[HN Gopher] You-get: Dumb downloader that scrapes the web
___________________________________________________________________
You-get: Dumb downloader that scrapes the web
Author : Anon84
Score : 197 points
Date : 2024-10-27 12:45 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| politelemon wrote:
| It seems they do not want you to report an issue without an
| accompanying fix for it.
|
| > If you would like to report a problem you find when using you-
| get, please open a Pull Request, which should include [snip]
|
| Can't say I've encountered this before.
| kylecazar wrote:
| They want you to just submit a PR with a test that, if passed,
| would indicate the problem for you is fixed.
| thangngoc89 wrote:
| What happens if you don't know Python? Python is a relatively
| easy language to learn but no way I'm gonna learn Python just
| to report an issue
| Filligree wrote:
| Good chance you wouldn't be writing good bug reports
| either, then. Github issues have enough noise that a first-
| pass filter like this feels like a good idea, even if it
| has some false positives.
| papichulo2023 wrote:
| I fail to see the logic in your comment. Just another
| case of Goodhart's law.
| achierius wrote:
| This isn't really a metric though. It's a formal
| existence proof that the bug exists. The key difference
| IMO is that you have to create a test which A) looks (to
| the maintainer) like it should pass, while simultaneously
| B) not passing. It's much harder to game.
|
| There are other cases where Goodharts Law fails as well:
| consider quant firms, where the "metric" used to judge a
| trader is basically how much money you pull in. Seems to
| be working fine for them
| dartos wrote:
| If you can't describe your bug in a test, then you
| probably can't describe it sufficiently in English
| either.
|
| Seems to make sense
| latexr wrote:
| This in no way aligns with reality. I _frequently_
| interact with users who can't code at all but make good
| bug reports. One of the best ways to ensure success is to
| have a form (GitHub allows creating those) which describe
| exactly what is necessary and guide people in the right
| direction.
|
| What you're saying is even worse, since you're implying
| someone could be an expert computer programmer or power
| user, but because they're unfamiliar with the specific
| language this project chose, they are incapable of making
| good bug reports. That makes no sense.
| js8 wrote:
| The same thing that happens if the author of the OSS you
| use doesn't know English.
| dartos wrote:
| Then you don't get to contribute bug reports.
|
| Perfectly fine rule for a maintainer to have.
| dotancohen wrote:
| If the bug is egregious enough, somebody else will find it.
| If the bug is important enough to you but esoteric, then
| ask on a forum or enlist the help of someone you know who
| does know Python.
|
| How do you currently submit bug reports on e.g. MS Word or
| Adobe Photoshop? This way is certainly more open than those
| commonly-deployed software.
| epcoa wrote:
| Did you (or anyone) in this thread look to see exactly what
| they are looking for with their provided examples?
|
| https://github.com/soimort/you-
| get/pull/2680/commits/313b8d2...
|
| You do not need to know Python deeply to construct what
| they are expecting. They're not actually looking for a unit
| test or something.
| latexr wrote:
| > Did you (or anyone) in this thread look to see exactly
| what they are looking for with their provided examples?
|
| I did. And I looked at all examples of "good commits",
| not just the trivial ones.
|
| https://github.com/soimort/you-get/pull/2685/files
|
| That's already complex for someone unfamiliar with the
| software (which might nonetheless be able to open a
| competent bug report).
| nunez wrote:
| That's exactly it. They put up a gate that blocks low-
| effort issues that only add busywork. I like it!
| sigseg1v wrote:
| I kind of like this. It's a more formal proof of concept. You
| prove the bug exists by writing a failing test. If they
| cannot construct a failing test then it's either too hard to
| mock or reproduce (and therefore maybe not even worth fixing,
| for a free tool), or it's impossible because it's not a bug.
| Frees up maintainer time from dealing with reports that
| aren't bugs.
| latexr wrote:
| > If they cannot construct a failing test then it's either
| too hard to mock or reproduce (...), or it's impossible
| because it's not a bug.
|
| Or, you know, the user is not a developer. Or is unfamiliar
| with Python, or their test suite, or git, or...
|
| It is perfectly possible to be good at reporting bugs but
| be incapable of submitting pull requests.
| newaccount74 wrote:
| The problem with popular tools is that they have more
| bugs that can be fixed. So bug reports are pretty much
| worthless: You know that there are 1000 bugs out there,
| but you only have resources to fix 10 of them.
|
| By asking users to provide reproducible test cases, you
| can massively reduce the amount of work you have to do.
| Of course that means 90% of bugs will never be reported.
| But since you don't have the resources to fix them
| anyway, why not just focus on the bugs that can be
| reproduced and come with a test case...
| onionisafruit wrote:
| Interesting. I like the idea of encouraging people to try
| creating a test or even a whole fix, but saying that's all you
| will accept is a bit much. On the other hand, I'm not doing the
| work to maintain you-get. I don't know what they deal with.
| This may be an effective way to filter a flood of repetitive
| issues from people who don't know how to run a command line
| program.
| probably_wrong wrote:
| I believe there are two extremes. On one end you get a bunch
| of repetitive non-issues, while on the other end you only get
| issues about (say) bugs in FreeBSD 13.3 because only hard-
| core users have the skills and patience to follow THE
| PROCESS.
|
| I know how to make an isolated virtual environment, install
| the package, make a fork, create a test and make a PR. But I
| don't know whether I care enough about a random project to
| actually do it.
| wccrawford wrote:
| As the other commenter said, they want a failing test, not a
| fix. A detailed description of the
| encountered problem; At least one commit, addressing
| the problem through some unit test(s). Examples of
| good commits: #2675, #2680, #2685
|
| "Addressing" is probably a bad word to use here.
| "Demonstrating" would have been better, IMO.
| tylerchilds wrote:
| the most expensive piece of writing software is scoping work.
|
| i'm almost tempted to add a test suite just to give people
| more agency over my output because right now i'm only
| soliciting feedback in person to cut down on internet
| bullshit, like what happened to xz-utils
| thih9 wrote:
| It's relatively easy to write a failing test and it massively
| cuts down the work related to moderating issues. Also, reduces
| the danger of github issues turning into a support forum.
|
| If this results in the project being easier to maintain and
| being maintained longer, then I'm fine with this.
| seneca wrote:
| > It's relatively easy to write a failing test and it
| massively cuts down the work related to moderating issues.
|
| Relative to what? Learning someone else's code base well
| enough to write a useful test is not trivial.
|
| It's not a bad method, but the vast majority of users won't
| be capable of writing a test that encapsulates their issue.
| chucksmash wrote:
| In the case of this tool, adding a failing test case looks
| trivial if you've got the URL of a page it fails on.
|
| Provided the maintainer is willing to provide some minimal
| guidance to issue reporters who lack the necessary know-
| how, it even seems like a clever back door way of helping
| people learn to contribute to open source.
| zufallsheld wrote:
| Serverspec does the same:
| https://github.com/mizzy/serverspec?tab=readme-ov-file#maint...
| omoikane wrote:
| The Chinese version of the text has an extra header line that
| translates to "to prevent abuse via GitHub Issues, we are not
| accepting general issues". An earlier commit has this for the
| English text: `you-get` is currently
| experimenting with an aggressive approach to handling issues.
| Namely, a bug report must be addressed with some code via a
| pull request.
|
| https://github.com/soimort/you-get/commit/75b44b83826b3c2d9a...
|
| Maybe they got too much spam.
|
| By the way, `tests/test.py` seems to just run the extractors
| against various websites directly. I can't find where it's
| mocking out network requests and replies. Maybe this is to
| simplify the process for people creating pull requests?
| godelski wrote:
| I can get this, but I aggressively report accounts and
| issues. I'm not sure how GitHub handles them but they seem to
| not come back.
|
| Though what I'm unsure how to deal with is legitimate users
| being idiotic. For example, recently one issue was opened
| that asked where the source code was. Not only was there a
| directory named "src" but there were some links in the readme
| to specific parts. While I do appreciate GitHub and places
| like hugging face [0], there are a lot of very aggressive and
| demanding noobs.
|
| I'd like ways to handle them better.... I'm tired of people
| yelling at me because 5 year old research code no longer
| works out of the box or because you've never touched code
| before.
|
| [0] check any hugging face issue and you'll see far more
| spam. Same accounts will open multiple issues that just
| barate owners and hugging face makes it difficult to report
| these accounts.
| throwaway314155 wrote:
| The solution is to ignore them and close their issue. Open
| source maintainers have enough to worry about and are
| unpaid, it's okay to be a little dictatorial when it comes
| to "bad questions".
| KTibow wrote:
| Can someone explain why this is better than yt-dlp
| uniqueuid wrote:
| That's an interesting question. They only depend on a single
| library, but I wonder how much code is really their own. I
| found it curious, for example, that there is a dedicated mp4
| joiner (I mean, if you already have ffmpeg, there is probably
| no way you can do it better yourself).
|
| https://github.com/soimort/you-get/blob/develop/src/you_get/...
| grugagag wrote:
| How did you infer better than yt-dlp? I think the more the
| better when it comes to this space as google fights back.
| xg15 wrote:
| But some information what the differences to ytdlp are and
| what the reasons for starting an entirely new project were,
| would still be helpful.
|
| (Also, a multitude of tools isn't really all that helpful if
| they all stop working in the same instant because they all
| relied on the same APIs etc)
| vanjajaja1 wrote:
| > Search on Google Videos and download > $ you-get "Richard
| Stallman eats"
|
| I don't often read instruction manuals, but this time I did and I
| found this gross easter egg
| dotancohen wrote:
| Can it back up a text webpage? Can it remove popups for
| newsletters, or subscription, or logins, or cookies'
| notifications? Can it read pages that require signing in?
| demberto wrote:
| this different from JDownloader2?
| tcsenpai wrote:
| I like this. I am imagining a companion extension for chrome/ff
| that uses you-get as a backend to implement it in a seamless way.
| Forward thinking idea: imagine going on youtube and have you-get
| extension bypass the youtube player and playing the content
| directly without ads. When I say youtube I might also say any
| other platform.
| mikojan wrote:
| Sounds like FastStream Video Player
|
| https://addons.mozilla.org/en-US/firefox/addon/faststream/?u...
| xg15 wrote:
| I wouldn't exactly call a ytdl-style media downloader with a
| whole library of site-specific extractors and converters "dumb"
| but still cool that more projects like ytdl exist.
| andai wrote:
| For a while I had expensive internet and low bandwidth, but I
| loved listening to music and lectures on YouTube. At some point I
| realized that getting only the audio stream would save me 90% in
| bandwidth costs. [0]
|
| youtube-dl (and yt-dlp) has a flag, I believe -G, which gives you
| the URL(s) for the requested format/quality. I used the command
| line on my computer and put the link in VLC. On my phone I had
| this elaborate workaround involving downloading the file to my
| VPS first over SSH, then downloading it to my phone, until I
| realized my phone browser can consume the URL directly, so I set
| up a PHP frontend for `youtube-dl -G -f bestaudio {url}`
|
| It's no longer online and I lost the code, but it was like one
| line of code.
|
| I mention this because you-get seems to support the same usecase
| (via --url / -u), so I wanted to let people know how useful this
| is!
|
| (While it was online I shared it on some forums and got very
| positive feedback, people used it for audiobooks etc.)
|
| [0] Also playing with screen off saves 80% battery life! YouTube
| knows these facts and that's why they made background playback
| (which fetches only audio stream) a paid feature...
| 01HNNWZ0MV43FF wrote:
| I think it's -x to just rip audio now
| TechDebtDevin wrote:
| Brave Mobile browser allows turning on background video audio
| thus eliminating the need for YouTube Premium and similar
| subscriptions.
| l3x4ur1n wrote:
| I don't know why your comment is downvoted because I use this
| feature of Brave very often and I also exclusively watch YT
| in Brave mobile (no ads).
| gaudystead wrote:
| For me, it was as easy as adding a shortcut to the YouTube
| homepage on Brave that it basically acts like the YouTube
| app, but with ad blocking built in. It's the only way I
| watch YT videos on mobile.
| icar wrote:
| You might be interested in GrayJay app.
| TechDebtDevin wrote:
| There are a lot of people that don't like Brave's business
| model. But I've never given Brave a dime and turn off their
| ad network stuff and they've saved me hundreds of dollars
| on Youtube Premium over the years.
| cocok wrote:
| For Firefox:
|
| https://github.com/mozilla/video-bg-play
| ww520 wrote:
| That's the -F option to list all the formats, including the
| audio streams. Pick the audio format with -f to download the
| audio. I usually pick the .m4a format and then run it through
| ffmpeg to convert to mp3.
| KMnO4 wrote:
| What's the point of converting it to mp3? AAC inside an m4a
| container usually has better sound quality than similarly
| compressed mp3, and definitely better than reencoding.
| userbinator wrote:
| MP3 is accepted by far more players.
| krick wrote:
| That's really unnecessarily complicated workflow you have.
| It's achievable by yt-dlp with just 3 flags:
|
| --extract-audio
|
| --format bestaudio
|
| --audio-format mp3
| knowitnone wrote:
| you're unnecessarily making huge assumptions. Some people
| don't want the bestaudio or mp3
| krick wrote:
| If I would make any assumptions, I would post another 30
| options from my config that are nice to have when you
| download audio from youtube. These 3 are exactly
| equivalent to what gp does.
| andai wrote:
| Same but I converted to Opus, because I was trying to squeeze
| it into as little bandwidth as possible. It was mostly speech
| content and Opus auto detects and optimizes for speech at low
| nitrates.
| Synaesthesia wrote:
| BTW if you browse YouTube with Firefox browser on Android you
| can play back YouTube videos with the screen locked using
| background player fix extension.
| 6yyyyyy wrote:
| NewPipe can do this very nicely, it even lets you build a
| playlist of videos.
| wutwutwat wrote:
| A service that takes arbitrary user input and then attempts to
| download/proxy whatever is at the end of that input. Brave
| soul.
| khimaros wrote:
| on Android YTDLnis solves this very nicely. simply share the
| video URL to the app and it can download whichever format you
| like https://github.com/deniscerri/ytdlnis
| cquintana92 wrote:
| One of my last weekend projects was something similar: convert
| youtube playlists into podcast-compstible URLs:
|
| https://github.com/cquintana92/yt2pc
| dredmorbius wrote:
| mpv similarly has this option. I _listen_ to far more videos
| than I _watch_.
|
| <https://mpv.io/>
| MattDaEskimo wrote:
| Another library released which lies about what it is to
| circumvent anti-bot security.
|
| Let's just not act surprised when tighter attestation comes in
| effect.
| ajsnigrutin wrote:
| This library/program solves problems that people have with
| pages like youtube... too many ads, no way to download videos
| for offline use (or archive for when they get removed), and
| better performance with a native player.
|
| If I was forced to watch all the ads on youtube, i wouldn't
| watch videos there at all.
| therein wrote:
| A future in which YouTube will refuse to stream you data
| because you didn't pass client attestation is definitely coming
| and I wish we could stop it.
|
| It is a dark future where some of us will accept it, and rest
| of us will be constantly taking part in a cat-mouse chase in
| which we glitch out attestation tokens from vulnerable devices
| to get by.
| userbinator wrote:
| We need laws against user-agent discrimination.
| troupo wrote:
| I used to "save" interesting links by emailing them to myself.
|
| Now most of them are dead, twitter accounts removed, youtube
| videos deleted, facebook pages bought by media management
| companies, sites rebuilt etc.
|
| Whatever the primary goal if this tool, it, and other similar
| tools, are invaluable in actually saving and preserving content
| krick wrote:
| Given the title and the first few sentences from a description I
| assumed that it's some heuristic-based tool to try and grab
| whatever there is on the page, which would be useful if there's
| no tool which implemented the support for this site (which in
| most cases just means "yt-dlp doesn't support it"). But
| apparently it's also extractor-based with a separate extractor
| for each somewhat-popular source. So, basically it's just less
| sophisticated clone of yt-dlp?
| jdthedisciple wrote:
| Anybody else getting this error constantly?
| you-get: [error] oops, something went wrong. you-get:
| don't panic, c'est la vie. please try the following steps:
| you-get: (1) Rule out any network problem. you-get:
| (2) Make sure you-get is up-to-date. you-get: (3) Check
| if the issue is already known, on you-get:
| https://github.com/soimort/you-get/wiki/Known-Bugs you-
| get: https://github.com/soimort/you-get/issues
| you-get: (4) Run the command with '--debug' option,
| you-get: and report this issue with the full output.
|
| Tried with debug flag but didn't really help
| pattern = str(pattern, 'latin1')
| ^^^^^^^^^^^^^^^^^^^^^^ TypeError: decoding to str: need a
| bytes-like object, NoneType found
|
| I was curious to see if it can bypass age restriction (though I
| tried on non-age-restricted video too with the same error).
| natch wrote:
| Is this just a fork of yet-dlp with credits rewritten?
| fnoobnar wrote:
| I'm not sure I understand why Bandcamp is on the list of
| supported sites: they allow you to just download the files on the
| condition you first pay the artist for them.
|
| The fact you can download it with this tool is because the artist
| is letting you listen to it for free before buying it.
| Downloading it with this tool seems totally unnecessary and a bit
| of a jerk move. Bandcamp hosts mostly small and independent
| artists and labels.
| khaki54 wrote:
| I presume you could subscribe and still use this tool? People
| use automation tools like this to download things that they
| already pay for because it saves them the effort of logging
| into 5 different apps depending on which walled garden it's in.
| hluska wrote:
| Do artists get paid on Bandcamp if they bypass the login?
| lovethevoid wrote:
| Their list of supported sites isn't a declaration of where you
| should use this tool for moralistic reasons. It's just a list
| of popular sites it works on.
| wanderingmind wrote:
| Nice work. But as a consumer, Why should I use you-get over yt-
| dlp? What are its strengths over yt-dlp, which works quiet well
| on a huge range of websites[1]
|
| [1] https://github.com/yt-dlp/yt-
| dlp/blob/master/supportedsites....
___________________________________________________________________
(page generated 2024-10-27 23:00 UTC)