[HN Gopher] YouTube-dl has an interpreter for a subset of JavaSc...
___________________________________________________________________
YouTube-dl has an interpreter for a subset of JavaScript in 870
lines of Python
Author : yuuta
Score : 304 points
Date : 2022-09-10 18:12 UTC (4 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| lolinder wrote:
| To be clear, this is an _extremely_ tiny subset of JS. It looks
| like they only implemented the features needed to run a very
| specific function. For example, the only symbol allowed after
| "new" is "Date", everything else throws an exception.
|
| It's still fun that it's there, but it's not as big a deal as it
| sounds from the tweet.
| krab wrote:
| It will only grow - as new scripts will need to be interpreted,
| new features will be added.
| lolinder wrote:
| I would be horrified if this grew much further. It's
| perfectly fine for its current scope, but the architecture
| would not scale at all to a full interpreter without
| essentially starting from scratch.
| em-bee wrote:
| if it's going to need much more than that then it probably
| would make more sense to port the whole application to
| javascript instead.
|
| but then this could be turned into a commandline browser that
| is able to interpret a whole web-page and save the resulting
| html structure instead of the source as curl/wget would do.
| mdaniel wrote:
| I was expecting this to be about Duktape
| <https://github.com/svaarala/duktape>, but heh, for sure no. I'd
| bet $1 there's no way youtube-dl would switch, but I wonder if
| yt-dlp would?
| kristopolous wrote:
| To understand why, I have a far simpler tool that focuses on a
| subset of sites (adult content video aggregators)
|
| https://github.com/kristopolous/tube-get
|
| It too deals with this problem but does so in a way that'd be
| easy to maliciously sabotage
|
| Look right about here https://github.com/kristopolous/tube-
| get/blob/master/tube-ge...
|
| As to why this program exists, this was originally written
| between about 2010-2015 or so technically predates the yt-*
| ecosystem.
|
| The tool still works fine and it's not a strict subset of yt-dlp
| or YouTube-dl because being a different approach, although it's
| overall site coverage is smaller, I've had it be a "second try"
| system when yt-* fails and it comes up with success maybe about
| half the time
| [deleted]
| homarp wrote:
| the tests for it: https://github.com/ytdl-org/youtube-
| dl/blob/master/test/test...
| lewisl9029 wrote:
| Another really cool JS dialect I recently learned about is njs
| from the nginx team: https://github.com/nginx/njs
|
| This video goes into some of the design and tradeoffs:
| https://www.youtube.com/watch?v=Jc_L6UffFOs
|
| TL;DW: they optimized for fast creation/destruction of low-
| footprint VMs with no JIT or garbage collection.
| M30 wrote:
| How should a programming noob interpret this? Be impressed at
| what was achieved here? Be concerned about security implications
| using the tool? Something else entirely?
| Test0129 wrote:
| > How should a programming noob interpret this?
|
| Usually in a virtual machine.
| smcl wrote:
| All of the above, really.
| tenebrisalietum wrote:
| > How should a programming noob interpret this?
|
| The browser is client-facing and everything there is possible
| to reverse engineer and figure out. So if you design a web-
| based application, and are depending on client-side Javascript
| for any security or distribution enforcement, it can be
| helpful, but can ultimately be unwound and cracked even if
| obfuscated, etc.
|
| > Be impressed at what was achieved here?
|
| Yes. Try to download a YouTube video with out it or an online
| service which is probably using it internally.
| rkangel wrote:
| This is the compiler writer equivalent of parsing HTML with
| regex:
|
| It is technically wrong - it isn't a sufficiently rich and
| powerful approach to handle all JS (HTML) that you might throw
| at it. It'll work for a while until it eventually barfs when
| you least expect it.
|
| EXCEPT that if the inputs you are giving it come from some
| understood source(s) that aren't likely to change, then a
| simpler approach to the "all singing all dancing" correct may
| be appropriate and justified. E.g. because it might be easier
| to write, easier to maintain and/or less attack surface etc.
| lolinder wrote:
| It's an extremely tiny subset of JS--as an example, the only
| object that can be instantiated is Date. Anything other than
| "Date" after "new" throws an exception.
|
| It's definitely neat, but not especially useful outside of the
| confines of its current application, and the security concerns
| of such a tiny subset will be minimal.
| petters wrote:
| > Anything other than "Date" after "new" throws an exception
|
| It's even very sensitive to white space.
| bjt2n3904 wrote:
| The goal of youtube-dl is to download a video off of YouTube
| for offline storage.
|
| This isn't something YouTube particularly enjoys. They would
| rather you keep coming back -- every visit is more ad revenue
| for them. If you have an offline copy, you don't need to visit
| YouTube anymore.
|
| YouTube has an incentive, therefore, to make it more difficult
| to download (or "scrape") their content.
|
| I'm not particularly sure of the specific details, but
| apparently YouTube has added JavaScript (a programming language
| that executes in the browser) as a hurdle to jump over. A
| simple python script doesn't have enough brains to execute
| JavaScript, only enough to realize that it exists. (Clearly,
| youtube-dl is sophistication enough to have jumped over it.)
|
| These are the conclusions I come to, having written software
| for about a decade.
|
| 1) Once you give information to someone, be it text, pictures,
| sound, or video -- they will do whatever they want with it, and
| you have no control. Oh, yes -- it may be illegal. Maybe
| unethical. But the fact of the matter is you do not have
| control over information once it leaves your hands.
|
| 2) Adding hurdles to make it harder to access the information
| does little to stop someone who is dedicated to accessing it.
|
| 3) Implementing a subset of JavaScript in such an elegant and
| tiny manner is quite impressive.
|
| How you interpret these facts depends on your worldviews. If
| you are a media and content creator, you will view these facts
| differently than a politician, and a teenager.
|
| As an engineer and amateur philosopher, I certainly support the
| rights of content creators to be paid for their work. And yet,
| I fear that more and more, content creators want to lease me a
| right to listen their music, instead of own a copy of it.
|
| I used to own CDs, DVDs, movies, and books. What happens if
| Amazon or YouTube decides to not serve me anymore? Anything
| I've "purchased" from them, I lose access to.
|
| Further more, if I create a song, I used to be able to burn
| copies of CDs and distribute it on the street corners. Now, you
| have to sign up to stream on Spotify. This is a double edged
| sword -- I get a wide audience, but Spotify will do whatever
| they want with me.
|
| This troubles me.
| jraph wrote:
| I do wonder why YouTube does not try harder to make it difficult
| to do this computation meant to prove you are a legit YouTube web
| client. Providing an easy-to-find, simple JS function
| interpretable with 900 lines of Python is like they don't try at
| all. They might as well do nothing.
|
| Or is their goal just to make youtube-dl not 100% reliable? Or to
| be able to say "look, you are running our code in a way we did
| not intend, you can't do this because you are breaking the EULA"?
| Cthulhu_ wrote:
| I'm guessing the amount of people using it is low enough to not
| bother with mitigation. Then again, there's a LOT of YT videos
| that take clips from other videos (which in most cases falls
| under fair use), which I can imagine would use this tool.
| Arnavion wrote:
| They do make it harder from time to time. In fact yt-dlp's
| interpreter has been broken for a month or so now and the devs
| finally gave up and told users to just install PhantomJS (which
| itself hasn't been updated since 2016 and probably has bugs /
| vulns of its own, but whatever).
|
| https://github.com/yt-dlp/yt-dlp/issues/4635#issuecomment-12...
| zuminator wrote:
| I'd guess that their efforts to make it harder are limited by
| the fact that they want YouTube to be able to play on thousands
| of different low powered set top boxes and cheap phones. So
| whatever obfuscated code they use has to be simple enough to be
| run and periodically updated by all these different devices,
| and that same simplicity makes it emulable.
| rcarmo wrote:
| Awesome. Even if it's likely incomplete, it might come in really
| handy for some scraping I need to do...
| Uptrenda wrote:
| Anyone who has ever pulled a website from a script knows the pain
| that is Javascript. Normally you want to just get some text and
| work out the API actions but a lot of sites use horribly
| obfuscated Javascript -- either because that's what modern web
| development is (lolz) -- or because its part of their 'security.'
| That means if you want to write browser-based bots properly --
| you ought to use a browser. There are special browsers that run
| 'headlessly' or are designed mostly for bot use. Like
| https://www.selenium.dev/ which plugs into a few different
| 'browser engines.'
|
| But now you have another problem. Your simple script goes from
| being small, simple, self-contained, and elegant gem, to
| requiring a full browser, specialized drivers, and/or daemons
| running just to work. If you're using something like Python you
| just frankly don't have very good packaging. So it's hard to
| string together all that into a solution and have it magically
| work for everyone. What YouTube-dl have done is good engineering.
| Even though it's not a full JS interpreter: they've kept their
| software lean, self-contained, and easier to use.
| eurasiantiger wrote:
| Just npm install puppeteer.
| lolinder wrote:
| Puppeteer is cool, but it's exactly what OP is warning
| against: it's a full browser that is downloaded and run
| through npm. It's remarkably well packaged, but still far
| more error prone than a simple HTTP request, and far more
| likely to break on its own just with the passage of time.
| haunter wrote:
| The same in yt-dlp https://github.com/yt-dlp/yt-
| dlp/blob/master/yt_dlp/jsinterp...
|
| Interesting to see the diffcheck between the two
| https://www.diffchecker.com/8EJGN27K
| cheschire wrote:
| Is yt-dlp's implementation being better the reason why I have
| fewer throttling issues than with youtube-dl?
| [deleted]
| [deleted]
| anony23 wrote:
| What purpose does it serve?
| [deleted]
| throwaway0984 wrote:
| IIRC it's used to extract/generate the signatures needed for
| YouTube media URLs
| oynqr wrote:
| You need to run some obscured JS to get decent download speeds
| from Youtube. Something along the lines of PoW.
| db48x wrote:
| It's not like proof of work at all. It's just a challenge and
| response; youtube includes a random number in the webpage for
| each video, and expects to see a request parameter with a
| particular value calculated from that random number when you
| request the video. If you don't do the arithmetic it
| throttles you to 50kb/s.
|
| Since the calculation of the response is done in JS, and they
| occasionally change the formula, some download programs are
| moving towards running the JS rather than trying to keep up
| with the changes.
|
| It's really just bullshit to make people's lives harder.
| xg15 wrote:
| Next step will probably be moving the calculation to
| webassembly or requiring the script to fetch the result via
| websocket or webrtc...
| mistrial9 wrote:
| .. pirate determination is a thing to behold, as is crazed-
| repetitive digital grabs.. Its not a fair or accurate
| characterization to dismiss it as "making people's lives
| harder" .. it is remarkable that the Debian distros now
| include ytdl; lets do what is reasonable to make it
| continue
| db48x wrote:
| You can't exactly pirate a youtube video, since they're
| all publicly available.
| MiguelX413 wrote:
| That's not really how piracy works. I say this as an
| advocate of it.
| rany_ wrote:
| They need to run a JavaScript function to download YouTube
| videos at normal speeds.
|
| Edit: it's also required to download music, otherwise it will
| just fail
|
| Source:
|
| - https://github.com/ytdl-org/youtube-
| dl/issues/29326#issuecom...
|
| - https://github.com/ytdl-org/youtube-
| dl/blob/d619dd712f63aab1...
|
| - https://github.com/ytdl-org/youtube-
| dl/commit/cf001636600430...
| ajkjk wrote:
| Wow: Overview of the control flow (already
| known): The Youtube API provides you with n - your
| video access token If their new changes apply to your
| client (they do for "web") then it is expected your client
| will modify n based on internal logic. This logic is inside
| player...base.js n is modified by a cryptic function
| Modified n is sent back to server as proof that we're an
| official client. If you send n unmodified, the server will
| eventually throttle you.
|
| So they can always change the function to keep you on your
| toes, hence you need to be able to run semi-arbitrary JS in
| order to keep using the API.
|
| Waste of human brainpower but I guess that energy is better
| spent imagining a world where Google isn't in charge instead
| of kvetching about what they're doing with their influence.
| elaus wrote:
| I'd have to read up on the specifics as well, but I think
| basically Youtube uses a lot of obfuscated, rapidly and
| automatically changing Javascript code to fetch the video data.
| A project like youtube-dl has to run this code to be able to
| download videos, because that's what's happening in the browser
| as well.
| temp_account_32 wrote:
| For those interested further, in some of the past few weeks
| youtube-dl had stopped working intermittently for multiple
| hours at a time, and it was precisely related to this code.
|
| We have a custom-made Discord music bot on our server which
| uses ytdl to stream songs so we can listen together, and at
| one point we were listening and suddenly got some obscure
| JavaScript error.
|
| We began joking that there's some bug in the code which
| breaks it after 6PM, but later found out that Google had
| changed some of the obfuscated JS and this basically broke
| this part of code, which prevented us from fetching the song
| information.
| bitexploder wrote:
| What is interesting is it seems to be constant cat and mouse.
| I download a YT vid. It crawls. Update yt-dlp, it flies
| again. I love yt-dlp and use it a lot.
| londons_explore wrote:
| If you start a youtube video and then pause it and resume a
| few days later, you'll notice that the youtube page plays for
| ~30 seconds (ie. whats buffered) and then the page refreshes.
| I'd guess this refresh is to pick up the new javascript and
| any updates to the HTML code.
|
| It's kinda annoying if you have a lot of youtube tabs open
| for a long time and come back to them.
| lupire wrote:
| But why not just use a normal JS engine called from Python?
| hadrien01 wrote:
| It's used in the YouTube extractor: https://github.com/ytdl-
| org/youtube-dl/blob/d619dd712f63aab1...
|
| I believe YouTube limits your bitrate if you don't pass a
| specific calculated value; it's possible youtube-dl has to
| parse and eval JS to get it.
| RicoElectrico wrote:
| > I believe YouTube limits your bitrate if you don't pass a
| specific calculated value
|
| It's starting to become Widevine bullshit all over again.
| kevin_thibedeau wrote:
| It's their platform. They can do with it what they want.
| vukgr wrote:
| Just because they have the right to do it doesn't make it
| right.
| jraph wrote:
| They've also chosen to be a monopoly.
| uwuemu wrote:
| MiguelX413 wrote:
| There's a difference between combatting entitlement to a
| platform and complaining about something not serving a
| greater good. Leftists are also censored either way.
| Those companies censor or don't censor according to what
| would maximize profit. Monetization and reach are
| probably mutually exclusive with freedom.
| tsukikage wrote:
| > you also think billionares should be taxed more
|
| That's... quite a response in defense of a tool intended
| for breaching TOS and performing copyright infringement.
| Can you clarify exactly who it is and isn't OK to steal
| from, again? I'm struggling here.
| btdmaster wrote:
| > As a matter of policy (as well as legality), youtube-dl
| does not include support for services that specialize in
| infringing copyright. As a rule of thumb, if you cannot
| easily find a video that the service is quite obviously
| allowed to distribute (i.e. that has been uploaded by the
| creator, the creator's distributor, or is published under
| a free license), the service is probably unfit for
| inclusion to youtube-dl.
|
| Does using a different User Agent instead of a typical
| browser amount to copyright infringement in any
| jurisdiction?
| tsukikage wrote:
| Copyright law only permits making copies of artistic
| works when you have license to do so. Youtube only
| permits use of content it serves in the specific
| situations described in its terms of service. All other
| use is prohibited.
|
| You can see the terms of service here:
|
| https://www.youtube.com/static?gl=GB&template=terms
|
| In particular, the first three points in the "permissions
| and restrictions" section explicitly prohibit tools like
| youtube-dl. I've pasted these below:
| The following restrictions apply to your use of the
| Service. You are not allowed to: 1. access,
| reproduce, download, distribute, transmit, broadcast,
| display, sell, license, alter, modify or otherwise use
| any part of the Service or any Content except: (a) as
| specifically permitted by the Service; (b) with prior
| written permission from YouTube and, if applicable, the
| respective rights holders; or (c) as permitted by
| applicable law; 2. circumvent, disable,
| fraudulently engage, or otherwise interfere with the
| Service (or attempt to do any of these things), including
| security-related features or features that: (a) prevent
| or restrict the copying or other use of Content; or (b)
| limit the use of the Service or Content; 3.
| access the Service using any automated means (such as
| robots, botnets or scrapers) except: (a) in the case of
| public search engines, in accordance with YouTube's
| robots.txt file; (b) with YouTube's prior written
| permission; or (c) as permitted by applicable law;
|
| As a convenient figleaf, it is also possible to use
| youtube-dl for some purposes that are not dubious. Of the
| people I know who use the tool, none of them do that.
| Dylan16807 wrote:
| It's their platform but it's also a web site and that
| comes with certain expectations of interoperability.
| [deleted]
| forchune3 wrote:
| it's sort of an extension of the state / surveillance
| RicoElectrico wrote:
| Many channels would be more than happy to enable download
| options, if possible.
|
| Hell, how is Creative Commons licence they totally give
| you option to select, work in case of videos that can't
| be downloaded in any way?
| londons_explore wrote:
| But would the channel owner be happy to enable download
| options if $0.09 per GB downloaded was subtracted from
| their ad revenue?
| sylware wrote:
| Nowadays "javascript" refers to the scriptable, grotesquely and
| absurdely complex and massive web engines, aka google financed
| blink and geeko, then apple financed webkit, that with their SDK.
|
| The currently obfuscated javascript media players will try to
| break yt-dlp by leveraging the complexity and size of those
| scripted web engines. They will make them out of reach to small
| teamns or individuals and it is even "better", it will force ppl
| to use apple or google web engine, killing any attempt to provide
| a real alternative.
|
| A standalone javascript interpreter is actually some work, but
| seems to stay in the "reasonable" realm: look at quickjs from M.
| Bellard and friends (the guy who created qemu, ffmpeg, tinycc,
| etc): plain and simple C (no need of a c++ compiler), doing the
| job more that well enough.
|
| That's why noscript/basic (x)html is so much important.
| dtx1 wrote:
| > but seems to stay in the "reasonable" realm
|
| > M. Bellard and friends
|
| Chose one, that dude is a wizard wielding c like a brain
| surgeon wields a scalpel.
| randyrand wrote:
| Chrome and Safari both have open source JS engines...
| userbinator wrote:
| That's beside the point. Open-source is not useful to the
| smaller players if it is too complex to comprehend and
| constantly churned.
| olliej wrote:
| Yeah I agree with almost all of this - the massive size and
| complexity of commercial engines makes it seem like JS the
| language must also be complex.
|
| I also agree with the idea that these sites will probably be
| able to/want to create JS that breaks these small/lightweight
| engines requiring constant work :-/
|
| This final point I disagree with entirely. You can't point to
| Bellard doing something as evidence that it's reasonable. This
| is a guy that wrote a program that generated a TV signal via a
| VGA card. :D
| oblak wrote:
| ah, but quickjs is an actual js engine. I have tried a couple
| of versions with real progress between them. This thing here is
| not
| languageserver wrote:
| > That's why noscript/basic (x)html is so much important.
|
| xhtml has been dead for a decade
| esprehn wrote:
| This isn't really JS, it's a purpose built evaluator that's only
| for evaluating a particular script on YouTube, assuming a huge
| list of things are true about how YouTube JS is written.
|
| Ex. Its got a hard coded list of methods for String, and it
| doesn't respect prototypes. It only supports creating Date
| instances, and won't work if you override the global Date. It
| parses with regexes and implements all operators with python's
| operator module (which is the wrong type semantics) etc. Nearly
| none of the semantics of JS are implemented.
|
| It's sort of the sandwich categorization problem:
|
| If I write a C# "interpreter" in perl thats only 200 lines and
| just handles string.Join, string.Concat and Console.WriteLine,
| and it doesn't actually try to implement C# syntax or semantics
| at all and just uses perl semantics for those operations is it
| actually C#? :P
|
| I say "not a sandwich".
| Test0129 wrote:
| This really isn't fair. Just because it doesn't faithfully
| implement whatever standard Javascript is on doesn't mean it
| isn't an interpreter. All an interpreter is is something that
| executes a script directly rather than requiring compilation.
| It is a defacto interpreter for a subset of javascript. Nothing
| more, nothing less. The title could be more clear, however.
| baobabKoodaa wrote:
| There's a huge difference between an interpreter for
| "JavaScript" and an interpreter for a "subset of JavaScript".
| Test0129 wrote:
| Making a pedantic argument on what constitutes an
| interpreter is silly. The title is bad. It is an
| interpreter. I'll continue to eat downvotes on this because
| of the pedantry of HN.
| khazhoux wrote:
| Technically, it's only the pedantry of a _subset_ of HN.
| lupire wrote:
| It's an interpretation of a subset of the pedantry on HN.
| jraph wrote:
| I didn't downvote, but I don't think esprehn is being
| unfair. Their comment is very informative. They didn't
| argue that what was implemented is not an interpreter,
| they did explain why it's not a JavaScript interpreter
| and not even an interpreter for a subset of JavaScript.
| It's just a special purpose interpreter suitable for
| YouTube's code that cannot be re-used for any code that
| uses the subset that it seems to implement.
|
| It's not pedantry (or I'm pedantic). It's a reaction to
| the title that can lead people to believe that a complete
| JavaScript interpreter has been written in less than a
| thousand lines of Python. This reaction is perfectly
| understandable.
| chess_buster wrote:
| I evaluated it with my Pedantic Interpreter which only
| results in the `pedantic` token.
| blondin wrote:
| my vote is meaningless and i am sorry about that. but
| just wanted to let you know that what you said made
| sense. do not let people get to you.
|
| most of us know that a thousand or so lines of code is
| not a full JavaScript interpreter and cannot be the real
| thing.
|
| there is no argument or conversation to have about it.
| baobabKoodaa wrote:
| > Making a pedantic argument on what constitutes an
| interpreter is silly. The title is bad. It is an
| interpreter.
|
| It's not a pedantic argument. Based on the title I
| thought that somebody wrote something akin to V8 in 800
| lines of Python. After reading the comments I realized
| those 800 lines just interpret a particular JavaScript
| function written by Youtube. Those things are different.
| Pointing out the fact that they are different is not
| pedantry. The title is misleading and the comments
| pointing that out are helpful.
| [deleted]
| blast wrote:
| esprehn didn't say it isn't an interpreter. They're saying it
| _is_ an interpreter and what it 's interpreting isn't (all
| of) JS. That's also what you're saying, so you're agreeing
| with esprehn.
|
| Edit: You misunderstood baobabKoodaa in the same way. Nobody
| is arguing about what constitutes an interpreter, except you.
| The question is only what language is being interpreted.
|
| Before accusing someone of pedantry, it would first be good
| not to completely misread them.
| blast wrote:
| I suppose this means it would be easy for YouTube to fuck with
| youtube-dl simply by throwing in more features of JS?
| joshenders wrote:
| Cat, meet mouse.
| dang wrote:
| Ok, we've changed this title to shrink the scope of the
| interpreter.
|
| Submitted title was "YouTube-dl has a JavaScript interpreter
| written in 870 lines of Python".
| jraph wrote:
| And as a user of youtube-dl, I'm quite happy about this. This
| probably allows a very safe, restricted "subset" of JS. Way
| better than using a full JS engine. 900 lines is still small
| and manageable.
| jiggawatts wrote:
| That's the exact same logic I hear from developers who say
| things like:
|
| Why do I need a full XML parser when I can just extract what
| I need with regex?
|
| And:
|
| All that RPC IDL stuff is overcomplicated, REST is so much
| easier because I can just write the client by hand.
| sebzim4500 wrote:
| I'm trying to get the thread model here. Is the concern that
| Youtube will inject JS into the payload which tries to break
| out of the youtuble-dl js sandbox using some zero day in
| whatever js engine they would use instead?
| rwmj wrote:
| Google attempting zero days on client computers would be
| something. It's not totally without precedent (Sony CD
| rootkits - https://en.wikipedia.org/wiki/Sony_BMG_copy_prot
| ection_rootk...) but would still be major news.
| [deleted]
| loeg wrote:
| youtube-dl targets a lot of websites other than Google
| properties, many of which are a lot sketchier (think, uh,
| NSFW streaming sites).
| kevingadd wrote:
| Embedding a whole js engine and then interopping with it
| from python would be non trivial. Good luck fixing any bugs
| or corner cases you hit that way. The V8 and spidermonkey
| embedding apis are both c++ (iirc) and non trivial to use
| correctly.
|
| Having full control like this +simple code is probably
| lower risk and more maintainable, even if there's the
| challenge of expanding feature set if scripts change.
|
| The alternative would be a console js shell, but those are
| very different from browsers so that poses it's own
| challenges.
| esprehn wrote:
| Fwiw there are python bindings for QuickJS and Duktape:
|
| https://github.com/PetterS/quickjs
|
| https://github.com/stefano/pyduktape
|
| https://github.com/amol-/dukpy
|
| I can't speak to the quality of those bindings, but they
| do seem maintained.
| em-bee wrote:
| apparently yt-dlp is somehow calling out to a js engine
| if available
| jraph wrote:
| Let's say they end up using Node. Node has a quite complete
| standard library that lets you access files and everything.
|
| Now if they do it right and only embed some bare JS
| interpreter, it's still way harder to audit than these <
| 900 lines, for which it is quite easy to convince oneself
| that the interpreted script cannot do much.
| geysersam wrote:
| Nowadays they could probably use Deno. Without
| permissions it doesn't allow network or file access etc.
| mjevans wrote:
| yt-dlp sometimes doesn't know how to evaluate the javascript
| / emcascript and will call out to an optional dependency, a
| real javascript interpreter, if installed.
| tra3 wrote:
| It's quacks like a duck at midnight, but it's actually a frog?
| olliej wrote:
| This is super cool.
|
| Some of the stuff is _kind of_ questionable to me in the sense
| that I could believe you could probably make some kind of
| sufficiently wonky JS that this would do the "wrong" thing.
|
| But it's super cool that they are able to do this as I think it
| shows that claims of JS complexity based on the size of JS
| engines is overlooking just how much of that size/complexity
| comes from the "make it fast" drive vs. what the language
| requires. Here you have a <1000LoC implementation of the core of
| the JS language, removed from things like regex engines, GCs,
| etc.
|
| Mad props to them for even attempting it as well - it simply
| would not have ever occurred to me to say "let's just write a
| small JS engine" and I would have spent stupid amounts of time
| attempting to use JSC* from python instead.
|
| [* JSC appears to be the only JS engine with a pure C API, and
| the API and ABI are stable so on iOS/macOS at least you can just
| use the system one which reduces binary size+build annoyance. The
| downside is that C is terrible, and C++ (differently terrible?
| :D) APIs make for much more pleasant interfaces to the VM -
| constructors+destructors mean that you get automatic lifetime
| management so handles to objects aren't miserable, you can have
| templates that allow your API to provide handles that have real
| type information. JSC only has JSValueRef and JSObjectRef, and as
| a JSObjectRef is a JSValueRef it's actually just a typedef to
| const JSValueRef :D OTOH other hand I do thing JSC's partially
| conservative GC is better for stack/temporary variables is
| superior to Handles for the most part, but it's also absolutely
| necessary to have an API that isn't absolutely wretched. The real
| problem with JSC's API is that it has not got any love for many
| many many .... many years so it doesn't have any way to handle or
| interact with many modern features without some kludgy wrappers
| where you push your API objects into JS and have the JS code wrap
| them up. The API objects are also super slow, as they basically
| get treated as "oh ffs" objects that obey no rules. I really do
| wish it would get updated to something more pleasant and really
| usable.]
| esprehn wrote:
| This doesn't actually implement any of the JS language though,
| it just reuses all of python's semantics and hard coded a tiny
| list of ex. String methods
|
| I also assume you mean mainstream JS engine, but Duktape,
| JerryScript and QuickJS are all C APIs.
|
| They probably could have used ex.
| https://github.com/PetterS/quickjs instead of the hacks in the
| OP linked file.
| olliej wrote:
| Ah, I only briefly scanned the implementation, and it looked
| like it was doing actual work - is it mostly string replacing
| to get approximate python equivalent syntax? Regardless
| that's disappointing.
|
| You are correct though that I was only thinking of the big
| engines - bias on my part alas.
|
| For your suggested alternate engines, JerryScript and QuickJS
| seem more complete than Duktape but I can't quite work out
| the GC strategy of JerryScript. Bellard says QuickJS has a
| cycle detector but I'm generally dubious of them based on
| prior experience.
|
| If I was shipping software that had to actually include a JS
| engine, if perf was not an issue I would probably use
| JerryScript or QuickJS as binary size I think would be a more
| critical component.
___________________________________________________________________
(page generated 2022-09-10 23:00 UTC)