hngopher.com

       [HN Gopher] Reverse engineering the obfuscated TikTok VM
       ___________________________________________________________________
        
       Reverse engineering the obfuscated TikTok VM
        
       Author : xfeeefeee
       Score  : 380 points
       Date   : 2025-04-21 01:59 UTC (21 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | xfeeefeee wrote:
       | The fascinating process of reverse engineering this VM is
       | detailed here.
       | 
       | TikTok uses a custom virtual machine (VM) as part of its
       | obfuscation and security layers. This project includes tools to:
       | 
       | Deobfuscate webmssdk.js that has the virtual machine.
       | 
       | Decompile TikTok's virtual machine instructions into readable
       | form.
       | 
       | Script Inject Replace webmssdk.js with the deobfuscated VM
       | injector.
       | 
       | Sign URLs Generate signed URLs which can be used to perform auth-
       | based requests eg. Post comments.
        
         | noduerme wrote:
         | Is calling a massive embedded JS obfuscator a "VM" a bit of a
         | stretch? Ultimately it's not translating anything to a lower-
         | level language.
         | 
         | Still, I had no idea. This is really taking JS obfuscation to
         | the next level.
         | 
         | One kind of wonders, what is the purpose of that level of
         | obfuscation? The naive take is that obfuscation is usually to
         | protect intellectual property... but this is client-side code
         | that wouldn't give away anything about their secret sauce
         | algorithm.
        
           | throwaway48476 wrote:
           | VM obfuscation is a common technique for malware developers.
           | 
           | The VM term is applied because the obfuscator creates a
           | custom instruction set and executes custom byte code. This is
           | generated per build.
        
             | noduerme wrote:
             | I appreciate you making the distinction that anything which
             | creates a custom instruction set is thus a VM. I think
             | that's the way a lot of people here who are currently at my
             | throat seem to define it, so I'm glad you put it in clear
             | terms. I would define it as a custom instruction set _plus_
             | some sort of plug-in that allows those opcodes to be run
             | closer to the metal than the language they 're written in.
             | FWIW I'd call this thing more of an obfuscation framework.
             | But maybe I'm just a dino. I am really glad you made this
             | comment, though. It clarified for me why so many people
             | went bananas when I said this wasn't a VM.
        
           | MonkeyClub wrote:
           | > Is calling a massive embedded JS obfuscator a "VM" a bit of
           | a stretch? Ultimately it's not translating anything to a
           | lower-level language.
           | 
           | From the Repo's README:
           | 
           | "TikTok is using a full-fledged bytecode VM, if you browse
           | through it, it supports scopes, nested functions and
           | exception handling. This isn't a typical VM and shows that it
           | is definitely sophiscated."
        
             | noduerme wrote:
             | But that's basically an emulator of a VM, isn't it? It's
             | like rewriting the Flash AVM2 into JS... it's still
             | _running in JS_ whereas the _original_ VM was C++. It could
             | JIT compile stuff but only because it literally was
             | reserving memory that could overflow, and (semi-technical
             | take here) from that advantage, of being closer to the
             | metal, flowed all of the flaws in AVM2 that precipitated
             | most of Adobe 's woes with Flash. A VM implant in a web
             | page that uses a plugin like Java or Flash, to get around
             | running browser-sandboxed code, which can take over
             | physical memory, is far different from just emulating a VM
             | in Javascript. I wouldn't call writing a ton of opcodes in
             | JS, which resolved to JS functions, a "virtual machine",
             | because it isn't reserving anything or doing anything that
             | Javascript can't do. Someone correct me here if I'm
             | wrong... this is just heavy-duty obfuscation.
             | 
             | Also, one major purpose of a VM is to _improve_ performance
             | over what 's available in the browser. If you use that as a
             | measurement, this clearly doesn't fit that goal.
        
               | gruez wrote:
               | >But that's basically an emulator of a VM, isn't it?
               | 
               | Emulators and VMs aren't mutually exclusive.
               | 
               | >Also, one major purpose of a VM is to improve
               | performance over what's available in the browser. If you
               | use that as a measurement, this clearly doesn't fit that
               | goal.
               | 
               | And from your other comment:
               | 
               | >I would define it as a custom instruction set plus some
               | sort of plug-in that allows those opcodes to be run
               | closer to the metal than the language they're written in.
               | 
               | A virtual machine just means a machine that's virtual.
               | All the other expectations you apply on top of it (eg.
               | "improve performance over what's available in the
               | browser") is totally irrelevant. The JVM clearly doesn't
               | improve performance of java code than running natively,
               | but nobody denies it's a virtual machine. The same goes
               | for VMWare products ("VM" is literally in its name!),
               | which executes x86 code but is further away from "the
               | metal" that it's running on.
        
           | userbinator wrote:
           | You are replying to a comment that looks extremely unhuman.
        
             | codetrotter wrote:
             | It looks like OP filled out the text area alongside with
             | the URL when submitting the post.
             | 
             | HN takes that text and turns it into a comment. I've seen
             | it happen before.
             | 
             | The unfortunate outcome of that IMO is that sometimes text
             | that makes sense as a description of a submission feels a
             | bit out of place as a comment due to how they are worded.
             | And these comments sometimes then end up getting downvoted.
             | 
             | I wouldn't be completely sure it was not human written.
             | Even though it feels a bit weird to read it as a comment.
        
               | xfeeefeee wrote:
               | > It looks like OP filled out the text area alongside
               | with the URL when submitting the post. HN takes that text
               | and turns it into a comment.
               | 
               | Yeah, this is exactly what happened, but I decided to
               | keep it rather than delete and filled it out more with
               | the synopsis from the repo.
               | 
               | Looking back at it, it really does look like an AI
               | bulleted summary. I probably should have noted that the
               | last part was indeed a quotation.
        
         | dmitrygr wrote:
         | What is the purpose of you posting a bad ChatGPT summary of the
         | original post?
        
           | xfeeefeee wrote:
           | I quoted the synopsis from the readme thinking it would be
           | helpful.
        
       | godelski wrote:
       | This seems like quite a lot of work to hide the code. What would
       | the legitimate reasons for this be? Because it looks like it
       | would make the program less optimized and more complexity just
       | leads to more errors.
       | 
       | I understand the desire to make it harder for bots, but 1) it
       | doesn't seem to be effective and bots seem to be going a very
       | different route 2) there's got to be better ways that are more
       | effective. It's not like you're going to stop clones through this
       | because clones can replicate by just seeing how things work and
       | reverse engineer blackbox style.
        
         | davidsojevic wrote:
         | Making it harder for bots usually means that it drives up the
         | cost for the bots to operate; so if they need to run in a
         | headless browser to get around the anti-bot measures it might
         | mean that it takes, for example, 1.5 seconds to execute a
         | request as compared to the 0.1 seconds it would without them in
         | place.
         | 
         | On top of that 1.5 seconds is also that there is a much larger
         | CPU and memory cost from having to run that browser compared to
         | a simple direct HTTP request which is near negligible.
         | 
         | So while you'll never truly defeat a sufficiently motivated
         | actor, you may be able to drive their costs up high enough that
         | it makes it difficult to enter the space or difficult to turn a
         | profit if they're so inclined.
        
           | godelski wrote:
           | I understand the argument. You can't have perfect defense and
           | speedbumps are quite effective. I'm not trying to disagree
           | with that.
           | 
           | But it does not seem like the solution is effective at
           | mitigating bots. Presumably bots are going a different route
           | considering how prolific they are, which warrants another
           | solution. If they are going through this route then it
           | certainly isn't effective either and also warrants another
           | solution.
           | 
           | It seems like this obscurification requires a fair amount of
           | work, especially since you need to frequently update the code
           | to rescramble it. Added complexity also increases risks for
           | bugs and vulnerabilities, which ultimately undermine the
           | whole endeavor.
           | 
           | I'm trying to understand why this level of effort is worth
           | the cost. (Other than nefarious reasons. Those ones are
           | rather obvious)
        
         | noduerme wrote:
         | A generous take would be that they have their own internal GUI
         | tools that make it easier for non-programmers to set up visual
         | elements in this. That was historically the reason to invent
         | VMs like Flash. A less generous take would account for the
         | enormous potential for hiding nefarious code inside such a
         | thing, and account for the nature of the government which
         | deployed it, and conclude that it was a national security /
         | defense project disguised as a candy-coated trojan horse.
        
           | supriyo-biswas wrote:
           | VM-based architectures are really common in the obfuscation
           | space, which is why you have executable packers[1], JS
           | packers[2] and bot management products[3][4] leveraging
           | similar techniques.
           | 
           | As for why the obfuscation is needed: bot management products
           | suffer from a fundamental weakness in that ultimately, all of
           | them simply collect static data from the environment,
           | therefore it would make much more sense to make the steps
           | involved as difficult to reverse engineer as possible. Once
           | that is done, all you need to do is slightly change the
           | schematics of your script every few weeks and publish a new
           | bundle, and you've got yourself a pretty unsubvertible*
           | protection scheme.
           | 
           | Regarding the "trojan horse", I think someone is yet to show
           | proof that it's a Javascript exploit.
           | 
           | (*Unsubvertible is obviously relative, but raising the cost
           | the attack, from say, $0.01/1000 requests to $10/1000
           | requests would massively cut down on abuse.)
           | 
           | [1] https://vmpsoft.com/
           | 
           | [2] https://jscrambler.com/
           | 
           | [3] https://github.com/neuroradiology/InsideReCaptcha
           | 
           | [4] https://www.zenrows.com/blog/bypass-
           | cloudflare#_qEu5MvVdnILJ...
        
         | throwaway48476 wrote:
         | Makes it easier to hide code that does browser fingerprinting.
        
         | rfoo wrote:
         | Google has been doing this since forever for recaptcha. And, to
         | be fair, it seems to be fairly effectively for bot detection.
         | 
         | https://github.com/neuroradiology/InsideReCaptcha
         | 
         | > bots seem to be going a very different route
         | 
         | If the "very different route" means running a headless browser,
         | then it's a success for this tech. Because the bot must run a
         | blackbox JS now, and this gives people a whole new street of
         | ways to run bot detection, using the bot's CPU.
        
           | godelski wrote:
           | Okay... but those bots exist... and in high numbers... By
           | "very different route" I mean "measure to _effectively_ stop
           | the bots " (or dramatically reduce). It seems like if they're
           | using a headless browser then they're still being quite
           | effective in accomplishing their goals.
        
         | Scaevolus wrote:
         | Obfuscation is one part of defense in depth. Tiktok also has a
         | variety of captchas to block scrapers, independent of this.
         | 
         | None of it's perfect, and they can be worked around, but by
         | providing a barrier you've restricted some of the bad actors
         | (spambots, scrapers) from acting at all.
         | 
         | It's easier to deal with 100 spambots than 1000!
        
           | like_any_other wrote:
           | Unless the scrapers are DDoSing the site, I refuse to
           | consider the downloading of publicly posted data as
           | malicious. It shows how captured the conversation has become
           | by corporate interests, that viewing or storing data posted
           | free of charge, publicly, by their users, in a way not
           | approved by that corporation, is seen as malicious, and the
           | only _morally_ allowed way to view it is to use their
           | spyware-laden client.
        
             | Scaevolus wrote:
             | What if the user has disabled downloads of a video? Should
             | the creator (and copyright owner) of a piece of media not
             | be allowed even token attempts to prevent copying?
        
               | ndriscoll wrote:
               | No because that interferes with fair use. If someone
               | publicly posts a video, everyone has the right to copy it
               | without any permission or awareness from the original
               | author for things like commentary/criticism (it would be
               | silly to require the copyright owner's permission to
               | criticise a work!).
        
             | areyourllySorry wrote:
             | this is also a measure against bots that write, not just
             | those that read
        
       | ronsor wrote:
       | There is no legitimate reason for a social media platform to
       | employ this much obfuscation.
        
         | miohtama wrote:
         | It's to keep bots away and not turn to be another Twitter.
        
           | dns_snek wrote:
           | That's probably not the goal. There are bots advertising
           | illegal services (e.g. ads for "hacking services", illegal
           | drugs) in most comment sections. If you report these
           | comments, 99.9% of the time the report will be rejected with
           | "no violations found" and the spam stays up.
        
             | bolognafairy wrote:
             | That doesn't mean that it's "probably not the intention".
        
               | dns_snek wrote:
               | The balance of evidence suggests otherwise. If they cared
               | about spam bots they would take action when spammers are
               | handed to them on a silver platter. The kinds of spammers
               | who will leave 30 identical comments advertising illegal
               | services, not some weird moderation corner case.
               | 
               | If you ever end up on a video that's related to drugs,
               | there will be entire chains of bots just advertising to
               | each other and TikTok won't find any violations when
               | reported. But sure, I'm sure they care a whole lot about
               | not ending up like Twitter.
        
               | TheDong wrote:
               | So you're saying that TikTok's support team doing a poor
               | job of handling reports is proof that the engineering
               | team wasn't tasked with reducing spam by writing code
               | obfuscation?
               | 
               | TikTok is a huge company, evidence of what the support
               | department does or doesn't do has only minor bearing on
               | the whole company, and basically none on the engineering
               | department.
               | 
               | The thing that seems most likely to me is that they care
               | about spam, the engineering department did this one
               | thing, and the support department is either overworked or
               | cares less. Or really efficient which is why you only see
               | "a lot of spam", not "literally nothing but spam".
        
               | wpietri wrote:
               | A large company is much less cohesive than you realize.
               | You can't reliably reason about the goals of one part
               | because another part isn't consistent. This particular
               | difference could easily be explained by insufficient
               | funding to moderation, which is endemic in social media.
        
           | lazyeye wrote:
           | Because bots cant interact with web pages at the browser
           | level like humans do...
        
         | krackers wrote:
         | The legitimate reason could be bot protection, the same way
         | recaptcha uses a similar VM technique for obfuscation.
        
         | supriyo-biswas wrote:
         | See my other comment on this thread:
         | https://news.ycombinator.com/item?id=43748994
        
         | vasco wrote:
         | You not being able to come up with one is different from there
         | not being any possible reason.
        
         | yard2010 wrote:
         | This is not a social media platform but a government backed
         | tool for doing stuff for the government.
        
         | fidotron wrote:
         | If you believe this you underestimate how adversarial the
         | software world really is. TikTok will be on the receiving end
         | of botnets by everything from commercial entities, state backed
         | groups and criminals.
         | 
         | They won't be betting that this stops that entirely, but it
         | adds a layer of friction that is easy for them to change on a
         | continuous basis. These things are also very good for leaving
         | honeypots in where if someone is found to still be using
         | something after a change you can tag them as a bot or otherwise
         | hacking. Both of those approaches are also widely used in game
         | anti-cheat mechanisms, and as shown there the lengths people
         | will go to anyway are completely insane.
        
           | fmxsh wrote:
           | It's an excellent strategy for the reasons you mention. And a
           | kind of "security by principle of least privilege".
        
           | lazyeye wrote:
           | Nah..I agree with the parent comment, there is simply no
           | legitimate reason for a social media app to employ this level
           | of obsfucation.
        
       | davidsojevic wrote:
       | Very impressive work! I always enjoy a good write up about
       | reverse engineering efforts and yours was really simple to
       | follow.
       | 
       | Many popular/large websites and bot protection services usually
       | have environment checking as a baseline and mouse-movement
       | tracking in some of the more aggressive anti-bot checks.
       | 
       | It's always interesting to see how long it takes from when the
       | measures have been defeated/publicised until the service ends up
       | making changes to their mechanism to make you start over
       | (hopefully not from scratch).
        
         | xfeeefeee wrote:
         | All credit should go to Lukas
         | https://github.com/LukasOgunfeitimi
         | 
         | I was sharing this here since I thought it was a great write
         | up, but did not intend to pass it off as my own!
         | 
         | There is certainly always a good amount of push and pull,
         | though my personal concern as a contributor to yt-dlp under
         | another alias is more about archival of the underlying media
         | rather than automating things like comments.
         | 
         | YouTube also uses an interesting scheme for authenticating
         | requests for media as well which required implementing a very
         | basic JavaScript interpreter within Python for yt-dlp too. I
         | expect this kind of thing to continue to become even more
         | common and complicated.
        
       | worldsavior wrote:
       | That's a very strong obfuscation. Takes a lot of work to
       | deobfuscate such a thing. Great writeup.
        
       | domfie wrote:
       | Looks like a lot of work. I recently discovered webcrack and the
       | tool jehna/humanify for such deobfuscate tasks
        
         | 3abiton wrote:
         | It could be interesting to see a comparison to OP's work.
        
       | 0xDEADFED5 wrote:
       | this is cool. i briefly worked on a TikTok bot a while back and
       | it was a huge pain in the ass.
        
       | heinternets wrote:
       | Is TikTok so obfuscated to prevent people from knowing the full
       | extent of data collection and device fingerprinting?
        
         | gruez wrote:
         | 1. Practically speaking all this javascript fingerprinting
         | pales in comparison to what native apps have access to. Most
         | people aren't using tiktok on their browsers, and the browser
         | version heavily pushes you to using the app, so you should be
         | far more worried about whatever's happening in the app.
         | 
         | 2. Despite tiktok having a giant target painted on its back for
         | its perceived connections to the CCP, I haven't really seen any
         | evidence that it does any more tracking/fingerprinting that
         | most other websites (eg. facebook) or security services (eg.
         | cloudflare or recaptcha) already do.
        
           | nicce wrote:
           | > 2. Despite tiktok having a giant target painted on its back
           | for its perceived connections to the CCP, I haven't really
           | seen any evidence that it does any more
           | tracking/fingerprinting that most other websites (eg.
           | facebook) or security services (eg. cloudflare or recaptcha)
           | already do.
           | 
           | Take a look for request parameters in TikTok vs. Instagram
           | for example.
           | 
           | Every request for TikTok forces you to pass most of the
           | information that browser can collect from the end-user before
           | server responds:
           | 
           | https://www.nullpt.rs/reverse-engineering-tiktok-vm-1
        
             | gruez wrote:
             | >Every request for TikTok forces you to pass most of the
             | information that browser can collect from the end-user
             | before server responds:
             | 
             | Half of the parameters are stuff relating to the app
             | itself, or could be inferred from other sources like user-
             | agent. The other fingerprinting stuff (eg. canvas or webgl
             | fingerprinting) is basically industry standard and by no
             | means unique to tiktok. Even the claim that "browser can
             | collect from the end-user before server responds" doesn't
             | hold up to scrutiny, because there's no meaningful
             | difference between that, and browser check interstitials
             | (eg. the cloudflare checkbox), which fingerprint you before
             | letting you access the content. It's also unclear how
             | that's more sinister than the alternative approach of
             | sending telemetry/fingerprinting data to a separate
             | endpoint.
        
       | RexM wrote:
       | Is this VM somehow related to Lynx (their cross platform dev
       | tooling?)
       | 
       | https://lynxjs.org/
       | 
       | Also discussed on HN
       | 
       | https://news.ycombinator.com/item?id=43264957
        
       | weinzierl wrote:
       | Is there also a VM in their iOS app? I thought a VM would be
       | against Apple's policies?
        
         | xmodem wrote:
         | Apple's policies prevent using JIT compilation, they don't ban
         | VM's outright.
        
           | jacobp100 wrote:
           | This is the correct answer. They even expose JavaScript Core
           | to apps
        
         | Scaevolus wrote:
         | Their mobile apps have equivalent signature code, but it's
         | compiled to native binaries instead.
        
       | kleiba wrote:
       | I've been using a shitty streaming website whose player
       | interrupts the playback of a video in irregular intervals and
       | presents a cryptic error message. I've started looking into the
       | JavaScript code to see if I can't code up a work-around mechanism
       | (basically debugging their garbage implementation), and of course
       | (why actually?) their player code is also obfuscated.
       | 
       | And I've gotta say, emplying an AI assistant has proven to be an
       | invaluable help in trying to understand obfuscated code. It's
       | actually really cool to take a function of gobbledegook
       | JavaScript and ask the AI to rewrite it in a more canonical and
       | easily understandable way, with inline comments. Of course, there
       | are flaws every now and then, but the ability to do this has been
       | such a game changer for reverse engineering, IMO.
       | 
       | I can even ask to take a guess at finding better
       | variable/function names and the AI can infer from the code (maybe
       | has seen the unobfuscated libraries during training?) what this
       | code is actually doing on a high-level and turn something like
       | e.g(e.g) into player.initialize(player.state) which is nothing
       | short of amazing.
       | 
       | So for anyone doing similar work, I cannot recommend highly
       | enough to have an AI agent as another tool in your tool belt.
        
         | lukan wrote:
         | Which AI agents did you use?
        
           | kleiba wrote:
           | I've tried different ones, they all seem to do a great job.
        
             | klabetron wrote:
             | Out of curiosity (as someone disappointingly new to prompt
             | engineering), what's an example prompt you used with some
             | success?
        
               | esseph wrote:
               | Ask questions. Be disappointed in the outcomes.
               | 
               | Ask more questions. Get some right answers. Repeat.
               | 
               | Make question asking muscle get swole.
        
               | nurettin wrote:
               | Actually knowing the subject and presenting insights
               | gives me much better results than simply asking it to do
               | what I mean.
        
               | Loughla wrote:
               | For help with prompt engineering, take a graduate level
               | grant writing course. It teaches you how to ask the right
               | questions to get answers from humans and how to break
               | down complicated processes into bite size pieces; really
               | useable for llm's.
        
               | specialist wrote:
               | Heh. Probably also useful should a djinn ever grant you
               | three wishes.
        
             | sureIy wrote:
             | Could you name a couple?
        
             | ImPostingOnHN wrote:
             | next up is using AI to obfuscate it better in the first
             | place, and then the terrible code gets scraped and used in
             | further training, with an arms race ensuing, until all code
             | on the internet is unintelligible but somehow works and can
             | only be maintained by a specific AI that has a particularly
             | encoded form of insanity
        
             | titaphraz wrote:
             | > they all seem to do a great job
             | 
             | Yeah right.
        
         | saagarjha wrote:
         | Is it truly obfuscated, or just minified?
        
           | johann8384 wrote:
           | Well the example in the article was obfuscated with several
           | specific examples.
        
             | saagarjha wrote:
             | I mean the JavaScript the LLM reversed for them
        
         | poincaredisk wrote:
         | I'm surprised by this. As a professional reverse engineering
         | I've actually found LLMs to be terrible at deobfuscation of JS
         | (especially in the context of JS malware). But maybe my
         | requirements are higher and it's actually OK for occasional use
         | against weak packers?
        
           | Bilal_io wrote:
           | I've used it for small files and it did very well
           | prettifying, naming the variables and adding comments for
           | context. But I can imagine it doing a bad job with large
           | files.
        
           | ctoth wrote:
           | Have you seen this?
           | 
           | https://github.com/jehna/humanify
           | 
           | What they do is ground the LLM to the AST with Babel to
           | ensure you still get the same shape of AST out of your
           | deobfuscation pass. Probably this tool could be cleaned up,
           | made to work with multiple llm and parser backends, have its
           | prompts improved, &c.
        
       | sylware wrote:
       | What's terrible are the humans writing such software...
       | 
       | But if AI can help to fight those people's work, good for
       | humanity I guess.
       | 
       | That said... Is AI going to de-obfuscate/reverse engineer their
       | obsfuscated AI prompts or web apps?
        
       | SoKamil wrote:
       | > As this is a Javascript file executed on the web, it is
       | actually possible to replace the normal webmssdk.js with the
       | deobfuscated file and use TikTok normally.
       | 
       | > This can be achieved by using two browser extensions known as
       | Tampermonkey for executing custom code and CSP to disable CSP so
       | I can fetch files from blocked origins. This is so I can put
       | latestDeobf.js in my own file server and have it be fetched each
       | time, this is so I can easily edit the file and let the changes
       | take effect each time I refresh. This makes it much easier to
       | bebug when reversing functions.
       | 
       | I believe you can achieve the same effect without any 3rd party
       | extensions. You can use Local Overrides in Chrome DevTools.
       | 
       | Great work!
        
         | wutwutwat wrote:
         | You can also install some trusted certs and MITM the requests,
         | replacing the content with whatever you'd like
         | 
         | Likely overkill for this use case, but no matter the client,
         | you can in theory do whatever you want to any traffic up until
         | the point it leaves your network.
        
           | ImPostingOnHN wrote:
           | what toolset do you use for on-the-fly translation?
           | 
           | ad-hoc code, or something with a more structured workflow,
           | maybe?
           | 
           | this sounds like a fun thing to try, thanks for your time
        
             | 18172828286177 wrote:
             | See Burpsuite
        
             | SoKamil wrote:
             | Charles, Proxyman, or mitmproxy if you like open source +
             | terminal would do the job.
        
               | geoka9 wrote:
               | mitmproxy will even allow you to script the
               | intercept/override behavior, which can be really handy.
        
       | Wowfunhappy wrote:
       | ...can I ask a really stupid question? What is a VM in this
       | context?
       | 
       | I've used VM's for years to run Windows on top of macOS or Linux
       | on top of Windows or macOS on top of macOS when I need an
       | isolated testing environment. I also know that Java works via the
       | "Javascript Virtual Machine" which I've always thought of as
       | "Java code actually runs in its own lightweight operating system
       | on top of the host OS, which makes it OS-agnostic". The JVM can't
       | run on bare metal because it doesn't have hardware drivers, but
       | presumably it _could_ if you wrote those drivers.
       | 
       | But presumably the VM being discussed in TFA isn't that kind of
       | VM, right? Bytedance didn't write an operating system in
       | Javascript?
       | 
       | I've been seeing "VM" used in lots of contexts like this recently
       | and it makes me think I must be missing something, but it's the
       | sort of question I don't know how to Google. AIs have not been
       | helpful either, plus I don't trust them.
        
         | jacobp100 wrote:
         | Yes the VM discussed is similar to JVM
        
         | turtleyacht wrote:
         | Virtual Machine Decompiling:
         | https://github.com/LukasOgunfeitimi/TikTok-ReverseEngineerin...
         | 
         | And also VM223, with statements that do stuff to an array
         | "stack": https://github.com/LukasOgunfeitimi/TikTok-
         | ReverseEngineerin...
         | 
         | One obvious giveaway for a VM is laying out memory, or
         | processing some intermediate language. In this case, it could
         | be the latter.
         | 
         | In-browser, you have Chrome V8 running Javascript; that
         | Javascript could be running an interpreted environment where
         | abstractions are not purely business logic, but an execution
         | model separate from domain stuff: auth, video, user, etc.
         | 
         | By that observation, this C snippet is a VM:
         | char instruction = 'p'; /* or array */            if
         | (instruction == 'p') {
         | println("document.appendChild(...)");       }
         | 
         | If the program outputs to a vm.js file, it's kinda-sorta a
         | "VM." I would call it something else, maybe a generator of
         | sorts (for now). Just in my opinion, for me, if I were working
         | on a VM, the threshold of calling it that would be much higher
         | than the above.
         | 
         | On the other hand, if I had to comment _in the generated
         | Javascript_ debugging hints referring to execution stack or
         | stack pointers, it is kind of a VM idea.
        
         | yjftsjthsd-h wrote:
         | Nit:
         | 
         | > I also know that Java works via the "Javascript Virtual
         | Machine"
         | 
         |  _Java_ Virtual machine. That Java and JavaScript are named the
         | way they are is... basically a historical accident of a cross-
         | promotion gone too far, IMO. They aren 't really related (at
         | least, in the way that the name might imply).
         | 
         | Now to your real question. Virtual machines are _anything_ that
         | is one computer pretending to be another computer. Sometimes,
         | that 's an x86_64 PC pretending to be another x86_64 PC to run
         | a different OS. Sometimes that's an x86_64 PC pretending to be
         | a 50-year-old mainframe ( https://opensimh.org/ really shines
         | there). Sometimes it's an ARM laptop running macOS pretending
         | to be an x86_64 PC so it can run Windows. And, relevant here,
         | sometimes it's a phone pretending to be a machine that has
         | never actually existed in hardware. You can just make up an
         | imaginary machine that has any old characteristics you want.
         | Maybe it has a built-in high-level network card that magically
         | turns HTTP requests into responses without programs having to
         | implement HTTP themselves. Maybe it has an imaginary graphics
         | card that directly renders buttons. Maybe you imagine a CPU
         | that runs Java opcodes directly. Whatever it is, if you can
         | imagine a system and then write a program that emulates it, you
         | can make a virtual machine and run stuff in it.
        
           | Wowfunhappy wrote:
           | > Java Virtual machine. That Java and JavaScript are named
           | the way they are is... basically a historical accident of a
           | cross-promotion gone too far
           | 
           | Oops, that was a typo! Thank you.
        
         | ngneer wrote:
         | This is not a stupid question. I have seen other comments on
         | the thread that confuse the two terms and run with it. Better
         | to ask than assume. Especially since "VM" is the same label for
         | two or three distinct yet related notions in security.
         | 
         | The VM you are familiar with indeed can run an OS, and is
         | indeed not what TikTok does.
         | 
         | #1 VMM - hypervisor runs VMs
         | 
         | #2 JVM/.NET - efficient bytecode
         | 
         | #3 Obfuscation - obscure bytecode
         | 
         | The main thing is that for #2 and #3 the machine language
         | changes.
         | 
         | With "virtualization" as used in most contexts, involving a
         | virtual machine monitor, or hypervisor, one creates zero or
         | more new (virtual) machines, to execute on multiple software
         | recipes. All the recipes are written in the same (machine)
         | language, for all the machines. This can help security by
         | introducing isolation, for example, where one VM cannot read
         | memory belonging to another VM unless the hypervisor allows it.
         | 
         | With the "virtual machine" used for obfuscation, the machine
         | language changes. The system performs the same actions as it
         | would without obfuscation, but now it is performing those
         | actions using a different machine language. Behaviorally, the
         | result is the same. But, the new language makes it harder to
         | reverse engineer the behavior.
         | 
         | Stupid example:
         | 
         | Original instruction: MOV A,B
         | 
         | Under hypervisor virtualization, VM0 and VM1 will perform this
         | same instruction.
         | 
         | Under obfuscation virtualization, software will perform
         | instructions that amount to the same result, but are harder to
         | figure out. So, the MOV instruction is redefined and mapped
         | onto a new (virtual) machine. The new machine does not simply
         | leverage the existing instruction, rather an obfuscated
         | sequence. For example:
         | 
         | A <- B + C + D * E
         | 
         | A <- A - C
         | 
         | A <- A - D * E
         | 
         | Obviously, the above transformation is easy to understand and
         | undo. Others are harder to understand and undo. Look up
         | MOVfuscator to see how crazy things may get.
        
         | fmxsh wrote:
         | It sounds more advanced than it is.
         | 
         | It's a function wrapping the functionality of its host
         | environment. Then provides the caller with its own byte code
         | language to execute instructions. The virtual machine
         | translates those instructions to the corresponding real
         | functionality of the host environment (Javascript) upon
         | execution.
         | 
         | This particular case is sophisticated but the idea is simple.
         | 
         | Correct me if I'm wrong. I'm not knowledgeable in this. This is
         | my current understanding of it.
        
         | Jasper_ wrote:
         | The words "virtual machine" and "interpreter" are mostly
         | interchangeable; they both refer to a mechanism to run a
         | computer program not by compiling it to machine code, but to
         | some intermediate "virtual" machine code which will then get
         | run. The terminology is new, but the idea is older, "P-code"
         | was the term we used to use before it fell out of favor.
         | 
         | Sun popularized the term "virtual machine" when marketing Java
         | instead of using "interpreter" or "P-code", both for marketing
         | reasons (VMware had just come on the scene and was making tech
         | headlines), but also to get away from the perception of classic
         | interpreters being slower than native code since Java had a JIT
         | compiler. Just-in-time compilers that compiled to the host's
         | machine code at runtime were well-known in research domains at
         | the time, but were much less popular than the more dominant
         | execution models of "AST interpreter" and "bytecode
         | interpreter".
         | 
         | There might be some gatekeepers that suggest that "interpreter"
         | means AST interpreter (not true for the Python interpreter, for
         | instance), or VM always means JIT compiled (not true for Ruby,
         | which calls its bytecode-based MRI "RubyVM" in a few places),
         | but you can ignore them.
        
       | itsthecourier wrote:
       | this level of obfuscation in a social app is super suspicious
        
         | doublerabbit wrote:
         | I wouldn't say so, pretty common. It used to add a layer of
         | security. You should take a look at an casino app.
         | 
         | Did you know that every chip on a Chip & Pin bank card is
         | powered by a Java Virtual Machine that when you go to tap or
         | insert in to a card reader it's activated.
         | 
         | https://en.wikipedia.org/wiki/Java_Card
        
       | mrkramer wrote:
       | In my bookmarks I found this RE examples as well:
       | https://www.nullpt.rs/reverse-engineering-tiktok-vm-1
       | 
       | https://ibiyemiabiodun.com/projects/reversing-tiktok-pt2/
        
       | lazyeye wrote:
       | An oldie but a goodie. A guide to manipulating online comments to
       | hide/dilute/obsfucate undesirable commentary....
       | 
       | https://cryptome.org/2012/07/gent-forum-spies.htm
        
       ___________________________________________________________________
       (page generated 2025-04-21 23:01 UTC)