[HN Gopher] Reverse engineering the obfuscated TikTok VM
___________________________________________________________________
Reverse engineering the obfuscated TikTok VM
Author : xfeeefeee
Score : 380 points
Date : 2025-04-21 01:59 UTC (21 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| xfeeefeee wrote:
| The fascinating process of reverse engineering this VM is
| detailed here.
|
| TikTok uses a custom virtual machine (VM) as part of its
| obfuscation and security layers. This project includes tools to:
|
| Deobfuscate webmssdk.js that has the virtual machine.
|
| Decompile TikTok's virtual machine instructions into readable
| form.
|
| Script Inject Replace webmssdk.js with the deobfuscated VM
| injector.
|
| Sign URLs Generate signed URLs which can be used to perform auth-
| based requests eg. Post comments.
| noduerme wrote:
| Is calling a massive embedded JS obfuscator a "VM" a bit of a
| stretch? Ultimately it's not translating anything to a lower-
| level language.
|
| Still, I had no idea. This is really taking JS obfuscation to
| the next level.
|
| One kind of wonders, what is the purpose of that level of
| obfuscation? The naive take is that obfuscation is usually to
| protect intellectual property... but this is client-side code
| that wouldn't give away anything about their secret sauce
| algorithm.
| throwaway48476 wrote:
| VM obfuscation is a common technique for malware developers.
|
| The VM term is applied because the obfuscator creates a
| custom instruction set and executes custom byte code. This is
| generated per build.
| noduerme wrote:
| I appreciate you making the distinction that anything which
| creates a custom instruction set is thus a VM. I think
| that's the way a lot of people here who are currently at my
| throat seem to define it, so I'm glad you put it in clear
| terms. I would define it as a custom instruction set _plus_
| some sort of plug-in that allows those opcodes to be run
| closer to the metal than the language they 're written in.
| FWIW I'd call this thing more of an obfuscation framework.
| But maybe I'm just a dino. I am really glad you made this
| comment, though. It clarified for me why so many people
| went bananas when I said this wasn't a VM.
| MonkeyClub wrote:
| > Is calling a massive embedded JS obfuscator a "VM" a bit of
| a stretch? Ultimately it's not translating anything to a
| lower-level language.
|
| From the Repo's README:
|
| "TikTok is using a full-fledged bytecode VM, if you browse
| through it, it supports scopes, nested functions and
| exception handling. This isn't a typical VM and shows that it
| is definitely sophiscated."
| noduerme wrote:
| But that's basically an emulator of a VM, isn't it? It's
| like rewriting the Flash AVM2 into JS... it's still
| _running in JS_ whereas the _original_ VM was C++. It could
| JIT compile stuff but only because it literally was
| reserving memory that could overflow, and (semi-technical
| take here) from that advantage, of being closer to the
| metal, flowed all of the flaws in AVM2 that precipitated
| most of Adobe 's woes with Flash. A VM implant in a web
| page that uses a plugin like Java or Flash, to get around
| running browser-sandboxed code, which can take over
| physical memory, is far different from just emulating a VM
| in Javascript. I wouldn't call writing a ton of opcodes in
| JS, which resolved to JS functions, a "virtual machine",
| because it isn't reserving anything or doing anything that
| Javascript can't do. Someone correct me here if I'm
| wrong... this is just heavy-duty obfuscation.
|
| Also, one major purpose of a VM is to _improve_ performance
| over what 's available in the browser. If you use that as a
| measurement, this clearly doesn't fit that goal.
| gruez wrote:
| >But that's basically an emulator of a VM, isn't it?
|
| Emulators and VMs aren't mutually exclusive.
|
| >Also, one major purpose of a VM is to improve
| performance over what's available in the browser. If you
| use that as a measurement, this clearly doesn't fit that
| goal.
|
| And from your other comment:
|
| >I would define it as a custom instruction set plus some
| sort of plug-in that allows those opcodes to be run
| closer to the metal than the language they're written in.
|
| A virtual machine just means a machine that's virtual.
| All the other expectations you apply on top of it (eg.
| "improve performance over what's available in the
| browser") is totally irrelevant. The JVM clearly doesn't
| improve performance of java code than running natively,
| but nobody denies it's a virtual machine. The same goes
| for VMWare products ("VM" is literally in its name!),
| which executes x86 code but is further away from "the
| metal" that it's running on.
| userbinator wrote:
| You are replying to a comment that looks extremely unhuman.
| codetrotter wrote:
| It looks like OP filled out the text area alongside with
| the URL when submitting the post.
|
| HN takes that text and turns it into a comment. I've seen
| it happen before.
|
| The unfortunate outcome of that IMO is that sometimes text
| that makes sense as a description of a submission feels a
| bit out of place as a comment due to how they are worded.
| And these comments sometimes then end up getting downvoted.
|
| I wouldn't be completely sure it was not human written.
| Even though it feels a bit weird to read it as a comment.
| xfeeefeee wrote:
| > It looks like OP filled out the text area alongside
| with the URL when submitting the post. HN takes that text
| and turns it into a comment.
|
| Yeah, this is exactly what happened, but I decided to
| keep it rather than delete and filled it out more with
| the synopsis from the repo.
|
| Looking back at it, it really does look like an AI
| bulleted summary. I probably should have noted that the
| last part was indeed a quotation.
| dmitrygr wrote:
| What is the purpose of you posting a bad ChatGPT summary of the
| original post?
| xfeeefeee wrote:
| I quoted the synopsis from the readme thinking it would be
| helpful.
| godelski wrote:
| This seems like quite a lot of work to hide the code. What would
| the legitimate reasons for this be? Because it looks like it
| would make the program less optimized and more complexity just
| leads to more errors.
|
| I understand the desire to make it harder for bots, but 1) it
| doesn't seem to be effective and bots seem to be going a very
| different route 2) there's got to be better ways that are more
| effective. It's not like you're going to stop clones through this
| because clones can replicate by just seeing how things work and
| reverse engineer blackbox style.
| davidsojevic wrote:
| Making it harder for bots usually means that it drives up the
| cost for the bots to operate; so if they need to run in a
| headless browser to get around the anti-bot measures it might
| mean that it takes, for example, 1.5 seconds to execute a
| request as compared to the 0.1 seconds it would without them in
| place.
|
| On top of that 1.5 seconds is also that there is a much larger
| CPU and memory cost from having to run that browser compared to
| a simple direct HTTP request which is near negligible.
|
| So while you'll never truly defeat a sufficiently motivated
| actor, you may be able to drive their costs up high enough that
| it makes it difficult to enter the space or difficult to turn a
| profit if they're so inclined.
| godelski wrote:
| I understand the argument. You can't have perfect defense and
| speedbumps are quite effective. I'm not trying to disagree
| with that.
|
| But it does not seem like the solution is effective at
| mitigating bots. Presumably bots are going a different route
| considering how prolific they are, which warrants another
| solution. If they are going through this route then it
| certainly isn't effective either and also warrants another
| solution.
|
| It seems like this obscurification requires a fair amount of
| work, especially since you need to frequently update the code
| to rescramble it. Added complexity also increases risks for
| bugs and vulnerabilities, which ultimately undermine the
| whole endeavor.
|
| I'm trying to understand why this level of effort is worth
| the cost. (Other than nefarious reasons. Those ones are
| rather obvious)
| noduerme wrote:
| A generous take would be that they have their own internal GUI
| tools that make it easier for non-programmers to set up visual
| elements in this. That was historically the reason to invent
| VMs like Flash. A less generous take would account for the
| enormous potential for hiding nefarious code inside such a
| thing, and account for the nature of the government which
| deployed it, and conclude that it was a national security /
| defense project disguised as a candy-coated trojan horse.
| supriyo-biswas wrote:
| VM-based architectures are really common in the obfuscation
| space, which is why you have executable packers[1], JS
| packers[2] and bot management products[3][4] leveraging
| similar techniques.
|
| As for why the obfuscation is needed: bot management products
| suffer from a fundamental weakness in that ultimately, all of
| them simply collect static data from the environment,
| therefore it would make much more sense to make the steps
| involved as difficult to reverse engineer as possible. Once
| that is done, all you need to do is slightly change the
| schematics of your script every few weeks and publish a new
| bundle, and you've got yourself a pretty unsubvertible*
| protection scheme.
|
| Regarding the "trojan horse", I think someone is yet to show
| proof that it's a Javascript exploit.
|
| (*Unsubvertible is obviously relative, but raising the cost
| the attack, from say, $0.01/1000 requests to $10/1000
| requests would massively cut down on abuse.)
|
| [1] https://vmpsoft.com/
|
| [2] https://jscrambler.com/
|
| [3] https://github.com/neuroradiology/InsideReCaptcha
|
| [4] https://www.zenrows.com/blog/bypass-
| cloudflare#_qEu5MvVdnILJ...
| throwaway48476 wrote:
| Makes it easier to hide code that does browser fingerprinting.
| rfoo wrote:
| Google has been doing this since forever for recaptcha. And, to
| be fair, it seems to be fairly effectively for bot detection.
|
| https://github.com/neuroradiology/InsideReCaptcha
|
| > bots seem to be going a very different route
|
| If the "very different route" means running a headless browser,
| then it's a success for this tech. Because the bot must run a
| blackbox JS now, and this gives people a whole new street of
| ways to run bot detection, using the bot's CPU.
| godelski wrote:
| Okay... but those bots exist... and in high numbers... By
| "very different route" I mean "measure to _effectively_ stop
| the bots " (or dramatically reduce). It seems like if they're
| using a headless browser then they're still being quite
| effective in accomplishing their goals.
| Scaevolus wrote:
| Obfuscation is one part of defense in depth. Tiktok also has a
| variety of captchas to block scrapers, independent of this.
|
| None of it's perfect, and they can be worked around, but by
| providing a barrier you've restricted some of the bad actors
| (spambots, scrapers) from acting at all.
|
| It's easier to deal with 100 spambots than 1000!
| like_any_other wrote:
| Unless the scrapers are DDoSing the site, I refuse to
| consider the downloading of publicly posted data as
| malicious. It shows how captured the conversation has become
| by corporate interests, that viewing or storing data posted
| free of charge, publicly, by their users, in a way not
| approved by that corporation, is seen as malicious, and the
| only _morally_ allowed way to view it is to use their
| spyware-laden client.
| Scaevolus wrote:
| What if the user has disabled downloads of a video? Should
| the creator (and copyright owner) of a piece of media not
| be allowed even token attempts to prevent copying?
| ndriscoll wrote:
| No because that interferes with fair use. If someone
| publicly posts a video, everyone has the right to copy it
| without any permission or awareness from the original
| author for things like commentary/criticism (it would be
| silly to require the copyright owner's permission to
| criticise a work!).
| areyourllySorry wrote:
| this is also a measure against bots that write, not just
| those that read
| ronsor wrote:
| There is no legitimate reason for a social media platform to
| employ this much obfuscation.
| miohtama wrote:
| It's to keep bots away and not turn to be another Twitter.
| dns_snek wrote:
| That's probably not the goal. There are bots advertising
| illegal services (e.g. ads for "hacking services", illegal
| drugs) in most comment sections. If you report these
| comments, 99.9% of the time the report will be rejected with
| "no violations found" and the spam stays up.
| bolognafairy wrote:
| That doesn't mean that it's "probably not the intention".
| dns_snek wrote:
| The balance of evidence suggests otherwise. If they cared
| about spam bots they would take action when spammers are
| handed to them on a silver platter. The kinds of spammers
| who will leave 30 identical comments advertising illegal
| services, not some weird moderation corner case.
|
| If you ever end up on a video that's related to drugs,
| there will be entire chains of bots just advertising to
| each other and TikTok won't find any violations when
| reported. But sure, I'm sure they care a whole lot about
| not ending up like Twitter.
| TheDong wrote:
| So you're saying that TikTok's support team doing a poor
| job of handling reports is proof that the engineering
| team wasn't tasked with reducing spam by writing code
| obfuscation?
|
| TikTok is a huge company, evidence of what the support
| department does or doesn't do has only minor bearing on
| the whole company, and basically none on the engineering
| department.
|
| The thing that seems most likely to me is that they care
| about spam, the engineering department did this one
| thing, and the support department is either overworked or
| cares less. Or really efficient which is why you only see
| "a lot of spam", not "literally nothing but spam".
| wpietri wrote:
| A large company is much less cohesive than you realize.
| You can't reliably reason about the goals of one part
| because another part isn't consistent. This particular
| difference could easily be explained by insufficient
| funding to moderation, which is endemic in social media.
| lazyeye wrote:
| Because bots cant interact with web pages at the browser
| level like humans do...
| krackers wrote:
| The legitimate reason could be bot protection, the same way
| recaptcha uses a similar VM technique for obfuscation.
| supriyo-biswas wrote:
| See my other comment on this thread:
| https://news.ycombinator.com/item?id=43748994
| vasco wrote:
| You not being able to come up with one is different from there
| not being any possible reason.
| yard2010 wrote:
| This is not a social media platform but a government backed
| tool for doing stuff for the government.
| fidotron wrote:
| If you believe this you underestimate how adversarial the
| software world really is. TikTok will be on the receiving end
| of botnets by everything from commercial entities, state backed
| groups and criminals.
|
| They won't be betting that this stops that entirely, but it
| adds a layer of friction that is easy for them to change on a
| continuous basis. These things are also very good for leaving
| honeypots in where if someone is found to still be using
| something after a change you can tag them as a bot or otherwise
| hacking. Both of those approaches are also widely used in game
| anti-cheat mechanisms, and as shown there the lengths people
| will go to anyway are completely insane.
| fmxsh wrote:
| It's an excellent strategy for the reasons you mention. And a
| kind of "security by principle of least privilege".
| lazyeye wrote:
| Nah..I agree with the parent comment, there is simply no
| legitimate reason for a social media app to employ this level
| of obsfucation.
| davidsojevic wrote:
| Very impressive work! I always enjoy a good write up about
| reverse engineering efforts and yours was really simple to
| follow.
|
| Many popular/large websites and bot protection services usually
| have environment checking as a baseline and mouse-movement
| tracking in some of the more aggressive anti-bot checks.
|
| It's always interesting to see how long it takes from when the
| measures have been defeated/publicised until the service ends up
| making changes to their mechanism to make you start over
| (hopefully not from scratch).
| xfeeefeee wrote:
| All credit should go to Lukas
| https://github.com/LukasOgunfeitimi
|
| I was sharing this here since I thought it was a great write
| up, but did not intend to pass it off as my own!
|
| There is certainly always a good amount of push and pull,
| though my personal concern as a contributor to yt-dlp under
| another alias is more about archival of the underlying media
| rather than automating things like comments.
|
| YouTube also uses an interesting scheme for authenticating
| requests for media as well which required implementing a very
| basic JavaScript interpreter within Python for yt-dlp too. I
| expect this kind of thing to continue to become even more
| common and complicated.
| worldsavior wrote:
| That's a very strong obfuscation. Takes a lot of work to
| deobfuscate such a thing. Great writeup.
| domfie wrote:
| Looks like a lot of work. I recently discovered webcrack and the
| tool jehna/humanify for such deobfuscate tasks
| 3abiton wrote:
| It could be interesting to see a comparison to OP's work.
| 0xDEADFED5 wrote:
| this is cool. i briefly worked on a TikTok bot a while back and
| it was a huge pain in the ass.
| heinternets wrote:
| Is TikTok so obfuscated to prevent people from knowing the full
| extent of data collection and device fingerprinting?
| gruez wrote:
| 1. Practically speaking all this javascript fingerprinting
| pales in comparison to what native apps have access to. Most
| people aren't using tiktok on their browsers, and the browser
| version heavily pushes you to using the app, so you should be
| far more worried about whatever's happening in the app.
|
| 2. Despite tiktok having a giant target painted on its back for
| its perceived connections to the CCP, I haven't really seen any
| evidence that it does any more tracking/fingerprinting that
| most other websites (eg. facebook) or security services (eg.
| cloudflare or recaptcha) already do.
| nicce wrote:
| > 2. Despite tiktok having a giant target painted on its back
| for its perceived connections to the CCP, I haven't really
| seen any evidence that it does any more
| tracking/fingerprinting that most other websites (eg.
| facebook) or security services (eg. cloudflare or recaptcha)
| already do.
|
| Take a look for request parameters in TikTok vs. Instagram
| for example.
|
| Every request for TikTok forces you to pass most of the
| information that browser can collect from the end-user before
| server responds:
|
| https://www.nullpt.rs/reverse-engineering-tiktok-vm-1
| gruez wrote:
| >Every request for TikTok forces you to pass most of the
| information that browser can collect from the end-user
| before server responds:
|
| Half of the parameters are stuff relating to the app
| itself, or could be inferred from other sources like user-
| agent. The other fingerprinting stuff (eg. canvas or webgl
| fingerprinting) is basically industry standard and by no
| means unique to tiktok. Even the claim that "browser can
| collect from the end-user before server responds" doesn't
| hold up to scrutiny, because there's no meaningful
| difference between that, and browser check interstitials
| (eg. the cloudflare checkbox), which fingerprint you before
| letting you access the content. It's also unclear how
| that's more sinister than the alternative approach of
| sending telemetry/fingerprinting data to a separate
| endpoint.
| RexM wrote:
| Is this VM somehow related to Lynx (their cross platform dev
| tooling?)
|
| https://lynxjs.org/
|
| Also discussed on HN
|
| https://news.ycombinator.com/item?id=43264957
| weinzierl wrote:
| Is there also a VM in their iOS app? I thought a VM would be
| against Apple's policies?
| xmodem wrote:
| Apple's policies prevent using JIT compilation, they don't ban
| VM's outright.
| jacobp100 wrote:
| This is the correct answer. They even expose JavaScript Core
| to apps
| Scaevolus wrote:
| Their mobile apps have equivalent signature code, but it's
| compiled to native binaries instead.
| kleiba wrote:
| I've been using a shitty streaming website whose player
| interrupts the playback of a video in irregular intervals and
| presents a cryptic error message. I've started looking into the
| JavaScript code to see if I can't code up a work-around mechanism
| (basically debugging their garbage implementation), and of course
| (why actually?) their player code is also obfuscated.
|
| And I've gotta say, emplying an AI assistant has proven to be an
| invaluable help in trying to understand obfuscated code. It's
| actually really cool to take a function of gobbledegook
| JavaScript and ask the AI to rewrite it in a more canonical and
| easily understandable way, with inline comments. Of course, there
| are flaws every now and then, but the ability to do this has been
| such a game changer for reverse engineering, IMO.
|
| I can even ask to take a guess at finding better
| variable/function names and the AI can infer from the code (maybe
| has seen the unobfuscated libraries during training?) what this
| code is actually doing on a high-level and turn something like
| e.g(e.g) into player.initialize(player.state) which is nothing
| short of amazing.
|
| So for anyone doing similar work, I cannot recommend highly
| enough to have an AI agent as another tool in your tool belt.
| lukan wrote:
| Which AI agents did you use?
| kleiba wrote:
| I've tried different ones, they all seem to do a great job.
| klabetron wrote:
| Out of curiosity (as someone disappointingly new to prompt
| engineering), what's an example prompt you used with some
| success?
| esseph wrote:
| Ask questions. Be disappointed in the outcomes.
|
| Ask more questions. Get some right answers. Repeat.
|
| Make question asking muscle get swole.
| nurettin wrote:
| Actually knowing the subject and presenting insights
| gives me much better results than simply asking it to do
| what I mean.
| Loughla wrote:
| For help with prompt engineering, take a graduate level
| grant writing course. It teaches you how to ask the right
| questions to get answers from humans and how to break
| down complicated processes into bite size pieces; really
| useable for llm's.
| specialist wrote:
| Heh. Probably also useful should a djinn ever grant you
| three wishes.
| sureIy wrote:
| Could you name a couple?
| ImPostingOnHN wrote:
| next up is using AI to obfuscate it better in the first
| place, and then the terrible code gets scraped and used in
| further training, with an arms race ensuing, until all code
| on the internet is unintelligible but somehow works and can
| only be maintained by a specific AI that has a particularly
| encoded form of insanity
| titaphraz wrote:
| > they all seem to do a great job
|
| Yeah right.
| saagarjha wrote:
| Is it truly obfuscated, or just minified?
| johann8384 wrote:
| Well the example in the article was obfuscated with several
| specific examples.
| saagarjha wrote:
| I mean the JavaScript the LLM reversed for them
| poincaredisk wrote:
| I'm surprised by this. As a professional reverse engineering
| I've actually found LLMs to be terrible at deobfuscation of JS
| (especially in the context of JS malware). But maybe my
| requirements are higher and it's actually OK for occasional use
| against weak packers?
| Bilal_io wrote:
| I've used it for small files and it did very well
| prettifying, naming the variables and adding comments for
| context. But I can imagine it doing a bad job with large
| files.
| ctoth wrote:
| Have you seen this?
|
| https://github.com/jehna/humanify
|
| What they do is ground the LLM to the AST with Babel to
| ensure you still get the same shape of AST out of your
| deobfuscation pass. Probably this tool could be cleaned up,
| made to work with multiple llm and parser backends, have its
| prompts improved, &c.
| sylware wrote:
| What's terrible are the humans writing such software...
|
| But if AI can help to fight those people's work, good for
| humanity I guess.
|
| That said... Is AI going to de-obfuscate/reverse engineer their
| obsfuscated AI prompts or web apps?
| SoKamil wrote:
| > As this is a Javascript file executed on the web, it is
| actually possible to replace the normal webmssdk.js with the
| deobfuscated file and use TikTok normally.
|
| > This can be achieved by using two browser extensions known as
| Tampermonkey for executing custom code and CSP to disable CSP so
| I can fetch files from blocked origins. This is so I can put
| latestDeobf.js in my own file server and have it be fetched each
| time, this is so I can easily edit the file and let the changes
| take effect each time I refresh. This makes it much easier to
| bebug when reversing functions.
|
| I believe you can achieve the same effect without any 3rd party
| extensions. You can use Local Overrides in Chrome DevTools.
|
| Great work!
| wutwutwat wrote:
| You can also install some trusted certs and MITM the requests,
| replacing the content with whatever you'd like
|
| Likely overkill for this use case, but no matter the client,
| you can in theory do whatever you want to any traffic up until
| the point it leaves your network.
| ImPostingOnHN wrote:
| what toolset do you use for on-the-fly translation?
|
| ad-hoc code, or something with a more structured workflow,
| maybe?
|
| this sounds like a fun thing to try, thanks for your time
| 18172828286177 wrote:
| See Burpsuite
| SoKamil wrote:
| Charles, Proxyman, or mitmproxy if you like open source +
| terminal would do the job.
| geoka9 wrote:
| mitmproxy will even allow you to script the
| intercept/override behavior, which can be really handy.
| Wowfunhappy wrote:
| ...can I ask a really stupid question? What is a VM in this
| context?
|
| I've used VM's for years to run Windows on top of macOS or Linux
| on top of Windows or macOS on top of macOS when I need an
| isolated testing environment. I also know that Java works via the
| "Javascript Virtual Machine" which I've always thought of as
| "Java code actually runs in its own lightweight operating system
| on top of the host OS, which makes it OS-agnostic". The JVM can't
| run on bare metal because it doesn't have hardware drivers, but
| presumably it _could_ if you wrote those drivers.
|
| But presumably the VM being discussed in TFA isn't that kind of
| VM, right? Bytedance didn't write an operating system in
| Javascript?
|
| I've been seeing "VM" used in lots of contexts like this recently
| and it makes me think I must be missing something, but it's the
| sort of question I don't know how to Google. AIs have not been
| helpful either, plus I don't trust them.
| jacobp100 wrote:
| Yes the VM discussed is similar to JVM
| turtleyacht wrote:
| Virtual Machine Decompiling:
| https://github.com/LukasOgunfeitimi/TikTok-ReverseEngineerin...
|
| And also VM223, with statements that do stuff to an array
| "stack": https://github.com/LukasOgunfeitimi/TikTok-
| ReverseEngineerin...
|
| One obvious giveaway for a VM is laying out memory, or
| processing some intermediate language. In this case, it could
| be the latter.
|
| In-browser, you have Chrome V8 running Javascript; that
| Javascript could be running an interpreted environment where
| abstractions are not purely business logic, but an execution
| model separate from domain stuff: auth, video, user, etc.
|
| By that observation, this C snippet is a VM:
| char instruction = 'p'; /* or array */ if
| (instruction == 'p') {
| println("document.appendChild(...)"); }
|
| If the program outputs to a vm.js file, it's kinda-sorta a
| "VM." I would call it something else, maybe a generator of
| sorts (for now). Just in my opinion, for me, if I were working
| on a VM, the threshold of calling it that would be much higher
| than the above.
|
| On the other hand, if I had to comment _in the generated
| Javascript_ debugging hints referring to execution stack or
| stack pointers, it is kind of a VM idea.
| yjftsjthsd-h wrote:
| Nit:
|
| > I also know that Java works via the "Javascript Virtual
| Machine"
|
| _Java_ Virtual machine. That Java and JavaScript are named the
| way they are is... basically a historical accident of a cross-
| promotion gone too far, IMO. They aren 't really related (at
| least, in the way that the name might imply).
|
| Now to your real question. Virtual machines are _anything_ that
| is one computer pretending to be another computer. Sometimes,
| that 's an x86_64 PC pretending to be another x86_64 PC to run
| a different OS. Sometimes that's an x86_64 PC pretending to be
| a 50-year-old mainframe ( https://opensimh.org/ really shines
| there). Sometimes it's an ARM laptop running macOS pretending
| to be an x86_64 PC so it can run Windows. And, relevant here,
| sometimes it's a phone pretending to be a machine that has
| never actually existed in hardware. You can just make up an
| imaginary machine that has any old characteristics you want.
| Maybe it has a built-in high-level network card that magically
| turns HTTP requests into responses without programs having to
| implement HTTP themselves. Maybe it has an imaginary graphics
| card that directly renders buttons. Maybe you imagine a CPU
| that runs Java opcodes directly. Whatever it is, if you can
| imagine a system and then write a program that emulates it, you
| can make a virtual machine and run stuff in it.
| Wowfunhappy wrote:
| > Java Virtual machine. That Java and JavaScript are named
| the way they are is... basically a historical accident of a
| cross-promotion gone too far
|
| Oops, that was a typo! Thank you.
| ngneer wrote:
| This is not a stupid question. I have seen other comments on
| the thread that confuse the two terms and run with it. Better
| to ask than assume. Especially since "VM" is the same label for
| two or three distinct yet related notions in security.
|
| The VM you are familiar with indeed can run an OS, and is
| indeed not what TikTok does.
|
| #1 VMM - hypervisor runs VMs
|
| #2 JVM/.NET - efficient bytecode
|
| #3 Obfuscation - obscure bytecode
|
| The main thing is that for #2 and #3 the machine language
| changes.
|
| With "virtualization" as used in most contexts, involving a
| virtual machine monitor, or hypervisor, one creates zero or
| more new (virtual) machines, to execute on multiple software
| recipes. All the recipes are written in the same (machine)
| language, for all the machines. This can help security by
| introducing isolation, for example, where one VM cannot read
| memory belonging to another VM unless the hypervisor allows it.
|
| With the "virtual machine" used for obfuscation, the machine
| language changes. The system performs the same actions as it
| would without obfuscation, but now it is performing those
| actions using a different machine language. Behaviorally, the
| result is the same. But, the new language makes it harder to
| reverse engineer the behavior.
|
| Stupid example:
|
| Original instruction: MOV A,B
|
| Under hypervisor virtualization, VM0 and VM1 will perform this
| same instruction.
|
| Under obfuscation virtualization, software will perform
| instructions that amount to the same result, but are harder to
| figure out. So, the MOV instruction is redefined and mapped
| onto a new (virtual) machine. The new machine does not simply
| leverage the existing instruction, rather an obfuscated
| sequence. For example:
|
| A <- B + C + D * E
|
| A <- A - C
|
| A <- A - D * E
|
| Obviously, the above transformation is easy to understand and
| undo. Others are harder to understand and undo. Look up
| MOVfuscator to see how crazy things may get.
| fmxsh wrote:
| It sounds more advanced than it is.
|
| It's a function wrapping the functionality of its host
| environment. Then provides the caller with its own byte code
| language to execute instructions. The virtual machine
| translates those instructions to the corresponding real
| functionality of the host environment (Javascript) upon
| execution.
|
| This particular case is sophisticated but the idea is simple.
|
| Correct me if I'm wrong. I'm not knowledgeable in this. This is
| my current understanding of it.
| Jasper_ wrote:
| The words "virtual machine" and "interpreter" are mostly
| interchangeable; they both refer to a mechanism to run a
| computer program not by compiling it to machine code, but to
| some intermediate "virtual" machine code which will then get
| run. The terminology is new, but the idea is older, "P-code"
| was the term we used to use before it fell out of favor.
|
| Sun popularized the term "virtual machine" when marketing Java
| instead of using "interpreter" or "P-code", both for marketing
| reasons (VMware had just come on the scene and was making tech
| headlines), but also to get away from the perception of classic
| interpreters being slower than native code since Java had a JIT
| compiler. Just-in-time compilers that compiled to the host's
| machine code at runtime were well-known in research domains at
| the time, but were much less popular than the more dominant
| execution models of "AST interpreter" and "bytecode
| interpreter".
|
| There might be some gatekeepers that suggest that "interpreter"
| means AST interpreter (not true for the Python interpreter, for
| instance), or VM always means JIT compiled (not true for Ruby,
| which calls its bytecode-based MRI "RubyVM" in a few places),
| but you can ignore them.
| itsthecourier wrote:
| this level of obfuscation in a social app is super suspicious
| doublerabbit wrote:
| I wouldn't say so, pretty common. It used to add a layer of
| security. You should take a look at an casino app.
|
| Did you know that every chip on a Chip & Pin bank card is
| powered by a Java Virtual Machine that when you go to tap or
| insert in to a card reader it's activated.
|
| https://en.wikipedia.org/wiki/Java_Card
| mrkramer wrote:
| In my bookmarks I found this RE examples as well:
| https://www.nullpt.rs/reverse-engineering-tiktok-vm-1
|
| https://ibiyemiabiodun.com/projects/reversing-tiktok-pt2/
| lazyeye wrote:
| An oldie but a goodie. A guide to manipulating online comments to
| hide/dilute/obsfucate undesirable commentary....
|
| https://cryptome.org/2012/07/gent-forum-spies.htm
___________________________________________________________________
(page generated 2025-04-21 23:01 UTC)