[HN Gopher] Reverse Engineering TikTok's VM Obfuscation
       ___________________________________________________________________
        
       Reverse Engineering TikTok's VM Obfuscation
        
       Author : hazebooth
       Score  : 569 points
       Date   : 2022-12-23 19:36 UTC (1 days ago)
        
 (HTM) web link (nullpt.rs)
 (TXT) w3m dump (nullpt.rs)
        
       | mhasbini wrote:
       | Deobfuscated script without the vm part:
       | https://gist.github.com/mhasbini/f9269d230ed8eb6dfdbb1bd1be9...
        
       | Aperocky wrote:
       | Isn't the same concept also used in Youtube? I believe a python
       | mock of the equivalent VM exist in youtube-dl.
        
         | mdaniel wrote:
         | I recall that discussion recently, and thus just happen to have
         | it handy:
         | 
         | a very, very specialized "regex" based JS evaluator that
         | presumably did just enough to make the YT one run:
         | https://github.com/ytdl-org/youtube-dl/blob/2021.12.17/youtu...
         | 
         | and its callsite: https://github.com/ytdl-org/youtube-
         | dl/blob/2021.12.17/youtu...
         | 
         | So the short version is that I would not classify that as a VM,
         | and I don't even believe it's obfuscated. Perhaps there are
         | other extractors that do what you're describing, I didn't go
         | looking
        
         | linux2647 wrote:
         | IIRC not exactly. YouTube provides some arbitrary JavaScript
         | that must be evaluated as a form of a challenge. It changes
         | with every page request, but it's just a set of math
         | operations. It's easier to evaluate the JS than to statically
         | analyze it
        
       | apienx wrote:
       | Solid case! Thanks for taking the time to write it up.
       | 
       | Those who care and have to use TikTok can probably add their own
       | virtualization layer (and tolerate the hit in cost/performance).
        
         | chinathrow wrote:
         | No one _has_ to use social media.
        
           | QuantumGood wrote:
           | Wouldn't an example be a job that requires it? Are you
           | attempting a meta comment, and really mean something like
           | "anyone can quit a job that requires social media usage"?
        
             | jesuspiece wrote:
             | They're trying to be unique and cool by denouncing the use
             | of social media
        
       | antiviral wrote:
       | This is excellent work.
       | 
       | It also shows how Tiktok _may_ be in violation of several US /EU
       | privacy laws. I really wonder now who this data is shared with.
       | Perhaps someone should bring this article to the FTC's attention
       | for further review.
        
       | codedokode wrote:
       | It is interesting, that while technologies like canvas, WebGL or
       | WebRTC were intented for other purposes, their main usage became
       | fingerprinting. For example, WebGL provides valuable information
       | about GPU model and its drivers.
       | 
       | This shows how browser developers race to provide new features
       | ignoring privacy impact.
       | 
       | I don't understand why features that allow fingerprinting
       | (reading back canvas pixels or GPU buffers) are not hidden behind
       | a permission.
        
         | PetahNZ wrote:
         | Come on, it's not their main usage... An intentional side
         | effect maybe, but their main usage is clear.
        
           | 0xy wrote:
           | If something is used 99% of the time for tracking and 1% of
           | the time for genuine useful reasons, it's safe to say it's a
           | tracking mechanism.
           | 
           | Intent is irrelevant, the APIs are fundamentally insecure.
           | Google directly benefits from this financially.
        
         | ivoras wrote:
         | Of course it's not that simple.
         | 
         | In most parts of the world, if a person is in a public space,
         | anyone can take a photo of that person, including shop owners.
         | This photo could be considered as a type of "fingerprint" for
         | that person. The only important difference is that in some
         | countries, you are not allowed make money off of such photos.
         | 
         | The Internet is a lot like a big public space, and possibly
         | worse - while you are using certain services (web pages or
         | apps), it might be argued that you are actually "on premises"
         | for that service provider.
         | 
         | The best we can do now is more and more education about what
         | can go wrong with such data collection.
        
           | ajsnigrutin wrote:
           | Yes, but taking photos is expensive, fingerprinting online is
           | cheap. Also, there's a difference between taking a photo of
           | the eiffel tower and taking a photo of a bunch of other
           | tourists there (legal), or intentionally targeting and
           | photographing an individual and creating a database of those
           | photos (illegal in most countries).
        
         | fxtentacle wrote:
         | It's because the developer of the browser needs fingerprinting
         | for their ads.
         | 
         | I don't think Chrome accidentally exposed data that Google
         | wanted.
        
           | IshKebab wrote:
           | Please don't spread obviously untrue conspiracy theories.
           | 
           | The main reason is that it's _really hard_ to avoid
           | fingerprinting (while providing rich features like WebGL and
           | WebRTC anyway).
           | 
           | A secondary reason is that web browsers started off from a
           | position of leaking fingerprint data all over the place so
           | there's not much incentive to care about it for new features.
           | 
           | You might be interested in this effort to reduce
           | fingerprinting: https://developer.chrome.com/en/docs/privacy-
           | sandbox/privacy...
           | 
           | (The real conspiracy is that Google added logins to Chrome
           | specifically so that they _don 't_ have to rely on
           | fingerprinting. They have a huge incentive to stop
           | fingerprinting because it leaves them as the only entity that
           | can track users.)
        
           | danielheath wrote:
           | I thought the developer of the browser is the only ad
           | provider that _doesn't_ need it (since they have other,
           | better ways to get that intel which their competitors do
           | not).
        
             | asdfghjkjhg wrote:
             | they (google) did try.
             | 
             | that's the profile icon you see on your google-chrome UI.
             | 
             | but only fools use that feature.
        
               | somekyle2 wrote:
               | what makes someone who uses that feature a "fool"? Some
               | users don't particularly mind being tracked.
        
               | Cockbrand wrote:
               | Also, it's very convenient in a work context if your
               | employer uses G Suite/Workspace. I don't have anything to
               | hide work-wise, and I do everything else in incognito
               | windows.
        
           | supriyo-biswas wrote:
           | The fly in the ointment with this theory is why Apple (or
           | even Mozilla) would expose the same kind of information.
           | Apple has only recently started experimenting with ads, and
           | their ads are limited to the apps that they control.
           | 
           | The more benign explanation would be to allow developers to
           | work around device-specific or browser-specific bugs.
           | 
           | (I'm aware Apple changes the GPU Model to "Apple GPU",
           | however they do expose a ton of other properties that make it
           | possible to fingerprint a device.)
        
             | jakear wrote:
             | Apple devices are in fact fairly difficult to fingerprint.
             | In my experiments [1] all instances of the same hardware
             | model (on iOS, iPadOS, and macOS) give the same
             | fingerprint, so the best a tracker can get is "uses iPhone
             | 14". Better than nothing, but not terribly unique.
             | 
             | [1] fingrprintr.pages.dev
        
               | [deleted]
        
               | [deleted]
        
             | RobotToaster wrote:
             | Isn't Mozilla's main source of income from Google?
        
             | threatofrain wrote:
             | Continuing the push the browser to be a general app
             | platform is the only way it can survive against native
             | experience, which is already eating into the enthusiasm for
             | the web. It seems like the trend for consumer companies is
             | to _maybe_ launch first on the web for velocity but
             | eventually migrate to native experiences.
             | 
             | I wonder to what degree we can enable hardware performance
             | without leaking user data.
        
             | camyule wrote:
             | Firefox do have a mechanism to limit the amount of data
             | being leaked for fingerprinting, but it's disabled by
             | default: https://support.mozilla.org/en-US/kb/firefox-
             | protection-agai...
        
               | philliphaydon wrote:
               | Wow I just realised I've had this enabled for... since I
               | first remember the feature announced, and the internet
               | hasn't broken.
        
               | cmeacham98 wrote:
               | They're not that big of a deal, but my two biggest
               | annoyances with RFP:
               | 
               | 1. prefers-color-scheme is completely broken, _even in
               | the dev tools_. Mozilla refuses to fix this in any way,
               | it is allegedly  "by design" that you have to disable all
               | RFP protection if you're a web dev and need to test the
               | dark color scheme of your website.
               | 
               | 2. Similarly, RFP always vends your timezone as UTC with
               | no way to change.
        
               | arein3 wrote:
               | They could add switches for individual features to mask
               | on a hidden/advanced menu
        
               | cmeacham98 wrote:
               | Mozilla refuses to add _any_ toggle to disable RFP's
               | control over features it touches, including even an
               | about:config entry.
               | 
               | See example bugzilla:
               | https://bugzilla.mozilla.org/show_bug.cgi?id=1535189
               | 
               | My "fix" for this involves using a janky old version of
               | an addon that attempts to muck with the CSS/JS to
               | reproduce the effect.
        
               | nightpool wrote:
               | that's a great way to get even more fingerprinting
               | potential, each additional switch is another bit of
               | identification on top of the actual fingerprint itself.
        
         | madeofpalk wrote:
         | > This shows how browser developers race to provide new
         | features ignoring privacy impact.
         | 
         | I think it showed how many years ago browser vendors were naive
         | with understanding how this tech could be misused.
         | 
         | These days I think browser vendors are very much aware of it
         | and will frequently block features or proposals that they feel
         | compromise on privacy and/or could be used as a tracking
         | vector, especially Firefox and Safari. Sort this list
         | https://mozilla.github.io/standards-positions/ by _Mozilla
         | Position_ to see the reason they reject /refuse to implement
         | standards and proposals.
        
         | jsnell wrote:
         | It is absurd to claim that the main use of WebRTC is
         | fingerprinting. Especially during the pandemic the world pretty
         | much ran on WebRTC. Real-time media is clearly a pretty core
         | functionality for the web to be a serious application platform,
         | it wasn't just some kind of a trojan horse for tracking.
         | 
         | Now, it is true that a lot of older web APIs do expose too much
         | fingerprinting surface. But the design sensibilities having
         | changed a lot over time, it's just not the case that you can
         | make statements about what browser developers do now based on
         | what designs from a decade or two ago look like. These days
         | privacy is a top issue when it comes to any new browser APIs.
         | 
         | But let's take your question at face value: why aren't
         | thesespecific things behind a permission dialog? Because the
         | permissions would be totally unactionable to a normal user.
         | "This page wants to send you notifications" or "this page wants
         | to use the microphone" is understandable. "This page wants to
         | read pixels from a canvas" isn't. If you go the permission
         | route, the options are to either a) teach users that they need
         | to click through nonsensical permission dialogs, with all the
         | obvious downsides; b) make the notifications so scare or the
         | permissions so inaccessible that the features might as well not
         | exist. And the latter would be bad! Because the legit use cases
         | for e.g. reading from a canvas _do_ exist; they 're just pretty
         | rare.
         | 
         | The Privacy Sandbox approach to this is to track and limit how
         | much entropy a site is extracting via these kinds of side
         | channels. So if you legit need to read canvas pixels, you'll
         | have to give up on other features that could leak
         | fingerprinting data. (I personally don't really believe in that
         | approach will work, but it is at least principled. What I'd
         | like to see instead is limiting the use of these APIs to
         | situations where the site has a stable identifier for the user
         | anyway. But that requires getting away from implementing auth
         | with cookies as opaque blobs of data with unknown semantics,
         | and moving to some kind of proper session support where the
         | browsers understands the semantics of signed-in session, and
         | it's made clear to users when they're signing in somewhere and
         | where they're signed in right now. And then you can make a lot
         | better tradeoffs with limiting the fingerprinting surface in
         | the non-signed in cases.)
        
           | psychphysic wrote:
           | Do you mean more websites use webRTC for legitimate purposes
           | than for fingerprinting? Or more instances of it being
           | activated is legitimate or more traffic is legitimate (probs
           | true given bandwidth needed for audio video).
           | 
           | But I suspect by the other two metrics it's correct to say
           | most uses are to fingerprint.
        
           | trifurcate wrote:
           | > "This page wants to send you notifications" or "this page
           | wants to use the microphone" is understandable. "This page
           | wants to read pixels from a canvas" isn't.
           | 
           | Yes, it is. Tor Browser already does this: https://www.bleeps
           | tatic.com/content/posts/2017/10/30/CanvasF...
           | 
           | That specific wording may be a touch too verbose for the
           | average end user, but it's not impossible nor is it strange.
           | Just include a note about how this is 99% likely a
           | fingerprinting measure; option b) isn't so bad in this case.
           | Of course, due to the nature of how fingerprinting works, the
           | absolute breadth of features that would be gated behind
           | something like this would be offputting.
           | 
           | I am also wary of what you suggested with gating this kind of
           | fingerprinting to when the website has positively identified
           | the user anyway; in a way, this seems to me even more
           | valuable than fingerprint data without an associated "strong"
           | identity.
        
             | ballenf wrote:
             | Giving users the permissions would simply be a training
             | exercise in "I have to say 'yes' or TikTok breaks". Like
             | how Android worked a few years ago with the other
             | permissions.
        
               | [deleted]
        
               | trifurcate wrote:
               | Android largely works now with these permission prompts,
               | though. TikTok asks you for a million permissions too,
               | and many average end users decline. Many people also opt
               | out of tracking on Facebook et al. when iOS prompts them
               | about it.
        
               | monkpit wrote:
               | > and many average end users decline
               | 
               | [citation needed]
        
               | scarface74 wrote:
               | Really? How much more of a citation do you need than
               | Facebook admitted during their quarterly financials the
               | effect that iOS users opting out had?
               | 
               | https://www.cnbc.com/2022/02/02/facebook-says-apple-ios-
               | priv...
        
               | saagarjha wrote:
               | If you don't present the tracking prompt exactly how
               | Apple wants you to they boot you from the store. The same
               | is not true for a website.
        
           | 0xy wrote:
           | Of course it's main use is fingerprinting. Do you think
           | WebRTC is instantiated for genuine reasons the majority of
           | the time? That's real absurdity.
           | 
           | WebRTC is instantiated most often by ad networks and anti-
           | fraud services.
           | 
           | Same thing with Chrome's fundamentally insecure AudioContext
           | tracking scheme (yes, it's a tracking scheme), which is used
           | by trackers 99% of the time. It provides audio latency
           | information which is highly unique (why?).
           | 
           | Given Chrome's stated mission of secure APIs and their
           | actions of implementing leaky APIs with zeal, I have reason
           | enough to question their motives.
           | 
           | After all, AudioContext is abused heavily on Google's ad
           | networks. Google knows this.
        
             | Datagenerator wrote:
             | One alternative Librewolf needs some more promotion, has
             | safer security by default
        
             | arein3 wrote:
             | Wow, that's really shitty from googles part.
        
         | ghayes wrote:
         | Take a look at Firefox's Fingerprinting Prevention feature.
         | This includes a permission for canvas, as well as:
         | 
         | - Your timezone is reported to be UTC
         | 
         | - Not all fonts installed on your computer are available to
         | webpages
         | 
         | - The browser window prefers to be set to a specific size
         | 
         | - Your browser reports a specific, common version number and
         | operating system
         | 
         | - Your keyboard layout and language is disguised
         | 
         | - Your webcam and microphone capabilities are disguised
         | 
         | - The Media Statistics Web API reports misleading information
         | 
         | - Any Site-Specific Zoom settings are not applied
         | 
         | - The WebSpeech, Gamepad, Sensors, and Performance Web APIs are
         | disabled
         | 
         | https://support.mozilla.org/en-US/kb/firefox-protection-agai...
        
       | TobyTheDog123 wrote:
       | TikTok changes this algorithm about once every three months. I've
       | reverse-engineered it about two times, and have since given up
       | and decided to run a headless browser to do it for me. I'd love
       | to see some tool developed to automate solving this so I can sign
       | requests in a more limited context (ala Cloudflare Workers / C@E)
        
         | nullpt_rs wrote:
         | Author of the post here, if you have an older version of the
         | script you're able to post or send over I'd love to take a look
         | at it and see what changes they make and potentially automate
         | the extraction.
        
           | TobyTheDog123 wrote:
           | Hey I'd love to:
           | 
           | 1.0.0.200: https://hastebin.com/tudivadufa.apache Unknown
           | version: https://hastebin.com/jasuxineti.js
           | 
           | Some of these might have some console.logs (or curse words),
           | but as a whole should be representative
        
         | moneywoes wrote:
         | Are you able to scrape with a headless browser?
        
           | TobyTheDog123 wrote:
           | Yeah, I can get basic user information pretty reliably just
           | from the initial page load.
           | 
           | I had a secondary use case of allowing users to sign-in in
           | order to import the (verified/creator) users they follow, but
           | quickly realized Apple wouldn't allow that data to be used
           | (after the whole OG app ordeal), so I never had a real reason
           | to follow up and crack it again.
        
       | draw_down wrote:
       | > void 0 (a fancy obfuscated way of saying undefined)
       | 
       | Kind of. But it was possible at one point, maybe still is, to
       | rebind `undefined` to some other value, causing trouble. `void`
       | is an operator, a language keyword; it's guaranteed to give you
       | the true undefined value. (In other words, the value whose type
       | is `undefined`.)
       | 
       | If you're coding against an environment as adversarial as these
       | people clearly believe they are, you'd go with `void` as well.
        
         | kerneloops wrote:
         | Another reason to use `void 0` is that "void 0" takes only 6
         | characters while "undefined" takes 9, saving some bandwidth. It
         | is common practice for JavaScript minifiers to use this
         | substitution.
        
           | marginalia_nu wrote:
           | Given it will be gzip-compressed in transport, does this
           | really save a meaningful amount of bandwidth?
        
             | draw_down wrote:
             | It's really more that there is no reason not to do it. Void
             | is marginally safer as well as shorter, so any
             | minifier/transpile step etc will make this substitution.
        
       | born-jre wrote:
       | Something hit me when reading this, you know how zknark is touted
       | as tech which in future allow to create app that can work on user
       | private data while preserving user's privacy, could it be used as
       | (opposite) an obfuscation technique to, u encrypt users data
       | inside and zk oracle in user side and send to server. You could
       | reverse engineer what are the inputs to the oracle, but not
       | further what exactly it sends to the server?
        
         | renonce wrote:
         | zkSNARK allows you to make a proof for a statement that some
         | boolean expression is satisfiable, without leaking any
         | information about how the expression can be satisfied. That
         | helps _prove_ something but not work on any data. The technique
         | you described sounds more like homomorphic encryption, which
         | currently is lots of magnitudes slower than native hardware and
         | lacks practical use.
        
           | born-jre wrote:
           | What about sth like this https://github.com/zkonduit/ezkl ?
        
       | thih9 wrote:
       | I've seen some of these techniques elsewhere; e.g. javascript-
       | obfuscator supports replacing variable names with hex values [1]
       | or transforming call structure into something more complex [2].
       | Bytecode generation is new to me; is there an existing JS
       | obfuscation tool, preferably open source, that supports it?
       | 
       | [1]: https://github.com/javascript-obfuscator/javascript-
       | obfuscat...
       | 
       | [2]: https://github.com/javascript-obfuscator/javascript-
       | obfuscat...
        
         | hoosieree wrote:
         | It's only for C, but Tigress[1] supports a _ton_ of obfuscation
         | types. Virtualization and JIT are very effective, especially
         | when used together with control flow transforms like Split and
         | Flatten.
         | 
         | Renaming variables or encoding them is fairly trivial to
         | reverse.
         | 
         | [1] https://tigress.wtf/transformations.html
        
         | xchkr1337 wrote:
         | Compiling JS to bytecode is not that uncommon, there's a few
         | anti-bot services that rely on it for obfuscation (like
         | recaptcha or f5 shapesecurity) but so far I haven't seen any
         | open source projects for obfuscating this way
        
         | czx4f4bd wrote:
         | Based on my previous research into this, the magic keywords to
         | find this kind of thing on Google are "virtualization
         | obfuscation" or "VM obfuscation".
         | 
         | rusty-jsyc is the main open source implementation I've found,
         | though it hasn't been touched in a few years:
         | https://jwillbold.com/posts/obfuscation/2019-06-16-the-secre...
         | (GitHub: https://github.com/jwillbold/rusty-jsyc)
         | 
         | I think there are other implementations, but they're
         | proprietary so I didn't look into them very much. There are
         | lots of posts out there about reversing virtualization
         | obfuscation, but not many about implementing it. Seems like
         | most people who put the effort into implementing it tend to
         | prefer selling it commercially (which I suppose makes sense).
        
         | 0x008 wrote:
         | If I recall correctly: electron can compile JavaScript to
         | "ByteNode" which is some form of byte code intended to be run
         | in the V8 engine.
        
       | frozencell wrote:
       | The hunt begins.
        
       | noduerme wrote:
       | This is really awesome work.
       | 
       | I spent a lot of time in the early 2000s coming up with nasty
       | obfuscation techniques to protect certain IP that inherently
       | needed to be run client-side in casino games. Up to and including
       | inserting bytecode that was custom crafted to intentionally crash
       | off-the-shelf decompilers that had to run the code to disassemble
       | it (and forcing them to phone home in the process where
       | possible!)
       | 
       | My view on obfuscation is that since it's never a valid security
       | practice, it's only admissible for hiding machinery from the
       | general public. For instance, if you have IP you want to protect
       | from average script kiddies. Any serious IP can be replicated by
       | someone with deep pockets anyway. Most other uses of code
       | obfuscation are nefarious, and obfuscated code should always be
       | assumed to be malicious until proven otherwise. I'm not a
       | reputable large company, but no reputable large company should be
       | going to these lengths to hide their process from the user,
       | because doing so serves no valid security purpose.
        
         | bobleeswagger wrote:
         | > since it's never a valid security practice
         | 
         | Why not? It's just another tool in the security game.
         | 
         | I _want_ to be with you on thinking that all obfuscation is
         | malicious, I know that individuals have every right to
         | obfuscation and privacy as a matter of the 1st and 4th
         | amendments in the US, but I 'm not sure I can always say that
         | obfuscation by a corporation is evil, without a more compelling
         | argument. I'm as anti-establishment as they come, too.
        
           | ViViDboarder wrote:
           | I think l the reason is that it means that they don't trust
           | or don't want their users to know what they are doing on your
           | machine. To me, that is already a malicious premise. Even if
           | they aren't trying to exfiltrate my data or anything.
        
             | bobleeswagger wrote:
             | I guess the acceptable form of obfuscation would mean only
             | IP is protected by it, not everything. I wonder what it
             | would take to enforce this as the norm, certainly doesn't
             | sound easy.
        
           | mtnygard wrote:
           | I read the GP a bit differently... I didn't read it as saying
           | obfuscation is evil, just that it is ineffective. More like
           | "obfuscation can't prevent reversing, therefore it's not a
           | valid security practice since all it does is slow down the
           | casual observer but does not stop the determined adversary."
           | The statement that most use of obfuscation is nefarious is a
           | corollary... since obfuscation doesn't protect IP it is
           | mostly used to hide malicious activity.
        
         | dbrueck wrote:
         | Agreed - obfuscation is useful for keeping honest people
         | honest. If someone is sufficiently motivated, they will
         | circumvent it, but for the vast majority of people it's just
         | not worth the effort so they'll move to something else.
         | 
         | For example, in our application we have some optionally
         | downloadable content that includes some code for an interpreted
         | language. That code lives on disk in an obfuscated form because
         | we are not yet ready to make the API public (it's on our
         | "someday" roadmap), we don't want to clean up the code for
         | public viewing, and above all because there are different
         | licensing requirements around each content pack.
         | 
         | We looked at various "real" security options and they all have
         | holes, and they all add a ton of complexity. We then also
         | looked at the likely intersection between "people who would pay
         | for this" and "people who could crack this", and there's not
         | much there. In the end, obfuscation is cheap (especially in
         | terms of implementation and maintenance) and steers our real
         | customers away violating the license, and we don't waste
         | resources on dishonest people.
         | 
         | If I'm being charitable, the obfuscation in the article has an
         | out of whack cost/benefit ratio. If I'm being cynical, the
         | obfuscation they are doing strays well into the realm of
         | nefarious. :)
        
           | thrashh wrote:
           | People knock on obfuscation but everything in life is based
           | on trust. Locks being breakable, the fruit stand in front of
           | a shop being unprotected, fences being scalable. Everything
           | is a cost/benefit
        
         | jstanley wrote:
         | Wait, why is a casino protecting it's so-called "intellectual
         | property" legitimate and above-board, but TikTok doing the same
         | is not?
        
           | margalabargala wrote:
           | I don't think OP was defending their own earlier work or
           | otherwise exempting it from their assertion that all
           | obfuscated code should be considered malicious.
        
             | rnd0 wrote:
             | That's how I read it too. I had the feeling that the
             | experience convinced the OP that it's not valid except in
             | some circumstances.
        
             | jstanley wrote:
             | Having reread it, I think you might be right.
             | 
             | > it's only admissible for hiding machinery from the
             | general public.
             | 
             | I had originally read this to imply that somehow it's OK
             | for a casino to hide its machinery from the general public,
             | but it's not OK for TikTok to hide its machinery from the
             | general public, but maybe "machinery" here is intended much
             | more narrowly, and OP thinks it applies neither to casinos
             | nor TikTok.
        
               | compsciphd wrote:
               | I read it as the only "legitimate" point is to hide it
               | from the general public. As people with more resources
               | will be able to figure it out. If you view that as
               | legitimate is up to each person to decide. Does the value
               | of trying to hide it from the general public have real
               | value or not. In general the answer might be no.
        
           | neodymiumphish wrote:
           | I think the distinction in what's obfuscated is important.
           | Casino apps are trying to hide their code that detects
           | cheating, number generation, etc, while TikTok is trying to
           | hide its data collection. Obfuscation itself isn't
           | necessarily bad.
        
             | im3w1l wrote:
             | > Number generation
             | 
             | Number generation is extremely important and it's also
             | regulated. You don't put such a thing in the client
             | obfuscated or not.
        
           | kevin_thibedeau wrote:
           | Because they're doing it on hardware that they control.
        
         | maria2 wrote:
         | White box crypto is kind of like obfuscation, but tries to make
         | it impossible to extract the information.
        
           | krackers wrote:
           | There's also indistinguishability obfuscation which I recall
           | recently had a breakthrough in terms of practical
           | construction
        
           | awestroke wrote:
           | No, encryption is very different from obfuscation, even if
           | the former is often used in the latter
        
             | xurukefi wrote:
             | You missed the point. maria2 is talking about whitebox
             | crypto. The "whitebox" part means that the decryption
             | process happens on your machine incuding the secrets, which
             | are present in some obfuscated scrambled form in memory.
             | Getting the secret key is a matter of debugging and
             | understanding the obfuscation scheme. A prime example of
             | this is DRM like Widevine (L3) in the chrome browser.
        
               | bitexploder wrote:
               | I am really failing to understand the distinction here.
               | Encryption with say, AES has very different properties
               | and use cases compared to an obfuscation scheme. You can
               | use encryption as a part of an obfuscation scheme, but
               | obfuscation is a shell game, all the way down. Crypto is
               | not, mathematically. They are categorically different
               | things, right?
        
               | grog454 wrote:
               | The math is irrelevant when the key is known to all
               | parties.
        
               | bitexploder wrote:
               | Sure. That is DRM and obfuscation in a nutshell. How
               | annoying can I make this for you to reverse engineer?
        
               | toast0 wrote:
               | Obfuscation with encryption can be done with good
               | ciphers, like AES, but the key is still shipped with the
               | code, so it's still just cat and mouse.
               | 
               | It's a little different if the key is hardware specific,
               | so each binary only runs on one system and it's hard to
               | extract the keys, but that's not a typical setup. Usually
               | it's this code needs to run on the general public's
               | computers or phones, and that's too general a target to
               | rely on hardware crypto.
        
               | matmann2001 wrote:
               | The key specifier word was "whitebox". They aren't
               | speaking generally about cryptography.
        
           | [deleted]
        
       | Alifatisk wrote:
       | I never knew that Tiktok was shipped with its own virtual
       | machine!
       | 
       | But that explains the obvious subdomain vm.tiktok.com
        
         | llacb47 wrote:
         | Don't think that's what vm means there. The m is likely
         | "maliva", which is tiktok's overseas (US/europe) CDN.
        
       | wiml wrote:
       | Given that the beginning of the "weird string" has a magic number
       | and a version field, I wonder if the point of this is not so much
       | obfuscation as transpilation? The magic number corresponds to
       | ASCII "HNOJ" "@?RC", or perhaps "JONH" "CR?@", which doesn't turn
       | anything up on Google but it seems odd to include that redundant
       | header if your main goal is minification or obfuscation.
        
       | derefr wrote:
       | That HTTP request is kind of hideous. All those extra parameters
       | that have nothing to do with what the response will end up being,
       | and which change often. Seems like a great way to toss out all
       | your API-response edge-cache-ability.
        
         | kevincox wrote:
         | With HTTPS you need to own the edge cache yourself and most
         | will have options to ignore the headers and URL parameters that
         | you want. That way they can log the tracking data and serve the
         | cached data as if they were never there.
        
       | Exuma wrote:
       | This article is 2 hours old and his Twitter is already changed?
        
         | deepzn wrote:
         | Looks like they are more active on Mastodon.
        
         | mdaniel wrote:
         | Someone reported that he just had a typo in the twitter handle,
         | IIRC an extra "r" at the end; FWIW, navigating up one level
         | also has a link to the twitter handle and works just fine:
         | https://twitter.com/nullpt_rs
        
       | KirillPanov wrote:
       | Awesome, really awesome work. However:
       | 
       | > If that is something you are interested in, keep an eye out for
       | the second part of this series :)
       | 
       | Your site is missing an RSS/Atom feed, so I can't do that. ::sad
       | face::
        
         | CallMeMarc wrote:
         | We're sharing the same fate apparently! Just added a PR to
         | their repository to add some feeds, hope it gets merged soon.
         | 
         | https://github.com/nullpt-rs/blog/pull/1
        
       | Kukumber wrote:
       | Nice use of low altitude satellites to track individuals and
       | sniff telecoms all over the world
       | 
       | This decompiled object class also spy on the grid network, that's
       | quite interesting and very clever
       | 
       | I never knew we could also lobby governments to push for some
       | office and cloud software full of spyware, even France had to ban
       | them! [1]
       | 
       | This TikTok app is very dangerous!
       | 
       | Of course /s
       | 
       | [1] - https://news.ycombinator.com/item?id=33686599
        
         | lazyeye wrote:
         | Yes it is.
         | 
         | What is the reason why China blocks all foreign social media
         | apps within its own borders?
        
       | derefr wrote:
       | FYI, most CAPTCHA and anti-DDoS services (e.g. Cloudflare) do
       | something very similar, sending the user an obfuscated program
       | implemented on top of an obfuscated JS VM, that they effectively
       | have to execute as-is, in a real browser, to get back the correct
       | results the gateway is looking for. This is done to prevent
       | simple scraping scripts (the ScraPy type) from being able to be
       | used to scrape the site. If you want to do scraping, you have to
       | spend the extra overhead of doing it by driving a real browser to
       | do it. (And not even a headless one; they have tricks to detect
       | that, too.)
        
       | amelius wrote:
       | Can someone explain what VM they are talking about, and where
       | that VM is running on, and what is running in it?
        
         | dbrueck wrote:
         | It's a custom VM running inside their app, though calling it a
         | VM might be a bit of a stretch because it doesn't appear to be
         | a general purpose computing mechanism but more of higher level
         | command processor.
         | 
         | It sounds like the forthcoming part 2 article will go into more
         | depth.
        
       ___________________________________________________________________
       (page generated 2022-12-24 23:01 UTC)