[HN Gopher] All the data can be yours: reverse engineering APIs
       ___________________________________________________________________
        
       All the data can be yours: reverse engineering APIs
        
       Author : noleary
       Score  : 168 points
       Date   : 2024-11-06 07:43 UTC (5 days ago)
        
 (HTM) web link (jero.zone)
 (TXT) w3m dump (jero.zone)
        
       | Eikon wrote:
       | This approach is generally seen as unwanted by website owners
       | (it's worth noting that automated API clients are distinct from
       | regular user agents). As a "reverse engineer", you have no idea
       | how expensive or not an endpoint is to process a request.
       | 
       | Instead, I'd recommend reaching out to the website owners
       | directly to discuss your API needs - they're often interested in
       | hearing about potential integrations and use cases.
       | 
       | If you don't receive a response, proceeding with unauthorized API
       | usage is mostly abusive and poor internet citizenship.
        
         | lolinder wrote:
         | I think it can be done responsibly. If you're not imposing any
         | more traffic on them than you would be visiting their site
         | manually, then using their APIs to get at structured data is
         | actually win-win compared to the alternative (load the whole
         | thing and scrape).
         | 
         | Where I'll agree with you is cases where people do this and
         | impose way more traffic than is typical and often more than is
         | necessary (i.e. no caching). But that's not really specific to
         | reverse engineering apis, that's just about being a good
         | internet citizen in general.
         | 
         | I'm of the opinion that a user agent is a user agent, and
         | website owners shouldn't pick and choose what user agents they
         | support. Target the behaviors that affect your infrastructure,
         | not the means of access and algorithms used to process what you
         | send.
        
           | Eikon wrote:
           | > If you're not imposing any more traffic on them than you
           | would be visiting their site manually, then using their APIs
           | to get at structured data is actually win-win compared to the
           | alternative (load the whole thing and scrape).
           | 
           | I agree but that's not usually how it goes. From what I've
           | seen, it's mostly very poorly written scripts with not rate
           | limiting and no backoff strategy that will be hitting your
           | api servers.
        
             | lolinder wrote:
             | Would you rather have those poorly-written scripts hitting
             | your APIs or have a poorly-written puppeteer script loading
             | every asset you have--hitting the APIs along with
             | _everything else_?
             | 
             | Casting shade on API reverse engineering when what you
             | actually have is a failure to rate limit is throwing the
             | baby out with the bathwater. Abusive users will abuse until
             | you build in a technological method to stop them, and user-
             | agent sniffing provably doesn't work to stop bad actors.
             | 
             | The concept of a flexible, customizable User Agent that
             | operates on my behalf is a key idea that's foundational to
             | the web, and I'm not willing to cede that cultural ground
             | in the vague hope that we can make the bad guys feel bad
             | and start just using Chrome like civilized people.
        
         | MathMonkeyMan wrote:
         | Nah.
         | 
         | Bots looking for exploits is rude, spamming an endpoint with
         | more traffic than normal is rude.. but a human trying to figure
         | out the API that you exposed to the internet? That's just fair
         | play.
         | 
         | Also, better to ask for forgiveness than to ask for permission.
         | The author is adding value to the world while hurting nobody,
         | and the answer would likely be an automatic "no" anyway.
        
         | boolemancer wrote:
         | In my personal view, this seems a little overbearing.
         | 
         | If you expose an API, and you want to tell a user that they are
         | "unauthorized" to use it, it should return a 401 status code so
         | that the caller knows they're unauthorized.
         | 
         | If you can't do that because their traffic looks like normal
         | usage of the API by your web app, then I question why their
         | usage is problematic for you.
         | 
         | At the end of the day, you don't get to control what 'browser'
         | the user uses to interact with your service. Sure, it might be
         | Chrome, but it just as easily might be Firefox, or Lynx, or
         | something the user built from scratch, or someone manually
         | typing out HTTP requests in netcat, or, in this case, someone
         | building a custom client for your specific service.
         | 
         | If you host a web server, it's on you to remember that and
         | design accordingly, not on the user to limit how they use your
         | service.
        
           | AlienRobot wrote:
           | That's like saying if someone accepts cash that means you
           | should be allowed to pay a $100 bill with a thousand dimes.
           | 
           | Just because you're right doesn't mean you aren't wrong.
        
             | lolinder wrote:
             | The $100 tab paid in dimes causes severe inconvenience to
             | the person trying to count them and to the person who has
             | to take them to the bank to cash them in and wait for them
             | to be counted again.
             | 
             | Their very reasonable question was: if you can't
             | distinguish the reverse engineered traffic from the traffic
             | through your own app in order to block it, then what harm
             | is the traffic doing? Presumably it's flying under your
             | rate limits, and the traffic has a valid session token from
             | a real customer. If you're unable to single it out and
             | return a 4xx, why does it matter where it's coming from?
             | 
             | I can think of a few reasons it might, but I'm not
             | particularly sympathetic to them. They generally boil down
             | to "I won't be able to use my app to manipulate the user
             | into taking actions they'd otherwise not take."
             | 
             | I'd be interested to hear if there are better reasons.
        
               | AlienRobot wrote:
               | "if you can't distinguish the reverse engineered traffic
               | from the traffic through your own app in order to block
               | it, then what harm is the traffic doing?"
               | 
               | If you really believe this you'll use a custom user agent
               | instead of spoofing Chrome. :-)
               | 
               | Some websites use HTTP referer to block traffic. Ask
               | yourself if any reverse engineer would be stopped by what
               | is obviously the website telling you not to access an
               | endpoint.
               | 
               | I'll add that end users don't have complete information
               | about the website. They can't know how many resources a
               | website has to deal to reverse engineering (webmasters
               | can't just play cat and mouse with you just because
               | you're wasting their money) nor do they know the cost of
               | an endpoint. I mean, most tech inclined use ad blockers
               | when it's obvious 90% of the websites pay the cost of
               | their endpoints by showing ads, so I doubt they would
               | respect anything more subtle than that.
        
               | boolemancer wrote:
               | If an endpoint costs a lot to run, implement rate limits
               | and return 429 status codes so callers know that they're
               | calling too often.
               | 
               | That endpoint will be expensive regardless of whether
               | it's your own app or a third party that's calling it too
               | often, so design it with that in mind.
               | 
               | Your app isn't special, it's just another client. Treat
               | it that way.
        
               | AlienRobot wrote:
               | The only reason why "another client" can exist is due to
               | limitations of the Internet itself.
               | 
               | If you could ensure that the web server can only be
               | accessed by your client, you would do that, but there is
               | no way to do this that can't be reverse-engineered.
               | 
               | Essentially your argument is that just because a door is
               | open that means you're allowed to enter inside, and I
               | don't believe that makes any sense.
        
               | TeMPOraL wrote:
               | > _If you really believe this you 'll use a custom user
               | agent instead of spoofing Chrome. :-)_
               | 
               | Read up on the history of User Agent string, and why
               | everyone claims they're Mozilla and "like Gecko". Yes,
               | it's because of all the silly people who, since earliest
               | days of the WWW, tried to change what they serve based on
               | the contents of User-Agent header.
        
             | Bjartr wrote:
             | Not the greatest example. If someone has incurred a $100
             | debt to you, then, from a legal perspective, you must
             | consider delivery of a thousand dimes as having paid the
             | debt. You don't get a choice on that without prior
             | contractual agreement.
             | 
             | https://uscode.house.gov/view.xhtml?req=granuleid:USC-
             | prelim...
        
             | Bjartr wrote:
             | Not the greatest example. If someone has incurred a $100
             | debt to you, then, from a legal perspective, you must
             | consider delivery of a thousand dimes as having paid the
             | debt. You don't get a choice on that without prior
             | contractual agreement.
             | 
             | https://uscode.house.gov/view.xhtml?req=granuleid:USC-
             | prelim...
             | 
             | (In the United States at least)
        
               | lolinder wrote:
               | This is not an accurate reading of the code. Snopes
               | quotes an FAQ on the US Treasury site (now missing, but
               | presumably still correct) [0]:
               | 
               | > Q: I thought that United States currency was legal
               | tender for all debts. Some businesses or governmental
               | agencies say that they will only accept checks, money
               | orders or credit cards as payment, and others will only
               | accept currency notes in denominations of $20 or smaller.
               | Isn't this illegal?
               | 
               | > A: The pertinent portion of law that applies to your
               | question is the Coinage Act of 1965, specifically Section
               | 31 U.S.C. 5103, entitled "Legal tender," which states:
               | "United States coins and currency (including Federal
               | reserve notes and circulating notes of Federal reserve
               | banks and national banks) are legal tender for all debts,
               | public charges, taxes, and dues."
               | 
               | > This statute means that all United States money as
               | identified above are a valid and legal offer of payment
               | for debts when tendered to a creditor. There is, however,
               | no Federal statute mandating that a private business, a
               | person or an organization must accept currency or coins
               | as for payment for goods and/or services. Private
               | businesses are free to develop their own policies on
               | whether or not to accept cash unless there is a State law
               | which says otherwise. For example, a bus line may
               | prohibit payment of fares in pennies or dollar bills. In
               | addition, movie theaters, convenience stores and gas
               | stations may refuse to accept large denomination currency
               | (usually notes above $20) as a matter of policy.
               | 
               | [0] https://www.snopes.com/fact-check/legal-tender-
               | payment/
        
         | gaeb69 wrote:
         | I agree. When scraping my school's portal which uses Canvas, my
         | school actually allows it.
         | 
         | Sometimes you can get the green light just by reading API
         | docs/School's privacy policy as they're usually obliged to have
         | one (ofc this primarily applies to school APIs like in OP's
         | article)
        
         | mindslight wrote:
         | A company can always document their API if they want power
         | users to be informed about the company's preferred ways of
         | doing things. They generally don't because they want to create
         | a bunch of market friction so you're forced into using their
         | proprietary client apps to interact with their services. I for
         | one applaud efforts like the original post, and think it's an
         | instance of _outstanding_ Internet citizenship, where
         | citizenship means working to preserve Internet societal norms
         | like decentralization.
        
         | rmbyrro wrote:
         | This would be akin to asking a shop owner if I'm allowed to
         | pick a panflet from an endless stack of panflets placed on the
         | sidewalk. If they don't want the public to pick panflets, don't
         | put them where the public can reach'em!
        
           | mtnGoat wrote:
           | Same could be said about things in your front yard?
           | 
           | I'm not sure taking liberties with things you don't own, is
           | always the best policy, nor is putting the entire
           | responsibility on the owner.
           | 
           | I don't think this is something you can boil down to a simple
           | black and white.
        
             | barnabee wrote:
             | Accessing a publicly available web service is not "taking
             | liberties with things you don't own".
             | 
             | If you put a server on the public internet and I send it a
             | message (assuming I'm not using ill-gotten credentials,
             | etc.), anything it responds with is your problem, not mine.
        
             | Chris2048 wrote:
             | > Same could be said about things in your front yard?
             | 
             | No. It _could_ be said, but wouldn 't be true - objects in
             | a domestic front yard are nothing like pamphlets placed on
             | the sidewalk.
        
         | tomjen3 wrote:
         | This is a weird attitude. The Internet we used to know meant
         | that you could do things with it. Certainly you should not
         | reverse engineer so as to get access to others accounts in the
         | first place, but that should be impossible anyway.
         | 
         | I am aware that lots of companies have ideas(TM) about how you
         | should be able to use their products(TM) and may even add these
         | to their Terms of Service, a document that has somehow become
         | the last refuge for the bureaucratic organisation desperate to
         | maintain control when forced to connect things to great
         | unbureaucratic internet.
         | 
         | To that, I say: too bad. I never signed up for the new version
         | of the internet and I do not consider TOS to be anything but
         | noise. I used Pidgin back in the day and would again if it
         | worked.
         | 
         | This absurd idea that website owners should have any say about
         | what runs on your computer/device is nonsense.
        
         | hipadev23 wrote:
         | This is such a cute take.
        
         | barnabee wrote:
         | What's wanted (or not) by website operators is irrelevant.
         | 
         | Adversarial interoperability is a cornerstone of the value of
         | the internet and something we should fight hard to keep.
        
       | sccxy wrote:
       | > Mobile apps have no choice but to use HTTP APIs. You can easily
       | download a lot of iOS apps through the Mac App Store, then run
       | strings on their bundles to look for endpoints.
       | 
       | Are there any good tutorials for that? 'strings' is not the
       | greatest name for searching good information.
        
         | mooreds wrote:
         | I googled a bit and found this, which points to some other
         | tools.
         | 
         | https://www.corellium.com/blog/ios-mobile-reverse-engineerin...
         | 
         | (No affiliation.)
        
         | syntaxing wrote:
         | I was about to post the exact thing, I dont see any thing API
         | related to iOS apps on Google.
         | 
         | Edit: After some Google-ing and trying it on my macbook, there
         | is a native CLI tool called "strings". Supposedly it does the
         | following: strings is primarily used to find and display
         | printable character sequences in binary files, object files,
         | and executables. Which means the author is probably looking at
         | the app to see the hardcorded characters in the app binary(?)
         | and searching for the API end points.
        
           | emilamlom wrote:
           | Just for context, strings is super commonly used when
           | reverse-engineering anything. It's a great first-step because
           | it's easy, fast, and get's some decent clues to help you get
           | your bearings in an unknown binary file.
        
         | out-of-ideas wrote:
         | gnu strings: (first google result from "gnu strings" search)
         | https://sourceware.org/binutils/docs/binutils/strings.html
        
         | JonChesterfield wrote:
         | strings is a unix program that shows you strings in a binary
         | file
        
         | instalabs wrote:
         | Not sure what "strings" is, but I always use Charles Proxy to
         | inspect traffic for any mobile app:
         | https://apps.apple.com/us/app/charles-proxy/id1134218562
        
         | jonhohle wrote:
         | `strings` is the Unix command line utility[0] of the same name.
         | strings file
         | 
         | will tell you all of the ASCII strings in file.
         | 
         | 0 - https://www.unix.com/man-page/osx/1/strings/
        
           | poincaredisk wrote:
           | strings -el
           | 
           | For utf-16 encoded ascii strings (very useful when dealing
           | with windows executables).
        
         | spondyl wrote:
         | Just to clarify a bit more for those who are new to strings and
         | because the audience for the post may learn towards people
         | fresher to reverse engineering:
         | 
         | While most of the time, you're dealing with variables and such
         | in programs, at some point you have to hardcode some
         | information such as URLs to query so something like
         | 
         | BASE_URL = "https://example.com" result = requests.get(BASE_URL
         | + "/api/blah"
         | 
         | If we pretend this is in an Android app which is stored as an
         | apk file (a zip file basically), running strings would spit out
         | "https://example.com" and "/api/blah"
         | 
         | It'll also spit out anything that appears to be an ASCII
         | character so plenty of junk but it's often quite handy as a
         | starting point.
         | 
         | There are, of course, much more precise tools such as man in
         | the middle proxying but that you'll only capture traffic for
         | endpoints actually used by said app. The app may contain other
         | endpoints let unused, rarely triggered and so on.
        
         | TeMPOraL wrote:
         | "strings" is a Unix CLI utility that automates the equivalent
         | of a tried and true practice on Windows: opening an executable
         | file in Notepad.exe and scrolling around until you find human-
         | readable text (usually near the end of the file).
        
       | treyd wrote:
       | I wonder how difficult it would be to combine many of these
       | techniques into some automated script that dumps a manifest of
       | the different types of undocumented APIs there are. LLMs have
       | also been shown to be pretty good at answering semantic questions
       | about large blobs of minified code, perhaps there could be some
       | success there?
        
         | cobertos wrote:
         | There's too many fiddly bits of which endpoints return what
         | data in which shape that requires a custom solution each time.
         | An LLM would probably have a very easy shot at this problem
         | though.
         | 
         | It's much more straightforward if you can find a GraphQL,
         | Swagger, or OpenAPI spec to automate conversion I'd imagine.
        
       | matthewfcarlson wrote:
       | I went to the 75grand app listed in the article and saw a listing
       | for Cafe Mac and did a double take. Apple's employee cafe is
       | caffe Macs, so I was quite confused for a second
        
       | gaeb69 wrote:
       | Sick app btw. Funny this comes up because I'm working on the
       | exact same thing for my school. Note that if your school uses
       | Canvas; Canvas' API is well documented and has GraphQL endpoints.
        
       | Lucasoato wrote:
       | > 75grand's success was even met with jealousy from the college
       | 
       | That's a common story; in my university (in Padova, UniPD)
       | happened something even worse. They tried hard to shut down an
       | unofficial app (Uniweb) that was installed by most of the
       | students in favor of the "official" one, that was completely
       | unusable (and probably was born out of a rigged contract). At the
       | end the best one won and became official, but that was after a
       | lot of struggle.
        
         | mtnGoat wrote:
         | I'm just not sure jealousy is the correct word for this though.
         | Most systems don't like these kinds of things for a number us
         | reasons.
        
           | TeMPOraL wrote:
           | In my (perhaps limited) experience, most of those reasons are
           | just different ways of spelling "our official solution is
           | shit, because we don't care and/or make money in some
           | underhanded way".
        
         | mock-possum wrote:
         | > They tried hard to shut down an unofficial app (Uniweb) that
         | was installed by most of the students in favor of the
         | "official" one, that was completely unusable
         | 
         | Sounds like what happened with the Apollo app and Reddit
        
       | z3c0 wrote:
       | > The error messages helpfully suggested fields I hadn't known
       | about by "correcting" my typos.
       | 
       | Glad to see this being called out. Sure, I get why it's
       | convenient. Misspelling a field by one character is a daily
       | occurrence ("activty" and "heirarchy" are my regulars). The catch
       | is that spellchecking queries and returning valid fields in the
       | error effectively reduces entropy by both character space and
       | message length, varying by the type of distance used in the
       | spellcheck.
        
       | colesantiago wrote:
       | Most of these techniques are extremely old and very outdated.
       | 
       | Teams that I've seen working on apps now implement much stronger
       | checks on APIs especially Android apps such as SafetyCheck and
       | DeviceCheck and other methods, which makes using strings rather
       | basic to see them.
       | 
       | And most apps are now encrypted so you just see junk in the logs.
        
         | rozap wrote:
         | And on the web side, fingerprinting is rampant and there are JS
         | challenges in cloudflare, imperva, etc which make it trickier.
         | Frustrating to run a whole browser with a virtual screen, load
         | the whole page which is ofc like 15mb of JS and other trash,
         | just to do a very simple thing.
         | 
         | Granted, smaller fish like the ones OP is referring to
         | generally don't have aggressive anti automation measures in
         | place, so it can be easy...but generally these techniques don't
         | work if the operator has put the proper measures in place.
        
       | danielvaughn wrote:
       | At a former job, we reverse engineered the trading APIs of most
       | American retail stock brokerages (Fidelity, E-Trade, Robinhood,
       | TD Ameritrade, etc). We did it by rooting an iPhone and using
       | Charles Proxy to grab the unencrypted traffic.
       | 
       | I learned a lot from that experience, and it's also just plain
       | fun to do. We did get some strongly worded letters from Robinhood
       | though, lol. They tried blocking our servers but we just set up
       | this automated system in Digital Ocean that would spin up a new
       | droplet each time we detected a blockage, and they were never
       | able to stop us after that.
       | 
       | Fun times.
        
         | packetlost wrote:
         | I did almost this exact same thing back in 2015~ ish when I was
         | in high school over Christmas break. I reverse engineered the
         | anime streaming site Crunchyroll's API via their Android and
         | PS3 app using some HTTP proxy application and trial + error. I
         | ended having a proper HLS-based streaming player and Android TV
         | app back when their Android app was still Flash based. It was
         | lots of fun!
        
           | danielvaughn wrote:
           | 2015/2016 was exactly when I was doing the above job. We
           | could've hired you as an intern!
        
             | packetlost wrote:
             | Maybe! I considered pursuing a job there at the time, but I
             | opted to get a degree instead. Being located in the Midwest
             | would've made it rather challenging anyways.
        
             | packetlost wrote:
             | Also this would have been winter of 2014 and into early
             | 2015, so a bit before :)
             | 
             | Sadly the CR forums are gone, so my rather popular thread
             | that had feedback and support is long gone.
        
         | IAmGraydon wrote:
         | You were able to connect to those APIs without auth? As far as
         | I know, they all require it.
        
           | danielvaughn wrote:
           | No we would use our own accounts that we sourced from either
           | the CEO/CTO or someone else.
        
             | IAmGraydon wrote:
             | Why couldn't they block you then? They should have been
             | able to quickly disable the accounts.
        
               | danielvaughn wrote:
               | Our product was built for end users, so the traffic
               | coming from our servers could technically be from any
               | account. But as to why we weren't blocked during testing,
               | that I'm not sure about. It's been about 8 years since I
               | did that work - I assume we had someone's account who
               | wasn't obviously connected to the company.
        
         | lesuorac wrote:
         | Can't one just list all of digital ocean's ip blocks?
         | 
         | Like sure then you can add in hertzer or w/e and keep adjusting
         | but idk if somebody keeps ban dodging by using the same
         | provider it seems like you'd just try banning that provider
         | early on?
        
           | danielvaughn wrote:
           | Yeah idk, they should've been able to but for some reason
           | they didn't.
        
           | bigiain wrote:
           | Same sort of timeframe, a project I worked on used netwoking
           | via mobile hotspots on a bunch of Android phones with SIMs
           | from a provider that used CGNAT. If the target websites
           | wanted to block that, they'd be blocking well over 10% of all
           | mobile phones in Australia.
           | 
           | (Hmmm, all the devices we used then would have just stopped
           | working with the shutdown on the 3G network here. I wonder if
           | it's all broken, or if they've upgraded all those devices to
           | 4/5G ones?)
        
         | TeMPOraL wrote:
         | > _We did get some strongly worded letters from Robinhood
         | though, lol._
         | 
         | Unsurprisingly, the most sleazy players are the first ones to
         | go after someone accessing their services in ways they didn't
         | anticipate or intend :).
        
       | stoplight wrote:
       | This is how I made a better version of the nhl.com site [1] that
       | has a better UI (you can see scores/schedules much more easily),
       | is mobile first, has no ads, and responsiveness built in. I did
       | the same for the AHL [2], and the PWHL [3].
       | 
       | [1] https://nhl-remix.vercel.app/ [2] https://ahl-
       | remix.vercel.app/ [3] https://pwhl-remix.vercel.app/
        
         | kmoser wrote:
         | For a minute I thought PWHL was short for "pwn-NHL".
        
         | 12345hn6789 wrote:
         | Poked around a bit. It's responsive and looks great on mobile.
         | Kudos
        
       | rcpt wrote:
       | Anyone have this for Twitter? I want to remove most of my tweets
       | but the official API costs $200
        
         | s09dfhks wrote:
         | Their delete post endpoints probably require auth. What's to
         | stop you from deleting someone else's posts
        
         | markerz wrote:
         | Maybe reverse it from the web app.
         | 
         | I deleted a tweet and saw this request:                   HTTP
         | POST
         | https://x.com/i/api/graphql/VstuveVgh5q5jk7lmnVopqr/DeleteTweet
         | {             "variables": {
         | "tweet_id":"12344567899123",
         | "dark_request":false             },
         | "queryId":"VstuveVgh5q5jk7lmnVopqr"         }
         | 
         | You can execute these from javascript in the browser if the
         | auth part is too complicated.
         | 
         | ### Update, this is the pure javascript console way, if you
         | don't want to write your own client doing HTTP posts
         | 
         | I played with the console more and got these parts:
         | 
         | // Find all tweets on screen (this gives you the tweet IDs too)
         | document.querySelectorAll('a > time')
         | 
         | // Click the "more" button on the first tweet
         | document.querySelectorAll('a > time')[0].parentElement.parentEl
         | ement.parentElement.parentElement.parentElement.parentElement.p
         | arentElement.parentElement.parentElement.parentElement.parentEl
         | ement.querySelector('button').click()
         | 
         | // Click delete on the tweet
         | document.querySelectorAll('[data-
         | testid="Dropdown"]')[0].children[0].click()
         | 
         | // Confirm delete
         | document.querySelectorAll('[data-
         | testid="confirmationSheetConfirm"]')[0].click()
        
         | noman-land wrote:
         | I did this by writing a script in the console of twitter.com
         | that walked all my tweets and deleted them one by one. Nothing
         | fancy needed.
        
       | rubslopes wrote:
       | I do exactly this, _but for the company that I work for._
       | 
       | I'm on the dashboards and integrations team, and I don't have
       | direct access to the codebase of the main product. As the
       | internal APIs have no documentation at all, I'm always "hacking"
       | our own system using the browser inspector to find out how our
       | endpoints work.
        
       ___________________________________________________________________
       (page generated 2024-11-11 23:00 UTC)