[HN Gopher] All the data can be yours: reverse engineering APIs
___________________________________________________________________
All the data can be yours: reverse engineering APIs
Author : noleary
Score : 168 points
Date : 2024-11-06 07:43 UTC (5 days ago)
(HTM) web link (jero.zone)
(TXT) w3m dump (jero.zone)
| Eikon wrote:
| This approach is generally seen as unwanted by website owners
| (it's worth noting that automated API clients are distinct from
| regular user agents). As a "reverse engineer", you have no idea
| how expensive or not an endpoint is to process a request.
|
| Instead, I'd recommend reaching out to the website owners
| directly to discuss your API needs - they're often interested in
| hearing about potential integrations and use cases.
|
| If you don't receive a response, proceeding with unauthorized API
| usage is mostly abusive and poor internet citizenship.
| lolinder wrote:
| I think it can be done responsibly. If you're not imposing any
| more traffic on them than you would be visiting their site
| manually, then using their APIs to get at structured data is
| actually win-win compared to the alternative (load the whole
| thing and scrape).
|
| Where I'll agree with you is cases where people do this and
| impose way more traffic than is typical and often more than is
| necessary (i.e. no caching). But that's not really specific to
| reverse engineering apis, that's just about being a good
| internet citizen in general.
|
| I'm of the opinion that a user agent is a user agent, and
| website owners shouldn't pick and choose what user agents they
| support. Target the behaviors that affect your infrastructure,
| not the means of access and algorithms used to process what you
| send.
| Eikon wrote:
| > If you're not imposing any more traffic on them than you
| would be visiting their site manually, then using their APIs
| to get at structured data is actually win-win compared to the
| alternative (load the whole thing and scrape).
|
| I agree but that's not usually how it goes. From what I've
| seen, it's mostly very poorly written scripts with not rate
| limiting and no backoff strategy that will be hitting your
| api servers.
| lolinder wrote:
| Would you rather have those poorly-written scripts hitting
| your APIs or have a poorly-written puppeteer script loading
| every asset you have--hitting the APIs along with
| _everything else_?
|
| Casting shade on API reverse engineering when what you
| actually have is a failure to rate limit is throwing the
| baby out with the bathwater. Abusive users will abuse until
| you build in a technological method to stop them, and user-
| agent sniffing provably doesn't work to stop bad actors.
|
| The concept of a flexible, customizable User Agent that
| operates on my behalf is a key idea that's foundational to
| the web, and I'm not willing to cede that cultural ground
| in the vague hope that we can make the bad guys feel bad
| and start just using Chrome like civilized people.
| MathMonkeyMan wrote:
| Nah.
|
| Bots looking for exploits is rude, spamming an endpoint with
| more traffic than normal is rude.. but a human trying to figure
| out the API that you exposed to the internet? That's just fair
| play.
|
| Also, better to ask for forgiveness than to ask for permission.
| The author is adding value to the world while hurting nobody,
| and the answer would likely be an automatic "no" anyway.
| boolemancer wrote:
| In my personal view, this seems a little overbearing.
|
| If you expose an API, and you want to tell a user that they are
| "unauthorized" to use it, it should return a 401 status code so
| that the caller knows they're unauthorized.
|
| If you can't do that because their traffic looks like normal
| usage of the API by your web app, then I question why their
| usage is problematic for you.
|
| At the end of the day, you don't get to control what 'browser'
| the user uses to interact with your service. Sure, it might be
| Chrome, but it just as easily might be Firefox, or Lynx, or
| something the user built from scratch, or someone manually
| typing out HTTP requests in netcat, or, in this case, someone
| building a custom client for your specific service.
|
| If you host a web server, it's on you to remember that and
| design accordingly, not on the user to limit how they use your
| service.
| AlienRobot wrote:
| That's like saying if someone accepts cash that means you
| should be allowed to pay a $100 bill with a thousand dimes.
|
| Just because you're right doesn't mean you aren't wrong.
| lolinder wrote:
| The $100 tab paid in dimes causes severe inconvenience to
| the person trying to count them and to the person who has
| to take them to the bank to cash them in and wait for them
| to be counted again.
|
| Their very reasonable question was: if you can't
| distinguish the reverse engineered traffic from the traffic
| through your own app in order to block it, then what harm
| is the traffic doing? Presumably it's flying under your
| rate limits, and the traffic has a valid session token from
| a real customer. If you're unable to single it out and
| return a 4xx, why does it matter where it's coming from?
|
| I can think of a few reasons it might, but I'm not
| particularly sympathetic to them. They generally boil down
| to "I won't be able to use my app to manipulate the user
| into taking actions they'd otherwise not take."
|
| I'd be interested to hear if there are better reasons.
| AlienRobot wrote:
| "if you can't distinguish the reverse engineered traffic
| from the traffic through your own app in order to block
| it, then what harm is the traffic doing?"
|
| If you really believe this you'll use a custom user agent
| instead of spoofing Chrome. :-)
|
| Some websites use HTTP referer to block traffic. Ask
| yourself if any reverse engineer would be stopped by what
| is obviously the website telling you not to access an
| endpoint.
|
| I'll add that end users don't have complete information
| about the website. They can't know how many resources a
| website has to deal to reverse engineering (webmasters
| can't just play cat and mouse with you just because
| you're wasting their money) nor do they know the cost of
| an endpoint. I mean, most tech inclined use ad blockers
| when it's obvious 90% of the websites pay the cost of
| their endpoints by showing ads, so I doubt they would
| respect anything more subtle than that.
| boolemancer wrote:
| If an endpoint costs a lot to run, implement rate limits
| and return 429 status codes so callers know that they're
| calling too often.
|
| That endpoint will be expensive regardless of whether
| it's your own app or a third party that's calling it too
| often, so design it with that in mind.
|
| Your app isn't special, it's just another client. Treat
| it that way.
| AlienRobot wrote:
| The only reason why "another client" can exist is due to
| limitations of the Internet itself.
|
| If you could ensure that the web server can only be
| accessed by your client, you would do that, but there is
| no way to do this that can't be reverse-engineered.
|
| Essentially your argument is that just because a door is
| open that means you're allowed to enter inside, and I
| don't believe that makes any sense.
| TeMPOraL wrote:
| > _If you really believe this you 'll use a custom user
| agent instead of spoofing Chrome. :-)_
|
| Read up on the history of User Agent string, and why
| everyone claims they're Mozilla and "like Gecko". Yes,
| it's because of all the silly people who, since earliest
| days of the WWW, tried to change what they serve based on
| the contents of User-Agent header.
| Bjartr wrote:
| Not the greatest example. If someone has incurred a $100
| debt to you, then, from a legal perspective, you must
| consider delivery of a thousand dimes as having paid the
| debt. You don't get a choice on that without prior
| contractual agreement.
|
| https://uscode.house.gov/view.xhtml?req=granuleid:USC-
| prelim...
| Bjartr wrote:
| Not the greatest example. If someone has incurred a $100
| debt to you, then, from a legal perspective, you must
| consider delivery of a thousand dimes as having paid the
| debt. You don't get a choice on that without prior
| contractual agreement.
|
| https://uscode.house.gov/view.xhtml?req=granuleid:USC-
| prelim...
|
| (In the United States at least)
| lolinder wrote:
| This is not an accurate reading of the code. Snopes
| quotes an FAQ on the US Treasury site (now missing, but
| presumably still correct) [0]:
|
| > Q: I thought that United States currency was legal
| tender for all debts. Some businesses or governmental
| agencies say that they will only accept checks, money
| orders or credit cards as payment, and others will only
| accept currency notes in denominations of $20 or smaller.
| Isn't this illegal?
|
| > A: The pertinent portion of law that applies to your
| question is the Coinage Act of 1965, specifically Section
| 31 U.S.C. 5103, entitled "Legal tender," which states:
| "United States coins and currency (including Federal
| reserve notes and circulating notes of Federal reserve
| banks and national banks) are legal tender for all debts,
| public charges, taxes, and dues."
|
| > This statute means that all United States money as
| identified above are a valid and legal offer of payment
| for debts when tendered to a creditor. There is, however,
| no Federal statute mandating that a private business, a
| person or an organization must accept currency or coins
| as for payment for goods and/or services. Private
| businesses are free to develop their own policies on
| whether or not to accept cash unless there is a State law
| which says otherwise. For example, a bus line may
| prohibit payment of fares in pennies or dollar bills. In
| addition, movie theaters, convenience stores and gas
| stations may refuse to accept large denomination currency
| (usually notes above $20) as a matter of policy.
|
| [0] https://www.snopes.com/fact-check/legal-tender-
| payment/
| gaeb69 wrote:
| I agree. When scraping my school's portal which uses Canvas, my
| school actually allows it.
|
| Sometimes you can get the green light just by reading API
| docs/School's privacy policy as they're usually obliged to have
| one (ofc this primarily applies to school APIs like in OP's
| article)
| mindslight wrote:
| A company can always document their API if they want power
| users to be informed about the company's preferred ways of
| doing things. They generally don't because they want to create
| a bunch of market friction so you're forced into using their
| proprietary client apps to interact with their services. I for
| one applaud efforts like the original post, and think it's an
| instance of _outstanding_ Internet citizenship, where
| citizenship means working to preserve Internet societal norms
| like decentralization.
| rmbyrro wrote:
| This would be akin to asking a shop owner if I'm allowed to
| pick a panflet from an endless stack of panflets placed on the
| sidewalk. If they don't want the public to pick panflets, don't
| put them where the public can reach'em!
| mtnGoat wrote:
| Same could be said about things in your front yard?
|
| I'm not sure taking liberties with things you don't own, is
| always the best policy, nor is putting the entire
| responsibility on the owner.
|
| I don't think this is something you can boil down to a simple
| black and white.
| barnabee wrote:
| Accessing a publicly available web service is not "taking
| liberties with things you don't own".
|
| If you put a server on the public internet and I send it a
| message (assuming I'm not using ill-gotten credentials,
| etc.), anything it responds with is your problem, not mine.
| Chris2048 wrote:
| > Same could be said about things in your front yard?
|
| No. It _could_ be said, but wouldn 't be true - objects in
| a domestic front yard are nothing like pamphlets placed on
| the sidewalk.
| tomjen3 wrote:
| This is a weird attitude. The Internet we used to know meant
| that you could do things with it. Certainly you should not
| reverse engineer so as to get access to others accounts in the
| first place, but that should be impossible anyway.
|
| I am aware that lots of companies have ideas(TM) about how you
| should be able to use their products(TM) and may even add these
| to their Terms of Service, a document that has somehow become
| the last refuge for the bureaucratic organisation desperate to
| maintain control when forced to connect things to great
| unbureaucratic internet.
|
| To that, I say: too bad. I never signed up for the new version
| of the internet and I do not consider TOS to be anything but
| noise. I used Pidgin back in the day and would again if it
| worked.
|
| This absurd idea that website owners should have any say about
| what runs on your computer/device is nonsense.
| hipadev23 wrote:
| This is such a cute take.
| barnabee wrote:
| What's wanted (or not) by website operators is irrelevant.
|
| Adversarial interoperability is a cornerstone of the value of
| the internet and something we should fight hard to keep.
| sccxy wrote:
| > Mobile apps have no choice but to use HTTP APIs. You can easily
| download a lot of iOS apps through the Mac App Store, then run
| strings on their bundles to look for endpoints.
|
| Are there any good tutorials for that? 'strings' is not the
| greatest name for searching good information.
| mooreds wrote:
| I googled a bit and found this, which points to some other
| tools.
|
| https://www.corellium.com/blog/ios-mobile-reverse-engineerin...
|
| (No affiliation.)
| syntaxing wrote:
| I was about to post the exact thing, I dont see any thing API
| related to iOS apps on Google.
|
| Edit: After some Google-ing and trying it on my macbook, there
| is a native CLI tool called "strings". Supposedly it does the
| following: strings is primarily used to find and display
| printable character sequences in binary files, object files,
| and executables. Which means the author is probably looking at
| the app to see the hardcorded characters in the app binary(?)
| and searching for the API end points.
| emilamlom wrote:
| Just for context, strings is super commonly used when
| reverse-engineering anything. It's a great first-step because
| it's easy, fast, and get's some decent clues to help you get
| your bearings in an unknown binary file.
| out-of-ideas wrote:
| gnu strings: (first google result from "gnu strings" search)
| https://sourceware.org/binutils/docs/binutils/strings.html
| JonChesterfield wrote:
| strings is a unix program that shows you strings in a binary
| file
| instalabs wrote:
| Not sure what "strings" is, but I always use Charles Proxy to
| inspect traffic for any mobile app:
| https://apps.apple.com/us/app/charles-proxy/id1134218562
| jonhohle wrote:
| `strings` is the Unix command line utility[0] of the same name.
| strings file
|
| will tell you all of the ASCII strings in file.
|
| 0 - https://www.unix.com/man-page/osx/1/strings/
| poincaredisk wrote:
| strings -el
|
| For utf-16 encoded ascii strings (very useful when dealing
| with windows executables).
| spondyl wrote:
| Just to clarify a bit more for those who are new to strings and
| because the audience for the post may learn towards people
| fresher to reverse engineering:
|
| While most of the time, you're dealing with variables and such
| in programs, at some point you have to hardcode some
| information such as URLs to query so something like
|
| BASE_URL = "https://example.com" result = requests.get(BASE_URL
| + "/api/blah"
|
| If we pretend this is in an Android app which is stored as an
| apk file (a zip file basically), running strings would spit out
| "https://example.com" and "/api/blah"
|
| It'll also spit out anything that appears to be an ASCII
| character so plenty of junk but it's often quite handy as a
| starting point.
|
| There are, of course, much more precise tools such as man in
| the middle proxying but that you'll only capture traffic for
| endpoints actually used by said app. The app may contain other
| endpoints let unused, rarely triggered and so on.
| TeMPOraL wrote:
| "strings" is a Unix CLI utility that automates the equivalent
| of a tried and true practice on Windows: opening an executable
| file in Notepad.exe and scrolling around until you find human-
| readable text (usually near the end of the file).
| treyd wrote:
| I wonder how difficult it would be to combine many of these
| techniques into some automated script that dumps a manifest of
| the different types of undocumented APIs there are. LLMs have
| also been shown to be pretty good at answering semantic questions
| about large blobs of minified code, perhaps there could be some
| success there?
| cobertos wrote:
| There's too many fiddly bits of which endpoints return what
| data in which shape that requires a custom solution each time.
| An LLM would probably have a very easy shot at this problem
| though.
|
| It's much more straightforward if you can find a GraphQL,
| Swagger, or OpenAPI spec to automate conversion I'd imagine.
| matthewfcarlson wrote:
| I went to the 75grand app listed in the article and saw a listing
| for Cafe Mac and did a double take. Apple's employee cafe is
| caffe Macs, so I was quite confused for a second
| gaeb69 wrote:
| Sick app btw. Funny this comes up because I'm working on the
| exact same thing for my school. Note that if your school uses
| Canvas; Canvas' API is well documented and has GraphQL endpoints.
| Lucasoato wrote:
| > 75grand's success was even met with jealousy from the college
|
| That's a common story; in my university (in Padova, UniPD)
| happened something even worse. They tried hard to shut down an
| unofficial app (Uniweb) that was installed by most of the
| students in favor of the "official" one, that was completely
| unusable (and probably was born out of a rigged contract). At the
| end the best one won and became official, but that was after a
| lot of struggle.
| mtnGoat wrote:
| I'm just not sure jealousy is the correct word for this though.
| Most systems don't like these kinds of things for a number us
| reasons.
| TeMPOraL wrote:
| In my (perhaps limited) experience, most of those reasons are
| just different ways of spelling "our official solution is
| shit, because we don't care and/or make money in some
| underhanded way".
| mock-possum wrote:
| > They tried hard to shut down an unofficial app (Uniweb) that
| was installed by most of the students in favor of the
| "official" one, that was completely unusable
|
| Sounds like what happened with the Apollo app and Reddit
| z3c0 wrote:
| > The error messages helpfully suggested fields I hadn't known
| about by "correcting" my typos.
|
| Glad to see this being called out. Sure, I get why it's
| convenient. Misspelling a field by one character is a daily
| occurrence ("activty" and "heirarchy" are my regulars). The catch
| is that spellchecking queries and returning valid fields in the
| error effectively reduces entropy by both character space and
| message length, varying by the type of distance used in the
| spellcheck.
| colesantiago wrote:
| Most of these techniques are extremely old and very outdated.
|
| Teams that I've seen working on apps now implement much stronger
| checks on APIs especially Android apps such as SafetyCheck and
| DeviceCheck and other methods, which makes using strings rather
| basic to see them.
|
| And most apps are now encrypted so you just see junk in the logs.
| rozap wrote:
| And on the web side, fingerprinting is rampant and there are JS
| challenges in cloudflare, imperva, etc which make it trickier.
| Frustrating to run a whole browser with a virtual screen, load
| the whole page which is ofc like 15mb of JS and other trash,
| just to do a very simple thing.
|
| Granted, smaller fish like the ones OP is referring to
| generally don't have aggressive anti automation measures in
| place, so it can be easy...but generally these techniques don't
| work if the operator has put the proper measures in place.
| danielvaughn wrote:
| At a former job, we reverse engineered the trading APIs of most
| American retail stock brokerages (Fidelity, E-Trade, Robinhood,
| TD Ameritrade, etc). We did it by rooting an iPhone and using
| Charles Proxy to grab the unencrypted traffic.
|
| I learned a lot from that experience, and it's also just plain
| fun to do. We did get some strongly worded letters from Robinhood
| though, lol. They tried blocking our servers but we just set up
| this automated system in Digital Ocean that would spin up a new
| droplet each time we detected a blockage, and they were never
| able to stop us after that.
|
| Fun times.
| packetlost wrote:
| I did almost this exact same thing back in 2015~ ish when I was
| in high school over Christmas break. I reverse engineered the
| anime streaming site Crunchyroll's API via their Android and
| PS3 app using some HTTP proxy application and trial + error. I
| ended having a proper HLS-based streaming player and Android TV
| app back when their Android app was still Flash based. It was
| lots of fun!
| danielvaughn wrote:
| 2015/2016 was exactly when I was doing the above job. We
| could've hired you as an intern!
| packetlost wrote:
| Maybe! I considered pursuing a job there at the time, but I
| opted to get a degree instead. Being located in the Midwest
| would've made it rather challenging anyways.
| packetlost wrote:
| Also this would have been winter of 2014 and into early
| 2015, so a bit before :)
|
| Sadly the CR forums are gone, so my rather popular thread
| that had feedback and support is long gone.
| IAmGraydon wrote:
| You were able to connect to those APIs without auth? As far as
| I know, they all require it.
| danielvaughn wrote:
| No we would use our own accounts that we sourced from either
| the CEO/CTO or someone else.
| IAmGraydon wrote:
| Why couldn't they block you then? They should have been
| able to quickly disable the accounts.
| danielvaughn wrote:
| Our product was built for end users, so the traffic
| coming from our servers could technically be from any
| account. But as to why we weren't blocked during testing,
| that I'm not sure about. It's been about 8 years since I
| did that work - I assume we had someone's account who
| wasn't obviously connected to the company.
| lesuorac wrote:
| Can't one just list all of digital ocean's ip blocks?
|
| Like sure then you can add in hertzer or w/e and keep adjusting
| but idk if somebody keeps ban dodging by using the same
| provider it seems like you'd just try banning that provider
| early on?
| danielvaughn wrote:
| Yeah idk, they should've been able to but for some reason
| they didn't.
| bigiain wrote:
| Same sort of timeframe, a project I worked on used netwoking
| via mobile hotspots on a bunch of Android phones with SIMs
| from a provider that used CGNAT. If the target websites
| wanted to block that, they'd be blocking well over 10% of all
| mobile phones in Australia.
|
| (Hmmm, all the devices we used then would have just stopped
| working with the shutdown on the 3G network here. I wonder if
| it's all broken, or if they've upgraded all those devices to
| 4/5G ones?)
| TeMPOraL wrote:
| > _We did get some strongly worded letters from Robinhood
| though, lol._
|
| Unsurprisingly, the most sleazy players are the first ones to
| go after someone accessing their services in ways they didn't
| anticipate or intend :).
| stoplight wrote:
| This is how I made a better version of the nhl.com site [1] that
| has a better UI (you can see scores/schedules much more easily),
| is mobile first, has no ads, and responsiveness built in. I did
| the same for the AHL [2], and the PWHL [3].
|
| [1] https://nhl-remix.vercel.app/ [2] https://ahl-
| remix.vercel.app/ [3] https://pwhl-remix.vercel.app/
| kmoser wrote:
| For a minute I thought PWHL was short for "pwn-NHL".
| 12345hn6789 wrote:
| Poked around a bit. It's responsive and looks great on mobile.
| Kudos
| rcpt wrote:
| Anyone have this for Twitter? I want to remove most of my tweets
| but the official API costs $200
| s09dfhks wrote:
| Their delete post endpoints probably require auth. What's to
| stop you from deleting someone else's posts
| markerz wrote:
| Maybe reverse it from the web app.
|
| I deleted a tweet and saw this request: HTTP
| POST
| https://x.com/i/api/graphql/VstuveVgh5q5jk7lmnVopqr/DeleteTweet
| { "variables": {
| "tweet_id":"12344567899123",
| "dark_request":false },
| "queryId":"VstuveVgh5q5jk7lmnVopqr" }
|
| You can execute these from javascript in the browser if the
| auth part is too complicated.
|
| ### Update, this is the pure javascript console way, if you
| don't want to write your own client doing HTTP posts
|
| I played with the console more and got these parts:
|
| // Find all tweets on screen (this gives you the tweet IDs too)
| document.querySelectorAll('a > time')
|
| // Click the "more" button on the first tweet
| document.querySelectorAll('a > time')[0].parentElement.parentEl
| ement.parentElement.parentElement.parentElement.parentElement.p
| arentElement.parentElement.parentElement.parentElement.parentEl
| ement.querySelector('button').click()
|
| // Click delete on the tweet
| document.querySelectorAll('[data-
| testid="Dropdown"]')[0].children[0].click()
|
| // Confirm delete
| document.querySelectorAll('[data-
| testid="confirmationSheetConfirm"]')[0].click()
| noman-land wrote:
| I did this by writing a script in the console of twitter.com
| that walked all my tweets and deleted them one by one. Nothing
| fancy needed.
| rubslopes wrote:
| I do exactly this, _but for the company that I work for._
|
| I'm on the dashboards and integrations team, and I don't have
| direct access to the codebase of the main product. As the
| internal APIs have no documentation at all, I'm always "hacking"
| our own system using the browser inspector to find out how our
| endpoints work.
___________________________________________________________________
(page generated 2024-11-11 23:00 UTC)