[HN Gopher] The AWS S3 Denial of Wallet Amplification Attack
___________________________________________________________________
The AWS S3 Denial of Wallet Amplification Attack
Author : croes
Score : 165 points
Date : 2024-05-01 19:00 UTC (4 hours ago)
(HTM) web link (blog.limbus-medtec.com)
(TXT) w3m dump (blog.limbus-medtec.com)
| andersa wrote:
| > Potential remedies
|
| - Stop using S3 and other AWS (perhaps it stands for Amazon Web
| Scams?) things already and switch to Cloudflare R2...
| vuln wrote:
| CIAWS
| voidwtf wrote:
| The way billing is calculated should be clearly labeled along
| with the pricing. Azure does this too, it's super unclear what
| metric they're using to determine what will be billed for
| requests. We're having to find out via trial and error. If we
| request 0-2GB on a 6GB file, but the client cancels after 400MB.
| Are we paying 2GB or 400MB or 6GB?
|
| Is there a billed difference between Range: 0-, no "Range"
| header, and Range: 0-1GB if the client downloads 400MB in each
| scenario?
| __roland__ wrote:
| Sorry for not having this made clearer (we'll fix this part of
| the post): the gotcha is not that AWS does not honor range
| requests, it's that canceling those will still add the full
| range of bytes to your egress bill (and this can add up
| quickly) although no bytes (or much fewer) have been
| transferred.
| alchemist1e9 wrote:
| On the other hand you did ask for them so what does it mean
| "canceling"? Just playing devil's advocate that they did
| likely start getting the data for you and that takes
| resources. Otherwise they would be open to a DOS attack that
| initiates many requests and then cancels them.
| __roland__ wrote:
| Sure, that's true. The thing is: this was the same
| requested (and cancelled) range on the same file(s), over
| and over (it was a bug). Looking at this from the outside,
| even some internal S3 caching should have had many cache
| hits and not have to re-download the requested ranges
| internally all the time (there were dozens of identical
| requests per second, immediately being cancelled).
|
| On top of this, S3 already bills (separately) for any
| request against a bucket (see the other current issue with
| the invalid PUT requests against a secured bucket, which
| still got billed to the bucket owner;
| https://news.ycombinator.com/item?id=40203126). So I'd say
| both the requests and the cancellations were already paid
| for; the surprise was the 'egress' cost on top, of data
| that was not actually leaving the AWS network.
|
| Still, you are right that this still consumes _some_
| additional AWS resources, and it is probably a non-trivial
| issue to fix in the 'billing system'.
| CharlesW wrote:
| "Thank you to everyone who brought this article to our attention.
| We agree that customers should not have to pay for unauthorized
| requests that they did not initiate. We'll have more to share on
| exactly how we'll help prevent these charges shortly." -- Jeff
| Barr, Chief Evangelist, Amazon Web Services
|
| https://twitter.com/jeffbarr/status/1785386554372042890
| wmf wrote:
| Note that there are two separate issues being discussed.
| gnabgib wrote:
| Don't think that's the same problem - that's about the failed
| puts still costing a dev money [0](272 points, 2 days ago, 99
| comments).
|
| [0]: https://news.ycombinator.com/item?id=40203126
| wmf wrote:
| "With range requests, the client can request to retrieve a part
| of a file, but not the entire file. ... Due to the way AWS
| calculates egress costs the transfer of the entire file is
| billed." WTF if true.
| fabian2k wrote:
| That sounds egregious enough that I have trouble believing this
| can be correct. My understanding is that AWS bills for egress
| for every service, parts of the file that aren't transferred
| are not part of this so can't be billed. There could certainly
| be S3-specific charges that affect cases like this, no idea.
| But if AWS bills the full egress traffic costs for a range
| request I'd consider that essentially fraud.
| belter wrote:
| https://github.com/ZJONSSON/node-unzipper/issues/308
| paulddraper wrote:
| tl;dr
|
| AWS user believes that testing on a 1Gbps connection for 45
| min can't be more than $10 of egress.
|
| Gets a $500 bill instead.
|
| Note: This user specified a _lower_ range but not an
| _upper_ range on the request (and closed the connection
| prematurely). Essentially read() with an offset, for a ZIP
| tool.
|
| See also: https://news.ycombinator.com/item?id=40205213
| yonixwm wrote:
| So I guess the attack of the OP is a case AWS calculate
| price based on the unbounded request header and not on
| the actual egress
| scottlamb wrote:
| More or less. The article quotes AWS as saying the
| following:
|
| > Amazon S3 attempts to stop the streaming of data, but
| it does not happen instantaneously.
|
| ...which doesn't really explain it. It shouldn't send
| more than a TCP window after the connection is closed,
| and TCP windows are at most 1 GiB [1], usually much less,
| so this completely fails to explain the article's
| observed 3 TB sent vs 130 TB billed.
|
| The article goes on to say:
|
| > Okay, this is half the explanation. AWS customers are
| not billed for the data actually transferred to the
| Internet but instead for some amount of data that is
| cached internally.
|
| In other words, how much they bill really isn't bounded
| by how much is sent at all. This is unacceptable.
|
| [1] https://en.wikipedia.org/wiki/TCP_window_scale_option
| easton wrote:
| > this completely fails to explain the article's observed
| 3 TB sent vs 130 TB billed
|
| I interpreted that to be that their code was doing this
| over and over again, so in total they retrieved 3TB over
| a set of requests. Still horrifying, but mildly more
| explainable.
| slt2021 wrote:
| this can be explained that this may be not egress out of
| AWS, but egress out of S3 system itself.
|
| S3 is a block storage, so retrieving an object for such a
| high availability and high perf service means it tries to
| pull some X block of data and cache it before sending
| through the socket.
|
| That X block of data is out of internal S3 storage, just
| not sent through the bigger Internet egress subsystem.
|
| So technically aws may argue this is egress for s3, just
| not for aws
| vermilingua wrote:
| Then subsequent requests that hit the cache shouldn't be
| charged by that logic.
| slt2021 wrote:
| s3 is a complex system, you could be hitting a different
| node with subsequent requests where this cache entry does
| not exist yet.
|
| if you think egress is expensive, well storing data in
| RAM for cache purposes is 1000000x more expensive
|
| a lot of stuff could be happening. Main problem is AWS (i
| think) is charging for egress out of S3 system, but
| customers are looking at their ingress at client side and
| there is mismatch
| paulddraper wrote:
| > AWS customers are not billed for the data actually
| transferred to the Internet but instead for some amount
| of data that is cached internally.
|
| But egress fees only apply to S3 transfers outside the
| AWS?
|
| So which is it? Data transferred to the Internet? Or data
| processed internally?
| fabian2k wrote:
| There is probably a small area where it's difficult to
| measure, so I would not expect billing to be exact to the
| byte here. But billing for the requested range if not the
| entire range was actually transferred is just not correct
| and not acceptable.
| paulddraper wrote:
| Certainly if you are charging for _internet egress_.
|
| Like, charging for requests or internal data processing,
| sure okay.
|
| But this is a charge specifically for the _data
| transferred from AWS to the internet_. So if you 're not
| transferring data to the internet....
| fabian2k wrote:
| The part where I think there is some flexibility is about
| the difference between "bytes attempted to transfer" and
| "bytes actually transferred". I think it is pretty fair
| to bill for the former, as long as you abort requests in
| a reasonable way. So I don't expect it to be billed
| exactly by the transferred byte, but I do expect it to
| not go above that higher than whatever the chunk size for
| transferring is.
| paulddraper wrote:
| Sure. In this case specifically AWS is attempting to
| transfer 70Gbps through a 1Gbps pipe.
| klabb3 wrote:
| That's an orthogonal issue. There's no interpretation of
| "egress" that means "stuff we do internally before
| leaving aws data centers". If the tcp conn is reset only
| a few MB would leave aws frontend servers. Instead, it
| appears they've been basing the number off the range in
| the request and/or whatever internal caching/loading
| they're doing within S3, which again has nothing to do
| with egress.
|
| I mean, we already know egress is short for egregious.
| It's an incredibly bad look to be overestimating the
| "fuck you" part of the bill.
| __roland__ wrote:
| Sorry, I think that part of our write-up is misleading (I was
| involved in analyzing the issue described here). To our best
| understanding, what happens is the following:
|
| - A client sends range requests and cancels them quickly.
|
| - The full range request data will be billed (NOT the whole
| file), so I think this should read that the entire requested
| _range_ gets billed, even if it never gets transferred (the
| explanation we received for this is that it 's due to some
| internal buffering S3 is doing, and they do count this as
| egress).
|
| In any case, if you send and cancel such requests quickly
| (which is easy enough, this was not even an adversarial
| situation, just a bug in some client API code) the egress
| cost is many times higher than your theoretical bandwidth
| (and about 80x higher than in the AWS documentation, hence
| the blogpost).
| nijave wrote:
| This is a problem with lots of services. Blocking large
| quantities of legitimate looking requests is a hard
| problem. Request cancellation is also tricky and not
| supported well in a lot of frameworks/programming
| languages.
| nicklecompte wrote:
| This must be a regression bug in AWS's internal system. At a
| past job (2020) we used S3 to store a large amount of genomic
| data, and a web application read range requests to visualize
| tiny segments of the genetic sequence in relevant genes - like
| 5kb out of 50GB. If AWS had billed the cost of an entire
| genome/exome every time we did that, we would have noticed. I
| monitored costs pretty closely, S3 was never a problem compared
| to EC2.
|
| It also seemed like the root cause was an _interrupted_ range
| request (although I wasn 't fully clear on that). Even so that
| seems like a recent regression. It took me ages to get that
| stupid app working, I interrupted a lot of range requests :)
| nielsole wrote:
| S3 egress costs are free if the traffic stays within AWS.
| Sounds like your clients were EC2 instances so this wouldn't
| apply to you, would it?
| mikepurvis wrote:
| If it was a web application as stated in the GP, then it
| would indeed be egress as the request would be coming from
| a browser.
| nicklecompte wrote:
| Yes, it was client-side JavaScript making the range
| requests, asking for a string of genomic data to render
| in the browser. It was only to give the scientists a
| pretty picture :) The EC2 costs were largely
| ElasticSearch for a different function, which never
| looked at the data in S3.
| __roland__ wrote:
| You are right, this is about _canceling_ range requests and
| still getting billed, not about requesting ranges and getting
| billed for the complete file egress. Sorry; we 'll make the
| post clearer.
| belter wrote:
| https://news.ycombinator.com/item?id=40203126
|
| https://news.ycombinator.com/item?id=40221108
| itsdrewmiller wrote:
| Those are about a different issue - not a great time for S3
| billing!
| belter wrote:
| Correct. Its a like a game of negative chess. Wins who racks
| the biggest bill, in the shortest amount of time, with the
| least amount of activity :-)
| andrewstuart wrote:
| Why is anyone using S3 when Cloudflare R2 is free?
| ezekiel68 wrote:
| Because of "The Rise of Worse is Better" (search it) and
| because a 900-lb industry gorilla is never displaced quickly or
| easily.
| surfingdino wrote:
| Because of the other AWS services you get access to.
| zedpm wrote:
| Lots of reasons. My company started using AWS (and specifically
| S3) something like 9 years ago; R2 wasn't even on the radar
| back then. If I were starting from scratch today, I'd be
| looking seriously at Cloudflare as a platform, but it's only in
| the last year or two that they've offered these services that
| would make it possible to build substantial applications.
| waiwai933 wrote:
| R2 bandwidth is free, but storage is not.
|
| R2 also doesn't have all the features that S3 does - including
| an equivalent of S3 Glacier, which is cheaper storage than R2.
| R2 also doesn't have object tagging, object-level permissions,
| or object locking. Sure, you could build your own layer in
| front of R2 that gives you these features, but are you
| necessarily saving money over just using S3?
| bearjaws wrote:
| Hate that it's essentially half ChatGPT generated. Especially
| given the huge explanation of AWS.
| tills13 wrote:
| A "AI Generated" label would be nice, here.
| anon373839 wrote:
| These AI accusations are becoming a tired trope. What,
| exactly, about the article gives you the impression that it
| was generated by an LLM?
| bakugo wrote:
| If writers don't want people to think their content is AI
| generated, maybe they shouldn't put ugly AI generated
| images on top of everything they write.
| anon373839 wrote:
| Ah, so it's the illustrations?
| flockonus wrote:
| AI-fobic
| __roland__ wrote:
| I can assure you this was not AI-generated, apart from the
| 'symbolic image' (which should be fairly obvious :).
|
| Maybe that's just our non-native English shining through. In
| any case, as a small European company in the healthcare space,
| we are quite used to having to explain "the cloud" (with all
| potential and pitfalls) to our customers. They are also (part
| of) the target audience for this post, hence the additional
| explanations.
|
| (Not OP and not author of the article, but was involved in the
| write-up.)
| bennettnate5 wrote:
| "Denial of Wallet" seems a misnomer--it makes it sound like
| source of payment is being blocked. They should really use the
| same term cellular systems have been for decades to describe this
| kind of threat, namely an "overbilling attack".
| KomoD wrote:
| "Denial of Wallet" has been used in countless articles (incl.
| academic) and places to refer to attacks that increase usage
| bills.
| akira2501 wrote:
| We use CloudFront and we deny public users the ability to access
| S3 directly. You can even use Signed URLs with CloudFront if you
| like. I'm not sure I'd evere feel comfortable letting the public
| at large hit my S3 endpoints.
| INTPenis wrote:
| As it should be, but recently on HN it was posted that AWS will
| charge you for any unauthorized PUT request to your S3 buckets.
| Meaning even 4xx errors will rack up a charge.
|
| So your S3 bucket names must be hidden passphrases now that
| stand between an attacker and your budget.
| nijave wrote:
| In all fairness, systems administrators have always had to
| pay for unauthorized requests and systems to mitigate the
| risk
|
| The new thing is hyperscalers have so much capacity you can
| get flooded by these long before the service degrades or goes
| offline
| kazen44 wrote:
| Also, the cost of doing this per request is insane compared
| to either absorbing or rate-limiting the bandwith the
| requests take.
|
| Cloud computing charges you by the request/byte/cpu cycle.
| Servers do not have this issue.
|
| Also, is it simply not possible to rate limit this on a per
| IP basis? Make client only able to do X requests per second
| from each unique IP/network flow.
| nijave wrote:
| >Cloud computing charges you by the request/byte/cpu
| cycle. Servers do not have this issue.
|
| Sure they do. Processing requests takes bandwidth, CPU,
| memory, disk I/O
|
| >Also, is it simply not possible to rate limit this on a
| per IP basis
|
| It's largely useless. You'll block any legitimate
| bits/programs, people on CGNAT, people on corporate
| networks & bad actors will use botnets, residential IPs,
| VPNs to gain access to thousands or millions of unique
| IPs
| akira2501 wrote:
| Wow. Okay. New horrors brought to us by the modern world
| we've created.
|
| Thankfully, it does look like AWS is appropriately
| embarrassed over this, and is going to maybe do something.
|
| https://twitter.com/jeffbarr/status/1785386554372042890
| nijave wrote:
| Direct S3 is pretty common for file distribution where latency
| is less of a concern.
|
| e.x. build an installer and distribute it, generate a report
| and generate a signed url
| Havoc wrote:
| It's almost like the combination of public accessible + charged
| per use + big cloud refusing to allow hardcaps on spend is a
| terrible idea...
| jsheard wrote:
| Azure is probably the most egregious example of this, AWS and
| GCP can at least _claim_ they have architectural barriers to
| implementing a hard spending cap, but Azure _already has one_
| and arbitrarily only allows certain subscription types to use
| it. If you have a student account then you get a certain amount
| of credit each month and if you over-spend it then most
| services are automatically suspended until the next month,
| unless you explicitly opt-out of the spending limit and commit
| to paying the excess out of pocket. However if you have a
| standard account you 're not allowed to set a spending limit
| for, uh... reasons.
|
| https://learn.microsoft.com/en-us/azure/cost-management-bill...
| anonymousDan wrote:
| AWS Educate has the same ability to impose a hard cap I
| believe...
| carbotaniuman wrote:
| I guess it's a matter of students don't have money to spend
| and bad optics, while a company might be cowed into paying
| the bill.
| dylan604 wrote:
| That's insane as well. They already built the system, but you
| just can't use it because we want the option for you to screw
| up and pad our billing. There are many projects I've worked
| on where a service not being available until the 1st of the
| next month would not be anything more than a minor annoyance,
| and would much rather that happen than an unexpected bill.
| This is also something that I think would be a nice CYA tool
| when developing something in the cloud for the first time.
| It's easy to make a mistake when learning cloud services that
| could be expensive like TFA shows.
| tomp wrote:
| how about the "PUT deny" attack?
|
| AFAIK cannot be protected against
|
| https://twitter.com/Lauramaywendel/status/178506487864384308...
| zedpm wrote:
| Jeff Barr posted that AWS is actively working on a resolution
| for this:
| https://twitter.com/jeffbarr/status/1785386554372042890 . Given
| who he is, I take this as a strong indication that there will
| be a reasonable fix in the near future.
| lulznews wrote:
| Does this apply to Cloudfront requests also?
| KomoD wrote:
| So much fluff, just get to the point.
|
| At least 500-600 words weren't needed and just added noise to the
| article, making it harder to read.
| surfingdino wrote:
| AWS APIs need a cleanup. I am constantly running into issues not
| documented in the official doc, boto3 docs, or even on
| StackOverflow. It's not even funny when a whole day goes by on
| trying to figure out why I see nothing in the body of a 200 OK
| response when I request data which I know is there in the bowels
| of AWS. Then it turns out that one param doesn't allow values
| below a certain number, even though the docs say otherwise.
| Twirrim wrote:
| Historically, they've been scared of versioning their APIs (not
| many services have done it, dynamodb has, for example).
|
| It leads to a "bad customer experience", having to update lots
| of code, and also increases maintenance costs while you keep
| two separate code paths functional.
|
| There's a lot about the S3 API that would be changed, including
| the response codes etc., if S3 engineers had freedom to change
| it! I remember many conversations on the topic when I worked
| alongside them in AWS.
| andrewxdiamond wrote:
| It's quite insane the levels of effort S3 engineers put in to
| maintain perfect API compatibility. Even tiny details such as
| whitespace or ordering have messed up project timelines and
| blocked important launches.
| Twirrim wrote:
| All meeting some random arbitrary, maybe not even
| conscious, decision made by an early S3 engineer when they
| were implementing something.
| adverbly wrote:
| Sounds like they were using the Range header on large files. I
| have made systems in the past using exactly this pattern(without
| the intentionally dropped requests).
|
| I hope this doesn't result in any significant changes as I really
| liked using this pattern for sequential data processing of
| potentially large blobs.
| ignoreusernames wrote:
| Early Athena (managed prestodb by AWS) had a similar bug when
| measuring colunar file scans. If it touched the file, it
| considered the whole file instead of just the column chunks read.
| If I'm not mistaken, this was a bug on presto itself, but it was
| a simple patch that landed on upstream a long time before we did
| the tests. This was the first and only time we considered using a
| relatively early AWS product. It was so bad that our half assed
| self deployed version outperformed Athena by every metric that we
| cared about
___________________________________________________________________
(page generated 2024-05-01 23:01 UTC)