hngopher.com

       [HN Gopher] In S3 simplicity is table stakes
       ___________________________________________________________________
        
       In S3 simplicity is table stakes
        
       Author : riv991
       Score  : 155 points
       Date   : 2025-03-14 11:55 UTC (11 hours ago)
        
 (HTM) web link (www.allthingsdistributed.com)
 (TXT) w3m dump (www.allthingsdistributed.com)
        
       | rook1 wrote:
       | "I think one thing that we've learned from the work on Tables is
       | that it's these _properties of storage_ that really define S3
       | much more than the object API itself."
       | 
       | Between the above philosophy, S3 Tables, and Express One Zone
       | (SSD perf), it makes me really curious about what other storage
       | modalities S3 moves towards supporting going forward.
        
       | neom wrote:
       | little history: When we were getting ready to do an API at
       | DigitalOcean I got asked "uhm... how should it feel?" I thought
       | about that for about 11 seconds and said "if all our APIs feel as
       | good as S3, it should be fine" - it's a good API.
        
         | flessner wrote:
         | The API, absolutely.
         | 
         | It's only sad that the SDKs are often on the heavy side, I
         | remember that the npm package used to be multiple megabytes as
         | it bundled large parts of the core AWS SDK. Nowadays, I believe
         | it's still around half a megabyte.
        
           | malfist wrote:
           | I don't know if the npm is the same way, but the java sdk now
           | has options by service. So you can include just s3 instead of
           | all of aws
        
           | rglover wrote:
           | The footprint of the JS SDK is much better as they split all
           | of it into service-specific packages, but the SDK APIs are a
           | bit confusing (everything is a class that has to be
           | instantiated--even config).
        
       | greatpostman wrote:
       | I can guarantee you, nothing is simple about S3. I bet engineers
       | spend months on a single configuration change to the underlying
       | subsystems.
        
         | jacksnipe wrote:
         | That's what makes it a good abstraction.
        
         | ecshafer wrote:
         | If you read the article, they say that exactly. Its by Dr
         | Werner Vogels, they know exactly what goes into S3, since they
         | are a principal engineer on the project.
        
           | apetresc wrote:
           | It's Vogels' blog but this is a guest post by Andy Warfield.
        
             | ecshafer wrote:
             | My mistake, I stand corrected.
        
       | ginko wrote:
       | Would have been good if they mentioned they meant Amazon S3. It
       | took me a while to figure out what this was about.
       | 
       | Initially I thought this was about S3 standby mode.
        
       | waiwai933 wrote:
       | > I've seen other examples where customers guess at new APIs they
       | hope that S3 will launch, and have scripts that run in the
       | background probing them for years! When we launch new features
       | that introduce new REST verbs, we typically have a dashboard to
       | report the call frequency of requests to it, and it's often the
       | case that the team is surprised that the dashboard starts posting
       | traffic as soon as it's up, even before the feature launches, and
       | they discover that it's exactly these customer probes, guessing
       | at a new feature.
       | 
       | This surprises me; has anyone done something similar and
       | benefitted from it? It's the sort of thing where I feel like
       | you'd maybe get a result 1% of the time if that, and then only
       | years later when everyone has moved on from the problem they were
       | facing at the time...
        
         | easton wrote:
         | Maybe this is a faster way of getting AWS feature requests
         | heard.
         | 
         | I'm going to write a script that keeps trying to call
         | ecs:MakeFargateCacheImages.
        
           | ajb wrote:
           | It could also be hackers, as when a new service launches is
           | exactly when it will be most buggy. And the contents of S3
           | are a big payoff.
        
       | myflash13 wrote:
       | I have a feeling that economies of scale have a point of
       | diminishing returns. At what point does it become more costly and
       | complicated to store your data on S3 versus just maintaining a
       | server with RAID disks somewhere?
       | 
       | S3 is an engineering marvel, but it's an insanely complicated
       | backend architecture just to store some files.
        
         | Tanjreeve wrote:
         | Probably never. The complexity is borne by Amazon. Even before
         | any of the development begins if you want a RAID setup with
         | some sort of decent availability you've already multiplied your
         | server costs by the number of replicas you'd need. It's a
         | Sisyphean task that also has little value for most people.
         | 
         | Much like twitter it's conceptually simple but it's a hard
         | problem to solve at any scale beyond a toy.
        
         | sjsdaiuasgdia wrote:
         | That's going to depend a lot on what your needs are,
         | particularly in terms of redundancy and durability. S3 takes
         | care of a lot of that for you.
         | 
         | One server with a RAID array can survive, usually, 1 or maybe 2
         | drive failures. The remaining drives in the array will have to
         | do more work when a failed drive is replaced and data is copied
         | to the new array member. This sometimes leads to additional
         | failures before replacement completes, because all the drives
         | in the array are probably all the same model bought at the same
         | time and thus have similar manufacturing quality and materials.
         | This is part of why it's generally said that RAID != backup.
         | 
         | You can make a local backup to something like another server
         | with its own storage, external drives, or tape storage.
         | Capacity, recovery time, and cost varies a lot across the
         | available options here. Now you're protected against the
         | original server failing, but you're not protected against
         | location-based impacts - power/network outages, weather
         | damage/flooding, fire, etc.
         | 
         | You can make a remote backup. That can be in a location you own
         | / control, or you can pay someone else to use their storage.
         | 
         | Each layer of redundancy adds cost and complexity.
         | 
         | AWS says they can guarantee 99.999999999% durability and 99.99%
         | availability. You can absolutely design your own system that
         | meets those thresholds, but that is far beyond what one server
         | with a RAID array can do.
        
           | vb-8448 wrote:
           | How many businesses or applications really need 99.999999999%
           | durability and 99.99% availability? Is your whole stack
           | organized to deliver the forementioned durability and
           | availability?
        
             | huntaub wrote:
             | I think that this is, to Andy's point, basically about
             | simplicity. It's not that your business necessarily _needs_
             | 11 9s of durability for continuity purposes, but it sure is
             | nice that you _never_ have to think about the durability of
             | the storage layer (vs. even something like EBS where 5 9s
             | of durability isn 't quite enough to go from "improbable"
             | to "impossible").
        
             | sjsdaiuasgdia wrote:
             | Different people and organizations will have different
             | needs, as indicated in the first sentence of my post. For
             | some use cases one server is totally fine, but it's good to
             | think through your use cases and understand how loss of
             | availability or loss of data would impact you, and how much
             | you're willing to pay to avoid that.
             | 
             | I'll note that data durability is a bit of a different
             | concern than service availability. A service being down for
             | some amount of time sucks, but it'll probably come back up
             | at some point and life moves on. If data is lost
             | completely, it's just _gone_. It 's going to have to be re-
             | created from other sources, generated fresh, or accepted as
             | irreplaceable and lost forever.
             | 
             | Some use cases can tolerate losing some or all of the data.
             | Many can't, so data durability tends to be a concern for
             | non-trivial use cases.
        
             | hadlock wrote:
             | There are a lot of companies who their livelihood depends
             | on their proprietary data, and loss of that data would be a
             | company-ending-event. I'm not sure how the calculus works
             | out exactly, but having additional backups and types of
             | backups to reduce risk is probably one of the smaller
             | business expenses one can pick up. Sending a couple TB of
             | data to three+ cloud providers on top of your physical
             | backups is in the tens of dollars per month.
        
           | Asraelite wrote:
           | > One server with a RAID array can survive, usually, 1 or
           | maybe 2 drive failures.
           | 
           | Standard RAID configurations can only handle 2 failures, but
           | there are libraries and filesystems allowing arbitrarily high
           | redundancy.
        
             | sjsdaiuasgdia wrote:
             | As long as it's all in one server, there's still a lot of
             | situations that can immediately cut through all that
             | redundancy.
             | 
             | As long as it's all in one physical location, there's still
             | fire and weather as ways to immediately cut through all
             | that redundancy.
        
         | Cthulhu_ wrote:
         | There are a few stories from companies that moved _away_ from
         | S3, like Dropbox, and who shared their investments and
         | expenses.
         | 
         | The long and short of it is that getting anywhere near the
         | redundancy, reliability, performance etc of S3, you're spending
         | A Lot of money.
        
         | hylaride wrote:
         | There is a diminishing return of what percentage you save,
         | sure. But amazon will always be at that edge. They already have
         | amortized the equipment, labour, administration, electricity,
         | storage, cooling, etc.
         | 
         | They also already have support for storage tiering,
         | replication, encryption, ACLs, integration with other services
         | (from web access to sending notifications of storage events to
         | lambda, sqs, etc). Uou get all of this whether you're saving 1
         | eight bit file or trillions of gigabyte sized ones.
         | 
         | There are reasons why you may need to roll your own storage
         | setup (regulatory, geographic, some other unique reason), but
         | you'll never be more economical than S3, especially if the
         | storage is mostly sitting idle.
        
         | timewizard wrote:
         | > At what point does it become more costly and complicated to
         | store your data on S3 versus just maintaining a server with
         | RAID disks somewhere?
         | 
         | It's more costly immediately. S3 storage prices are above what
         | you would pay even for triply redundant media and you have to
         | pay for data transfer at a very high rate to both send and
         | receive data to the public internet.
         | 
         | It's far less complicated though. You just create a bucket and
         | you're off to the races. Since the S3 API endpoints are all
         | public there's not even a delay for spinning up the
         | infrastructure.
         | 
         | Where S3 shines for me is two things. Automatic lifecycle
         | management. Objects can be moved in between storage classes
         | based on the age of the object and even automatically deleted
         | after expiration. The second is S3 events which are also
         | _durable_ and make S3 into an actual appliance instead of just
         | a convenient key/value store.
        
         | zerd wrote:
         | One interesting thing about S3 is the vast scale of it. E.g. if
         | you need to store 3 PB of data you might need 150 HDDs +
         | redundancy, but if you store it on S3 it's chopped up and put
         | on tens of thousands of HDDs, which helps with IOPS and
         | throughput. Of course that's shared with others, which is why
         | smart placement is key, so that hot objects are spread out.
         | 
         | Some details in
         | https://www.allthingsdistributed.com/2023/07/building-and-op...
         | / https://www.youtube.com/watch?v=sc3J4McebHE
        
       | _fat_santa wrote:
       | S3 is up there as one of my favorite tech products ever. Over the
       | years I've used it for all sorts of things but most recently I've
       | been using it to roll my own DB backup system.
       | 
       | One of the things that shocks me about the system is the level of
       | object durability. A few years ago I was taking an AWS
       | certification course and learned that their durability number
       | means that one can expect to loose data about once every 10,000
       | years. Since then anytime I talk about S3's durability I bring up
       | that example and it always seems to convey the point for a
       | layperson.
       | 
       | And it's "simplicity" is truly elegant. When I first started
       | using S3 I thought of it as a dumb storage location but as I
       | learned more I realized that it had some wild features that they
       | all managed to hide properly so when you start it looks like a
       | simple system but you can gradually get deeper and deeper until
       | your S3 bucket is doing some pretty sophisticated stuff.
       | 
       | Last thing I'll say is, you know your API is good when "S3
       | compatable API" is a selling point of your competitors.
        
         | victorp13 wrote:
         | > Last thing I'll say is, you know your API is good when "S3
         | compatable API" is a selling point of your competitors.
         | 
         | Counter-point: You know that you're the dominant player. See:
         | .psd, .pdf, .xslx. Not particularly good file types, yet widely
         | supported by competitor products.
        
           | pavlov wrote:
           | Photoshop, PDF and Excel are all products that were truly
           | much better than their competitors at the time of their
           | introduction.
           | 
           | Every file format accumulates cruft over thirty years,
           | especially when you have hundreds of millions of users and
           | you have to expand the product for use cases the original
           | developers never imagined. But that doesn't mean the success
           | wasn't justified.
        
             | jalk wrote:
             | PDF is not a product. I get what you are day but I can't
             | say that I've ever liked Adobe Acrobat
        
               | pavlov wrote:
               | PDF is a product, just like PostScript was.
        
           | eternityforest wrote:
           | Most people use libraries to read and write the files, and
           | judge them pretty much entirely by popularity.
           | 
           | A very popular file format pretty much defines the semantics
           | and feature set for that category in everyone's mind, and if
           | you build around those features, then you can probably expect
           | good compatibility.
           | 
           | Nobody thinks about the actual on disk data layout, they
           | think about standardization and semantics.
           | 
           | I rather like PDF, although it doesn't seem to be well suited
           | for 500MB scans of old books and the like, they really seem
           | to bog down on older mobile devices.
        
         | merb wrote:
         | The durability is not so good when you have a lot of objects
        
           | achierius wrote:
           | Why not? I don't work with web-apps or otherwise use object
           | stores very often, but naively I would expect that "my
           | objects not disappearing" would be a good thing.
        
             | oblio wrote:
             | I think their point is that you'd need even higher
             | durability. With millions of objects, even 5+ nines means
             | that you lose objects relatively constantly.
        
         | laluser wrote:
         | It's _designed_ for that level of durability, but it's only as
         | good as a single change or correlated set of hardware failures
         | that can quickly change the theoretical durability model. Or
         | even corrupting data is possible too.
        
           | huntaub wrote:
           | You're totally correct, but these products also need to be
           | specifically designed against these failure cases (i.e. it's
           | more than just MTTR + MTTF == durability). You (of course)
           | can't just run deployments without validating that the
           | durability property is satisfied throughout the change.
        
             | laluser wrote:
             | Yep! There's a lot of checksum verification, carefully
             | orchestrated deployments, hardware diversity, erasure code
             | selection, the list goes on and on. I help run a multi-
             | exabyte storage system - I've seen a few things.
        
         | Gys wrote:
         | > their durability number means that one can expect to loose
         | data about once every 10,000 years
         | 
         | What does that mean? If I have 1 million objects, I loose 100
         | per year?
        
           | lukevp wrote:
           | What it means is in any given year, you have a 1 in 10,000
           | chance that a data loss event occurs. It doesn't stack like
           | that.
           | 
           | If you had light bulbs that lasted 1,000 hrs on average, and
           | you had 10k light bulbs, and turned them all on at once, then
           | they would all last 1,000 hours on average. Some would die
           | earlier and some later, but the top line number does not tell
           | you anything about the distribution, only the average (mean).
           | That's what MTTF is; the mean time for a given part to where
           | it has a greater likelihood to have failed by then vs not. It
           | doesn't tell you if the distribution of light bulbs burning
           | out is 10 hrs or 500 hrs wide. it's the latter, you'll start
           | seeing bulbs out within 750 hrs, but if the former it'd be
           | 995 hrs before anything burned out.
        
           | ceejayoz wrote:
           | Amazon claims 99.999999999% durability.
           | 
           | If you have ten million objects, you should lose one every
           | 10k years or so.
        
             | graemep wrote:
             | How does that compare to competitors and things like
             | distributed file systems?
        
               | huntaub wrote:
               | I generally see object storage systems advertise 11 9s of
               | availability. You would usually see a commercial
               | distributed file system (obviously stuff like Ceph and
               | Lustre will depend on your specific configuration)
               | advertise less (to trade off performance for durability).
        
               | gamegoblin wrote:
               | In general if you actually do the erasure coding math,
               | almost all distributed storage systems that use erasure
               | coding will have waaaaay more than 11 9s of theoretical
               | durability
               | 
               | S3's _original_ implementation might have only had 11 9s,
               | and it just doesn 't make sense to keep updating this
               | number, beyond a certain point it's just meaningless
               | 
               | Like "we have 20 nines" "oh yeah, well we have 30 nines!"
               | 
               | To give an example of why this is the case, if you go
               | from a 10:20 sharding scheme to a 20:40 sharding scheme,
               | your storage overhead is roughly the same (2x), but you
               | have doubled the number of nines
               | 
               | So it's quite easy to get a ton of theoretical 9s with
               | erasure coding
        
             | toolslive wrote:
             | it's really not that impressive, but you have to use
             | erasure coding (chop the data D in X parts, use these to
             | generate Y extra pieces, and store all X+Y of them) iso
             | replication (store D n times)
        
           | 8organicbits wrote:
           | Isn't it just a marketing number? I didn't think durability
           | was part of the S3 SLA, for example.
        
         | cruffle_duffle wrote:
         | > but as I learned more I realized that it had some wild
         | features that they all managed to hide properly so when you
         | start it looks like a simple system but you can gradually get
         | deeper and deeper until your S3 bucket is doing some pretty
         | sophisticated stuff.
         | 
         | Over the years working on countless projects I've come to
         | realize that the more "easy" something looks to an end user,
         | the more work it took to make it that way. It takes a _lot_ of
         | work to create and polish something to the state where you'd
         | call it graceful, elegant and beautiful.
         | 
         | There are exceptions for sure, but often times hidden under
         | every delightful interface is an iceberg of complexity. When
         | something "just works" you know there was a hell of a lot of
         | effort that went into making it so.
        
         | Cthulhu_ wrote:
         | I've used it for server backups too, just a simple webserver.
         | Built a script that takes the webserver files, config files and
         | makes a database dump, packages it all into a .tar.gz file on
         | monday mornings, and uploads it to S3 using a "write only into
         | this bucket" access key. In S3 I had it set up so it sends me
         | an email whenever a new file was added, and that anything older
         | than 3 weeks is put into cold storage.
         | 
         | Of course, I lost that script when the server crashed, the one
         | thing I didn't back up properly.
        
         | wodenokoto wrote:
         | I've never worked with AWS, but have a certification from GCP
         | and currently use Azure.
         | 
         | What do you see as special for S3? Isn't it just another
         | bucket?
        
         | angulardragon03 wrote:
         | I did a GCP training a while back, and the anecdote from one of
         | the trainers was that the Cloud Storage team (GCP's
         | S3-compatible product) hadn't lost a single byte of data since
         | GCS had existed as a product. Crazy at that scale.
        
           | hobo_in_library wrote:
           | Eh, they have lost a bit
        
       | conorjh wrote:
       | tf is that title supposed to mean?
        
         | dgfitz wrote:
         | Somehow gambling is directly correlated to using S3? That's the
         | best I got.
        
         | MillironX wrote:
         | Table stakes is a poker term meaning the absolute minimum
         | amount you are allowed to bet. So the title translates to "In
         | S3, simplicity is the bare minimum" or "In S3, simplicity is so
         | important that if we didn't have it, we might as well not even
         | have S3."
        
           | MatthewCampbell wrote:
           | It's a risky idiom in general because it's often used to
           | prevent debate. "Every existing product has feature X, so
           | feature X is table stakes." "Why are we testing whether we
           | really need this feature, it's table stakes!" My observation
           | has been that "table stakes" features are often the best ones
           | to reject. (Not so in the case of this title, though)
        
       | alberth wrote:
       | S3 is the simplest CRUD app you could create.
       | 
       | It's essentially just the 4 functions of C.R.U.D done to a file.
       | 
       | Most problems in tech are not that simple.
       | 
       | Note: not knocking the service. just pointing out not all things
       | are so inherently basic (and valuable at the same time).
        
         | riv991 wrote:
         | Isn't that the most impressive part? That the abstraction makes
         | it seem so simple
        
         | bdcravens wrote:
         | That's the public interface. The underlying architecture is
         | where the power is.
        
         | cruffle_duffle wrote:
         | That's when you really know you hid all the complexity well.
         | When people call your globally replicated data store with
         | granular permissions, sophisticated data retention policies,
         | versioning, and manage to have, what, seven (ten?) nines or
         | something, "simple".
         | 
         | No problem. I'm sure ChatGPT could cook up a replacement in a
         | weekend. Like Dropbox it's just rsync with some scripts that
         | glue it together. How hard could it possibly be?
         | 
         | I mean people serve entire websites right out of s3 buckets.
         | Using it as a crude CDN of sorts.
         | 
         | It's a modern marvel.
        
           | sebastiansm wrote:
           | I could build a netflix in a weekend.
        
         | golly_ned wrote:
         | A file system is simple. Open, read, close. Most tech problems
         | are not that simple. How hard could a filesystem be?
        
           | dekhn wrote:
           | Locking, checks after unclean shutdown, sparse files, high
           | performance, reliabilty.... are all things that make
           | filesystems harder.
        
         | shepherdjerred wrote:
         | Anyone can create a CRUD API. It takes a _lot_ of work to make
         | a CRUD API that scales with high availability and a reasonable
         | consistency model. The vast majority of engineers would take
         | months or years to write demo.
         | 
         | If you don't believe me, you might want to reconsider how
         | skilled the average developer _really_ is.
        
         | o10449366 wrote:
         | These comments are so uniquely "HN" cringe-worthy.
        
         | rglover wrote:
         | I used to have the same opinion until I built my own CDN.
         | Scaling something like that is no joke, let alone ensuring you
         | handle consistency and caching properly.
         | 
         | A basic implementation is simple, but at S3 scale, that's a
         | whole different ball game.
        
         | icedchai wrote:
         | Now add versioning, replication, logging, encryption, ACLs, CDN
         | integration, event triggering (Lambda). I could go on. These
         | are just some other features I can name off the top of my head.
         | And it all has to basically run with zero downtime, 24x7...
        
       | arnath wrote:
       | I found out last year that you can actually run a full SPA using
       | S3 and a CDN. It's kind of a nuts platform
        
         | ellisv wrote:
         | I use S3+Cloudfront for static sites and Cloudflare workers if
         | it needed.
         | 
         | It's always crazy to me that people will run a could be static
         | site on Netlify/Vercel/etc.
        
           | Cthulhu_ wrote:
           | We've used Netlify at previous projects, we used it because
           | it was easy. No AWS accounts or knowledge needed, just push
           | to master, let the CI build (it was a Gatsby site) and it was
           | live.
        
             | ellisv wrote:
             | I think Netlify is great but to me it's overkill if you
             | just have a static site.
             | 
             | I understand that Netlify is much simpler to get started
             | with and setting up an AWS account is somewhat more
             | complex. If you have several sites, it's worth spending the
             | time to learn.
        
         | jorams wrote:
         | Since everything you need to run "a full SPA" is to serve some
         | static files over an internet connection I'm not sure how that
         | tells you anything interesting about the platform. It's
         | basically the simplest thing a web server can do.
        
       | user9999999999 wrote:
       | if only metadata could be queried without processing a csv output
       | file first, imagine storing thumbnails in there even! copied
       | objects had actual events, not something you have to dig
       | cloudtrail for, you could get last update time from a bucket to
       | make caching easier
        
         | huntaub wrote:
         | Are you talking about getting metadata from many objects in the
         | bucket simultaneously? You might be interested in S3 Metadata h
         | ttps://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingM....
        
         | aeyes wrote:
         | When S3 Tables launched they made the Metadata available using
         | this technology. So you can query it like an Apache Iceberg
         | table.
         | 
         | https://aws.amazon.com/blogs/aws/introducing-queryable-objec...
        
         | user9999999999 wrote:
         | whoops. I was wrong! you can store base64 encoded metadata in
         | the object then run a HEAD request to get it, but its limited
         | to 2kb. also, you ~can~ query the metadata, but its latency is
         | more suited to batch processing and not lambdas
        
       | chasd00 wrote:
       | S3 was one of the first offerings coming out of AWS right? It's
       | pretty legendary and a great concept to begin with. You can tell
       | by how much sense it makes and then trying to wrap your ahead
       | around the web dev world pre-S3.
        
         | Cthulhu_ wrote:
         | ? The web dev world pre-S3 was pretty much the same, but you
         | stored your files on a regular server (and set up your own
         | redundancy and backup strategy). Not that much different to be
         | honest from an end user's point of view.
        
           | kevindamm wrote:
           | At a lot of places there wasn't even a redundancy nor backup
           | strategy, so it really was just as simple as registering with
           | a hosting company and ssh+ftp (or cPanel or something like
           | that for what amounted to the managed solutions of the time).
           | 
           | I agree, things before S3 weren't really that different. LAMP
           | stacks everywhere, and developer skills were very portable
           | between different deployments of these LAMP stacks. Single
           | machines didn't scale as much then, but for most small-medium
           | sites they really didn't need to.
        
       | ipsento606 wrote:
       | Lots of comments here talking about how great S3 is.
       | 
       | Anyone willing to give a cliff notes about what's good about it?
       | 
       | I've been running various sites and apps for a decade, but have
       | never touched S3 because the bandwidth costs are 1, sometimes 2
       | orders of magnitude more expensive than other static hosting
       | solutions.
        
         | waiwai933 wrote:
         | S3 is great for being able to stick files somewhere and not
         | have to think about any of the surrounding infrastructure on an
         | ongoing basis [1]. You don't have to worry about keeping a RAID
         | server, swapping out disks when one fails, etc.
         | 
         | For static hosting, it's fine, but as you say, it's not
         | necessarily the cheapest, though you can bring the cost down by
         | sticking a CDN (Cloudflare/CloudFront) in front of it. There
         | are other use cases where it really shines though.
         | 
         | [1]: I say ongoing basis because you will need to figure out
         | your security controls, etc. at the beginning so it's not
         | totally no-thought.
        
         | jjice wrote:
         | S3 has often fallen into a "catch all" solution for me whenever
         | I need to store data large enough that I don't want to keep it
         | in a database (RDBMS or Redis).
         | 
         | Need to save a file somewhere? Dump it in S3. It's generally
         | affordable (obviously dependent on scale and use), fast, easy,
         | and super configurable.
         | 
         | Being able to expose something to the outside, or with a
         | presigned URL is a huge advantage as well.
         | 
         | Off the top of my head, I think of application storage
         | generally in this tier ordering (just off the top of my head
         | based on the past few years of software development, no real
         | deep thought here):
         | 
         | 1. General application data that needs to be read, written, and
         | related - RDBMS
         | 
         | 2. Application data that needs to be read and written fast, no
         | relations - Redis
         | 
         | 3. Application data that is mostly stored and read - S3
         | 
         | Replace any of those with an equivalent storage layer.
        
         | ak217 wrote:
         | One underappreciated feature of S3 - that allowed it to excel
         | in workloads like the Tables feature described in the article -
         | is that it's able to function as the world's highest throughout
         | network filesystem. And you don't have to do anything to
         | configure it (as the article points out). By storing data on
         | S3, you get to access the full cross-sectional bandwidth of
         | EC2, which is colossal. For effectively all workloads, you will
         | max out your network connection before S3's. This enables
         | workloads that can't scale anywhere else. Things like data
         | pipelines generating unplanned hundred-terabit-per-second
         | traffic spikes with hotspots that would crash any filesystem
         | cluster I've ever seen. And you don't have to pay a lot for it-
         | once you're done using the bandwidth, you can archive the data
         | elsewhere or delete it.
        
           | huntaub wrote:
           | You've totally hit the nail on the head. This is the _real_
           | moat of S3, the fact that they have _so much_ front-end
           | throughput available from the gigantic buildout that folks
           | can take advantage of without _any_ capacity pre-planning.
        
         | nerdjon wrote:
         | There are a few things about S3 that I find extremely powerful.
         | 
         | The biggest is, if I need to store some data. I know what the
         | data is (so I don't need to worry about needing to in a moment
         | notice traverse a file structure for example, I know my
         | filenames), I can store that data, I don't need to figure out
         | how much space I need ahead of time and it is there when I need
         | it. Maybe it automatically moves to another storage tier to
         | save me some money but I can reliably assume it will be there
         | when I need it. Just that simplicity alone is worth a lot, I
         | never need to think later that I need to expand some space,
         | possibly introducing downtime depending on the setup, maybe
         | dealing with partitions, etc.
         | 
         | Related to that is static hosting. I have ran a CDN and other
         | static content out of S3 with cloud front in front of it. The
         | storage cost was almost non existent due to how little actual
         | data we were talking about and only paid for cloudfront costs
         | when there were requests. If nothing was being used it was
         | almost "free". Even when being used it was very cheap for my
         | use cases.
         | 
         | Creating daily inventory reports in S3 is awesome.
         | 
         | But the thing that really is almost "magic" once you understand
         | its quirks. Athena (and quick sight built on top of that and
         | similar tools). The ability to store data in S3 like inventory
         | reports that I already mentioned, access logs, cloud watch
         | logs, or any structured data that you may not need to query
         | often enough to warrant a full long running database. It may
         | cost you a few dollars to run your Athena query and it is not
         | going to be super quick, but if you know what you're looking
         | for it is amazing.
        
         | hadlock wrote:
         | Do you need to replace your SFTP server? S3. Do you need to
         | backup TB of db files? S3. Do you need a high performance web
         | cache? S3. Host your SPA? S3 backs cloudfront. Shared
         | filesystem between desktop computers? Probably a bad idea but
         | you can do it with S3. Need a way for customers to securely
         | drop files somewhere? Signed S3 URI. Need to store metrics?
         | Logs? S3. Load balancer logs? S3. And it's cheaper than an EBS
         | volume, and doesn't need resizing every couple of quarters. And
         | there are various SLAs which make it cheaper (Glacier) or more
         | expensive (High Performance). S3 makes a great storage backend
         | for a lot of use cases especially when your data is coming in
         | from multiple regions across the globe. There are some quibbles
         | about eventual consistency but in general it is an easy backend
         | to build for.
        
       | neerajk wrote:
       | S3 is the closest thing to magic I have seen. It deserves a
       | Whitman, but Gemini will have to do:
       | 
       | A lattice woven, intricate and deep, Where countless objects
       | safely lie asleep.
       | 
       | To lean upon its strength, its constant hum, A silent guardian,
       | 'til kingdom come.
       | 
       | Versioning whispers tales of changes past, While lifecycle rules
       | decree what shadows last.
       | 
       | The endless hunger of a world online, S3's silent symphony, a
       | work divine.
       | 
       | Consider then, this edifice of thought, A concept forged,
       | meticulously wrought.
       | 
       | More than a bucket, more than simple store, A foundational layer,
       | forevermore.
        
       | bob1029 wrote:
       | I really enjoy using S3 to serve arbitrary blobs. It perfectly
       | solves the problem space for my use cases.
       | 
       | I avoid getting tangled in authentication mess by simply naming
       | my files using type4 GUIDs and dumping them in public buckets.
       | The file name is effectively the authentication token and
       | expiration policies are used to deal with the edges.
       | 
       | This has been useful for problems like emailing customers
       | gigantic reports and transferring build artifacts between
       | systems. Having a stable URL that "just works" everywhere easily
       | pays for the S3 bill in terms of time & frustration saved.
        
         | bityard wrote:
         | My favorite use case for S3 API-compatible solutions: I often
         | run into systems that generate lots of arbitrary data that only
         | have temporary importance. A common example might be
         | intermediate build artifacts, or testing ephemera (browser
         | screenshots, etc). Things that are needed for X number of
         | months and then just need to disappear.
         | 
         | Yeah, we can dump those to a filesystem. But then we have to
         | ask which filesystem? What should the directory layout be? If
         | there are millions or billions of objects, walking the whole
         | tree gets expensive. Do we write a script to clean everything
         | up? Run it via cron or some other job runner?
         | 
         | With S3, you just write your artifact to S3 with a TTL and it
         | gets deleted automagically when it should. No cron jobs, no
         | walking the whole tree. And you can set up other lifecycle
         | options if you need it moved to other (cheaper) storage later
         | on, backups, versioning, and whatever else.
         | 
         | For on-prem, you have Minio, Garage, or SeaweedFS. These are
         | pretty nice to deploy the servers however you need for the
         | level of reliability/durability you require.
        
       | bentobean wrote:
       | It's funny--S3 started as a "simple" storage service, and now
       | it's handling entire table abstractions. Reminds me how SQL was
       | declared dead every few years, yet here we are, building ever
       | more complex data solutions on top of supposedly simple
       | foundations.
        
         | imiric wrote:
         | I instinctively distrust any software or protocol that implies
         | it is "simple" in its name: SNMP, SMTP, TFTP, SQS, etc. They're
         | usually the cause of an equal or more amount of headaches than
         | alternatives.
         | 
         | Maybe such solutions are a reaction to previous more "complex"
         | solutions, and they do indeed start simple, but inevitably get
         | swallowed by the complexity monster with age.
        
           | great_wubwub wrote:
           | TFTP is probably the exception to that rule. All the other
           | protocols started out easily enough and added more and more
           | cruft. TFTP stayed the way it's always been - minimalist,
           | terrifyingly awful at most things, handy for a few corner
           | cases. If you know when to use it and when to use something
           | like SCP, you're golden.
           | 
           | If TFTP had gone the way of SNMP, we'd have 'tftp <src>
           | <dest> --proto tcp --tls --retries 8 --log-type json' or some
           | horrendous mess like that.
        
       | paulddraper wrote:
       | > In S3 simplicity is table stakes
       | 
       | The S3 API is not "simple."
       | 
       | Authentication being a good part of that.
        
       | brikym wrote:
       | I'm curious about S3 tables. Azure has had tables in their
       | storage account product for years. What are the differences?
        
       | diroussel wrote:
       | It's great that they added iceberg support I guess, but it's a
       | shame that they also removed S3 Select. S3 Select wasn't perfect.
       | For instance the performance was no where near as good as using
       | DuckDB to scan a parquet file, since duck is smart, and S3 Select
       | does a full table scan.
       | 
       | But S3 Select is nearly way cheaper that the new iceberg support.
       | So if your needs are only for reading one parquet snapshot, we no
       | need to do updates, then this change is not welcome.
       | 
       | Great article though, and I was pleased to see this at the end:
       | 
       | > We've invested in a collaboration with DuckDB to accelerate
       | Iceberg support in Duck,
        
       | StratusBen wrote:
       | For those interested in S3 Tables which is referenced in this
       | blog post, we literally just published this overview on what they
       | are and cost considerations of them that people might find
       | interesting: https://www.vantage.sh/blog/amazon-s3-tables
        
         | 1a527dd5 wrote:
         | https://www.vantage.sh/blog/amazon-s3-tables#s3-tables-cost
         | 
         | I can't make head or tails of the beginning of this sentence:-
         | 
         | > Pricing for S3 Tables is all and all not bad.
         | 
         | Otherwise lovely article!
        
           | shawabawa3 wrote:
           | "all and all" is a typo for "all in all" which means
           | "overall", or "taking everything into consideration"
           | 
           | So they are saying the pricing is not bad considering
           | everything it does
        
       | CobrastanJorji wrote:
       | > When we moved S3 to a strong consistency model, the customer
       | reception was stronger than any of us expected.
       | 
       | This feels like one of those Apple-like stories about inventing
       | and discovering an amazing, brand new feature that delighted
       | customers but not mentioning the motivating factor of the
       | competing products that already had it. A more honest sentence
       | might have been "After years of customers complaining that the
       | other major cloud storage providers had strong consistency
       | models, customers were relieved when we finally joined the
       | party."
        
       | lizknope wrote:
       | Clicked on the article thinking it was about S3 Graphics, the
       | company that made the graphics chip in my first PC. Now I see
       | it's some amazon cloud storage thing.
        
       ___________________________________________________________________
       (page generated 2025-03-14 23:00 UTC)