[HN Gopher] In S3 simplicity is table stakes
___________________________________________________________________
In S3 simplicity is table stakes
Author : riv991
Score : 155 points
Date : 2025-03-14 11:55 UTC (11 hours ago)
(HTM) web link (www.allthingsdistributed.com)
(TXT) w3m dump (www.allthingsdistributed.com)
| rook1 wrote:
| "I think one thing that we've learned from the work on Tables is
| that it's these _properties of storage_ that really define S3
| much more than the object API itself."
|
| Between the above philosophy, S3 Tables, and Express One Zone
| (SSD perf), it makes me really curious about what other storage
| modalities S3 moves towards supporting going forward.
| neom wrote:
| little history: When we were getting ready to do an API at
| DigitalOcean I got asked "uhm... how should it feel?" I thought
| about that for about 11 seconds and said "if all our APIs feel as
| good as S3, it should be fine" - it's a good API.
| flessner wrote:
| The API, absolutely.
|
| It's only sad that the SDKs are often on the heavy side, I
| remember that the npm package used to be multiple megabytes as
| it bundled large parts of the core AWS SDK. Nowadays, I believe
| it's still around half a megabyte.
| malfist wrote:
| I don't know if the npm is the same way, but the java sdk now
| has options by service. So you can include just s3 instead of
| all of aws
| rglover wrote:
| The footprint of the JS SDK is much better as they split all
| of it into service-specific packages, but the SDK APIs are a
| bit confusing (everything is a class that has to be
| instantiated--even config).
| greatpostman wrote:
| I can guarantee you, nothing is simple about S3. I bet engineers
| spend months on a single configuration change to the underlying
| subsystems.
| jacksnipe wrote:
| That's what makes it a good abstraction.
| ecshafer wrote:
| If you read the article, they say that exactly. Its by Dr
| Werner Vogels, they know exactly what goes into S3, since they
| are a principal engineer on the project.
| apetresc wrote:
| It's Vogels' blog but this is a guest post by Andy Warfield.
| ecshafer wrote:
| My mistake, I stand corrected.
| ginko wrote:
| Would have been good if they mentioned they meant Amazon S3. It
| took me a while to figure out what this was about.
|
| Initially I thought this was about S3 standby mode.
| waiwai933 wrote:
| > I've seen other examples where customers guess at new APIs they
| hope that S3 will launch, and have scripts that run in the
| background probing them for years! When we launch new features
| that introduce new REST verbs, we typically have a dashboard to
| report the call frequency of requests to it, and it's often the
| case that the team is surprised that the dashboard starts posting
| traffic as soon as it's up, even before the feature launches, and
| they discover that it's exactly these customer probes, guessing
| at a new feature.
|
| This surprises me; has anyone done something similar and
| benefitted from it? It's the sort of thing where I feel like
| you'd maybe get a result 1% of the time if that, and then only
| years later when everyone has moved on from the problem they were
| facing at the time...
| easton wrote:
| Maybe this is a faster way of getting AWS feature requests
| heard.
|
| I'm going to write a script that keeps trying to call
| ecs:MakeFargateCacheImages.
| ajb wrote:
| It could also be hackers, as when a new service launches is
| exactly when it will be most buggy. And the contents of S3
| are a big payoff.
| myflash13 wrote:
| I have a feeling that economies of scale have a point of
| diminishing returns. At what point does it become more costly and
| complicated to store your data on S3 versus just maintaining a
| server with RAID disks somewhere?
|
| S3 is an engineering marvel, but it's an insanely complicated
| backend architecture just to store some files.
| Tanjreeve wrote:
| Probably never. The complexity is borne by Amazon. Even before
| any of the development begins if you want a RAID setup with
| some sort of decent availability you've already multiplied your
| server costs by the number of replicas you'd need. It's a
| Sisyphean task that also has little value for most people.
|
| Much like twitter it's conceptually simple but it's a hard
| problem to solve at any scale beyond a toy.
| sjsdaiuasgdia wrote:
| That's going to depend a lot on what your needs are,
| particularly in terms of redundancy and durability. S3 takes
| care of a lot of that for you.
|
| One server with a RAID array can survive, usually, 1 or maybe 2
| drive failures. The remaining drives in the array will have to
| do more work when a failed drive is replaced and data is copied
| to the new array member. This sometimes leads to additional
| failures before replacement completes, because all the drives
| in the array are probably all the same model bought at the same
| time and thus have similar manufacturing quality and materials.
| This is part of why it's generally said that RAID != backup.
|
| You can make a local backup to something like another server
| with its own storage, external drives, or tape storage.
| Capacity, recovery time, and cost varies a lot across the
| available options here. Now you're protected against the
| original server failing, but you're not protected against
| location-based impacts - power/network outages, weather
| damage/flooding, fire, etc.
|
| You can make a remote backup. That can be in a location you own
| / control, or you can pay someone else to use their storage.
|
| Each layer of redundancy adds cost and complexity.
|
| AWS says they can guarantee 99.999999999% durability and 99.99%
| availability. You can absolutely design your own system that
| meets those thresholds, but that is far beyond what one server
| with a RAID array can do.
| vb-8448 wrote:
| How many businesses or applications really need 99.999999999%
| durability and 99.99% availability? Is your whole stack
| organized to deliver the forementioned durability and
| availability?
| huntaub wrote:
| I think that this is, to Andy's point, basically about
| simplicity. It's not that your business necessarily _needs_
| 11 9s of durability for continuity purposes, but it sure is
| nice that you _never_ have to think about the durability of
| the storage layer (vs. even something like EBS where 5 9s
| of durability isn 't quite enough to go from "improbable"
| to "impossible").
| sjsdaiuasgdia wrote:
| Different people and organizations will have different
| needs, as indicated in the first sentence of my post. For
| some use cases one server is totally fine, but it's good to
| think through your use cases and understand how loss of
| availability or loss of data would impact you, and how much
| you're willing to pay to avoid that.
|
| I'll note that data durability is a bit of a different
| concern than service availability. A service being down for
| some amount of time sucks, but it'll probably come back up
| at some point and life moves on. If data is lost
| completely, it's just _gone_. It 's going to have to be re-
| created from other sources, generated fresh, or accepted as
| irreplaceable and lost forever.
|
| Some use cases can tolerate losing some or all of the data.
| Many can't, so data durability tends to be a concern for
| non-trivial use cases.
| hadlock wrote:
| There are a lot of companies who their livelihood depends
| on their proprietary data, and loss of that data would be a
| company-ending-event. I'm not sure how the calculus works
| out exactly, but having additional backups and types of
| backups to reduce risk is probably one of the smaller
| business expenses one can pick up. Sending a couple TB of
| data to three+ cloud providers on top of your physical
| backups is in the tens of dollars per month.
| Asraelite wrote:
| > One server with a RAID array can survive, usually, 1 or
| maybe 2 drive failures.
|
| Standard RAID configurations can only handle 2 failures, but
| there are libraries and filesystems allowing arbitrarily high
| redundancy.
| sjsdaiuasgdia wrote:
| As long as it's all in one server, there's still a lot of
| situations that can immediately cut through all that
| redundancy.
|
| As long as it's all in one physical location, there's still
| fire and weather as ways to immediately cut through all
| that redundancy.
| Cthulhu_ wrote:
| There are a few stories from companies that moved _away_ from
| S3, like Dropbox, and who shared their investments and
| expenses.
|
| The long and short of it is that getting anywhere near the
| redundancy, reliability, performance etc of S3, you're spending
| A Lot of money.
| hylaride wrote:
| There is a diminishing return of what percentage you save,
| sure. But amazon will always be at that edge. They already have
| amortized the equipment, labour, administration, electricity,
| storage, cooling, etc.
|
| They also already have support for storage tiering,
| replication, encryption, ACLs, integration with other services
| (from web access to sending notifications of storage events to
| lambda, sqs, etc). Uou get all of this whether you're saving 1
| eight bit file or trillions of gigabyte sized ones.
|
| There are reasons why you may need to roll your own storage
| setup (regulatory, geographic, some other unique reason), but
| you'll never be more economical than S3, especially if the
| storage is mostly sitting idle.
| timewizard wrote:
| > At what point does it become more costly and complicated to
| store your data on S3 versus just maintaining a server with
| RAID disks somewhere?
|
| It's more costly immediately. S3 storage prices are above what
| you would pay even for triply redundant media and you have to
| pay for data transfer at a very high rate to both send and
| receive data to the public internet.
|
| It's far less complicated though. You just create a bucket and
| you're off to the races. Since the S3 API endpoints are all
| public there's not even a delay for spinning up the
| infrastructure.
|
| Where S3 shines for me is two things. Automatic lifecycle
| management. Objects can be moved in between storage classes
| based on the age of the object and even automatically deleted
| after expiration. The second is S3 events which are also
| _durable_ and make S3 into an actual appliance instead of just
| a convenient key/value store.
| zerd wrote:
| One interesting thing about S3 is the vast scale of it. E.g. if
| you need to store 3 PB of data you might need 150 HDDs +
| redundancy, but if you store it on S3 it's chopped up and put
| on tens of thousands of HDDs, which helps with IOPS and
| throughput. Of course that's shared with others, which is why
| smart placement is key, so that hot objects are spread out.
|
| Some details in
| https://www.allthingsdistributed.com/2023/07/building-and-op...
| / https://www.youtube.com/watch?v=sc3J4McebHE
| _fat_santa wrote:
| S3 is up there as one of my favorite tech products ever. Over the
| years I've used it for all sorts of things but most recently I've
| been using it to roll my own DB backup system.
|
| One of the things that shocks me about the system is the level of
| object durability. A few years ago I was taking an AWS
| certification course and learned that their durability number
| means that one can expect to loose data about once every 10,000
| years. Since then anytime I talk about S3's durability I bring up
| that example and it always seems to convey the point for a
| layperson.
|
| And it's "simplicity" is truly elegant. When I first started
| using S3 I thought of it as a dumb storage location but as I
| learned more I realized that it had some wild features that they
| all managed to hide properly so when you start it looks like a
| simple system but you can gradually get deeper and deeper until
| your S3 bucket is doing some pretty sophisticated stuff.
|
| Last thing I'll say is, you know your API is good when "S3
| compatable API" is a selling point of your competitors.
| victorp13 wrote:
| > Last thing I'll say is, you know your API is good when "S3
| compatable API" is a selling point of your competitors.
|
| Counter-point: You know that you're the dominant player. See:
| .psd, .pdf, .xslx. Not particularly good file types, yet widely
| supported by competitor products.
| pavlov wrote:
| Photoshop, PDF and Excel are all products that were truly
| much better than their competitors at the time of their
| introduction.
|
| Every file format accumulates cruft over thirty years,
| especially when you have hundreds of millions of users and
| you have to expand the product for use cases the original
| developers never imagined. But that doesn't mean the success
| wasn't justified.
| jalk wrote:
| PDF is not a product. I get what you are day but I can't
| say that I've ever liked Adobe Acrobat
| pavlov wrote:
| PDF is a product, just like PostScript was.
| eternityforest wrote:
| Most people use libraries to read and write the files, and
| judge them pretty much entirely by popularity.
|
| A very popular file format pretty much defines the semantics
| and feature set for that category in everyone's mind, and if
| you build around those features, then you can probably expect
| good compatibility.
|
| Nobody thinks about the actual on disk data layout, they
| think about standardization and semantics.
|
| I rather like PDF, although it doesn't seem to be well suited
| for 500MB scans of old books and the like, they really seem
| to bog down on older mobile devices.
| merb wrote:
| The durability is not so good when you have a lot of objects
| achierius wrote:
| Why not? I don't work with web-apps or otherwise use object
| stores very often, but naively I would expect that "my
| objects not disappearing" would be a good thing.
| oblio wrote:
| I think their point is that you'd need even higher
| durability. With millions of objects, even 5+ nines means
| that you lose objects relatively constantly.
| laluser wrote:
| It's _designed_ for that level of durability, but it's only as
| good as a single change or correlated set of hardware failures
| that can quickly change the theoretical durability model. Or
| even corrupting data is possible too.
| huntaub wrote:
| You're totally correct, but these products also need to be
| specifically designed against these failure cases (i.e. it's
| more than just MTTR + MTTF == durability). You (of course)
| can't just run deployments without validating that the
| durability property is satisfied throughout the change.
| laluser wrote:
| Yep! There's a lot of checksum verification, carefully
| orchestrated deployments, hardware diversity, erasure code
| selection, the list goes on and on. I help run a multi-
| exabyte storage system - I've seen a few things.
| Gys wrote:
| > their durability number means that one can expect to loose
| data about once every 10,000 years
|
| What does that mean? If I have 1 million objects, I loose 100
| per year?
| lukevp wrote:
| What it means is in any given year, you have a 1 in 10,000
| chance that a data loss event occurs. It doesn't stack like
| that.
|
| If you had light bulbs that lasted 1,000 hrs on average, and
| you had 10k light bulbs, and turned them all on at once, then
| they would all last 1,000 hours on average. Some would die
| earlier and some later, but the top line number does not tell
| you anything about the distribution, only the average (mean).
| That's what MTTF is; the mean time for a given part to where
| it has a greater likelihood to have failed by then vs not. It
| doesn't tell you if the distribution of light bulbs burning
| out is 10 hrs or 500 hrs wide. it's the latter, you'll start
| seeing bulbs out within 750 hrs, but if the former it'd be
| 995 hrs before anything burned out.
| ceejayoz wrote:
| Amazon claims 99.999999999% durability.
|
| If you have ten million objects, you should lose one every
| 10k years or so.
| graemep wrote:
| How does that compare to competitors and things like
| distributed file systems?
| huntaub wrote:
| I generally see object storage systems advertise 11 9s of
| availability. You would usually see a commercial
| distributed file system (obviously stuff like Ceph and
| Lustre will depend on your specific configuration)
| advertise less (to trade off performance for durability).
| gamegoblin wrote:
| In general if you actually do the erasure coding math,
| almost all distributed storage systems that use erasure
| coding will have waaaaay more than 11 9s of theoretical
| durability
|
| S3's _original_ implementation might have only had 11 9s,
| and it just doesn 't make sense to keep updating this
| number, beyond a certain point it's just meaningless
|
| Like "we have 20 nines" "oh yeah, well we have 30 nines!"
|
| To give an example of why this is the case, if you go
| from a 10:20 sharding scheme to a 20:40 sharding scheme,
| your storage overhead is roughly the same (2x), but you
| have doubled the number of nines
|
| So it's quite easy to get a ton of theoretical 9s with
| erasure coding
| toolslive wrote:
| it's really not that impressive, but you have to use
| erasure coding (chop the data D in X parts, use these to
| generate Y extra pieces, and store all X+Y of them) iso
| replication (store D n times)
| 8organicbits wrote:
| Isn't it just a marketing number? I didn't think durability
| was part of the S3 SLA, for example.
| cruffle_duffle wrote:
| > but as I learned more I realized that it had some wild
| features that they all managed to hide properly so when you
| start it looks like a simple system but you can gradually get
| deeper and deeper until your S3 bucket is doing some pretty
| sophisticated stuff.
|
| Over the years working on countless projects I've come to
| realize that the more "easy" something looks to an end user,
| the more work it took to make it that way. It takes a _lot_ of
| work to create and polish something to the state where you'd
| call it graceful, elegant and beautiful.
|
| There are exceptions for sure, but often times hidden under
| every delightful interface is an iceberg of complexity. When
| something "just works" you know there was a hell of a lot of
| effort that went into making it so.
| Cthulhu_ wrote:
| I've used it for server backups too, just a simple webserver.
| Built a script that takes the webserver files, config files and
| makes a database dump, packages it all into a .tar.gz file on
| monday mornings, and uploads it to S3 using a "write only into
| this bucket" access key. In S3 I had it set up so it sends me
| an email whenever a new file was added, and that anything older
| than 3 weeks is put into cold storage.
|
| Of course, I lost that script when the server crashed, the one
| thing I didn't back up properly.
| wodenokoto wrote:
| I've never worked with AWS, but have a certification from GCP
| and currently use Azure.
|
| What do you see as special for S3? Isn't it just another
| bucket?
| angulardragon03 wrote:
| I did a GCP training a while back, and the anecdote from one of
| the trainers was that the Cloud Storage team (GCP's
| S3-compatible product) hadn't lost a single byte of data since
| GCS had existed as a product. Crazy at that scale.
| hobo_in_library wrote:
| Eh, they have lost a bit
| conorjh wrote:
| tf is that title supposed to mean?
| dgfitz wrote:
| Somehow gambling is directly correlated to using S3? That's the
| best I got.
| MillironX wrote:
| Table stakes is a poker term meaning the absolute minimum
| amount you are allowed to bet. So the title translates to "In
| S3, simplicity is the bare minimum" or "In S3, simplicity is so
| important that if we didn't have it, we might as well not even
| have S3."
| MatthewCampbell wrote:
| It's a risky idiom in general because it's often used to
| prevent debate. "Every existing product has feature X, so
| feature X is table stakes." "Why are we testing whether we
| really need this feature, it's table stakes!" My observation
| has been that "table stakes" features are often the best ones
| to reject. (Not so in the case of this title, though)
| alberth wrote:
| S3 is the simplest CRUD app you could create.
|
| It's essentially just the 4 functions of C.R.U.D done to a file.
|
| Most problems in tech are not that simple.
|
| Note: not knocking the service. just pointing out not all things
| are so inherently basic (and valuable at the same time).
| riv991 wrote:
| Isn't that the most impressive part? That the abstraction makes
| it seem so simple
| bdcravens wrote:
| That's the public interface. The underlying architecture is
| where the power is.
| cruffle_duffle wrote:
| That's when you really know you hid all the complexity well.
| When people call your globally replicated data store with
| granular permissions, sophisticated data retention policies,
| versioning, and manage to have, what, seven (ten?) nines or
| something, "simple".
|
| No problem. I'm sure ChatGPT could cook up a replacement in a
| weekend. Like Dropbox it's just rsync with some scripts that
| glue it together. How hard could it possibly be?
|
| I mean people serve entire websites right out of s3 buckets.
| Using it as a crude CDN of sorts.
|
| It's a modern marvel.
| sebastiansm wrote:
| I could build a netflix in a weekend.
| golly_ned wrote:
| A file system is simple. Open, read, close. Most tech problems
| are not that simple. How hard could a filesystem be?
| dekhn wrote:
| Locking, checks after unclean shutdown, sparse files, high
| performance, reliabilty.... are all things that make
| filesystems harder.
| shepherdjerred wrote:
| Anyone can create a CRUD API. It takes a _lot_ of work to make
| a CRUD API that scales with high availability and a reasonable
| consistency model. The vast majority of engineers would take
| months or years to write demo.
|
| If you don't believe me, you might want to reconsider how
| skilled the average developer _really_ is.
| o10449366 wrote:
| These comments are so uniquely "HN" cringe-worthy.
| rglover wrote:
| I used to have the same opinion until I built my own CDN.
| Scaling something like that is no joke, let alone ensuring you
| handle consistency and caching properly.
|
| A basic implementation is simple, but at S3 scale, that's a
| whole different ball game.
| icedchai wrote:
| Now add versioning, replication, logging, encryption, ACLs, CDN
| integration, event triggering (Lambda). I could go on. These
| are just some other features I can name off the top of my head.
| And it all has to basically run with zero downtime, 24x7...
| arnath wrote:
| I found out last year that you can actually run a full SPA using
| S3 and a CDN. It's kind of a nuts platform
| ellisv wrote:
| I use S3+Cloudfront for static sites and Cloudflare workers if
| it needed.
|
| It's always crazy to me that people will run a could be static
| site on Netlify/Vercel/etc.
| Cthulhu_ wrote:
| We've used Netlify at previous projects, we used it because
| it was easy. No AWS accounts or knowledge needed, just push
| to master, let the CI build (it was a Gatsby site) and it was
| live.
| ellisv wrote:
| I think Netlify is great but to me it's overkill if you
| just have a static site.
|
| I understand that Netlify is much simpler to get started
| with and setting up an AWS account is somewhat more
| complex. If you have several sites, it's worth spending the
| time to learn.
| jorams wrote:
| Since everything you need to run "a full SPA" is to serve some
| static files over an internet connection I'm not sure how that
| tells you anything interesting about the platform. It's
| basically the simplest thing a web server can do.
| user9999999999 wrote:
| if only metadata could be queried without processing a csv output
| file first, imagine storing thumbnails in there even! copied
| objects had actual events, not something you have to dig
| cloudtrail for, you could get last update time from a bucket to
| make caching easier
| huntaub wrote:
| Are you talking about getting metadata from many objects in the
| bucket simultaneously? You might be interested in S3 Metadata h
| ttps://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingM....
| aeyes wrote:
| When S3 Tables launched they made the Metadata available using
| this technology. So you can query it like an Apache Iceberg
| table.
|
| https://aws.amazon.com/blogs/aws/introducing-queryable-objec...
| user9999999999 wrote:
| whoops. I was wrong! you can store base64 encoded metadata in
| the object then run a HEAD request to get it, but its limited
| to 2kb. also, you ~can~ query the metadata, but its latency is
| more suited to batch processing and not lambdas
| chasd00 wrote:
| S3 was one of the first offerings coming out of AWS right? It's
| pretty legendary and a great concept to begin with. You can tell
| by how much sense it makes and then trying to wrap your ahead
| around the web dev world pre-S3.
| Cthulhu_ wrote:
| ? The web dev world pre-S3 was pretty much the same, but you
| stored your files on a regular server (and set up your own
| redundancy and backup strategy). Not that much different to be
| honest from an end user's point of view.
| kevindamm wrote:
| At a lot of places there wasn't even a redundancy nor backup
| strategy, so it really was just as simple as registering with
| a hosting company and ssh+ftp (or cPanel or something like
| that for what amounted to the managed solutions of the time).
|
| I agree, things before S3 weren't really that different. LAMP
| stacks everywhere, and developer skills were very portable
| between different deployments of these LAMP stacks. Single
| machines didn't scale as much then, but for most small-medium
| sites they really didn't need to.
| ipsento606 wrote:
| Lots of comments here talking about how great S3 is.
|
| Anyone willing to give a cliff notes about what's good about it?
|
| I've been running various sites and apps for a decade, but have
| never touched S3 because the bandwidth costs are 1, sometimes 2
| orders of magnitude more expensive than other static hosting
| solutions.
| waiwai933 wrote:
| S3 is great for being able to stick files somewhere and not
| have to think about any of the surrounding infrastructure on an
| ongoing basis [1]. You don't have to worry about keeping a RAID
| server, swapping out disks when one fails, etc.
|
| For static hosting, it's fine, but as you say, it's not
| necessarily the cheapest, though you can bring the cost down by
| sticking a CDN (Cloudflare/CloudFront) in front of it. There
| are other use cases where it really shines though.
|
| [1]: I say ongoing basis because you will need to figure out
| your security controls, etc. at the beginning so it's not
| totally no-thought.
| jjice wrote:
| S3 has often fallen into a "catch all" solution for me whenever
| I need to store data large enough that I don't want to keep it
| in a database (RDBMS or Redis).
|
| Need to save a file somewhere? Dump it in S3. It's generally
| affordable (obviously dependent on scale and use), fast, easy,
| and super configurable.
|
| Being able to expose something to the outside, or with a
| presigned URL is a huge advantage as well.
|
| Off the top of my head, I think of application storage
| generally in this tier ordering (just off the top of my head
| based on the past few years of software development, no real
| deep thought here):
|
| 1. General application data that needs to be read, written, and
| related - RDBMS
|
| 2. Application data that needs to be read and written fast, no
| relations - Redis
|
| 3. Application data that is mostly stored and read - S3
|
| Replace any of those with an equivalent storage layer.
| ak217 wrote:
| One underappreciated feature of S3 - that allowed it to excel
| in workloads like the Tables feature described in the article -
| is that it's able to function as the world's highest throughout
| network filesystem. And you don't have to do anything to
| configure it (as the article points out). By storing data on
| S3, you get to access the full cross-sectional bandwidth of
| EC2, which is colossal. For effectively all workloads, you will
| max out your network connection before S3's. This enables
| workloads that can't scale anywhere else. Things like data
| pipelines generating unplanned hundred-terabit-per-second
| traffic spikes with hotspots that would crash any filesystem
| cluster I've ever seen. And you don't have to pay a lot for it-
| once you're done using the bandwidth, you can archive the data
| elsewhere or delete it.
| huntaub wrote:
| You've totally hit the nail on the head. This is the _real_
| moat of S3, the fact that they have _so much_ front-end
| throughput available from the gigantic buildout that folks
| can take advantage of without _any_ capacity pre-planning.
| nerdjon wrote:
| There are a few things about S3 that I find extremely powerful.
|
| The biggest is, if I need to store some data. I know what the
| data is (so I don't need to worry about needing to in a moment
| notice traverse a file structure for example, I know my
| filenames), I can store that data, I don't need to figure out
| how much space I need ahead of time and it is there when I need
| it. Maybe it automatically moves to another storage tier to
| save me some money but I can reliably assume it will be there
| when I need it. Just that simplicity alone is worth a lot, I
| never need to think later that I need to expand some space,
| possibly introducing downtime depending on the setup, maybe
| dealing with partitions, etc.
|
| Related to that is static hosting. I have ran a CDN and other
| static content out of S3 with cloud front in front of it. The
| storage cost was almost non existent due to how little actual
| data we were talking about and only paid for cloudfront costs
| when there were requests. If nothing was being used it was
| almost "free". Even when being used it was very cheap for my
| use cases.
|
| Creating daily inventory reports in S3 is awesome.
|
| But the thing that really is almost "magic" once you understand
| its quirks. Athena (and quick sight built on top of that and
| similar tools). The ability to store data in S3 like inventory
| reports that I already mentioned, access logs, cloud watch
| logs, or any structured data that you may not need to query
| often enough to warrant a full long running database. It may
| cost you a few dollars to run your Athena query and it is not
| going to be super quick, but if you know what you're looking
| for it is amazing.
| hadlock wrote:
| Do you need to replace your SFTP server? S3. Do you need to
| backup TB of db files? S3. Do you need a high performance web
| cache? S3. Host your SPA? S3 backs cloudfront. Shared
| filesystem between desktop computers? Probably a bad idea but
| you can do it with S3. Need a way for customers to securely
| drop files somewhere? Signed S3 URI. Need to store metrics?
| Logs? S3. Load balancer logs? S3. And it's cheaper than an EBS
| volume, and doesn't need resizing every couple of quarters. And
| there are various SLAs which make it cheaper (Glacier) or more
| expensive (High Performance). S3 makes a great storage backend
| for a lot of use cases especially when your data is coming in
| from multiple regions across the globe. There are some quibbles
| about eventual consistency but in general it is an easy backend
| to build for.
| neerajk wrote:
| S3 is the closest thing to magic I have seen. It deserves a
| Whitman, but Gemini will have to do:
|
| A lattice woven, intricate and deep, Where countless objects
| safely lie asleep.
|
| To lean upon its strength, its constant hum, A silent guardian,
| 'til kingdom come.
|
| Versioning whispers tales of changes past, While lifecycle rules
| decree what shadows last.
|
| The endless hunger of a world online, S3's silent symphony, a
| work divine.
|
| Consider then, this edifice of thought, A concept forged,
| meticulously wrought.
|
| More than a bucket, more than simple store, A foundational layer,
| forevermore.
| bob1029 wrote:
| I really enjoy using S3 to serve arbitrary blobs. It perfectly
| solves the problem space for my use cases.
|
| I avoid getting tangled in authentication mess by simply naming
| my files using type4 GUIDs and dumping them in public buckets.
| The file name is effectively the authentication token and
| expiration policies are used to deal with the edges.
|
| This has been useful for problems like emailing customers
| gigantic reports and transferring build artifacts between
| systems. Having a stable URL that "just works" everywhere easily
| pays for the S3 bill in terms of time & frustration saved.
| bityard wrote:
| My favorite use case for S3 API-compatible solutions: I often
| run into systems that generate lots of arbitrary data that only
| have temporary importance. A common example might be
| intermediate build artifacts, or testing ephemera (browser
| screenshots, etc). Things that are needed for X number of
| months and then just need to disappear.
|
| Yeah, we can dump those to a filesystem. But then we have to
| ask which filesystem? What should the directory layout be? If
| there are millions or billions of objects, walking the whole
| tree gets expensive. Do we write a script to clean everything
| up? Run it via cron or some other job runner?
|
| With S3, you just write your artifact to S3 with a TTL and it
| gets deleted automagically when it should. No cron jobs, no
| walking the whole tree. And you can set up other lifecycle
| options if you need it moved to other (cheaper) storage later
| on, backups, versioning, and whatever else.
|
| For on-prem, you have Minio, Garage, or SeaweedFS. These are
| pretty nice to deploy the servers however you need for the
| level of reliability/durability you require.
| bentobean wrote:
| It's funny--S3 started as a "simple" storage service, and now
| it's handling entire table abstractions. Reminds me how SQL was
| declared dead every few years, yet here we are, building ever
| more complex data solutions on top of supposedly simple
| foundations.
| imiric wrote:
| I instinctively distrust any software or protocol that implies
| it is "simple" in its name: SNMP, SMTP, TFTP, SQS, etc. They're
| usually the cause of an equal or more amount of headaches than
| alternatives.
|
| Maybe such solutions are a reaction to previous more "complex"
| solutions, and they do indeed start simple, but inevitably get
| swallowed by the complexity monster with age.
| great_wubwub wrote:
| TFTP is probably the exception to that rule. All the other
| protocols started out easily enough and added more and more
| cruft. TFTP stayed the way it's always been - minimalist,
| terrifyingly awful at most things, handy for a few corner
| cases. If you know when to use it and when to use something
| like SCP, you're golden.
|
| If TFTP had gone the way of SNMP, we'd have 'tftp <src>
| <dest> --proto tcp --tls --retries 8 --log-type json' or some
| horrendous mess like that.
| paulddraper wrote:
| > In S3 simplicity is table stakes
|
| The S3 API is not "simple."
|
| Authentication being a good part of that.
| brikym wrote:
| I'm curious about S3 tables. Azure has had tables in their
| storage account product for years. What are the differences?
| diroussel wrote:
| It's great that they added iceberg support I guess, but it's a
| shame that they also removed S3 Select. S3 Select wasn't perfect.
| For instance the performance was no where near as good as using
| DuckDB to scan a parquet file, since duck is smart, and S3 Select
| does a full table scan.
|
| But S3 Select is nearly way cheaper that the new iceberg support.
| So if your needs are only for reading one parquet snapshot, we no
| need to do updates, then this change is not welcome.
|
| Great article though, and I was pleased to see this at the end:
|
| > We've invested in a collaboration with DuckDB to accelerate
| Iceberg support in Duck,
| StratusBen wrote:
| For those interested in S3 Tables which is referenced in this
| blog post, we literally just published this overview on what they
| are and cost considerations of them that people might find
| interesting: https://www.vantage.sh/blog/amazon-s3-tables
| 1a527dd5 wrote:
| https://www.vantage.sh/blog/amazon-s3-tables#s3-tables-cost
|
| I can't make head or tails of the beginning of this sentence:-
|
| > Pricing for S3 Tables is all and all not bad.
|
| Otherwise lovely article!
| shawabawa3 wrote:
| "all and all" is a typo for "all in all" which means
| "overall", or "taking everything into consideration"
|
| So they are saying the pricing is not bad considering
| everything it does
| CobrastanJorji wrote:
| > When we moved S3 to a strong consistency model, the customer
| reception was stronger than any of us expected.
|
| This feels like one of those Apple-like stories about inventing
| and discovering an amazing, brand new feature that delighted
| customers but not mentioning the motivating factor of the
| competing products that already had it. A more honest sentence
| might have been "After years of customers complaining that the
| other major cloud storage providers had strong consistency
| models, customers were relieved when we finally joined the
| party."
| lizknope wrote:
| Clicked on the article thinking it was about S3 Graphics, the
| company that made the graphics chip in my first PC. Now I see
| it's some amazon cloud storage thing.
___________________________________________________________________
(page generated 2025-03-14 23:00 UTC)