[HN Gopher] S3 Express Is All You Need
___________________________________________________________________
S3 Express Is All You Need
Author : ryanworl
Score : 96 points
Date : 2023-11-28 19:04 UTC (3 hours ago)
(HTM) web link (www.warpstream.com)
(TXT) w3m dump (www.warpstream.com)
| BonoboIO wrote:
| Has anyone here a usecase which would perform better with this
| new S3 Express Tier?
|
| And a second question, would it be worth the 8x times surcharge?
| parhamn wrote:
| I think the key benefit brushed on by this article is the
| potential 10x improvement in access speeds (which has many
| applications, beyond reducing your s3 op charges).
|
| > S3 Express One Zone can improve data access speeds by 10x and
| reduce request costs by 50% compared to S3 Standard and scales
| to process millions of requests per minute.
| paulddraper wrote:
| A cache with large blobs (images, etc)
| awoimbee wrote:
| If it's only a cache it should be on EBS, which is still way
| faster and 2x less expensive. I started a migration to s3 for
| such a project (container image caching) but then stopped
| when I realized what I was doing.
| paulddraper wrote:
| 1. You'd need an access/authentication layer on top of
| that.
|
| 2. Variable throughput may be a concern.
|
| 3. You may have availability concerns.
| rbranson wrote:
| EBS attaches a single block storage volume to a single
| host[1]. S3 Express is a service-based object store. Apples
| and oranges.
|
| [1] Yes, I am aware of multi-attach but this introduces a
| scaling bottleneck and requires a fairly exotic setup.
| YetAnotherNick wrote:
| Yes, EBS is the gold standard but managing a EBS to scale
| up and down instantly, be available to multiple instances,
| lifecycle management, managing replica, switchover etc. are
| definitely not easy. And EBS are bad choice when throughput
| needed is very spiky.
| barsandtones wrote:
| This will work great with the s3 mount point that AWS recently
| released. This will outperform EFS if your application does not
| require full POSIX compatibility.
| tjoff wrote:
| > _However, the new storage class does open up an exciting new
| opportunity for all modern data infrastructure: the ability to
| tune an individual workload for low latency and higher cost or
| higher latency and lower cost with the exact same architecture
| and code._
|
| I get it, but at the same time that is also what you lost when
| you locked yourself in with a particular vendor.
| imheretolearn wrote:
| > I get it, but at the same time that is also what you lost
| when you locked yourself in with a particular vendor.
|
| What are other viable practical alternative solution(s)?
| toomuchtodo wrote:
| Storage adapter to talk S3 compatible to target, assuming
| you're not relying on vendor specific extensions or behavior
| (ie this).
|
| Off the top of my head, Backblaze B2, Cloudflare R2, etc are
| S3 compatible, and Minio locally.
|
| https://www.google.com/search?q=s3+compatible
| anamexis wrote:
| There are no vendor specific extensions or behavior here,
| are there? Isn't it just a different billing structure?
| jacobr1 wrote:
| Notifications, for event processing architectures aren't
| part of the API common to these systems
| williamdclt wrote:
| I suppose "super low latency" is behaviour, in the sense
| that "a large enough quantitative difference is a
| qualitative difference". If you rely on the perf and only
| S3 provides that, then you effectively are locked into S3
| implementation
| Spooky23 wrote:
| I used to run one on-prem from DDN. Another good one is
| Nutanix. There are many out there.
|
| If you have a big use case and you really understand your
| needs, it's very doable.
| influx wrote:
| There's not much to the S3 API, and data import/export even at
| massive scale is available with Snowball. Sure, there's many
| other AWS services that aren't available at other vendors, but
| blob storage is commodified at this point.
| amarshall wrote:
| Exporting data from S3 is ludicrously expensive, even with
| Snowball it's $30/TB just for network egress.
| paulddraper wrote:
| Except this has uniform billing, security, locality,
| monitoring, tools, etc
| tjoff wrote:
| I did mention vendor lock in?
| throwawaaarrgh wrote:
| You can use a different vendor any time, it's all S3
| compatible. You just don't get the same performance and
| billing.
| Sirupsen wrote:
| Most production storage systems/databases built on top of S3
| spend a significant amount of effort building an SSD/memory
| caching tier to make them performant enough for production (e.g.
| on top of RocksDB). But it's not easy to keep it in sync with
| blob...
|
| Even with the cache, the cold query latency lower-bound to S3 is
| subject to ~50ms roundtrips [0]. To build a performant system,
| you have to tightly control roundtrips. S3 Express changes that
| equation dramatically, as S3 Express approaches HDD random read
| speeds (single-digit ms), so we can build production systems that
| don't need an SSD cache--just the zero-copy, deserialized in-
| memory cache.
|
| Many systems will probably continue to have an SSD cache (~100 us
| random reads), but now MVPs can be built without it, and cold
| query latency goes down dramatically. That's a big deal
|
| We're currently building a vector database on top of object
| storage, so this is extremely timely for us... I hope GCS ships
| this ASAP. [1]
|
| [0]: https://github.com/sirupsen/napkin-math [1]:
| https://turbopuffer.com/
| jamesblonde wrote:
| We built HopsFS-S3 [0] for exactly this problem, and have
| running it as part of Hopsworks now for a number of years. It's
| a network-aware, write-through cache for S3 with a HDFS API.
| Metadata operations are performed on HopsFS, so you don't have
| the other problems list max listing operations return 1000
| files/dirs.
|
| NVMe is what is changing the equation, not SSD. NVMe disks now
| have up to 8 GB/s, although the crap in the cloud providers
| barely goes to 2 GB/s - and only for expensive instances. So,
| instead of 40X better throughput than S3, we can get like 10X.
| Right now, these workloads are much better on-premises on the
| cheapest m.2 NVMe disks ($200 for 4TB with 4 GB/s read/write)
| backed by a S3 object store like Scality.
|
| [0] https://www.hopsworks.ai/post/faster-than-aws-s3
| dekhn wrote:
| the numbers you're giving are throughput (byte/sec) not
| latency.
|
| The comment you reply to is talking mostly about latency -
| reporting that S3 object get latencies (time to open the
| object and return its head) in the single-digits ms, where S3
| was 50ms before.
|
| BTW EBS can do 4GB/sec per volume. But you will pay for it.
| throwitaway222 wrote:
| I don't understand why EFS never gets major shout outs - it's way
| better than S3: systems can mount it as a drive, shared across
| systems, already has had super low latency... Not sure what s3
| express is really useful for if EFS already exists.
| candiddevmike wrote:
| EFS is really expensive and has terrible latency with small
| files in my experience
| brazzledazzle wrote:
| Yeah the main reason is that it's incredibly expensive. You
| can improve performance by allocating ahead of time but NFS
| has never been at its best when working with a bunch of tiny
| files.
| richieartoul wrote:
| Do you have any more details you can share about the
| performance of EFS? I've never met anyone who has actually
| used it in anger.
| gchamonlive wrote:
| Throughput scales with the amount of data in it, it is in
| the docs. So depending on the application, even if latency
| is better, the speeds are atrocious at lower volumes of
| persisted data.
| saddlerustle wrote:
| That's not true anymore with EFS Elastic Throughput
| a2tech wrote:
| Yes, I built a moderately large system on it that used lots
| of small shared files. The performance was fairly terrible.
| There's weird little niggles with it--we had random
| slowdowns, throughput issues, and things just didn't work
| quite right.
|
| It was an ok solution for what we were doing, but several
| times I came really close to just dumping it and standing
| up an NFS server using EBS volumes.
|
| I also used it a couple of times to store webroots and that
| was a complete disaster with systems that had lots of small
| files (Drupal I'm looking at you).
| huntaub wrote:
| Note that EFS One Zone is priced the same as S3 Express One
| Zone with similar latency. One isn't better or worse than the
| other, it only depends on what kind of access your
| application needs.
| dekhn wrote:
| When you set up EFS did you maximize the IO settings?
|
| Before doing that it was unacceptably slow. After doing that
| it was unacceptably expensive.
| toomuchtodo wrote:
| EFS exists if you don't care much about spend and performance
| while having to forklift a POSIX compliant use case into AWS
| for persistent data.
| a2tech wrote:
| Thats basically how we were using it. It could have been
| worse.
| yeeeloit wrote:
| I wonder if Mountpoint for S3 along with this new Express
| option makes it a direct competitor to EFS for some use cases.
|
| https://docs.aws.amazon.com/AmazonS3/latest/userguide/mountp...
| tneely wrote:
| I'm quite curious about this too - both from a cost and
| performance perspective. If S3 Express is close enough to EFS
| on these metrics, then I'd say it wins out due to the sheer
| ubiquity and portability of S3 these days.
| sparrc wrote:
| In my experience the biggest drawback with EFS is startup time
| for systems that mount it in.
|
| For example a container or EC2 instance might only need a tiny
| bit of your storage and with s3 can just download what it needs
| when it needs it.
|
| As opposed to EFS where the container or instance needs to load
| in the entire datastore on startup which can add minutes to
| startup time if the EFS drive is large.
| dpedu wrote:
| My understanding is that EFS is exposed as an NFS share. I
| haven't used it personally, but NFS mounting is generally
| fast, nearly instant. What does "load in the entire
| datastore" mean?
| ericpauley wrote:
| EFS mounting is definitely nearly instant. I use it
| constantly.
| emgeee wrote:
| some additional context here is that warpstream is building a
| Kakfa compatible streaming system that uses s3 as the object
| store. This allows them to leverage cheap zone transfer costs for
| redundancy + automatic storage tiering to cut down on the costs
| of running and maintaining these systems. This has previously
| come at the cost of latency due to s3's read/write speeds but
| with S3 this makes them more competitive with Confluent Kafka's
| managed offerings for these latency sensitive applications.
|
| IMO warpstream is a really cool product and this new S3 offering
| makes them even better
| refset wrote:
| I am eager to hear how it will affect their latency numbers:
|
| > Engineering is about trade-offs, and we've made a significant
| one with WarpStream: latency. The current implementation has a
| P99 of ~400ms for Produce requests because we never acknowledge
| data until it has been durably persisted in S3 and committed to
| our cloud control plane. In addition, our current P99 latency
| of data end-to-end from producer-to-consumer is around 1s
|
| via https://www.warpstream.com/blog/kafka-is-dead-long-live-
| kafk...
| fswd wrote:
| I solved this problem locally. When uploading a file to the
| server before going to S3 it is cached in redis. Whenever the
| codebase needs to use the file, it checks redis, and if it is not
| there it fetches it and caches it again.
| jamiesonbecker wrote:
| Exactly. Write-through cache is exactly how Userify[0] used to
| work for self-hosted versions. (when it was Python, we used
| Redis to keep state synced across multiple processes, but now
| that it's a Go app, we do all the caching and state management
| in memory using Ristretto[1])
|
| However, we now install by default to local disk filesystem,
| since it's much faster to just do a periodic S3 hot sync, like
| with restic or aws-cli, than to treat S3 as the primary backing
| store, or just version the EBS or instance volume. The other
| reason you might want to use S3 as a primary is if you use a
| lot of disk, but our files are compressed and extremely small,
| even for a large installation with tens of thousands of users
| and instances.
|
| 0. https://userify.com (ssh key management + sudo for teams)
|
| 1. https://github.com/dgraph-io/ristretto
| osti wrote:
| If I'm not wrong, this is the low latency S3 that is written in
| Rust. Finally launched after years in the making.
| FridgeSeal wrote:
| Do you have any sources for that? Very interested to know more
| about this.
| osti wrote:
| Unfortunately I don't, this is already internal information
| that I don't know if I should say here. I never worked on S3
| and I no longer work at AWS so someone from within would have
| to weigh in.
| paulddraper wrote:
| Surely being written in a non-Rust language is not responsible
| for an extra 40ms of latency, right?
|
| Or is rust really that magic?
| osti wrote:
| Of course not, it's designed differently from the original
| S3. AWS came out with this to compete with Azure premium blob
| storage, which has very good first byte latency, and Azure
| had it 4 years ago..
|
| https://azure.microsoft.com/en-us/blog/premium-block-blob-
| st...
| estebarb wrote:
| ShardStore? (More info: https://www.thestack.technology/aws-
| shardstore-s3/ ) it seems that it was deployed years ago.
| francoismassot wrote:
| We tested S3 Express for our search engine quickwit [0] a couple
| of weeks ago.
|
| While this was really satisfying on the performance side, we were
| a bit disappointed by the price, and I mostly agree with the
| article on this matter.
|
| I can see some very specific use cases where the pricing should
| be OK but currently, I would say most of our users will just stay
| on the classic S3 and add some local SSD caching if they have a
| lot of requests.
|
| [0] https://github.com/quickwit-oss/quickwit/
| mgaunard wrote:
| Many S3 implementations appear to simply be transparent downloads
| to disk rather than a true "use the network as a disk".
___________________________________________________________________
(page generated 2023-11-28 23:00 UTC)