[HN Gopher] S3 Express Is All You Need
       ___________________________________________________________________
        
       S3 Express Is All You Need
        
       Author : ryanworl
       Score  : 96 points
       Date   : 2023-11-28 19:04 UTC (3 hours ago)
        
 (HTM) web link (www.warpstream.com)
 (TXT) w3m dump (www.warpstream.com)
        
       | BonoboIO wrote:
       | Has anyone here a usecase which would perform better with this
       | new S3 Express Tier?
       | 
       | And a second question, would it be worth the 8x times surcharge?
        
         | parhamn wrote:
         | I think the key benefit brushed on by this article is the
         | potential 10x improvement in access speeds (which has many
         | applications, beyond reducing your s3 op charges).
         | 
         | > S3 Express One Zone can improve data access speeds by 10x and
         | reduce request costs by 50% compared to S3 Standard and scales
         | to process millions of requests per minute.
        
         | paulddraper wrote:
         | A cache with large blobs (images, etc)
        
           | awoimbee wrote:
           | If it's only a cache it should be on EBS, which is still way
           | faster and 2x less expensive. I started a migration to s3 for
           | such a project (container image caching) but then stopped
           | when I realized what I was doing.
        
             | paulddraper wrote:
             | 1. You'd need an access/authentication layer on top of
             | that.
             | 
             | 2. Variable throughput may be a concern.
             | 
             | 3. You may have availability concerns.
        
             | rbranson wrote:
             | EBS attaches a single block storage volume to a single
             | host[1]. S3 Express is a service-based object store. Apples
             | and oranges.
             | 
             | [1] Yes, I am aware of multi-attach but this introduces a
             | scaling bottleneck and requires a fairly exotic setup.
        
             | YetAnotherNick wrote:
             | Yes, EBS is the gold standard but managing a EBS to scale
             | up and down instantly, be available to multiple instances,
             | lifecycle management, managing replica, switchover etc. are
             | definitely not easy. And EBS are bad choice when throughput
             | needed is very spiky.
        
         | barsandtones wrote:
         | This will work great with the s3 mount point that AWS recently
         | released. This will outperform EFS if your application does not
         | require full POSIX compatibility.
        
       | tjoff wrote:
       | > _However, the new storage class does open up an exciting new
       | opportunity for all modern data infrastructure: the ability to
       | tune an individual workload for low latency and higher cost or
       | higher latency and lower cost with the exact same architecture
       | and code._
       | 
       | I get it, but at the same time that is also what you lost when
       | you locked yourself in with a particular vendor.
        
         | imheretolearn wrote:
         | > I get it, but at the same time that is also what you lost
         | when you locked yourself in with a particular vendor.
         | 
         | What are other viable practical alternative solution(s)?
        
           | toomuchtodo wrote:
           | Storage adapter to talk S3 compatible to target, assuming
           | you're not relying on vendor specific extensions or behavior
           | (ie this).
           | 
           | Off the top of my head, Backblaze B2, Cloudflare R2, etc are
           | S3 compatible, and Minio locally.
           | 
           | https://www.google.com/search?q=s3+compatible
        
             | anamexis wrote:
             | There are no vendor specific extensions or behavior here,
             | are there? Isn't it just a different billing structure?
        
               | jacobr1 wrote:
               | Notifications, for event processing architectures aren't
               | part of the API common to these systems
        
               | williamdclt wrote:
               | I suppose "super low latency" is behaviour, in the sense
               | that "a large enough quantitative difference is a
               | qualitative difference". If you rely on the perf and only
               | S3 provides that, then you effectively are locked into S3
               | implementation
        
           | Spooky23 wrote:
           | I used to run one on-prem from DDN. Another good one is
           | Nutanix. There are many out there.
           | 
           | If you have a big use case and you really understand your
           | needs, it's very doable.
        
         | influx wrote:
         | There's not much to the S3 API, and data import/export even at
         | massive scale is available with Snowball. Sure, there's many
         | other AWS services that aren't available at other vendors, but
         | blob storage is commodified at this point.
        
           | amarshall wrote:
           | Exporting data from S3 is ludicrously expensive, even with
           | Snowball it's $30/TB just for network egress.
        
         | paulddraper wrote:
         | Except this has uniform billing, security, locality,
         | monitoring, tools, etc
        
           | tjoff wrote:
           | I did mention vendor lock in?
        
         | throwawaaarrgh wrote:
         | You can use a different vendor any time, it's all S3
         | compatible. You just don't get the same performance and
         | billing.
        
       | Sirupsen wrote:
       | Most production storage systems/databases built on top of S3
       | spend a significant amount of effort building an SSD/memory
       | caching tier to make them performant enough for production (e.g.
       | on top of RocksDB). But it's not easy to keep it in sync with
       | blob...
       | 
       | Even with the cache, the cold query latency lower-bound to S3 is
       | subject to ~50ms roundtrips [0]. To build a performant system,
       | you have to tightly control roundtrips. S3 Express changes that
       | equation dramatically, as S3 Express approaches HDD random read
       | speeds (single-digit ms), so we can build production systems that
       | don't need an SSD cache--just the zero-copy, deserialized in-
       | memory cache.
       | 
       | Many systems will probably continue to have an SSD cache (~100 us
       | random reads), but now MVPs can be built without it, and cold
       | query latency goes down dramatically. That's a big deal
       | 
       | We're currently building a vector database on top of object
       | storage, so this is extremely timely for us... I hope GCS ships
       | this ASAP. [1]
       | 
       | [0]: https://github.com/sirupsen/napkin-math [1]:
       | https://turbopuffer.com/
        
         | jamesblonde wrote:
         | We built HopsFS-S3 [0] for exactly this problem, and have
         | running it as part of Hopsworks now for a number of years. It's
         | a network-aware, write-through cache for S3 with a HDFS API.
         | Metadata operations are performed on HopsFS, so you don't have
         | the other problems list max listing operations return 1000
         | files/dirs.
         | 
         | NVMe is what is changing the equation, not SSD. NVMe disks now
         | have up to 8 GB/s, although the crap in the cloud providers
         | barely goes to 2 GB/s - and only for expensive instances. So,
         | instead of 40X better throughput than S3, we can get like 10X.
         | Right now, these workloads are much better on-premises on the
         | cheapest m.2 NVMe disks ($200 for 4TB with 4 GB/s read/write)
         | backed by a S3 object store like Scality.
         | 
         | [0] https://www.hopsworks.ai/post/faster-than-aws-s3
        
           | dekhn wrote:
           | the numbers you're giving are throughput (byte/sec) not
           | latency.
           | 
           | The comment you reply to is talking mostly about latency -
           | reporting that S3 object get latencies (time to open the
           | object and return its head) in the single-digits ms, where S3
           | was 50ms before.
           | 
           | BTW EBS can do 4GB/sec per volume. But you will pay for it.
        
       | throwitaway222 wrote:
       | I don't understand why EFS never gets major shout outs - it's way
       | better than S3: systems can mount it as a drive, shared across
       | systems, already has had super low latency... Not sure what s3
       | express is really useful for if EFS already exists.
        
         | candiddevmike wrote:
         | EFS is really expensive and has terrible latency with small
         | files in my experience
        
           | brazzledazzle wrote:
           | Yeah the main reason is that it's incredibly expensive. You
           | can improve performance by allocating ahead of time but NFS
           | has never been at its best when working with a bunch of tiny
           | files.
        
           | richieartoul wrote:
           | Do you have any more details you can share about the
           | performance of EFS? I've never met anyone who has actually
           | used it in anger.
        
             | gchamonlive wrote:
             | Throughput scales with the amount of data in it, it is in
             | the docs. So depending on the application, even if latency
             | is better, the speeds are atrocious at lower volumes of
             | persisted data.
        
               | saddlerustle wrote:
               | That's not true anymore with EFS Elastic Throughput
        
             | a2tech wrote:
             | Yes, I built a moderately large system on it that used lots
             | of small shared files. The performance was fairly terrible.
             | There's weird little niggles with it--we had random
             | slowdowns, throughput issues, and things just didn't work
             | quite right.
             | 
             | It was an ok solution for what we were doing, but several
             | times I came really close to just dumping it and standing
             | up an NFS server using EBS volumes.
             | 
             | I also used it a couple of times to store webroots and that
             | was a complete disaster with systems that had lots of small
             | files (Drupal I'm looking at you).
        
           | huntaub wrote:
           | Note that EFS One Zone is priced the same as S3 Express One
           | Zone with similar latency. One isn't better or worse than the
           | other, it only depends on what kind of access your
           | application needs.
        
           | dekhn wrote:
           | When you set up EFS did you maximize the IO settings?
           | 
           | Before doing that it was unacceptably slow. After doing that
           | it was unacceptably expensive.
        
         | toomuchtodo wrote:
         | EFS exists if you don't care much about spend and performance
         | while having to forklift a POSIX compliant use case into AWS
         | for persistent data.
        
           | a2tech wrote:
           | Thats basically how we were using it. It could have been
           | worse.
        
         | yeeeloit wrote:
         | I wonder if Mountpoint for S3 along with this new Express
         | option makes it a direct competitor to EFS for some use cases.
         | 
         | https://docs.aws.amazon.com/AmazonS3/latest/userguide/mountp...
        
           | tneely wrote:
           | I'm quite curious about this too - both from a cost and
           | performance perspective. If S3 Express is close enough to EFS
           | on these metrics, then I'd say it wins out due to the sheer
           | ubiquity and portability of S3 these days.
        
         | sparrc wrote:
         | In my experience the biggest drawback with EFS is startup time
         | for systems that mount it in.
         | 
         | For example a container or EC2 instance might only need a tiny
         | bit of your storage and with s3 can just download what it needs
         | when it needs it.
         | 
         | As opposed to EFS where the container or instance needs to load
         | in the entire datastore on startup which can add minutes to
         | startup time if the EFS drive is large.
        
           | dpedu wrote:
           | My understanding is that EFS is exposed as an NFS share. I
           | haven't used it personally, but NFS mounting is generally
           | fast, nearly instant. What does "load in the entire
           | datastore" mean?
        
             | ericpauley wrote:
             | EFS mounting is definitely nearly instant. I use it
             | constantly.
        
       | emgeee wrote:
       | some additional context here is that warpstream is building a
       | Kakfa compatible streaming system that uses s3 as the object
       | store. This allows them to leverage cheap zone transfer costs for
       | redundancy + automatic storage tiering to cut down on the costs
       | of running and maintaining these systems. This has previously
       | come at the cost of latency due to s3's read/write speeds but
       | with S3 this makes them more competitive with Confluent Kafka's
       | managed offerings for these latency sensitive applications.
       | 
       | IMO warpstream is a really cool product and this new S3 offering
       | makes them even better
        
         | refset wrote:
         | I am eager to hear how it will affect their latency numbers:
         | 
         | > Engineering is about trade-offs, and we've made a significant
         | one with WarpStream: latency. The current implementation has a
         | P99 of ~400ms for Produce requests because we never acknowledge
         | data until it has been durably persisted in S3 and committed to
         | our cloud control plane. In addition, our current P99 latency
         | of data end-to-end from producer-to-consumer is around 1s
         | 
         | via https://www.warpstream.com/blog/kafka-is-dead-long-live-
         | kafk...
        
       | fswd wrote:
       | I solved this problem locally. When uploading a file to the
       | server before going to S3 it is cached in redis. Whenever the
       | codebase needs to use the file, it checks redis, and if it is not
       | there it fetches it and caches it again.
        
         | jamiesonbecker wrote:
         | Exactly. Write-through cache is exactly how Userify[0] used to
         | work for self-hosted versions. (when it was Python, we used
         | Redis to keep state synced across multiple processes, but now
         | that it's a Go app, we do all the caching and state management
         | in memory using Ristretto[1])
         | 
         | However, we now install by default to local disk filesystem,
         | since it's much faster to just do a periodic S3 hot sync, like
         | with restic or aws-cli, than to treat S3 as the primary backing
         | store, or just version the EBS or instance volume. The other
         | reason you might want to use S3 as a primary is if you use a
         | lot of disk, but our files are compressed and extremely small,
         | even for a large installation with tens of thousands of users
         | and instances.
         | 
         | 0. https://userify.com (ssh key management + sudo for teams)
         | 
         | 1. https://github.com/dgraph-io/ristretto
        
       | osti wrote:
       | If I'm not wrong, this is the low latency S3 that is written in
       | Rust. Finally launched after years in the making.
        
         | FridgeSeal wrote:
         | Do you have any sources for that? Very interested to know more
         | about this.
        
           | osti wrote:
           | Unfortunately I don't, this is already internal information
           | that I don't know if I should say here. I never worked on S3
           | and I no longer work at AWS so someone from within would have
           | to weigh in.
        
         | paulddraper wrote:
         | Surely being written in a non-Rust language is not responsible
         | for an extra 40ms of latency, right?
         | 
         | Or is rust really that magic?
        
           | osti wrote:
           | Of course not, it's designed differently from the original
           | S3. AWS came out with this to compete with Azure premium blob
           | storage, which has very good first byte latency, and Azure
           | had it 4 years ago..
           | 
           | https://azure.microsoft.com/en-us/blog/premium-block-blob-
           | st...
        
         | estebarb wrote:
         | ShardStore? (More info: https://www.thestack.technology/aws-
         | shardstore-s3/ ) it seems that it was deployed years ago.
        
       | francoismassot wrote:
       | We tested S3 Express for our search engine quickwit [0] a couple
       | of weeks ago.
       | 
       | While this was really satisfying on the performance side, we were
       | a bit disappointed by the price, and I mostly agree with the
       | article on this matter.
       | 
       | I can see some very specific use cases where the pricing should
       | be OK but currently, I would say most of our users will just stay
       | on the classic S3 and add some local SSD caching if they have a
       | lot of requests.
       | 
       | [0] https://github.com/quickwit-oss/quickwit/
        
       | mgaunard wrote:
       | Many S3 implementations appear to simply be transparent downloads
       | to disk rather than a true "use the network as a disk".
        
       ___________________________________________________________________
       (page generated 2023-11-28 23:00 UTC)