http://databasearchitects.blogspot.com/2024/02/ssds-have-become-ridiculously-fast.html Database Architects A blog by and for database architects. Monday, February 19, 2024 SSDs Have Become Ridiculously Fast, Except in the Cloud In recent years, flash-based SSDs have largely replaced disks for most storage use cases. Internally, each SSD consists of many independent flash chips, each of which can be accessed in parallel. Assuming the SSD controller keeps up, the throughput of an SSD therefore primarily depends on the interface speed to the host. In the past six years, we have seen a rapid transition from SATA to PCIe 3.0 to PCIe 4.0 to PCIe 5.0. As a result, there was an explosion in SSD throughput: [ssd-bandwidth] At the same time, we saw not just better performance, but also more capacity per dollar: [ssd-capacity] The two plots illustrate the power of a commodity market. The combination of open standards (NVMe and PCIe), huge demand, and competing vendors led to great benefits for customers. Today, top PCIe 5.0 data center SSDs such as the Kioxia CM7-R or Samsung PM1743 achieve up to 13 GB/s read throughput and 2.7M+ random read IOPS. Modern servers have around 100 PCIe lanes, making it possible to have a dozen of SSDs (each usually using 4 lanes) in a single server at full bandwidth. For example, in our lab we have a single-socket Zen 4 server with 8 Kioxia CM7-R SSDs, which achieves 100GB/s (!) I/O bandwidth: [iostat] AWS EC2 was an early NVMe pioneer, launching the i3 instance with 8 physically-attached NVMe SSDs in early 2017. At that time, NVMe SSDs were still expensive, and having 8 in a single server was quite remarkable. The per-SSD read (2 GB/s) and write (1 GB/s) performance was considered state of the art as well. Another step forward occurred in 2019 with the launch of i3en instances, which doubled storage capacity per dollar. Since then, several NVMe instance types, including i4i and im4gn, have been launched. Surprisingly, however, the performance has not increased; seven years after the i3 launch, we are still stuck with 2 GB/s per SSD. Indeed, the venerable i3 and i3en instances remain the best EC2 has to offer in terms of IO/$ and SSD capacity/$, respectively. Personally, I find this very surprising given the SSD bandwidth explosion and cost reductions we have seen on the commodity market. At this point, the performance gap between state-of-the-art SSDs and those offered by major cloud vendors, especially in read throughput, write throughput, and IOPS, is nearing an order of magnitude. (Azure's top NVMe instances are only slightly faster than AWS's.) What makes this stagnation in the cloud even more surprising is that we have seen great advances in other areas. For example, during the same 2017 to 2023 time frame, EC2 network bandwidth exploded, increasing from 10 Gbit/s (c4) to 200 Gbit/s (c7gn). Now, I can only speculate why the cloud vendors have not caught up on the storage side: * One theory is that EC2 intentionally caps the write speed at 1 GB /s to avoid frequent device failure, given the total number of writes per SSD is limited. However, this does not explain why the read bandwidth is stuck at 2 GB/s. * A second possibility is that there is no demand for faster storage because very few storage systems can actually exploit tens of GB/s of I/O bandwidth. See our recent VLDB paper. On the other hand, as long as fast storage devices are not widely available, there is also little incentive to optimize existing systems. * A third theory is that if EC2 were to launch fast and cheap NVMe instance storage, it would disrupt the cost structure of its other storage service (in particular EBS). This is, of course, the classic innovator's dilemma, but one would hope that one of the smaller cloud vendors would make this step to gain a competitive edge. Overall, I'm not fully convinced by any of these three arguments. Actually, I hope that we'll soon see cloud instances with 10 GB/s SSDs, making this post obsolete. Posted by Viktor Leis at 9:00 AM # Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest 9 comments: 1. [blank] AnonymousFebruary 20, 2024 at 6:42 PM Related Hackernews Discussion: https://news.ycombinator.com/item? id=39443679 ReplyDelete Replies Reply 2. [blogger_lo] AdamKFebruary 20, 2024 at 7:27 PM Cloud providers buy only high capacity drives, so they have less transfer speed per TB of storage available (compared to comodity drives). If the drive is share between multiple VMs, throughput is shared between them, and they have to obey SLAs. This could be also a way to manage wear of the drives. ReplyDelete Replies Reply 3. [blank] AnonymousFebruary 20, 2024 at 7:54 PM Fucking nerds. Fuck your databases. ReplyDelete Replies 1. [blank] AnonymousFebruary 20, 2024 at 9:06 PM Reason ? Delete Replies Reply 2. [blank] AnonymousFebruary 20, 2024 at 10:54 PM databases must have fucked your mother for you to be so angry Delete Replies Reply Reply 4. [blank] AnonymousFebruary 20, 2024 at 9:19 PM I've seen speeds of 60k iops and 7 GB/s speeds on akamai cloud/ linode. Even in the 5$ nanodes. Varies depending on the class of system you get, you get a random Zen 2 or Zen 3 class core and the better disks are on the Zen 3 instances. Still pretty slow for databases compared to bare metal. The fractional vCPUs they sell are comparable with the disk difference. Cloud resources are a pretty bad deal right now and don't reflect the gains from the last 3 years - which have been huge on the CPU side too. ReplyDelete Replies Reply 5. [blank] AnonymousFebruary 20, 2024 at 9:20 PM Interesting. I can say that locally on my workstation, SSDs are beneficial for searching with tools like FileSearchEX, where one has to load the contents in memory of thousands of files over and over to find keywords. But I wonder, as the HN article states, is the reason because of a protocol in front of the actual drives? ReplyDelete Replies Reply 6. [blank] AnonymousFebruary 20, 2024 at 11:12 PM Cloud just means someone elses hardware and your virtual machine shares I/O with everyone. If you want faster, use your own hardware or keep syncing your cloud instance around until you find hardware without noisy neighbors. ReplyDelete Replies Reply 7. [blank] AnonymousFebruary 20, 2024 at 11:36 PM This link isn't directly salient to this post, as it's about CXL adoption, but the preamble does a nice job framing why folks see what they see with cloud services re:performance. https://blog.jrlabs.io/posts/2024-02-19-what-is-cxl/ ReplyDelete Replies Reply Add comment Load more... Older Post Home Subscribe to: Post Comments (Atom) Contributors * Peter Boncz * Thomas Neumann * Viktor Leis Blog Archive * V 2024 (1) + V February (1) o SSDs Have Become Ridiculously Fast, Except in the ... * > 2023 (3) + > April (1) + > February (1) + > January (1) * > 2022 (7) + > June (5) + > April (1) + > January (1) * > 2021 (2) + > July (1) + > June (1) * > 2020 (4) + > November (1) + > October (1) + > April (1) + > January (1) * > 2019 (3) + > July (1) + > May (1) + > February (1) * > 2018 (2) + > June (1) + > April (1) * > 2017 (2) + > December (1) + > February (1) * > 2016 (2) + > August (1) + > April (1) * > 2015 (7) + > December (1) + > September (1) + > July (1) + > June (1) + > April (1) + > February (1) + > January (1) * > 2014 (11) + > December (1) + > September (1) + > August (1) + > July (2) + > June (2) + > May (4) Simple theme. Powered by Blogger.