[HN Gopher] Waiting for Postgres 18: Accelerating Disk Reads wit...
___________________________________________________________________
Waiting for Postgres 18: Accelerating Disk Reads with Asynchronous
I/O
Author : lfittl
Score : 342 points
Date : 2025-05-07 14:57 UTC (8 hours ago)
(HTM) web link (pganalyze.com)
(TXT) w3m dump (pganalyze.com)
| nu11ptr wrote:
| Is this new async. I/O feature for Linux only?
|
| I know Windows has IOCP and also now an IORing implementation of
| its own (Less familiar with macOS capabilities other than POSIX
| AIO).
|
| https://learn.microsoft.com/en-us/windows/win32/api/ioringap...
|
| Update: Most of the comments below seem to be missing the fact
| that Windows now also has an IORing implementation, as I
| mentioned above. Comparison article here:
|
| https://windows-internals.com/ioring-vs-io_uring-a-compariso...
| stingraycharles wrote:
| Sounds like this feature is based on io_uring which is a Linux
| feature. I would be surprised if they implemented async io on
| Windows before they would on Linux given the user/deployment
| base being very Linux-heavy.
| PaulHoule wrote:
| For a long time ago there have been APIs to do asynchronous
| file I/O on the books for Linux but they weren't worth using
| because they didn't really speed anything up.
| immibis wrote:
| IIRC they literally just did the sync I/O on a worker
| thread.
| anarazel wrote:
| They sped up things for a long time - but only when using
| unbuffered IO. The new thing with io_uring is that it also
| accelerates buffered IO. In the initial version it was all
| through kernel worker threads, but these days several
| filesystems have better paths for common cases.
| p_ing wrote:
| Yeah, surprise Linux had to play catch up to a Windows 1994
| release! Same with the scheduler, I'd argue Windows does OOM
| better than Linux today...
|
| Windows even had the concept of io_uring before, but network
| only with Registered I/O back in the Windows 8 (8.1?) days.
|
| Linux still lacks the "all I/O is async" NT has.
|
| The underlying kernel and executive of Windows aren't
| primitive pieces of trash. They're quite advanced, ruined by
| PMs and the Ads division throwing crap on top.
|
| And yes, Windows' I/O Ring is a near 1:1 copy of the Linux
| implementation, but IOCP/OVERLAPPED I/O data structure
| preceded it since NT's first release.
|
| This isn't a pissing match, just we all hope that kernel devs
| learn from each other and take the best ideas. Sometimes we,
| IT, don't get to choose the OS we run -- it's dictated by the
| apps the business requires.
| greenavocado wrote:
| How difficult would it be to completely tear out the
| Windows desktop experience and just use the system and
| display drivers without the rest? Has anybody attempted
| such a feat?
| dboreham wrote:
| There are several such things. The Windows installer uses
| one. X-Box uses another.
| p_ing wrote:
| There is Windows Server Core which removes everything but
| a CLI, but you still have the normal login experience,
| you still have a "desktop" (no start menu, taskbar, etc),
| you can still launch normal Win32 apps... for the most
| part (task manager, notepad, and so on).
|
| Win32 is also responsible for core Services, which means
| you can't de-Windows-ify Windows and strip it down to an
| NT API-only. All other personalities (OS/2, POSIX, SFU)
| have a dependency on Win32, as well.
|
| You're still running the WindowServer of course; it's
| part of the Executive.
|
| That said, with a bunch of modifications, NTDEV did get
| Windows 11 down to it's bare minimum, and text only to
| boot. So I guess it's technically possible, though not
| useful.
|
| https://www.youtube.com/watch?v=SL6t_iuitxM
| smileybarry wrote:
| > There is Windows Server Core which removes everything
| but a CLI, but you still have the normal login
| experience, you still have a "desktop" (no start menu,
| taskbar, etc), you can still launch normal Win32 apps...
| for the most part (task manager, notepad, and so on).
|
| Yep, they've replaced nearly every UI with text (the
| login window is a TUI), though there's still some shell
| DLLs and the whole thing still uses a window manager.
| That's honestly for the best, since it allows you to
| migrate full installations with some UI-based apps to a
| Core installation with them intact.
|
| > That said, with a bunch of modifications, NTDEV did get
| Windows 11 down to it's bare minimum, and text only to
| boot. So I guess it's technically possible, though not
| useful.
|
| Windows has had a "text mode" since at least Windows XP
| IIRC, but it's really not that useful, if at all. Even
| for rescue operations you're better off with Windows PE.
| Ericson2314 wrote:
| There is always https://en.wikipedia.org/wiki/ReactOS !
| :)
| password4321 wrote:
| Free-as-in-beer Hyper-V Server 2019 is in extended
| support (only security updates) until 2029.
| cyberax wrote:
| > Same with the scheduler
|
| Windows does OOM far better than Linux because it doesn't
| really overcommit RAM.
|
| But the CPU _scheduler_ in Linux is far, far, far better
| than in Windows. Linux can even do hard-realtime, after
| all.
| nu11ptr wrote:
| > this feature is based on io_uring which is a Linux feature
|
| And now also a Windows feature, see my comment above for info
| lfittl wrote:
| It depends on the I/O method - as described in the article,
| "io_uring" is only available on Linux (and requires building
| with liburing, as well as io_uring to be enabled in the
| Kernel), but the default (as of beta1) is actually "worker",
| which works on any operating system.
|
| The "worker" method uses a dedicated pool of I/O worker
| processes that run in the background, and whilst not as
| performant as io_uring in our benchmark, did clearly outperform
| the "sync" method (which is the same as what Postgres currently
| has in 17 and older).
| nu11ptr wrote:
| > "io_uring" is only available on Linux
|
| Windows now also has IORing (see my comment above)
| spwa4 wrote:
| Yes, although Windows has had async I/O since Windows NT 3.1,
| their API is still not supported by Postgres.
| nu11ptr wrote:
| Yes, that was IOCP, however, Windows now also has IORing (see
| my comment above)
| anarazel wrote:
| FWIW, there are prototype patches for an IOCP based
| io_method. We just couldn't get them into an acceptable state
| for PG 18. I barely survived getting in what we did...
| niux wrote:
| I recently deployed Postgres on a dedicated Hetzner EX-44 server
| (20 cores, 64GB RAM, 2x 512GB NVMe SSDs in RAID 1) for
| EUR39/month. The price-to-performance ratio is exceptional,
| providing enterprise-level capacity at a fraction of typical
| cloud costs.
|
| For security, I implemented TailScale which adds only ~5ms of
| latency while completely eliminating public network exposure - a
| worthwhile tradeoff for the significant security benefits.
|
| My optimization approach includes:
|
| - Workload-specific configuration generated via PGTune
| (https://pgtune.leopard.in.ua/)
|
| - Real-time performance monitoring with PgHero for identifying
| bottlenecks
|
| - Automated VACUUM ANALYZE operations scheduled via pgcron
| targeting write-heavy tables, which prevents performance
| degradation and helps me sleep soundly
|
| - A custom CLI utility I built for ZSTD-compressed backups that
| achieves impressive compression ratios while maintaining high
| throughput, with automatic S3 uploading:
| https://github.com/overflowy/pgbackup
|
| This setup has been remarkably stable and performant, handling
| our workloads with substantial headroom for growth.
| trollied wrote:
| That's great, but you need a solid HA/backup and recovery
| strategy if you even remotely care about your data.
| Tostino wrote:
| I would absolutely use another backup utility (additionally if
| you want) if I were you (barman, pgbackrest, etc).
|
| You are just wrapping pgdump, which is not a full featured
| backup solution. Great for a snapshot...
|
| Use some of the existing tools and you get point-in-time
| recovery, easy restores to hot standbys for replication, a good
| failover story, backup rotations, etc.
| niux wrote:
| The reason I wrote my own tool is because I couldn't find
| anything for Pg17 at the time and pgbackrest seemed overkill
| for my needs. Also, the CLI handles backup rotations as well.
| Barman looks interesting though, I'll definitely have a look,
| thanks!
| Tostino wrote:
| pgbackrest was always easy to use in my experience. Not
| very hard to setup or configure, and low overhead. Supports
| spool directories for WAL shipping, compression, and block
| incremental backups (YAY!!!). I ran my last company on it
| for the last ~6 years I was there. Never any complaints,
| solid software (which is what you want for backups).
|
| I have been using barman indirectly through CloundNativePG
| with my latest company, but don't have the operational
| experience to speak on it yet.
| 9dev wrote:
| pgbackrest only looks scary because it's so flexible, but
| the defaults work great in almost all cases. The most
| complex thing you'll need to do is creating a storage
| bucket to write to, and configure the appropriate storage
| provider in pgbackrest's config file.
|
| When it's set up properly, it's solid as rock. I'd really
| recommend you check it out again; it likely solves
| everything you did more elegantly, and also covers a ton of
| things you didn't think of. Been there, done that :)
| arp242 wrote:
| Snapshots are backups. Not sufficient backups for some cases,
| but perfectly fine for others.
| lousken wrote:
| do you still need it? haven't managed pg for a while, but
| shouldn't pg17 have some solution for backups?
| codegeek wrote:
| Would you be willing to open source the setup ? I would love to
| learn from it.
| digdugdirk wrote:
| Seconded. This corner of the programming world is a deep dark
| and scary place for people who haven't had solid industry
| experience. It'd be hugely helpful to have a barebones
| starting point to begin learning best practices.
| the8472 wrote:
| On linux there also is preadv2(..., RWF_NOWAIT) which can be used
| to do optimistic non-blocking read from the page cache. That
| might be useful for io_method = worker to shave off a bit of
| latency. Try reading on the main thread with NOWAIT and only
| offload to a worker thread when that fails.
| anarazel wrote:
| FWIW, I played with that - unfortunately it seems that the the
| overhead of doing twice the page cache lookups is a cure worse
| than the disease.
|
| Note that we do not offload IO to workers when doing I/O that
| the caller will synchronously wait for, just when the caller
| actually can do IO asynchronously. That reduces the need to
| avoid the offload cost.
|
| It turns out, as some of the results in Lukas' post show, that
| the offload to the worker is often actually beneficial
| _particularly_ when the data is in the kernel page cache - it
| parallelizes the memory copy from kernel to userspace and
| postgres ' checksum computation. Particularly on Intel server
| CPUs, which have had pretty mediocre per-core memory bandwidth
| in the last ~ decade, memory bandwidth turns out to be a
| bottleneck for page cache access and checksum computations.
|
| Edit: Fix negation
| gavinray wrote:
| Do you think there's a possibility of Direct IO being adopted
| at some point in the future now that AIO is available?
| anarazel wrote:
| > Do you think there's a possibility of Direct IO being
| adopted at some point in the future now that AIO is
| available?
|
| Explicitly a goal.
|
| You can turn it on today, with a bunch of caveats (via
| debug_io_direct=data). If you have the right workload -
| e.g. read only and lots of seqscans, bitmap index scans etc
| you can see rather substantial perf gains. But it'll suck
| in any cases in 18.
|
| We need at least:
|
| - AIO writes in checkpointer, bgwriter and backend buffer
| replacement (think bulk loading data with COPY)
|
| - readahead support in a few more places, most crucially
| index range scan (works out ok today if the heap is
| correlated with the index, sucks badly otherwise)
|
| EDIT: Formatting
| the8472 wrote:
| Ah yeah, getting good kernel<>userspace oneshot memcpy
| performance for large files is surprisingly hard. mmap has
| setup/teardown overhead that's significant for oneshot
| transfers, regular read/write calls suffer from page
| cache/per page overhead. Hopefully all the large folio work
| in the kernel will help with that.
| anarazel wrote:
| From what I've seen a surprisingly large part of the
| overhead is due to SMAP when doing larger reads from the
| page cache - i.e. if I boot with clearcpuid=smap (not for
| prod use!), larger reads go _significantly_ faster. On both
| Intel and AMD CPUs interestingly.
|
| On Intel it's also not hard to simply reach the per-core
| memory bandwidth with modern storage HW. This matters most
| prominently for writes by the checkpointing process, which
| needs to compute data checksums given the current postgres
| implementation (if enabled). But even for reads it can be a
| bottleneck, e.g. when prewarming the buffer pool after a
| restart.
| yxhuvud wrote:
| Well, nowadays there is
| https://www.phoronix.com/news/Linux-RWF_UNCACHED-2024
| the8472 wrote:
| That doesn't speed up uerspace<>kernel memcopy, it just
| reduces cache churn. Despite its name it still goes
| through the page cache, it just triggers writeback and
| drops the pages once that's done. For example when
| copying to a tmpfs it makes zero difference since that
| lives entirely in memory.
| senderista wrote:
| So you're less dependent on the page replacement
| algorithm being scan-resistant, since you can use this
| flag for scan/loop workloads, right?
| zX41ZdbW wrote:
| The usage in ClickHouse database:
| https://github.com/ClickHouse/ClickHouse/blob/d2697c0ea112fd...
|
| Aside from a few problems in specific Linux kernel versions, it
| works great.
| pseudopersonal wrote:
| Does anyone know when the update allowing more concurrent
| connections is dropping, so we can stop using pgbouncer?
| __s wrote:
| That'll likely need conversion from process per connection, so
| not any time soon
| skeptrune wrote:
| How close is this to the way MySQL does it with InnoDB? It
| appears to be about the same.
| seunosewa wrote:
| Yep. It's a low hanging fruit they should've picked years ago.
|
| They will eventually figure out using b-trees for tables too.
| ARandomerDude wrote:
| Can you elaborate on this B-tree part of your comment? I know
| B-tree is the default index type in pg, but it sounds like
| there's more to the story that I'm not familiar with.
| greenavocado wrote:
| PostgreSQL uses heap files for the primary table storage,
| not B-trees. In PostgreSQL table data is primarily stored
| in heap files (unordered collections of pages/blocks).
| Indexes (including primary key indexes) use B-trees
| (specifically B+ trees). When you query a table via an
| index, the B-tree index points to locations in the heap
| file
|
| InnoDB uses a clustered index approach. The primary key
| index is a B-tree. The actual table data is stored in the
| leaf nodes of this B-tree. Secondary indexes point to the
| primary key.
|
| One is not better than the other in general terms. InnoDB's
| clustered B-tree approach shines when:
|
| You frequently access data in primary key order
|
| Your workload has many range scans on the primary key
|
| You need predictable performance for primary key lookups
|
| Your data naturally has a meaningful ordering that matches
| your access patterns
|
| PostgreSQL's heap approach excels when:
|
| You frequently update non-key columns (less page
| splits/reorganization)
|
| You have many secondary indexes (they're smaller without
| primary keys)
|
| Your access patterns vary widely and don't follow one
| particular field
|
| You need faster table scans when indexes aren't applicable
|
| I personally find PostgreSQL's approach more flexible for
| complex analytical workloads with unpredictable access
| patterns, while InnoDB's clustered approach feels more
| optimized for OLTP workloads with predictable key-based
| access patterns. The "better" system depends entirely on
| your specific workload, data characteristics, and access
| patterns.
| farazbabar wrote:
| Don't forget high speed committed writes to append only
| tables (the opposite of scans), postgres approach is
| better here as well.
| saltcured wrote:
| It's also deeply entwined with the MVCC concurrency
| control and the ability to do DDL in transactions, right?
| jiggawatts wrote:
| SQL Server supports every combination of heap storage,
| clustered storage, MVCC, and DDL in transactions.
| Sesse__ wrote:
| Indexes that point directly to the disk column are also
| significantly faster to access; it is a persistent pain
| point for OLAP on InnoDB that all secondary indexes are
| indirect. You can work around it by adding additional
| columns to the index to make your lookups covering, but
| it's kludgy and imprecise and tends to bloat the index
| even further. (The flip side is that if you have tons of
| indexes, and update some unrelated column, InnoDB doesn't
| need to update those indexes to point to the location of
| the new row. But I'm generally very rarely annoyed by
| that in comparison.)
| tehlike wrote:
| There's a lot more PG needs to do for storage layout / access
| pov, and they have been working on it for a while. Orioledb
| has shown what might be opssible, and they have been
| upstreaming it.
|
| Having the ability to do something LSM as a storage engine
| would be great - and potentially allow better compression
| than what we currently get with TOAST - which is not a lot...
| PG doesn't even have oob page compression...
| shayonj wrote:
| Very nicely written post! I'd love to start running these in
| production on NVMe and hope its something major cloud providers
| start to offer ASAP. The performance gains are _extremely_
| attractive
| martinald wrote:
| I sort of had to chuckle at the 20k IOPS AWS instance, given even
| a consumer $100-200 NVMe gives ~1million+ IOPS these days. I
| suspect now we have PCIe 5.0 NVMes this will go up to
|
| I always do wonder how much "arbitrary" cloud limits on things
| like this cause so many issues. I'm sure that async IO is very
| helpful anyway, but I bet on a 1million IOPS NVMe it is nowhere
| near as important.
|
| We're effectively optimising critical infrastructure tech for
| ~2010 hardware because that's when big cloud got going and there
| has been so few price reductions on things since then vs the
| underlying hardware costs.
|
| Obviously a consumer NVMe is not "enterprise" but my point is we
| are 3+ orders of magnitude off performance on cheap consumer
| hardware vs very expensive 'enterprise' AWS/big cloud costs.
| binary132 wrote:
| Meanwhile people are running things on raspberry pi home
| clusters thinking they're winning
| brulard wrote:
| Maybe they are. With NVMe hat, you get decent IO performance
| adgjlsfhk1 wrote:
| It's still moderately bad. Raspberry pi is limited to 2
| gen3 pcie lanes which is ~4-8x slower than the drive (and
| you will likely be further limited by cpu speed)
| lfittl wrote:
| Yep, I find cloud storage performance to be quite frustrating,
| but its the reality for many production database deployments
| I've seen.
|
| Its worth noting that even on really fast local NVMe drives the
| new asynchronous I/O work delivers performance benefits, since
| its so much more efficient at issuing I/Os and reducing syscall
| overhead (for io_uring).
|
| Andres Freund (one of the principal authors of the new
| functionality) did a lot of benchmarking on local NVMe drives
| during development. Here is one mailinglist thread I could find
| that shows a 2x and better benefit with the patch set at the
| time: https://www.postgresql.org/message-
| id/flat/uvrtrknj4kdytuboi...
| anonymars wrote:
| > given even a consumer $100-200 NVMe gives ~1million+ IOPS
| these days
|
| In the face of sustained writes? For how long?
| merb wrote:
| sustained reads would not even give 1 mio iops in that case.
| Maybe wen you only read the same file that fits into the nvme
| cache. Which probably never happens in a production
| database..
| p_ing wrote:
| Samsung 9910 has a 1:1 TB:GB cache size of LPDDR4X memory.
| I won't pretend to understand the magic NVMe drives
| possess, but if you got a 4TB or 8TB 9910, could you not in
| theory pull in all of the data you require to cache?
|
| I would assume, and it might be a poor assumption, that
| NVMe controllers don't pull in files, but rather blocks, so
| even if you had a database that exceeded cache size, in
| theory if the active blocks of that database did not exceed
| cache size, it could be "indefinitely" cached for a read-
| only pattern.
| wtallis wrote:
| The DRAM on a SSD like that isn't for caching user data,
| it's for caching the drive's metadata about which logical
| blocks (as seen by the OS) correspond to which physical
| locations in the flash memory.
| jauntywundrkind wrote:
| I think you'd be surprised. Sustained write performance has
| gotten pretty good. Decent but not fancy consumer drives
| will often do 1GBps sustained, for bulkier writes. That's
| much better than we used to expect: flash has gotten much
| better with so many layers! This mid-range PCIe5 drive
| sustains a nice 1.5GBps:
| https://www.techpowerup.com/review/team-group-ge-
| pro-2-tb/6....
|
| I don't think sustained reads are a problem? Benches like
| the CrystalDiskMark do a full disk random read test;
| they're designed to bust through cache afaik. 7.2GBp of 4k
| reads would translate to 1.8MIOps. Even if this is
| massively optimistic, you need to slash a _lot_ of zeroes
| /orders of magnitude to get down to 20kIOps, which you will
| also pay >$100/mo for.
| __s wrote:
| Even worse on Azure where we had to ask customers to scale up
| vcpu to increase iops
|
| https://azure.microsoft.com/en-us/pricing/details/managed-di...
|
| Increasing vcpu also opened up more disk slots to try improve
| situation with disk striping
| maherbeg wrote:
| instance store on aws can give up to 3.3mil iops
| https://aws.amazon.com/blogs/aws/now-available-i3-instances-...
| - the main problem is just using networked storage.
| perching_aix wrote:
| Instance store is also immediately wiped when the instance is
| halted / restarted, which can theoretically happen at any
| time, for example by a mystery instance failure, or a
| patching tool that's helpfully restarting your boxes during
| offhours.
| slashdev wrote:
| My understanding is this not true, only when the instance
| permanently fails and is moved.
| the8472 wrote:
| stop or hibernate kills it. https://docs.aws.amazon.com/A
| WSEC2/latest/UserGuide/instance...
| slashdev wrote:
| Yeah, so restart does not.
|
| Which means you can count on it about as much as a server
| of your own, if you could not repair the server.
|
| I know a database company that uses instance storage as
| the primary storage. It's common.
| coder543 wrote:
| That quoted IOPS number is only with an 8-disk stripe
| (requiring the full instance), even if you don't need 488GB
| of RAM or a $3600/mo instance, I believe.
|
| The per-disk performance is still nothing to write home
| about, and 8 _actually fast_ disks would blow this instance
| type out of the water.
| the8472 wrote:
| The NVMe on other instance types is quite throttled. E.g. on
| a G5.4xlarge instance EBS is limited to 593MB/s and 20000IOPS
| while instance-attached NVMe is limited to 512MB/s (read) at
| 125000IOPS, a fraction of IO what a workstation or gaming PC
| with similar GPU and RAM would have. And stopping the
| instance wipes it, which means you can't do instance warmup
| with those, everything must be populated at boot.
| codegeek wrote:
| You probably already know this but I will say it anyway. These
| cloud services like AWS are not succeeding in enterprise
| because they have outdated hardware. They succeed because in
| enterprise, CIOs and CTOs want something that is known, has a
| brand and everyone else uses it. It's like the old adage of "No
| one got fired for using IBM". Now it is "No one gets fired for
| hosting with AWS no matter how ridiculous the cost and
| corresponding feature is".
| the8472 wrote:
| > No one gets fired for hosting with AWS
|
| But consider the counterfactual: Non-realized customers
| because AWS certified solutions architect(tm) software
| couldn't deliver the price/perf they would have needed.
|
| At $work this is a very real problem because a software
| system was built on api gateway, lambdas, sqs and a whole
| bunch of other moving pieces (serverless! scalable! easy
| compliance!) that combined resulted in way too much latency
| to meet a client's goal.
| 9dev wrote:
| > No one gets fired for hosting with AWS no matter how
| ridiculous the cost and corresponding feature is
|
| Actually, AWS is so expensive, hosting everything we ran on
| Hetzner there would have simply depleted our funding, and the
| company would not exist anymore.
| gopalv wrote:
| > had to chuckle at the 20k IOPS AWS instance, given even a
| consumer $100-200 NVMe gives ~1million+ IOPS these days
|
| The IOPS figure usually hides the fact that it is not a single
| IOP that is really fast, but a collection of them.
|
| More IOPS generally is done best by reducing latency of a
| single operation but the average latency is what actually
| contributes to the "fast query" experience. Because a lot of
| the next IO is branchy from the last one (like an index or
| filter lookup).
|
| As more and more disks to CPU connectivity goes over the
| network, we can really deliver a large IOPS even when we have
| very high latencies (by spreading the data across hundreds of
| SSDs and routing it fast), because with the network storage we
| pay a huge latency cost for durability of the data simply
| because of location diversification.
|
| Every foot is a nanosecond, approximately.
|
| That the tradeoff is worth it, because you don't need clusters
| to deal with a bad CPU or two. Stop & start, to fix memory/cpu
| errors.
|
| The AWS model pushes the latency problem to the customer and we
| see it in the IOPS measurements, but it is really the latency x
| queue depth we're seeing not the hardware capacity.
| Hilift wrote:
| Everything in the cloud is throttled. Network, IOPS, CPU. And
| probably implemented incorrectly. AWS makes billions if the
| customer infrastructure is great or terrible. I found that
| anything smaller than an AWS EC2 m5.8xlarge had noticeably bad
| performance on loaded servers (Windows). The list price for
| that would be about $13k per year, but most organizations get
| lower than list prices.
|
| This also applies to services, not only compute. Anything
| associated with Microsoft Office 365 Exchange, scripts may run
| 10x slower against the cloud using the MSOnline cmdlets. It's
| absolute insanity, I used to perform a dump of all mailbox
| statistics that would take about one hour, it could take almost
| 24 hours against Office 365. You have to be careful to not use
| the same app or service account in multiple places, because the
| throttle limits are per-account.
| immibis wrote:
| I noticed this with bandwidth. AWS price for bandwidth:
| $90.00/TB after 0.1TB/month. Price everywhere else (low cost
| VPSes): $1.50/TB after 1-5TB/month. Price some places
| (dedicated servers): $0.00/TB up to ~100TB/month, $1.50/TB
| after.
|
| You pay 60 times the price for the privilege of being on AWS.
|
| Bandwidth is just their most egregious price difference. The
| servers are more expensive too. The storage is more expensive
| (except for Glacier). The serverless platforms are mostly more
| expensive than using a cheap server.
|
| There are only two AWS products that I understand to have good
| prices: S3 Glacier (and only if you never restore!), and
| serverless apps (Lambda / API Gateway) if your traffic is low
| enough to fit in the Always Free tier. For everything else, it
| appears you get ripped off by using AWS.
| yxhuvud wrote:
| FWIW, using the same approach as in the article, ie io_uring,
| is one of the few ways to actually reach anywhere close to that
| 1 million, so it is not as if they are competing concerns.
| Tostino wrote:
| Thank you for the effort that went into getting this committed. I
| remember seeing the first discussions about async I/O (and using
| io_uring) like 6 or 7 years ago. Amazing amount of work to get
| the design right.
|
| Looking forward to the other places that async I/O can be used in
| future Postgres releases now that the groundwork is done.
| p_ing wrote:
| Is io_uring still plagued by security issues enabled by it's use?
| Or have those largely been fixed? My understanding was many Linux
| admins (or even distros by default?) were disabling io_uring.
| hansvm wrote:
| https://github.com/axboe/liburing/discussions/1047
| p_ing wrote:
| Thanks. It looks like it is still going through growing
| pains.
|
| https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=io_uring
|
| https://www.theregister.com/2025/04/29/linux_io_uring_securi.
| ..
|
| But most of the 'off by default' are from ~2023 and not a
| current concern.
| znpy wrote:
| Disabling io_uring because "guy on the internet said so" or
| "$faang_company says so" is beyond dumb.
|
| One should evaluate the risk according to their specific use
| case.
|
| It can be a good idea to disable it of you run untrusted
| workloads (eg: other people's containers, sharing the same
| kernel) but if you have a kernel on a machine (virtual or real)
| dedicated to your own workload you can pretty much keep using
| io_uring. There are other technologies to enforce security (eg:
| selinux emand similar).
| wtallis wrote:
| I think at this point, those "other technologies to enforce
| security" are the main area of concern for io_uring users: if
| those other security layers don't know about io_uring they
| won't apply any restrictions to it.
| song wrote:
| Are there good performance comparisons between postgres, mariadb
| and percona? I'm really curious at this point in which case each
| of those database shine.
| KronisLV wrote:
| Probably depends on the particular workload, but there are at
| least some attempts at benchmarking vaguely typical workloads:
| https://datasystemreviews.com/postgresql-vs-mariadb-performa...
| kev009 wrote:
| A lot of work has gone into FreeBSD's aio(4) so it will be
| interesting to see how that works, because it doesn't have the
| drawbacks of Linux/glibc aio.
| tiffanyh wrote:
| Would you mind expanding more on this topic.
|
| Is FreeBSD doing anything significantly different and/or
| better?
| WhyNotHugo wrote:
| It's pretty disappointing that simply using O_NONBLOCK doesn't
| work as expected on regular files. It would be such a simple and
| portable mechanism to do async I/O using the same interfaces that
| we already use for networking.
| dbbk wrote:
| This looks promising! Wonder if it's coming to Neon
| gitroom wrote:
| insane how long it took postgres to get async i/o right - feels
| like all the big changes spark a million little tradeoffs, right?
| you think stuff like io_uring is finally gonna push postgres to
| catch up with the clouds
___________________________________________________________________
(page generated 2025-05-07 23:00 UTC)