[HN Gopher] Fast Commits for Ext4
___________________________________________________________________
Fast Commits for Ext4
Author : lukastyrychtr
Score : 77 points
Date : 2021-01-15 18:55 UTC (4 hours ago)
(HTM) web link (lwn.net)
(TXT) w3m dump (lwn.net)
| The_rationalist wrote:
| I wonder if this interact with https://github.com/clearlinux-
| pkgs/linux/blob/master/0102-in...
| ape4 wrote:
| ext4 is getting better just as Fedora has moved to Btrfs
| sroussey wrote:
| Curious how this change set would affect MySQL or postgresql?
| jldugger wrote:
| Probably not much -- the main win is not having to fsync files
| unrelated to your work, which is great for desktops which run
| multiple unrelated tasks (browser, terminals, rss readers,
| email clients, steam/game updates, package downloads). But I
| have to imagine SQL databases are typically run on systems
| dedicated to that singular task.
| cbhl wrote:
| Do folks typically turn on journaling at the filesystem layer
| when running a database?
|
| The database itself contains journaling, so one might choose to
| run with data=writeback or even directly against the block
| device if they were concerned about performance.
| comboy wrote:
| I don't think that those who read the manual do:
| https://www.postgresql.org/docs/13/wal-intro.html (unless
| they care about quick crash recovery)
| jabberwcky wrote:
| You definitely need both, these are two completely different
| kinds of journalling:
|
| - Filesystem journalling is making robust changes to the data
| structures describing directories, files, and where files
| live, in units of atomic filesystem operations. For example,
| the filesystem journal may record "CREATE FILE", which
| translates to "update directory entry 1234 in directory block
| 5678, then allocate and initialize extent descriptor 9999,
| then write an inode at array entry 74234"
|
| - Database journalling is making robust changes to the data
| structures describing the actual file contents, in units of
| atomic logical application operations. For example, a DB
| journal may record "INSERT ROW", which translates to "update
| block 123 of this index file, and 234 of this data file",
| application-specific relationships like that cannot be
| captured by the filesystem on UNIX.
|
| (Note: NTFS is transactional on Windows. It's entirely
| possible to correlate independent writes and make them
| atomic, so on Windows at least, in theory a DB could exist
| without a separate journal. I don't know if this is used in
| practice). Even if it were in use, it places severe limits on
| the kinds of concurrency optimizations a database system
| could otherwise perform, because all of that stuff moves
| behind the curtain of the OS interfaces.
| quotemstr wrote:
| > One of the things that I did discuss with Harshad was using
| some hueristics, where if there are two "unrelated" applications
| (e.g., different session id, or process group leader, or
| different uid, etc. --- details to be determined layer), we would
| not entangele writes to unrelated files via fsync(2), while
| forcing files written by the same application to share fate with
| one another even if only file is fsync'ed.
|
| Ugh. This is why we can't have nice things. I really don't want
| the kernel's filesystem performance to depend on the number of
| different UIDs writing to the filesystem. That is insanity!
|
| Ted Ts'o is just wrong here: performance should take priority
| over preserving the behavior of applications that rely on non-
| contractual implementation details of the Linux kernel. fsync
| should sync _only_ the indicated file, and that 's that. We can
| add a mount option to let users opt into the older, safer
| behavior, but we shouldn't suffer for essentially an eternity
| because somewhere, someone might have written an application that
| depends on an ext4 implementation detail.
___________________________________________________________________
(page generated 2021-01-15 23:00 UTC)