[HN Gopher] Linux NILFS file system: automatic continuous snapshots
___________________________________________________________________
Linux NILFS file system: automatic continuous snapshots
Author : solene
Score : 191 points
Date : 2022-10-11 11:58 UTC (11 hours ago)
(HTM) web link (dataswamp.org)
(TXT) w3m dump (dataswamp.org)
| wazoox wrote:
| I've been running NILFS2 on my main work NAS for 8 years. It
| never failed us :)
| mdaniel wrote:
| I mean this honestly: how did you evaluate such a new
| filesystem in order to bet a work NAS upon it?
| wazoox wrote:
| I've made some testing, and installed it on a secondary
| system that in the beginning mostly hosted unimportant files.
| Then we added more things, and as after a few years it posed
| absolutely no problem we went further (and added a backup
| procedure). Then we migrated to new hardware, and it's still
| going strong (it's quite small, about 15 TB volume).
| yonrg wrote:
| I would do it by using it! ... and probably some backup
| remram wrote:
| How is this pronounced? Nil-F-S? Nilfuss? Nai-L-F-S? N-I-L-F-S?
| heavyset_go wrote:
| The first one.
| Rygian wrote:
| How close is this to a large continuous tape loop for video
| surveillance?
|
| I would very much welcome a filesystem that breaks away from the
| directories/files paradigm. Any time-based data store would
| greatly benefit from that.
| rcthompson wrote:
| I think all you would need to add is a daemon that
| automatically deletes the oldest file(s) whenever free space
| drops below a certain threshold, so that the filesystem GC can
| reclaim that space for new files.
| tommiegannert wrote:
| If NILFS is continuously checkpointing, couldn't you even
| remove the file right after you add it, for simplicity?
| Rygian wrote:
| I know and use 'logrotate'.
|
| My point was more on the tracks of a filesystem where a
| single file can be overwritten over and over again, and it's
| up to the filesystem to transparently ensure the full
| capacity of the disk is put towards retaining old versions of
| the file.
| nix23 wrote:
| Hmm maybe something like Bluestore?
|
| https://docs.ceph.com/en/latest/rados/configuration/storage
| -...
| Rygian wrote:
| I definitely need to dive into Ceph, thanks for the
| pointer :-)
| darau1 wrote:
| What's the difference between a snapshot, and a checkpoint?
| okasaki wrote:
| from TA:
|
| > A checkpoint is a snapshot of your system at a given point in
| time, but it can be deleted automatically if some disk space
| must be reclaimed. A checkpoint can be transformed into a
| snapshot that will never be removed.
| sargun wrote:
| I've always wondered why NILFS (or similar) isn't used for cases
| where ransomware is a risk. I'm honestly surprised that it's not
| mandated to use a append-only / log-structured filesystem for
| some critical systems (think patient records), where the cost of
| losing data is so high, rarely mutated, and trading it off for
| wasting storage isn't that bad (after all, HDD storage is
| incredibly cheap, and nobody said you had to keep the working set
| and the log on the same device).
| compsciphd wrote:
| you don't need a log structured fs to do this, you could just
| have regular zfs/btrfs snapshots too.
|
| BUT
|
| if an attack has the ability to delete an entire file system /
| encrypt it, they really have the ability to delete the
| snapshots as well, the only reason they might not is due to
| "security through obscurity".
|
| now, what I have argued is that an append only file system
| which works in a SAN like environment (i.e. you have random
| reads, but only append writes properties that are enforced
| remotely) could give you that, but to an extent you'd still get
| a similar behavior by just exporting ZFS shares (or even as
| block devices) and snapshotting them regularly on the remote
| end.
| ephbit wrote:
| > if an attack has the ability to delete an entire file
| system / encrypt it, they really have the ability to delete
| the snapshots as well, ..
|
| How so?
|
| Let's say you have one machine holding the actual data for
| working on it. And some backup server. You could use btrfs
| send over ssh and regularly btrfs receive the data on the
| backup machine. Even it they got encrypted by ransomware they
| wouldn't be lost in the backups. As long as they're not
| deleted there how could a compromised work machine compromise
| the data on the backup machine?
| ggm wrote:
| Didn't VMS have this baked in? My memory is that all 8.3 file
| names had 8.3[;nnn] version tagging under the hood
| usr1106 wrote:
| That's what it looked like, but I doubt it was deep in the
| filesystem. It was basically just a naming convention. User had
| to purge old versions manually. This gets tedious if you have
| many files that change often. Snapshots are a safety net, not
| something you want to have in your way all day long.
| ggm wrote:
| Er.. my memory is that it did COW inside VMS fs semantics and
| was not manually achived. You did have to manually delete. So
| I don't think it was just a hack.
|
| It didn't do directories so was certainly not as good as
| snapshot but we're talking 40 years ago!
| jerf wrote:
| What happens if you run "dd if=/dev/zero of=/any/file/here", thus
| simply loading the disk with all the zeros it can handle? Do you
| lose all your snapshots as they are deleted to make room, or does
| it keep some space aside for this situation?
|
| (Not a "gotcha" question, a legitimate question.)
| regularfry wrote:
| I know this isn't what you're getting at, but is it smart
| enough to create a sparse file when you specifically pick zero
| as your filler byte?
| solene wrote:
| the garbage collector daemon will delete older checkpoints
| beyond the preserve time to make some room.
| Volundr wrote:
| It's configurable: https://nilfs.sourceforge.io/en/man5/nilfs_c
| leanerd.conf.5.h.... Cleanerd is responsible for maintaining a
| certain amount of free space on the system, and you can control
| the rules for doing so (e.x. a checkpoint won't be eligible for
| being cleaned until it is 1 week old).
|
| It's also worth knowing NILFS2 has checkpoints and snapshots.
| What you actually get are continuous "checkpoints". These can
| be upgraded to snapshots at any time with a simple command.
| Checkpoints are garbage collected, snapshots are not (until
| they are downgraded back into checkpoints).
| throwaway787544 wrote:
| didgetmaster wrote:
| Does NILFS do checksums and snapshotting for every single file in
| the system? One of my biggest complaints about file systems in
| general is that they are all designed to treat every file the
| exact same way.
|
| We now have storage systems (even SSDs) that are big enough to
| hold hundreds of millions of files. Those files can be a mix of
| small files, big files, temp files, personal files, and public
| files. Yet every file system must treat your precious thesis
| paper the same way it treats a huge cat video you downloaded off
| the Internet.
|
| We need some kind of 'object store' where each object can be
| given a set of attributes that govern how the file system treats
| it. Backup, encryption, COW, checksums, and other operations
| should not be wasted on a bunch of data that no one really cares
| about.
|
| I have been working on a kind of object file system that
| addresses this problem.
| nix23 wrote:
| Well you can do that kind of with zfs filesystems, and the
| "object" is the recordsize.
| mustache_kimono wrote:
| I was going to ask: "Is there any limit on the number of ZFS
| filesystems in a pool?" Google says 2^64 is the limit.
|
| Couldn't one just just generate a filesystem per object if
| snapshots, etc., on a per object level is what one cared
| about? Wonder how quickly this would fall over?
|
| > Backup, encryption, COW, checksums, and other operations
| should not be wasted on a bunch of data that no one really
| cares about.
|
| This GP comment is a little goofy though. There was a user I
| once encountered who wanted ZFS, but a la carte. "I want the
| snapshots but I don't need COW." You have to explain, "You
| don't get the snapshots unless you have the COW", etc.
| Conan_Kudo wrote:
| On Btrfs, you can mark a folder/file/subvolume to have
| nocow, which has the effect of only doing a COW operation
| when you are creating snapshots.
| mustache_kimono wrote:
| And that may work for btrfs, but again at some cost:
|
| "When you enable nocow on your files, Btrfs cannot
| compute checksums, meaning the integrity against bitrot
| and other corruptions cannot be guaranteed (i.e. in nocow
| mode, Btrfs drops to similar data consistency guarantees
| as other popular filesystems, like ext4, XFS, ...). In
| RAID modes, Btrfs cannot determine which mirror has the
| good copy if there is corruption on one of them."[0]
|
| [0]: https://wiki.tnonline.net/w/Blog/SQLite_Performance_
| on_Btrfs...
| lazide wrote:
| Yup. It's a pretty fundamental thing. COW and data
| checksums (and usually automatic/inline compression) co-
| exist that way because it's otherwise too expensive
| performance wise, and potentially dangerous corruption
| wise.
|
| For instance, if you modify a single byte in a large
| file, you need to update the data on disk as well as the
| checksum in the block header, and other related data.
| Chances are, these are in different sectors, and also
| require re-reading in all the other data in the block to
| compute the checksum. Anywhere in that process is a
| chance for corruption of the original data and the
| update.
|
| If the byte changes the final compressed size, it may not
| fit in the current block at all, causing an expensive (or
| impossible) re-allocation.
|
| You could end up with the original data and update both
| invalid.
|
| Writing out a new COW block is done all at once, and if
| it fails, the write failed atomically, with the original
| data still intact.
| tjoff wrote:
| > _Chances are, these are in different sectors, and also
| require re-reading in all the other data in the block to
| compute the checksum. Anywhere in that process is a
| chance for corruption of the original data and the
| update._
|
| Not much different than any interrupted write though. And
| a COW needs to reread just as much.
|
| > _If the byte changes the final compressed size, it may
| not fit in the current block at all, causing an expensive
| (or impossible) re-allocation._
|
| Something that you must always pay in a COW filesystem
| anyway? Is handled by other non-COW filesystems anyway.
|
| Just because a filesystem isn't COW doesn't mean every
| change needs to be in place either. Of course, a
| filesystem that is primarily COW might not want to
| maintain compression for non-COW edge-cases and that is
| quite reasonable.
| Arnavion wrote:
| While filesystem-integrated RAID makes sense since the
| filesystem can do filesystem-specific RAID placements (eg
| zfs), for now the safest RAID experience seems to be
| filesystem on mdadm on dm-integrity on disk partition, so
| that the RAID and RAID errors are invisible to the
| filesystem.
| mustache_kimono wrote:
| > the safest RAID experience seems to be filesystem on
| mdadm on dm-integrity on disk partition, so that the RAID
| and RAID errors are invisible to the filesystem.
|
| I suppose I don't understand this. Why would this be the
| case?
| Arnavion wrote:
| dm-integrity solves the problem of identifying which
| replica is good and which is bad. mdadm solves the
| problem of reading from the replica identified as good
| and fixing / reporting the replica identified as bad. The
| filesystem doesn't notice or care.
| mustache_kimono wrote:
| Ahh, so you intend, "If you can't use ZFS/btrfs, use dm-
| integrity"?
| Arnavion wrote:
| No. I don't use ZFS since it's not licensed correctly, so
| I have no opinion on it. And BTRFS raid is not safe
| enough for use. So I'm saying "Use filesystem on mdadm on
| dm-integrity".
| llanowarelves wrote:
| I have been spinning my wheels on personal backups and file
| organization the last few months. It is tough to perfectly
| structure it.
|
| I think directories or volumes having different properties and
| you having it split up as /consumer-media /work-media /work
| /docs /credentials etc may be the way to go.
|
| Then you can set integrity, encryption etc separately, either
| at filesystem level or as part of the software-level backup
| strategy.
| lazide wrote:
| Why is it 'wasted'? Those things are mostly free on modern
| hardware.
|
| The challenge with your thesis here is that the only one who
| can know what is 'that important' is _YOU_ , and your decision
| making and communication bandwidth is already the limiting
| factor.
|
| For many users, that cat video would be heartbreaking to lose,
| and they don't have term papers to worry about.
|
| So having to decide or think what is or is not 'important
| enough' to you, and communicate that to the system, just makes
| everything slower than putting everything on a system good
| enough to protect the most sensitive and high value data you
| have.
| didgetmaster wrote:
| Nothing is free or even 'mostly free' when managing data.
| Data security (encryption), redundancy (backups), and
| integrity (checksums, etc.) all impose a cost on the system.
|
| Getting each piece of data properly classified will always be
| a challenge (AI or other tools may help with that), but it
| would still be nice to be able to do it. If I have a 50GB
| video file that I could easily re-download off the Internet,
| it would be nice to be able to turn off any security,
| redundancy, or integrity features for it.
|
| I wonder how many petabytes of storage space is being wasted
| by having multiple backups of all the operating system files
| that could be easily downloaded from multiple websites. Do I
| really need to encrypt that GB file that 10 million people
| also have a copy of? Am I worried if a single pixel in that
| high resolution photo has changed due to bit rot?
| Arnavion wrote:
| >Do I really need to encrypt that GB file that 10 million
| people also have a copy of?
|
| Indeed you don't. Poettering has a similar idea in [1]
| (scroll down to "Summary of Resources and their
| Protections" for the tl;dr table), where he imagines OS
| files are only protected by dm-verity (for Silverblue-style
| immutable distros) / dm-integrity (for regular mutable
| distros).
|
| [1]: https://0pointer.net/blog/authenticated-boot-and-disk-
| encryp...
| derefr wrote:
| > For many users, that cat video would be heartbreaking to
| lose, and they don't have term papers to worry about.
|
| Depends on where that cat video is / how it ended up on the
| disk.
|
| The user explicitly saved it to their user-profile Downloads
| directory? Yeah, sure, the user might care a lot about
| preserving that data. There's intent there.
|
| The user's web browser _implicitly_ saved it into the browser
| 's cache directory? No, the user absolutely doesn't care.
| That directory is a pure transparent optimization over just
| loading the resource from the URL again; and the browser
| makes no guarantees of anything in it surviving for even a
| few minutes. The user doesn't even _know_ they have the data;
| only the browser does. As such, the browser should be able to
| tell the filesystem that this data is discardable cache data,
| and the filesystem should be able to apply different storage
| policies based on that.
|
| This is already true of managed cache/spool/tmp directories
| vis-a-vis higher-level components of the OS. macOS, for
| example, knows that stuff that's under ~/Library/Caches can
| be purged when disk space is tight, so it counts it as
| "reclaimable space"; and in some cases (caches that use
| CoreData) the OS can even garbage-collect them itself.
|
| So, why not also avoid making these files a part of backups?
| Why not avoid checksumming them? Etc.
| lazide wrote:
| Backups - possibly, but no one I know counts COW/Snapshots,
| etc. as backups. Backup software generally already avoids
| copying those.
|
| They can be ways to restore to a point in time
| deterministically - but then they are absolutely needed to
| do so! Otherwise, the software is going to be acting
| differently with a bunch of data gone from underneath it,
| no?
|
| Check summing is more about being able to detect errors
| (and deterministically know if data corruption is
| occurring). So yes, absolutely temporary and cache files
| should be checksummed. If that data is corrupted, it will
| cause crashes of the software using them and downstream
| corruption after all.
|
| Why would I _not_ want that to get caught before my
| software crashes or my output document (for instance) is
| being silently corrupted because one of the temporary files
| used when editing it got corrupted to /from disk?
| derefr wrote:
| > So yes, absolutely temporary and cache files should be
| checksummed. If that data is corrupted, it will cause
| crashes of the software using them and downstream
| corruption after all.
|
| ...no? I don't care if a video in my browser's cache ends
| up with a few corrupt blocks when I play it again a year
| later. Video codecs are designed to be tolerant of that.
| You'll get a glitchy section in a few frames, and then
| hit the next keyframe and everything will clean up.
|
| In fact, _most_ encodings -- of images, audio, even text
| -- are designed to be self-synchronizing in the face of
| corruption.
|
| I think you're thinking specifically of _working-state_
| files, which usually _need_ to be perfect and guaranteed-
| trusted, because they 're in normalized low-redundancy
| forms and are also used to derive other data from.
|
| But when I say "caching", I'm talking about cached
| _final-form assets_ intended for direct human
| consumption. These get corrupted all the time, from
| network errors during download, disk storage errors on
| NASes, etc; and people mostly just don 't care. For
| video, they just watch past it. For a web page, they
| hard-refresh it and everything's fine the second time
| around.
|
| If you think it's impossible to differentiate these two
| cases: well, that's because we don't explicitly ask
| developers to differentiate them. There could be separate
| ~/Library/ViewCache and ~/Library/StateCache directories.
|
| And before you ask, a good example of a large "ViewCache"
| asset that's _not_ browser-related: a video-editor
| render-preview video file (the low-quality / thumbnail-
| sized kind, used for scrubbing.)
| lazide wrote:
| If they are corrupted _on disk_ the behavior is not so
| deterministic as a 'broken image' and a reload. Corrupted
| _on disk_ content causes software crashes, hangs, and
| other broken behavior users definitely don't like.
| Especially when it's the filesystem metadata which gets
| corrupted.
|
| Because _merely trying to read it_ can cause severe
| issues at the filesystem level.
|
| I take it you haven't dealt with failing storage much
| before?
| derefr wrote:
| I maintain database and object-storage clusters for a
| living. Dealing with failing storage is half my job.
|
| > Especially when it's the filesystem metadata which gets
| corrupted.
|
| We're not talking about filesystem metadata, though.
| Filesystem metadata is all "of a piece" -- if you have a
| checksumming filesystem, then you can't _not_ checksum
| some of the filesystem metadata, because all the metadata
| lives in (the moral equivalent of) a single database file
| the filesystem maintains, and _that database_ gets
| checksummed. It 's all one data structure, where the
| checksumming is a thing you do _to_ that data structure,
| not to individual nodes within it. (For a tree filesystem
| like btrfs, this would be the non-cryptographic
| equivalent of a merkle-tree hash.) The only way you could
| even potentially turn off filesystem features for some
| metadata (dirent, freelist, etc) nodes but not others,
| would be to split your filesystem into multiple
| filesystems.
|
| No, to be clear, we're specifically talking about what
| happens inside the filesystem's _extents_. _Those_ can
| experience corruption without that causing any undue
| issues, besides "the data you get from fread(3) is
| wrong." Unlike filesystem metadata, which is _all_
| required for the filesystem 's _integrity_ , a
| checksumming filesystem can _choose_ whether to "look"
| inside file extents, or to treat them as opaque. And it
| can (in theory) make that choice per file, if it likes.
| From the FS's perspective, an extent is just a range of
| reserved disk blocks.
|
| Now, an assumption: only storage _arrays_ use spinning
| rust for anything any more. The only disk problems
| _consumer devices_ face any more are SSD degradation
| problems, not HDD degradation problems.
|
| (Even if you don't agree with this assumption by itself,
| it's much more clear-cut if you consider only devices
| operated by people willing to choose to use a filesystem
| that's not the default one for their OS.)
|
| This assumption neatly cleaves the problem-space in two:
|
| - How should a filesystem _on a RAID array, set up for a
| business or prosumer use-case,_ deal with HDD faults?
|
| - How should a _single-device_ filesystem _used in a
| consumer use-case_ deal with SDD faults?
|
| The HDD-faults case comes down to: filesystem-level
| storage pool management with filesystem-driven redundant
| reads, with kernel blocking-read timeouts to avoid hangs,
| with async bad-sector remapping for timed out reads.
| Y'know: ZFS.
|
| While the SDD-faults case comes down to: read the bad
| data. Deal with the bad data. You won't get any hangs,
| until the day the whole thing just stops working. The
| worst you'll get is bit-rot. And even then, it's rare,
| because NAND controllers use internal space for error-
| correction, entirely invisibly to the kernel. (See also:
| http://dtrace.org/blogs/ahl/2016/06/19/apfs-part5/)
|
| In fact, in my own personal experience, the most likely
| cause of incorrect or corrupt data ending up on an
| SSD/NVMe disk, is that the _CPU or memory_ of the system
| is bad, and so one or the other is corrupting the memory
| that will be written to disk _before_ or _during_ the
| write. (I 've personally had this happen at least twice.
| What to look for to diagnose this: PCIe "link training"
| errors.)
| rodgerd wrote:
| > Does NILFS do checksums and snapshotting for every single
| file in the system?
|
| NILFS is, by default, a filesystem that only ever appends until
| you garbage collect the tail. It doesn't really "snapshot" in
| the way that ZFS or btrfs do, because you can just walk the
| entire history of the filesystem until you run out of history.
| The snapshots are just bookmarks of a consistent state.
| heavyset_go wrote:
| You can turn off CoW, checksumming, compression, etc at the
| file and directory levels using btrfs.
| Arnavion wrote:
| Indeed. You can also make a directory into a subvolume so
| that that directory is not included in snapshots of the
| parent volume.
| spookthesunset wrote:
| It might sound weird but the hard part of what you describe is
| not the technology but how to design the UX in a way that you
| aren't babysitting everything.
|
| And doing that is not at all easy. For all anybody knows your
| cat video is "worth more" to you than your thesis paper. How
| can you get the system to determine the worth of each file
| without manually setting an attribute each time you create a
| file? And if you let the system guess, the cost of failure
| could be very high! What if it decided your thesis paper was
| worthless and stored it will a lower "integrity" (or whatever
| you call the metric)?
|
| I dunno. Storage is getting cheaper all the time and it might
| just be easier to fuck it and treat all files with the same
| high level of integrity. Maybe it would be so much work for a
| user to manually manage they'd just mark everything the same?
| didgetmaster wrote:
| You could always set the default behavior to be uniform for
| all files (e.g. protect everything or protect nothing) and
| just forget about it. But it would be nice to be able to
| manually set the protection level for specific files that are
| the exception.
|
| If I was copying an important file into an unprotected
| environment, I could change how it was handled (likewise if I
| was downloading some huge video I didn't care about into a
| system where the default protection was set to high).
|
| I agree that if you have 100 million files, then it could be
| nearly impossible to classify every single one of them
| correctly.
| spookthesunset wrote:
| I'd think on a directory basis would be the ideal
| nintendo1889 wrote:
| A directory basis, or even better, a numerical priority
| that could be manually set in the application that
| generated them, or automatically, based on the user or
| application or in a hypervisor, based on the VM. Then it
| could be an opportunistic setting.
|
| I thought ZFS had some sort of unique settings like this.
| koolba wrote:
| How does this compare to ZFS + cron to create snapshots every X
| minutes?
| harvie wrote:
| Week ago my client lost data on ZFS by accidentaly deleting
| folder. Unfortunately the data was created and deleted in the
| meantime between two snapshots. One would expect that it still
| might be possible to recover, because ZFS is CoW.
|
| There are some solutions like photorec (which now has ZFS
| support), but it expects you can identify the file by footprint
| of its contents, which was not the case. Also many of these
| solutions would require ZFS to go offline for forensic analysis
| and that was also not possible because lots of other clients
| were using the same pool at the time.
|
| So this had failed me and i really wished at the time that ZFS
| had continuous snapshots.
|
| BTW on ZFS i use ZnapZend. It's second best thing after
| continuous snapshots:
|
| https://www.znapzend.org/ https://github.com/oetiker/znapzend/
|
| There are also some ZFS snapshotting daemons in Debian, but
| this is much more elegant and flexible.
|
| But since znapzend is userspace daemon (as are all ZFS
| snapshoters) you need some kind of monitoring and warning
| mechanism for cases something goes wrong and it can't longer
| create snapshots (crashes, gets killed by OOM or something...).
| In NILFS2 every write/delete is snapshot, so you are basicaly
| guaranteed by kernel to have everything snapshoted without
| having to watch it.
| yonrg wrote:
| I run this setup. zfs + zfsnap (not cron anymore, now
| systemd.timer).
|
| I cannot tell if NILFS is doing this too, with zfsnap I
| maintain different retention times. 5-minutely for 1hour,
| hourly for 1day, daily for a week. That are less than 60
| snapshots. The older ones are cleaned up.
|
| In addition, zfs brings compression and encryption. That's why
| I have it on the laptops, too.
| goodpoint wrote:
| There is no comparison. NILFS provides *continuous* snaphots,
| so you can inspect and rollback changes as needed.
|
| It does without a performance penalty compared to other logging
| filesystems.
|
| And without using additional space forever. The backlog rotates
| forward continuously.
|
| It's a really unique feature that makes a lot of sense for
| desktop use, where you might want to recover files that were
| created and deleted after a short time.
| harvie wrote:
| Perhaps we can leverage "inotify" API to make ZFS snapshot
| everytime some file had been changed... But i think ZFS is
| not really good at handling huge amounts of snapshots. The
| NILFS2 snapshots are probably more lightweight when compared
| to ZFS ones.
| goodpoint wrote:
| The NILFS snapshots are practically free (for a logging
| filesystem, obviously).
| mustache_kimono wrote:
| > Perhaps we can leverage "inotify" API to make ZFS
| snapshot everytime some file had been changed...
|
| ZFS and btrfs users are already living in the future:
| inotifywait -r -m --format %w%f -e close_write
| "/srv/downloads/" | while read -r line; do #
| command below will snapshot the dataset # upon
| which the closed file is located sudo httm --snap
| "$line" done
|
| See: https://kimono-koans.github.io/inotifywait/
| [deleted]
| fuckstick wrote:
| > It does without a performance penalty.
|
| What is the basis for comparison? Sounds like a pretty
| meaningless statement at its face.
| goodpoint wrote:
| Compared to other logging filesystems obviously.
| fuckstick wrote:
| Nilfs baseline (write throughput especially) is slow as
| shit compared to other filesystems including f2fs. So
| just because you have this feature that doesn't make it
| even slower isn't that interesting - you pay for it one
| way or the other.
| usr1106 wrote:
| For many users filesystem speed of your home directory is
| completely irrelevant unless you run on a Raspberry Pi
| using SD cards. You just don't notice it.
|
| Of course if you haver server handling let's say video
| files things will be very different. And there are some
| users who process huge amounts of data.
|
| I run 2 lvm snapshots (daily and weekly) on my home
| partition for years. Write performance is abysmal if you
| measure it, but you don't note it in daily development
| work.
| [deleted]
| [deleted]
| 1MachineElf wrote:
| >It's a really unique feature that makes a lot of sense for
| desktop us
|
| Sounds like it could serve as a basis for a Linux
| implementation of something like Apple Time Machine.
| [deleted]
| mustache_kimono wrote:
| With 'httm', a few of us are already living in that bright
| future: https://github.com/kimono-koans/httm
| masklinn wrote:
| Afaik Time Machine does not do continuous snapshots, just
| periodic (and triggered).
|
| So you can already do that with zfs: take a snapshot and
| send it to the backup drive.
| harvie wrote:
| "It does without a performance penalty"
|
| yeah. it's already so terribly slow that it's unlikely that
| taking snapshots can make it any slower :-D
| Volundr wrote:
| That was not my experience with NILFS. It outperformed ext4
| on my laptop NVME.
| akvadrako wrote:
| The benchmarks here look pretty bad:
|
| https://www.phoronix.com/review/linux-58-filesystems/4
| Volundr wrote:
| The last page looks pretty bad. If you look at the others
| it's more of a mixed bag, but yeah.
|
| I don't remember what benchmark I ran before deciding to
| run it on my laptop. Given my work at the time probably
| pgbench, but I couldn't say for sure. It was long enough
| ago I also might've been benchmarking against ext3, not
| 4.
| harvie wrote:
| i think i was running it on 6TB conventional HDD RAID1.
| also note that the read and write speeds might be quite
| asymetrical... in general also depends on workload type.
| pkulak wrote:
| > There is no comparison.
|
| What if I compare it to BTRFS + Snapper? No performance
| penalty there, plus checksumming.
| AshamedCaptain wrote:
| btrfs and snapperd do have a performance penalty as the
| number of snapshots increases. Having 100+ usually means
| snapper list will take north of an hour. You can easily
| reach these numbers if you are taking a snapshot every
| handful of minutes.
|
| Even background snapper cleanups will start to take a toll,
| since even if they are done with ionice they tend to block
| simultaneous accesses to the filesystem while they are in
| progress. If you have your root on the same filesystem,
| it's not pretty -- lots of periodic system-wide freezes
| with the HDD LEDs non-stop blinking. I tend to limit
| snapshots always to < 20 for that reason (and so does the
| default snapperd config).
| mike256 wrote:
| About 2 years ago I believed the same. Then I used BTRFS as
| a store for VM images (with periodoc snapshot) and
| performance went down to really really bad. After I deleted
| all snapshots performance was good again. There is a big
| performance penalty in btrfs with more than about 100
| snapshots.
| Volundr wrote:
| NILFS is really, really cool. In concept. Unfortunately the
| tooling and support just isn't there. I ran it for quite some
| time on my laptop and the continuous snapshoting is everything I
| hoped it'd be. At one point however there was a change to the
| kernel that rendered it unbootable. Despite being a known and
| recorded bug it took forever to get fixed (about a year if I
| recall correctly) leaving me stuck on an old kernel the whole
| time.
|
| This was made more frustrating by the lack of any tooling such as
| fsck to help me diagnose the issue. The only reason I figured out
| it was a bug was that I booted a live CD to try to rescue the
| system and it booted fine.
|
| When I finally replaced that laptop I went back to ZFS and
| scripted snapshots. As much as I want to, I just can't recommend
| NILFS for daily use.
| yonrg wrote:
| Do you happen to remember which change in kernel was the cause?
|
| I had troubles with un-popular file systems as root file system
| when the initrd was not built properly. So sysresccd is always
| good to have in reach.. Saying this, I think I won't have any
| other file system on root besides the default of the distro.
| Data which require special care are on other partitions.
| CGamesPlay wrote:
| How did Linus not go on a rampage after breaking userspace for
| an entire year? Is NILFS not part of the kernel mainline, I
| guess?
| jraph wrote:
| If I understand correctly, I don't think this is a userspace-
| breaking bug, as in: a kernel API changed and made a
| userspace program not work anymore.
|
| It is a bug that prevents the kernel from booting. That's
| bad, but that's not the same thing. That's not a userspace
| compatibility issue such as the ones Linus chases. The user
| space isn't even involved if the kernel cannot boot. Or if it
| is actually a userspace program that causes a kernel crash,
| it is a crash, which is not really the same thing as an API
| change (one could argue, but that's a bit far-fetched, the
| intents are not the same, etc - I don't see Linus explode on
| somebody who introduced a crash the way he would explode on
| someone changing a userspace API).
| yjftsjthsd-h wrote:
| > Is NILFS not part of the kernel mainline, I guess?
|
| Good guess, but no:
|
| https://github.com/torvalds/linux/tree/master/fs/nilfs2
|
| > How did Linus not go on a rampage after breaking userspace
| for an entire year?
|
| I would very much like to know that as well. Any chance it
| didn't get reported (at least, not as "this broke booting")?
| Volundr wrote:
| I reported it along with a few other users in
| https://marc.info/?l=linux-nilfs&m=157540765215806&w=2. I
| think it just isn't widely enough used that Linus noticed
| we were broken. If I recall correctly it also wasn't
| directly fixed so much as incidentally. I just kept
| checking new kernel versions as they were released until
| one worked. There was never anything in the change-log
| (that I recall) about fixing the bug, just another change
| that happened to fix the issue.
|
| Edit: Looking through the archives, it looks like my memory
| was somewhat uncharitable. It was reported in November and
| directly patched in June (https://marc.info/?l=linux-
| nilfs&m=159154670627428&w=2) so about 7 months after
| reporting. Not sure what kernel release that would've
| landed in, so could've been closer to 8.
| bityard wrote:
| > How did Linus not go on a rampage after breaking userspace
| for an entire year?
|
| Linus' commandment about not breaking userspace is frequently
| misunderstood. He wants to ensure that user-space /programs/
| do not break (even if they rely on buggy behavior that made
| it into a release), not that the /user/ will never see any
| breakage of the system whatsoever, which is of course an
| impossible goal. Device drivers and filesystems are firmly
| system-level stuff, bugs and backwards-incompatible changes
| in those areas are regrettable but happen all the same.
| cmurf wrote:
| Very nice introduction to NILSFS, which has been in the Linux
| kernel since 2009.
| newcup wrote:
| I think NILFS is a hidden gem. I've been using it exclusively in
| my Linux laptops, desktops etc. since ca. 2014. Apart from one
| kernel regression bug related to NILFS2 it's worked flawlessly
| (no data corruption even with the bug just no access to the file
| system; effectively it forced running older kernel while the bug
| was fixed).
|
| The continuous snapshotting has saved me a couple of times; I've
| just mounted a version of the file system from few hours or weeks
| ago to access overwritten or deleted data. I use NILFS also on
| backup disks to provide combined deduplication and snapshots
| easily (just rsync & NILFS' mkss, latter to make sure the
| "checkpoints" aren't unnoticedly garbage collected in case the
| backup disk gets full).
| nix23 wrote:
| >I think NILFS is a hidden gem. I've been using it exclusively
| in my Linux laptops, desktops etc. since ca. 2014
|
| Yes it's really sad, there we have a native and stable check-
| summing fs, and nearly no one knows about it.
| yjftsjthsd-h wrote:
| > check-summing fs
|
| Is it? Last I'd heard was
|
| > nilfs2 store checksums for all data. However, at least the
| current implementation does not verify it when reading.
|
| https://www.spinics.net/lists/linux-nilfs/msg01063.html
| nix23 wrote:
| Hmm you could be right, i found nothing about that it is
| calculated at read-time. Just with fsck.
| conradev wrote:
| BTRFS is also a native copy on write filesystem that verifies
| a configurable checksum and supports snapshots.
|
| The snapshots are not automatic, but short of that it is
| pretty feature complete
| nix23 wrote:
| That's why i specifically wrote -> stable...
| 77pt77 wrote:
| BTRFS is not stable?
| guipsp wrote:
| BTRFS is pretty stable nowadays.
| nerpderp82 wrote:
| What does that mean quantifiably?
| guipsp wrote:
| Synology deploys it in their products
| ComputerGuru wrote:
| > Apart from one kernel regression bug related to NILFS2 it's
| worked flawlessly
|
| Maybe on x86? I've tried repeatedly to use it on ARM for
| RaspberryPi where it would have been perfect, but always ran
| into various kernel panics as soon as the file system is
| mounted or accessed.
| heavyset_go wrote:
| I've used NILFS2 on flash storage on some old non-RPi ARMv7
| hardware for a while without a problem. Switched to F2FS for
| performance reasons, though.
| newcup wrote:
| True, I only have used it on x86 devices. Thanks for the
| heads up!
|
| I've heard so many stories of SD card failures (against which
| snapshotting might be of no help) with RaspberryPi that I've
| decided to send any valuable data promptly to safety over a
| network. (Though, I personally haven't had any problems with
| failing SD's.)
| rodgerd wrote:
| NILFS is absolutely wonderful; it was very unfortunate that
| Linus chose to dub btrfs as the ext4 successor all those years
| ago, because it cut off a lot of interest in the plethora of
| interesting work that was going on at the time.
|
| A decade later and btrfs is still riddled with problems and
| incomplete, people are still using xfs and ext4 for lack of
| trust, one kernel dev has a side hobby trying to block openzfs,
| and excellent little projects like nilfs are largely unknown.
| perrygeo wrote:
| > one kernel dev has a side hobby trying to block openzfs
|
| Can you elaborate?
| nintendo1889 wrote:
| I remember DEC/HP releasing the source to the digital unix AdvFS
| filesystem on sourceforge with the intent of porting it over to
| linux, but it never materialized. AdvFS had many advanced
| features. The source is still available and within it are some
| PDF slides that explain a lot of it's features.
| Nifty3929 wrote:
| Do any file systems have good, native support for tagging and
| complex searched based on those tags?
| DannyBee wrote:
| BeFS was the last real one i'm aware of at the complexity you
| are talking about (plenty of FSen have some very basic indexed
| support for say file sizes , but not the kind of generic
| tagging you are talking about)
|
| At this point, the view seems to be "attributes happen in the
| file system, indexing happens in user space".
|
| Especially on linux.
|
| Part of the reason is, as i understand it, the
| surface/complexity of including query languages in the kernel,
| which is not horribly unreasonable
|
| So all the common FSen have reasonable xattr support, and
| inotify/etc that support notification of attribute changes.
|
| The expectation seems to be that the fact that inotify might
| drop events now and then is not a dealbreaker. The modern queue
| length is usually 16384 anyway.
|
| I'm not saying there aren't tradeoffs here, but this seems to
| be the direction taken overall.
|
| I actually would love to have an FS with native indexed xattr
| and a way to get at them.
|
| I just don't think we'll get back there again anytime soon.
| Nifty3929 wrote:
| Okay - how about tagging and non-complex searches then.
| Beggars can't be choosers :-)
|
| Really what I'd like is just to search for some specific
| tags, or maybe list a directory excluding some tag, or
| similar. For bonus points, maybe a virtual directory that
| represents a search like this, and which "contains" the
| results of that search. (A "Search Folder")
|
| I'll check out BeFS. Thanks!
| harvie wrote:
| I had issues with file locking when running some legacy database
| software on NILFS2. Probably caused data corruption in that
| database (not the FS itself).
|
| SF website of NILFS2 suggests that there are some unimplemented
| features, one of them being synchronous IO, which might have
| caused that issue?
|
| https://nilfs.sourceforge.io/en/current_status.html
|
| In some cases, the NILFS2 is safer storage for your data than
| ZFS. So NILFS might work for some simple usecases (eg. localy
| storing documents that you modify often), but it's certainly not
| ready to be deployed as generic filesystem. It's relatively slow
| and sometimes behaves bit weird. If something goes really bad,
| the recovery might be bit painfull. There is no fsck yet, nor
| community support. NILFS2 can self-heal itself to some extent.
|
| I really like the idea of NILFS2 but at this point i would prefer
| patch adding continuous snapshotting to ZFS. Unlike NILFS2 the
| ZFS have lots of active developers and big community. While
| NILFS2 is almost dead. The fact it's been in kernel for quite
| some time and most people didn't even noticed it (despite it's
| very interresting features) speaks for itself.
|
| Don't get me wrong. I wish that more developers get interested in
| NILFS2 and fix these issues and make it on par with EXT4, XFS and
| ZFS... But still ZFS has more features overall, so we might just
| add continuous snapshots in memoriam of NILFS2.
| yjftsjthsd-h wrote:
| > In some cases, the NILFS2 is safer storage for your data than
| ZFS.
|
| What cases? Do you just mean due to continuous snapshots
| protecting against accidental deletes or such, or are there
| more "under the covers" things it fixes?
| ComputerGuru wrote:
| It's basically append-only for recent things so you
| theoretically you can't lose anything (within a reasonable
| timeframe). I don't know if the porcelain exposes everything
| you need to avail yourself of that design functionality,
| though.
| compsciphd wrote:
| we used NILFS 15 years ago in dejaview -
| https://www.cs.columbia.edu/~nieh/pubs/sosp2007_dejaview.pdf
|
| We combined nilfs + our process snapshotting tech (we tried to
| mainline it, but it didn't go, but many of the concepts ended up
| in CRIU though) + our remote display + screen reading tech (i.e.
| normal APIs) to create an environment that could record
| everything you ever saw visually and textually. enable you to
| search it and enable you to recreate the state as it was at that
| time with non noticeable interruption to the user (processes
| downtime was like 0.02s).
| heavyset_go wrote:
| This is cool, thanks for sharing it.
___________________________________________________________________
(page generated 2022-10-11 23:00 UTC)