[HN Gopher] Why are my ZFS disks so noisy?
       ___________________________________________________________________
        
       Why are my ZFS disks so noisy?
        
       Author : todsacerdoti
       Score  : 168 points
       Date   : 2024-09-26 20:37 UTC (1 days ago)
        
 (HTM) web link (allthingsopen.org)
 (TXT) w3m dump (allthingsopen.org)
        
       | hamandcheese wrote:
       | I don't know if this exists or not, but I'd like to try something
       | like a fuse filesystem which can transparently copy a file to a
       | fast scratch SSD when it is first accessed.
       | 
       | I have a somewhat large zfs array and it makes consistent noise
       | as I stream videos from it. The streaming is basically a steady
       | trickle compared to what the array is capable of. I'd rather
       | incur all the noise up front, as fast as possible, then continue
       | the stream from a silent SSD.
        
         | toast0 wrote:
         | If you've got enough ram, you might be able to tune prefetching
         | to prefetch the whole file? Although, I'm not sure how tunable
         | that actually is.
        
           | iforgotpassword wrote:
           | Yes, I think nowadays you do this with
           | blockdev --setra <num_sectors> /dev/sdX
           | 
           | But I feel like there was a sysctl for this too in the past.
           | I used it back in the day to make the HDD in my laptop spin
           | down immediately after a new song started playing in
           | rhythmbox by setting it to 16MB.
        
         | hnlmorg wrote:
         | Not aware of anything that directly matches your description,
         | however all major operating systems do cache filesystem object
         | in RAM. So if you pre-read the file, then it should be read
         | back from cache when you come to stream it.
         | 
         | Additionally, ZFS supports using SSDs to supplement the cache.
        
         | spockz wrote:
         | Just to test whether your OS setup etc is already up to par,
         | try reading the whole file, eg by calculating the hash with
         | something like `md5` (yes md5 is not secure I know.). This
         | should put the file mostly in the os cache. But with video
         | files being able to hit more than 50GiB in size these days, you
         | need quite a lot of RAM to keep it all in cache. Maybe you can
         | setting the SSD as a scratch disk for swap? I'm not sure how/if
         | you can tweak what it is used for.
         | 
         | As a sibling says, ZFS should support this pretty
         | transparently.
        
         | 3np wrote:
         | It may not be 100% what you're looking for and will probably
         | not make your drives _silent_ while streaming but putting L2ARC
         | on that SSD and tweaking prefetch might get you a good way
         | there.
         | 
         | Another much simpler filesystem-agnostic alternative would be
         | to copy it over to the SSD with a script and commence streaming
         | from there. You'll have to wait for the entire file to copy for
         | the stream to start, though. I think some streaming servers may
         | actually support this natively if you mount /var and/or
         | /var/tmp on the faster drive and configure it to utilize it as
         | a "cache".
        
         | Mayzie wrote:
         | > I don't know if this exists or not, but I'd like to try
         | something like a fuse filesystem which can transparently copy a
         | file to a fast scratch SSD when it is first accessed.
         | 
         | You may be interested in checking out bcache[1] or bcachefs[2].
         | 
         | [1] https://www.kernel.org/doc/html/latest/admin-
         | guide/bcache.ht...
         | 
         | [2] https://bcachefs.org/
        
           | theblazehen wrote:
           | lvm-cache works as well, if you're already using LVM.
           | 
           | https://github.com/45Drives/autotier is exactly what they
           | were asking for as well
        
             | magicalhippo wrote:
             | I've done some testing with using ZFS on top of LVM with
             | dm-writecache.
             | 
             | Worked well enough on the small scale, but sadly haven't
             | had the time or hardware to test it in a more production-
             | like environment.
             | 
             | Also, it was starting to feel a bit like a Jenga tower,
             | increasing the chances of bugs and other weird issues to
             | strike.
        
               | kaliszad wrote:
               | I wouldn't recommend combining those two. It's only
               | begging for problems.
        
               | magicalhippo wrote:
               | Yeah that's my worry. Still, got 6x old 3TB disks that
               | still work and a few spare NVMEs, so would be fun to try
               | it for teh lulz.
        
           | LtdJorge wrote:
           | Yeah, bcache is exactly that
        
         | naming_the_user wrote:
         | Depending on the file-size I wonder if it'd be better to just
         | prefetch the entire file into RAM.
        
           | fsckboy wrote:
           | good idea, but that's not his problem. he needs a media
           | player that will sprint through to the end of the file when
           | he first opens it. They don't do that cuz they figure they
           | might be streaming from the net so why tax _that_ part of the
           | sytem.
        
             | throw73737 wrote:
             | VLC has all sorts of prefetch settings.
        
             | usefulcat wrote:
             | With ZFS that wouldn't necessarily help. I've been using
             | ZFS at work for many years with mostly large files. Even if
             | I repeatedly read the same file as fast as possible, ZFS
             | will not cache the entire thing in memory (there's no
             | shortage of RAM so that's not it). This is unlike most
             | linux filesystems, which are usually pretty aggressive with
             | caching.
             | 
             | Maybe there is some combination of settings that will get
             | it to cache more aggressively; just saying that it's not a
             | given that it will do so.
        
         | dialup_sounds wrote:
         | Reminded of this -- ZFS on Apple's Fusion Drive
         | 
         | http://jolly.jinx.de/teclog/2012.10.31.02-fusion-drive-loose...
        
         | nicman23 wrote:
         | i mean you can do that with a simple wrapper
         | 
         | have a ssd ie mounted in /tmp/movies
         | 
         | and create a script in .bin/ (or whatever)
         | 
         | #!/bin/sh
         | 
         | tmp_m="/tmp/movies/$(mktemp -d)"
         | 
         | cp "$@" $tmp_m
         | 
         | mpv $tmp_m
         | 
         | rm $tmp_m
         | 
         | please note i have not tried the script but it probably works
        
         | anotherhue wrote:
         | mpv has --cache-size (or something) that you can set at a few
         | GB. If you run out of ram it should swap to your ssd.
         | 
         | Edit: demuxer-max-bytes=2147483647
        
         | 2OEH8eoCRo0 wrote:
         | You could use overlayfs with the upper layer on the SSD
         | "scratch" and trigger a write operation
        
         | madeofpalk wrote:
         | This is essentially what macOS's Fusion Drive is/was
         | https://en.wikipedia.org/wiki/Fusion_Drive
         | 
         | I'm unsure if they ship any macs with these anymore. I guess
         | not since the Apple Sillicon iMacs don't have spinning hard
         | drives?
        
         | mystified5016 wrote:
         | Bcachefs.
         | 
         | You can do all sorts of really neat things. You can define
         | pools of drives at different cache levels. You can have a bunch
         | of mechanical drives for deep storage, some for hot storage,
         | SSD to cache recently read files, then write-through from the
         | SSD down to mechanical drives, either immediately or after a
         | delay.
         | 
         | It's pretty much everything I could wish for from a filesystem,
         | though I haven't actually taken the time to try it out yet.
         | AFAIK it's still somewhat experimental, more or less in beta.
        
           | cassianoleal wrote:
           | Is it fully merged to the kernel? I remember a few weeks ago
           | some drama between Kent Overstreet and Linus about it but I
           | didn't go into details. Has that been resolved?
           | 
           | Edit: or maybe it was some drama over userspace tooling, I
           | can't remember tbh.
        
             | MattTheRealOne wrote:
             | Bcachefs is fully merged into the kernel since 6.7. The
             | drama was around Overstreet trying to merge significant
             | code changes to a Release Candidate kernel that should only
             | be receiving minor bug fixes at that stage of the
             | development process. It was developer communication issues,
             | not anything that impacts the user. The changes will just
             | have to wait until the next kernel version.
        
               | cassianoleal wrote:
               | I think this is what I was recalling.
               | 
               | https://jonathancarter.org/2024/08/29/orphaning-bcachefs-
               | too...
        
         | throw0101d wrote:
         | > _I don 't know if this exists or not, but I'd like to try
         | something like a fuse filesystem which can transparently copy a
         | file to a fast scratch SSD when it is first accessed._
         | 
         | ZFS has caching for writes (SLOG)[0][1] and reads
         | (L2ARC),[2][3] which was introduced many years ago when HDDs
         | were cheap and flash was still very, very expensive:
         | 
         | * https://www.brendangregg.com/blog/2009-10-08/hybrid-
         | storage-...
         | 
         | [0] https://openzfs.github.io/openzfs-
         | docs/man/master/7/zpoolcon...
         | 
         | [1] https://openzfs.github.io/openzfs-docs/man/master/8/zpool-
         | cr...
         | 
         | [2] https://openzfs.github.io/openzfs-
         | docs/man/master/7/zpoolcon...
         | 
         | [3] https://openzfs.github.io/openzfs-docs/man/master/8/zpool-
         | ad...
        
           | aidenn0 wrote:
           | I should point out that the SLOG only caches _synchronous_
           | writes, which are written twice with ZFS.
           | 
           | Also, the L2ARC is great, but does still have RAM overhead.
           | There are also useful tunables. I had a workload on a RAM
           | limited machine where directory walking was common, but data
           | reads were fairly random and a L2ARC configured for metadata
           | only speed it up by a large amount.
        
             | kaliszad wrote:
             | You can also have a special metadata only/ small files
             | dedicated special VDEV. ZFS can pull of many tricks if you
             | configure it well. Of course, the L2ARC is better, if you
             | don't trust the caching device that much (e.g. only have a
             | single SSD).
        
           | magicalhippo wrote:
           | > ZFS has caching for writes
           | 
           | Not really.
           | 
           | It will accumulate _synchronous_ writes into the ZIL, and you
           | put the ZIL on a fast SLOG vdev. But it will only do so for a
           | limited amount of time /space, and is not meant as a proper
           | write-back cache but rather as a means to quickly service
           | _synchronous_ writes.
           | 
           | By default asynchronous writes do _not_ use the ZIL, and
           | hence SLOG vdev at all. You can force it to, but that can
           | also be a bad idea unless you have Optane drives as you 're
           | then bottlenecked by the ZIL/SLOG.
        
         | _hyn3 wrote:
         | cat filename > /dev/null
         | 
         | Reads the entire file into the OS buffer.
        
         | sulandor wrote:
         | add this to mpv.conf                  cache=yes
         | demuxer-max-bytes=5G
         | demuxer-max-back-bytes=5G
        
       | Maledictus wrote:
       | I expected a way to find out what the heck the system is sending
       | to those disks. Like per process and what the the Kernel/ZFS is
       | adding.
        
         | M95D wrote:
         | It can be done with CONFIG_DM_LOG_WRITES and another set of
         | drives, but AFAIK, it needs to be set up before (or more
         | exactly _under_ ) zfs.
        
         | Sesse__ wrote:
         | blktrace is fairly useful in this regard.
        
       | paol wrote:
       | I know the article is just using the noise thing as an excuse to
       | deep dive into zfs and proxmox, which is cool, but if what you
       | really care about is reducing noise I thought I'd leave some
       | practical advice here:
       | 
       | 1. Most hard drive noise is caused by mechanical vibrations being
       | transmitted to the chassis the drive is mounted on.
       | 
       | 2. Consequently the most effective way to reduce noise is to
       | reduce the mechanical coupling in the drive mounting mechanism.
       | Having the drives in a noise-isolating case is helpful too, but
       | only as a secondary improvement. Optimizing the drive mounting
       | should really be the first priority.
       | 
       | 3. If space isn't a concern the optimal thing is to have a large
       | case (like an ATX or larger) with a large number of HDD bays. The
       | mounting should use soft rubber or silicon grommets. Some
       | mounting systems can work with just the grommets, but systems
       | that use screws are ok too as long as the screw couples to the
       | grommet not the chassis. In a good case like this any number of
       | hard drives can be made essentially inaudible.
       | 
       | 4. If space is a concern, a special purpose "NAS like" case
       | (example: the Jonsbo N line of cases) can approach the size of
       | consumer NAS boxes. The lack of space makes optimal accoustics
       | difficult, but it will still be a 10x improvement over typical
       | consumers NASes.
       | 
       | 5. Lastly what you _shouldn 't_ ever do is get one of those
       | consumers NAS boxes. They are made with no concern for noise at
       | all, and manufacturing cheapness constraints tend to make them
       | literally pessimal at it. I had a QNAP I got rid of that couldn't
       | have been more effective at amplifying drive noise if it had been
       | designed for that on purpose.
        
         | naming_the_user wrote:
         | Yeah, the article title seemed kind of weird to me. I have a
         | ZFS NAS, it's just a bunch of drives in an ATX case with (what
         | I'd considered to nowadays be) the standard rubber grommets.
         | 
         | I mean, you can hear it, but it's mostly just the fans and
         | drives spinning, it's not loud at all.
         | 
         | The recommendations seem reasonable but for noise? If it's
         | noisy probably something is wrong I think.
        
           | nicolaslem wrote:
           | I totally understand the article title, I have a ZFS NAS that
           | makes the same kind of noise as described there. Roughly
           | every five seconds the drives make sound that is different
           | from the background hum of a running computer. In a calm
           | environment this is very distracting. I even had a guest
           | sleeping in an adjacent room complain about it once.
        
             | ndiddy wrote:
             | This is likely a different problem than the article
             | describes. Most newer hard drives will move the actuator
             | arm back and forth every few seconds when the drive is
             | inactive. It has to do with evenly distributing the
             | lubrication on the arm to increase the life of the drive.
        
             | ssl-3 wrote:
             | That's a tunable in ZFS.
             | 
             | vfs.zfs.txg.timeout defaults to 5 seconds, but it can be
             | set (much) higher if you wish.
             | 
             | I don't care if I lose up to a minute or two of work
             | instead of <=5 seconds in the face of an unplanned failure,
             | so I set it to a couple of minutes on my desktop rig years
             | ago and never looked back.
             | 
             | AFAIK there's also no harm in setting it both dynamically
             | and randomly. I haven't tried it, but periodically setting
             | vfs.zfs.txg.timeout to a random value between [say] 60 and
             | 240 seconds should go a long ways towards making it easier
             | to ignore by breaking up the regularity.
             | 
             | (Or: Quieter disks. Some of mine are very loud; some are
             | very quiet. Same box, same pool, just different models.
             | 
             | Or: Put the disks somewhere else, away from the user and
             | the sleeping guests.)
        
         | cjs_ac wrote:
         | To add to this, I mounted an HDD to a case using motherboard
         | standoffs in a place that was obviously intended for SSDs. Not
         | only was it very loud, the resonance between the disk and the
         | case also broke the disk after six months.
        
           | jonhohle wrote:
           | And you probably weren't even listening to Rhythm Nation![0]
           | 
           | 0 - https://devblogs.microsoft.com/oldnewthing/20220816-00/?p
           | =10...
        
         | magicalhippo wrote:
         | I recently upgraded my home-NAS from a Fractal Define R4 to a
         | Define 7 XL. The R4 had the rubber grommets, but hot-swappable
         | trays that were just held in by spring force. As such they
         | rattled a lot.
         | 
         | The Define 7 has the same grommet system, but the trays can be
         | fastened by screws to the support rails.
         | 
         | The difference in noise was significant. Even though I went
         | from 6 to 10 disks it's much more quiet now.
        
         | patrakov wrote:
         | No. The most effective way to remove HDD noise is to remove
         | HDDs and add SSDs. I don't have any HDDs since 2016.
         | 
         | P.S. I also talked to a customer in the past who stored their
         | backups in an SSD-only Ceph cluster. They were citing higher
         | reliability of SSDs and higher density, which was important
         | because they had very limited physical space in the datacenter.
         | In other words, traditional 3.5" HDDs would not have allowed
         | them to store that much data in that many rack units.
        
           | toast0 wrote:
           | SSDs are great. Quieter, can be denser, faster, available in
           | small sizes for small money, more reliable, etc.
           | 
           | But they're not great for low cost bulk storage. If you're
           | putting together a home NAS, you probably want to do well on
           | $/TB and don't care so much about transfer speeds.
           | 
           | But if you've found 10TB+ ssds for under $200, let us know
           | where to find them.
        
           | Guvante wrote:
           | A 20 TB HDD is <$400
           | 
           | An 8 TB SSD is >$600
           | 
           | $80/TB vs $20/TB is a four fold increase.
           | 
           | Also a 16 TB drive is $2,000 so more like a 5x increase in a
           | data center setup.
        
             | magicalhippo wrote:
             | The 4TB M.2 SSDs are getting to a price point where one
             | might consider them. The problem is that it's not trivial
             | to connect a whole bunch of them in a homebrew NAS without
             | spending tons of money.
             | 
             | Best I've found so far is cards like this[1] that allow for
             | 8 U.2 drives, and then some M.2 to U.2 adapters like
             | this[2] or this[3].
             | 
             | In a 2x RAID-Z1 or single RAID-Z2 setup that would give
             | 24TB of redundant flash storage for a tad more than a
             | single 16TB enterprise SSD.
             | 
             | [1]: https://www.aliexpress.com/item/1005005671021299.html
             | 
             | [2]: https://www.aliexpress.com/item/1005005870506081.html
             | 
             | [3]: https://www.aliexpress.com/item/1005006922860386.html
        
               | bpye wrote:
               | On AM5 you can do 6 M.2 drives without much difficulty,
               | and with considerably better perf. Your motherboard will
               | need to support x4/x4/x4/x4 bifurcation on the x16 slot,
               | but you put 4 there [0], and then use the two on board x4
               | slots, one will use the CPU lanes and the other will be
               | connected via the chipset.
               | 
               | [0] -
               | https://www.aliexpress.com/item/1005002991210833.html
        
           | 7bit wrote:
           | > The most effective way to remove HDD noise is to remove
           | HDDs and add SSDs.
           | 
           | Lame
        
         | kaliszad wrote:
         | You clearly haven't read the full article as Jim Salter writes
         | about the mechanical stuff at the end of the article.
         | 
         | Also, you want to reduce vibrations because of this:
         | https://www.youtube.com/watch?v=tDacjrSCeq4 (Shouting in the
         | datacenter)
        
         | Larrikin wrote:
         | >Lastly what you shouldn't ever do is get one of those
         | consumers NAS boxes. They are made with no concern for noise at
         | all, and manufacturing cheapness constraints tend to make them
         | literally pessimal at it. I had a QNAP I got rid of that
         | couldn't have been more effective at amplifying drive noise if
         | it had been designed for that on purpose.
         | 
         | Is there any solution that lets me mix and match drive sizes as
         | well as upgrade? I'm slowly getting more and more into self
         | hosting as much of digital life as possible, so I don't want to
         | be dependent on Synology, but they offered a product that let
         | me go from a bunch of single drives with no redundancy to being
         | able to repurpose them into a solution where I can swap out
         | drives and most importantly grow. As far as I can tell theres
         | no open source equivalent. As soon as I've set up a file system
         | with the drives I already have the only solution is to buy the
         | same amount of drives with more space once I run out.
        
           | kiney wrote:
           | BTRFS
        
           | mrighele wrote:
           | > As soon as I've set up a file system with the drives I
           | already have the only solution is to buy the same amount of
           | drives with more space once I run out.
           | 
           | Recent versions of zfs support raidz expansion [1], which let
           | you add extra disks to a raidz1/2/3 pool. It has a number of
           | limitations, for example you cannot change the type of pool
           | (mirror to raidz1, raidz1 to raidz2 etc.) but if you plan to
           | expand your pool one disk at a time it can be useful. Just
           | remember that 1) old data will not take advantage of the
           | extra disk until you copy it around and 2) the size of the
           | pool is limited by the size of the smallest disk in the pool.
           | 
           | [1] https://github.com/openzfs/zfs/pull/15022
        
           | kccqzy wrote:
           | I might be misunderstanding your needs but my home server
           | uses just LVM. When I run out of disk space, I buy a new
           | drive, use `pvcreate` followed by `vgextend` and `lvextend`.
        
           | devilbunny wrote:
           | And I've never used a QNAP, but I'm on my second Synology and
           | their drive carriages all use rubber/silicone grommets to
           | isolate drive vibration from the case. It's not _silent_ -
           | five drives of spinning rust will make some noise regardless
           | - but it sits in a closet under my stairs that backs up to my
           | media cabinet and you have to be within a few feet to hear it
           | even in the closet over background noise in the house.
           | 
           | I don't use any of their "personal cloud" stuff that relies
           | on them. It's just a Linux box with some really good features
           | for drive management and package updates. You can set up and
           | maintain any other services you want without using their
           | manager.
           | 
           | The ease with which I could set it up as a destination for
           | Time Machine backups has absolutely saved my bacon on at
           | least one occasion. My iMac drive fell to some strange data
           | corruption and would not boot. I booted to recovery, pointed
           | it at the Synology, and aside from the restore time, I only
           | lost about thirty minutes' work. The drive checked out fine
           | and is still going strong. Eventually it will die, and when
           | it does I'll buy a new Mac and tell it to restore from the
           | Synology. I have double-disk redundancy, so I can lose any
           | two of five drives with no loss of data so long as I can get
           | new drives to my house and striped in before a third fails.
           | That would take about a week, so while it's possible, it's
           | unlikely.
           | 
           | If I were really paranoid about that, I'd put together a
           | group buy for hard drives from different manufacturers,
           | different runs, different retailers, etc., and then swap them
           | around so none of us were using drives that were all from the
           | same manufacturer, factory, and date. But I'm not that
           | paranoid. If I have a drive go bad, and it's one that I have
           | more than one of the same (exact) model, I'll buy enough to
           | replace them all, immediately replace the known-bad one, and
           | then sell/give away the same-series.
        
       | jftuga wrote:
       | Is there an online calculator to help you find the optimal
       | combination of # of drives, raid level, and block size?
       | 
       | For example, I'm interested in setting up a new RAID-Z2 pool of
       | disks and would like to minimize noise and number of writes.
       | Should I use 4 drives or 6? Also, what would be the optimal block
       | size(es) in this scenario?
        
         | thg wrote:
         | Not sure if this is exactly what you're looking for, but I
         | suppose it will be of at least some help:
         | https://docs.google.com/spreadsheets/d/1tf4qx1aMJp8Lo_R6gpT6...
        
         | mastax wrote:
         | https://wintelguy.com/zfs-calc.pl
        
       | nubinetwork wrote:
       | I don't think you can really get away from noise, zfs writes to
       | disk every 5 seconds pretty much all the time...
        
       | m463 wrote:
       | I run proxmox, and ever since day 1 I noticed it hits the disk.
       | 
       | a _LOT_.
       | 
       | I dug into it and even without ANY vms or containers runnning, it
       | writes a bunch of stuff out every second.
       | 
       | I turned off a bunch of stuff, I think:                 systemctl
       | disable pve-ha-crm       systemctl disable pve-ha-lrm
       | 
       | But stuff like /var/lib/pve-firewall and /var/lib/rrdcached was
       | still written to every second.
       | 
       | I think I played around with commit=n mount and also
       | 
       | The point of this is - I tried running proxmox with zfs, and it
       | wrote to the disk even more often.
       | 
       | maybe ok for physical hard disks, but I didn't want to burn out
       | my ssd immediately.
       | 
       | for physical disks it could be noisy
        
       ___________________________________________________________________
       (page generated 2024-09-27 23:01 UTC)