[HN Gopher] Why are my ZFS disks so noisy?
___________________________________________________________________
Why are my ZFS disks so noisy?
Author : todsacerdoti
Score : 168 points
Date : 2024-09-26 20:37 UTC (1 days ago)
(HTM) web link (allthingsopen.org)
(TXT) w3m dump (allthingsopen.org)
| hamandcheese wrote:
| I don't know if this exists or not, but I'd like to try something
| like a fuse filesystem which can transparently copy a file to a
| fast scratch SSD when it is first accessed.
|
| I have a somewhat large zfs array and it makes consistent noise
| as I stream videos from it. The streaming is basically a steady
| trickle compared to what the array is capable of. I'd rather
| incur all the noise up front, as fast as possible, then continue
| the stream from a silent SSD.
| toast0 wrote:
| If you've got enough ram, you might be able to tune prefetching
| to prefetch the whole file? Although, I'm not sure how tunable
| that actually is.
| iforgotpassword wrote:
| Yes, I think nowadays you do this with
| blockdev --setra <num_sectors> /dev/sdX
|
| But I feel like there was a sysctl for this too in the past.
| I used it back in the day to make the HDD in my laptop spin
| down immediately after a new song started playing in
| rhythmbox by setting it to 16MB.
| hnlmorg wrote:
| Not aware of anything that directly matches your description,
| however all major operating systems do cache filesystem object
| in RAM. So if you pre-read the file, then it should be read
| back from cache when you come to stream it.
|
| Additionally, ZFS supports using SSDs to supplement the cache.
| spockz wrote:
| Just to test whether your OS setup etc is already up to par,
| try reading the whole file, eg by calculating the hash with
| something like `md5` (yes md5 is not secure I know.). This
| should put the file mostly in the os cache. But with video
| files being able to hit more than 50GiB in size these days, you
| need quite a lot of RAM to keep it all in cache. Maybe you can
| setting the SSD as a scratch disk for swap? I'm not sure how/if
| you can tweak what it is used for.
|
| As a sibling says, ZFS should support this pretty
| transparently.
| 3np wrote:
| It may not be 100% what you're looking for and will probably
| not make your drives _silent_ while streaming but putting L2ARC
| on that SSD and tweaking prefetch might get you a good way
| there.
|
| Another much simpler filesystem-agnostic alternative would be
| to copy it over to the SSD with a script and commence streaming
| from there. You'll have to wait for the entire file to copy for
| the stream to start, though. I think some streaming servers may
| actually support this natively if you mount /var and/or
| /var/tmp on the faster drive and configure it to utilize it as
| a "cache".
| Mayzie wrote:
| > I don't know if this exists or not, but I'd like to try
| something like a fuse filesystem which can transparently copy a
| file to a fast scratch SSD when it is first accessed.
|
| You may be interested in checking out bcache[1] or bcachefs[2].
|
| [1] https://www.kernel.org/doc/html/latest/admin-
| guide/bcache.ht...
|
| [2] https://bcachefs.org/
| theblazehen wrote:
| lvm-cache works as well, if you're already using LVM.
|
| https://github.com/45Drives/autotier is exactly what they
| were asking for as well
| magicalhippo wrote:
| I've done some testing with using ZFS on top of LVM with
| dm-writecache.
|
| Worked well enough on the small scale, but sadly haven't
| had the time or hardware to test it in a more production-
| like environment.
|
| Also, it was starting to feel a bit like a Jenga tower,
| increasing the chances of bugs and other weird issues to
| strike.
| kaliszad wrote:
| I wouldn't recommend combining those two. It's only
| begging for problems.
| magicalhippo wrote:
| Yeah that's my worry. Still, got 6x old 3TB disks that
| still work and a few spare NVMEs, so would be fun to try
| it for teh lulz.
| LtdJorge wrote:
| Yeah, bcache is exactly that
| naming_the_user wrote:
| Depending on the file-size I wonder if it'd be better to just
| prefetch the entire file into RAM.
| fsckboy wrote:
| good idea, but that's not his problem. he needs a media
| player that will sprint through to the end of the file when
| he first opens it. They don't do that cuz they figure they
| might be streaming from the net so why tax _that_ part of the
| sytem.
| throw73737 wrote:
| VLC has all sorts of prefetch settings.
| usefulcat wrote:
| With ZFS that wouldn't necessarily help. I've been using
| ZFS at work for many years with mostly large files. Even if
| I repeatedly read the same file as fast as possible, ZFS
| will not cache the entire thing in memory (there's no
| shortage of RAM so that's not it). This is unlike most
| linux filesystems, which are usually pretty aggressive with
| caching.
|
| Maybe there is some combination of settings that will get
| it to cache more aggressively; just saying that it's not a
| given that it will do so.
| dialup_sounds wrote:
| Reminded of this -- ZFS on Apple's Fusion Drive
|
| http://jolly.jinx.de/teclog/2012.10.31.02-fusion-drive-loose...
| nicman23 wrote:
| i mean you can do that with a simple wrapper
|
| have a ssd ie mounted in /tmp/movies
|
| and create a script in .bin/ (or whatever)
|
| #!/bin/sh
|
| tmp_m="/tmp/movies/$(mktemp -d)"
|
| cp "$@" $tmp_m
|
| mpv $tmp_m
|
| rm $tmp_m
|
| please note i have not tried the script but it probably works
| anotherhue wrote:
| mpv has --cache-size (or something) that you can set at a few
| GB. If you run out of ram it should swap to your ssd.
|
| Edit: demuxer-max-bytes=2147483647
| 2OEH8eoCRo0 wrote:
| You could use overlayfs with the upper layer on the SSD
| "scratch" and trigger a write operation
| madeofpalk wrote:
| This is essentially what macOS's Fusion Drive is/was
| https://en.wikipedia.org/wiki/Fusion_Drive
|
| I'm unsure if they ship any macs with these anymore. I guess
| not since the Apple Sillicon iMacs don't have spinning hard
| drives?
| mystified5016 wrote:
| Bcachefs.
|
| You can do all sorts of really neat things. You can define
| pools of drives at different cache levels. You can have a bunch
| of mechanical drives for deep storage, some for hot storage,
| SSD to cache recently read files, then write-through from the
| SSD down to mechanical drives, either immediately or after a
| delay.
|
| It's pretty much everything I could wish for from a filesystem,
| though I haven't actually taken the time to try it out yet.
| AFAIK it's still somewhat experimental, more or less in beta.
| cassianoleal wrote:
| Is it fully merged to the kernel? I remember a few weeks ago
| some drama between Kent Overstreet and Linus about it but I
| didn't go into details. Has that been resolved?
|
| Edit: or maybe it was some drama over userspace tooling, I
| can't remember tbh.
| MattTheRealOne wrote:
| Bcachefs is fully merged into the kernel since 6.7. The
| drama was around Overstreet trying to merge significant
| code changes to a Release Candidate kernel that should only
| be receiving minor bug fixes at that stage of the
| development process. It was developer communication issues,
| not anything that impacts the user. The changes will just
| have to wait until the next kernel version.
| cassianoleal wrote:
| I think this is what I was recalling.
|
| https://jonathancarter.org/2024/08/29/orphaning-bcachefs-
| too...
| throw0101d wrote:
| > _I don 't know if this exists or not, but I'd like to try
| something like a fuse filesystem which can transparently copy a
| file to a fast scratch SSD when it is first accessed._
|
| ZFS has caching for writes (SLOG)[0][1] and reads
| (L2ARC),[2][3] which was introduced many years ago when HDDs
| were cheap and flash was still very, very expensive:
|
| * https://www.brendangregg.com/blog/2009-10-08/hybrid-
| storage-...
|
| [0] https://openzfs.github.io/openzfs-
| docs/man/master/7/zpoolcon...
|
| [1] https://openzfs.github.io/openzfs-docs/man/master/8/zpool-
| cr...
|
| [2] https://openzfs.github.io/openzfs-
| docs/man/master/7/zpoolcon...
|
| [3] https://openzfs.github.io/openzfs-docs/man/master/8/zpool-
| ad...
| aidenn0 wrote:
| I should point out that the SLOG only caches _synchronous_
| writes, which are written twice with ZFS.
|
| Also, the L2ARC is great, but does still have RAM overhead.
| There are also useful tunables. I had a workload on a RAM
| limited machine where directory walking was common, but data
| reads were fairly random and a L2ARC configured for metadata
| only speed it up by a large amount.
| kaliszad wrote:
| You can also have a special metadata only/ small files
| dedicated special VDEV. ZFS can pull of many tricks if you
| configure it well. Of course, the L2ARC is better, if you
| don't trust the caching device that much (e.g. only have a
| single SSD).
| magicalhippo wrote:
| > ZFS has caching for writes
|
| Not really.
|
| It will accumulate _synchronous_ writes into the ZIL, and you
| put the ZIL on a fast SLOG vdev. But it will only do so for a
| limited amount of time /space, and is not meant as a proper
| write-back cache but rather as a means to quickly service
| _synchronous_ writes.
|
| By default asynchronous writes do _not_ use the ZIL, and
| hence SLOG vdev at all. You can force it to, but that can
| also be a bad idea unless you have Optane drives as you 're
| then bottlenecked by the ZIL/SLOG.
| _hyn3 wrote:
| cat filename > /dev/null
|
| Reads the entire file into the OS buffer.
| sulandor wrote:
| add this to mpv.conf cache=yes
| demuxer-max-bytes=5G
| demuxer-max-back-bytes=5G
| Maledictus wrote:
| I expected a way to find out what the heck the system is sending
| to those disks. Like per process and what the the Kernel/ZFS is
| adding.
| M95D wrote:
| It can be done with CONFIG_DM_LOG_WRITES and another set of
| drives, but AFAIK, it needs to be set up before (or more
| exactly _under_ ) zfs.
| Sesse__ wrote:
| blktrace is fairly useful in this regard.
| paol wrote:
| I know the article is just using the noise thing as an excuse to
| deep dive into zfs and proxmox, which is cool, but if what you
| really care about is reducing noise I thought I'd leave some
| practical advice here:
|
| 1. Most hard drive noise is caused by mechanical vibrations being
| transmitted to the chassis the drive is mounted on.
|
| 2. Consequently the most effective way to reduce noise is to
| reduce the mechanical coupling in the drive mounting mechanism.
| Having the drives in a noise-isolating case is helpful too, but
| only as a secondary improvement. Optimizing the drive mounting
| should really be the first priority.
|
| 3. If space isn't a concern the optimal thing is to have a large
| case (like an ATX or larger) with a large number of HDD bays. The
| mounting should use soft rubber or silicon grommets. Some
| mounting systems can work with just the grommets, but systems
| that use screws are ok too as long as the screw couples to the
| grommet not the chassis. In a good case like this any number of
| hard drives can be made essentially inaudible.
|
| 4. If space is a concern, a special purpose "NAS like" case
| (example: the Jonsbo N line of cases) can approach the size of
| consumer NAS boxes. The lack of space makes optimal accoustics
| difficult, but it will still be a 10x improvement over typical
| consumers NASes.
|
| 5. Lastly what you _shouldn 't_ ever do is get one of those
| consumers NAS boxes. They are made with no concern for noise at
| all, and manufacturing cheapness constraints tend to make them
| literally pessimal at it. I had a QNAP I got rid of that couldn't
| have been more effective at amplifying drive noise if it had been
| designed for that on purpose.
| naming_the_user wrote:
| Yeah, the article title seemed kind of weird to me. I have a
| ZFS NAS, it's just a bunch of drives in an ATX case with (what
| I'd considered to nowadays be) the standard rubber grommets.
|
| I mean, you can hear it, but it's mostly just the fans and
| drives spinning, it's not loud at all.
|
| The recommendations seem reasonable but for noise? If it's
| noisy probably something is wrong I think.
| nicolaslem wrote:
| I totally understand the article title, I have a ZFS NAS that
| makes the same kind of noise as described there. Roughly
| every five seconds the drives make sound that is different
| from the background hum of a running computer. In a calm
| environment this is very distracting. I even had a guest
| sleeping in an adjacent room complain about it once.
| ndiddy wrote:
| This is likely a different problem than the article
| describes. Most newer hard drives will move the actuator
| arm back and forth every few seconds when the drive is
| inactive. It has to do with evenly distributing the
| lubrication on the arm to increase the life of the drive.
| ssl-3 wrote:
| That's a tunable in ZFS.
|
| vfs.zfs.txg.timeout defaults to 5 seconds, but it can be
| set (much) higher if you wish.
|
| I don't care if I lose up to a minute or two of work
| instead of <=5 seconds in the face of an unplanned failure,
| so I set it to a couple of minutes on my desktop rig years
| ago and never looked back.
|
| AFAIK there's also no harm in setting it both dynamically
| and randomly. I haven't tried it, but periodically setting
| vfs.zfs.txg.timeout to a random value between [say] 60 and
| 240 seconds should go a long ways towards making it easier
| to ignore by breaking up the regularity.
|
| (Or: Quieter disks. Some of mine are very loud; some are
| very quiet. Same box, same pool, just different models.
|
| Or: Put the disks somewhere else, away from the user and
| the sleeping guests.)
| cjs_ac wrote:
| To add to this, I mounted an HDD to a case using motherboard
| standoffs in a place that was obviously intended for SSDs. Not
| only was it very loud, the resonance between the disk and the
| case also broke the disk after six months.
| jonhohle wrote:
| And you probably weren't even listening to Rhythm Nation![0]
|
| 0 - https://devblogs.microsoft.com/oldnewthing/20220816-00/?p
| =10...
| magicalhippo wrote:
| I recently upgraded my home-NAS from a Fractal Define R4 to a
| Define 7 XL. The R4 had the rubber grommets, but hot-swappable
| trays that were just held in by spring force. As such they
| rattled a lot.
|
| The Define 7 has the same grommet system, but the trays can be
| fastened by screws to the support rails.
|
| The difference in noise was significant. Even though I went
| from 6 to 10 disks it's much more quiet now.
| patrakov wrote:
| No. The most effective way to remove HDD noise is to remove
| HDDs and add SSDs. I don't have any HDDs since 2016.
|
| P.S. I also talked to a customer in the past who stored their
| backups in an SSD-only Ceph cluster. They were citing higher
| reliability of SSDs and higher density, which was important
| because they had very limited physical space in the datacenter.
| In other words, traditional 3.5" HDDs would not have allowed
| them to store that much data in that many rack units.
| toast0 wrote:
| SSDs are great. Quieter, can be denser, faster, available in
| small sizes for small money, more reliable, etc.
|
| But they're not great for low cost bulk storage. If you're
| putting together a home NAS, you probably want to do well on
| $/TB and don't care so much about transfer speeds.
|
| But if you've found 10TB+ ssds for under $200, let us know
| where to find them.
| Guvante wrote:
| A 20 TB HDD is <$400
|
| An 8 TB SSD is >$600
|
| $80/TB vs $20/TB is a four fold increase.
|
| Also a 16 TB drive is $2,000 so more like a 5x increase in a
| data center setup.
| magicalhippo wrote:
| The 4TB M.2 SSDs are getting to a price point where one
| might consider them. The problem is that it's not trivial
| to connect a whole bunch of them in a homebrew NAS without
| spending tons of money.
|
| Best I've found so far is cards like this[1] that allow for
| 8 U.2 drives, and then some M.2 to U.2 adapters like
| this[2] or this[3].
|
| In a 2x RAID-Z1 or single RAID-Z2 setup that would give
| 24TB of redundant flash storage for a tad more than a
| single 16TB enterprise SSD.
|
| [1]: https://www.aliexpress.com/item/1005005671021299.html
|
| [2]: https://www.aliexpress.com/item/1005005870506081.html
|
| [3]: https://www.aliexpress.com/item/1005006922860386.html
| bpye wrote:
| On AM5 you can do 6 M.2 drives without much difficulty,
| and with considerably better perf. Your motherboard will
| need to support x4/x4/x4/x4 bifurcation on the x16 slot,
| but you put 4 there [0], and then use the two on board x4
| slots, one will use the CPU lanes and the other will be
| connected via the chipset.
|
| [0] -
| https://www.aliexpress.com/item/1005002991210833.html
| 7bit wrote:
| > The most effective way to remove HDD noise is to remove
| HDDs and add SSDs.
|
| Lame
| kaliszad wrote:
| You clearly haven't read the full article as Jim Salter writes
| about the mechanical stuff at the end of the article.
|
| Also, you want to reduce vibrations because of this:
| https://www.youtube.com/watch?v=tDacjrSCeq4 (Shouting in the
| datacenter)
| Larrikin wrote:
| >Lastly what you shouldn't ever do is get one of those
| consumers NAS boxes. They are made with no concern for noise at
| all, and manufacturing cheapness constraints tend to make them
| literally pessimal at it. I had a QNAP I got rid of that
| couldn't have been more effective at amplifying drive noise if
| it had been designed for that on purpose.
|
| Is there any solution that lets me mix and match drive sizes as
| well as upgrade? I'm slowly getting more and more into self
| hosting as much of digital life as possible, so I don't want to
| be dependent on Synology, but they offered a product that let
| me go from a bunch of single drives with no redundancy to being
| able to repurpose them into a solution where I can swap out
| drives and most importantly grow. As far as I can tell theres
| no open source equivalent. As soon as I've set up a file system
| with the drives I already have the only solution is to buy the
| same amount of drives with more space once I run out.
| kiney wrote:
| BTRFS
| mrighele wrote:
| > As soon as I've set up a file system with the drives I
| already have the only solution is to buy the same amount of
| drives with more space once I run out.
|
| Recent versions of zfs support raidz expansion [1], which let
| you add extra disks to a raidz1/2/3 pool. It has a number of
| limitations, for example you cannot change the type of pool
| (mirror to raidz1, raidz1 to raidz2 etc.) but if you plan to
| expand your pool one disk at a time it can be useful. Just
| remember that 1) old data will not take advantage of the
| extra disk until you copy it around and 2) the size of the
| pool is limited by the size of the smallest disk in the pool.
|
| [1] https://github.com/openzfs/zfs/pull/15022
| kccqzy wrote:
| I might be misunderstanding your needs but my home server
| uses just LVM. When I run out of disk space, I buy a new
| drive, use `pvcreate` followed by `vgextend` and `lvextend`.
| devilbunny wrote:
| And I've never used a QNAP, but I'm on my second Synology and
| their drive carriages all use rubber/silicone grommets to
| isolate drive vibration from the case. It's not _silent_ -
| five drives of spinning rust will make some noise regardless
| - but it sits in a closet under my stairs that backs up to my
| media cabinet and you have to be within a few feet to hear it
| even in the closet over background noise in the house.
|
| I don't use any of their "personal cloud" stuff that relies
| on them. It's just a Linux box with some really good features
| for drive management and package updates. You can set up and
| maintain any other services you want without using their
| manager.
|
| The ease with which I could set it up as a destination for
| Time Machine backups has absolutely saved my bacon on at
| least one occasion. My iMac drive fell to some strange data
| corruption and would not boot. I booted to recovery, pointed
| it at the Synology, and aside from the restore time, I only
| lost about thirty minutes' work. The drive checked out fine
| and is still going strong. Eventually it will die, and when
| it does I'll buy a new Mac and tell it to restore from the
| Synology. I have double-disk redundancy, so I can lose any
| two of five drives with no loss of data so long as I can get
| new drives to my house and striped in before a third fails.
| That would take about a week, so while it's possible, it's
| unlikely.
|
| If I were really paranoid about that, I'd put together a
| group buy for hard drives from different manufacturers,
| different runs, different retailers, etc., and then swap them
| around so none of us were using drives that were all from the
| same manufacturer, factory, and date. But I'm not that
| paranoid. If I have a drive go bad, and it's one that I have
| more than one of the same (exact) model, I'll buy enough to
| replace them all, immediately replace the known-bad one, and
| then sell/give away the same-series.
| jftuga wrote:
| Is there an online calculator to help you find the optimal
| combination of # of drives, raid level, and block size?
|
| For example, I'm interested in setting up a new RAID-Z2 pool of
| disks and would like to minimize noise and number of writes.
| Should I use 4 drives or 6? Also, what would be the optimal block
| size(es) in this scenario?
| thg wrote:
| Not sure if this is exactly what you're looking for, but I
| suppose it will be of at least some help:
| https://docs.google.com/spreadsheets/d/1tf4qx1aMJp8Lo_R6gpT6...
| mastax wrote:
| https://wintelguy.com/zfs-calc.pl
| nubinetwork wrote:
| I don't think you can really get away from noise, zfs writes to
| disk every 5 seconds pretty much all the time...
| m463 wrote:
| I run proxmox, and ever since day 1 I noticed it hits the disk.
|
| a _LOT_.
|
| I dug into it and even without ANY vms or containers runnning, it
| writes a bunch of stuff out every second.
|
| I turned off a bunch of stuff, I think: systemctl
| disable pve-ha-crm systemctl disable pve-ha-lrm
|
| But stuff like /var/lib/pve-firewall and /var/lib/rrdcached was
| still written to every second.
|
| I think I played around with commit=n mount and also
|
| The point of this is - I tried running proxmox with zfs, and it
| wrote to the disk even more often.
|
| maybe ok for physical hard disks, but I didn't want to burn out
| my ssd immediately.
|
| for physical disks it could be noisy
___________________________________________________________________
(page generated 2024-09-27 23:01 UTC)