[HN Gopher] ZFS on a single core RISC-V hardware with 512MB
___________________________________________________________________
ZFS on a single core RISC-V hardware with 512MB
Author : magicalhippo
Score : 77 points
Date : 2022-03-13 17:16 UTC (5 hours ago)
(HTM) web link (andreas.welcomes-you.com)
(TXT) w3m dump (andreas.welcomes-you.com)
| 2Gkashmiri wrote:
| hope this coupled with more tech on risc-v hardware can bring it
| to the level of raspbery pi with all the community and hardware
| devices and the accessories and all that.
|
| will it take a decade ? less?
| FullyFunctional wrote:
| I hope it won't be a decade, but remember the (original)
| Raspberry Pi launched on a very mature part, with a _very_
| mature (ancient) ISA.
|
| Outside the discount pricing, Intel has promised to tape out
| SiFive's P650. Revos, Tenstorrent, and others are also working
| on fast cores, but it'll be at least 2-3 years before they hit
| the market if at all.
|
| So far SiFive's dual issue in-order core (~ 40 GeekBench 5.4.1)
| (like on now-cancelled BeagleV) is the fastest chip you can buy
| as a lay person. The D1 (~ 32 GB 5.4.1) is cheaper but less
| powerful.
| michaelmrose wrote:
| There has not ever been a reason for memory to be correlated with
| storage capacity nor any reason to believe that such a
| correlation ought to exist.
|
| Nobody ever said well I plugged an 20TB external hard drive so I
| better plug in a few more sticks of RAM so that works.
|
| Dedup needs RAM in proportion to storage because for each
| duplicate block it maintains an entry in an in memory table.
| lazide wrote:
| You uh just kind of contradicted yourself?
|
| All file systems have metadata which is good to keep in memory.
| Building several 50+TB NAS boxes recently, it isn't just ZFS
| either. And it isn't some sort of linear performance penalties
| sometimes when you don't have enough RAM. It can be kernel
| panics, exponential decay in performance, etc.
| michaelmrose wrote:
| I didn't contradict myself at all virtually nobody needs
| dedup its not remotely worth the RAM cost for 99.9% of users.
|
| Can you quantify what you are saying. What OS/filesystem?
| What minimum RAM requirements for what amount of storage?
| lazide wrote:
| That's a good point - I've seen free memory drop every time
| I've built the larger file systems (and not from just the
| cache), but I never tried to quantify it. And I don't see
| any good stats or notes on it.
|
| seems like no one is building these larger systems on boxes
| small enough for it to matter, or at least google isn't
| finding it.
| michaelmrose wrote:
| Another way of saying this is that RAM usage doesn't
| meaningfully scale with storage size in scope of storage
| systems encountered by actual non theoretical people
| because the minimum ram available on any system one
| encounters is sufficient to service the amount of storage
| that it is possible to use on said system.
| porkgymnastics wrote:
| magicalhippo wrote:
| > There has not ever been a reason for memory to be correlated
| with storage capacity nor any reason to believe that such a
| correlation ought to exist.
|
| However specific implementations can indeed have memory
| requirements that scale in relation to storage capacity. For
| example, if the implementation keeps the bitmap of free space
| in memory, then more storage = larger bitmap = more memory
| required.
|
| There's been several attempts in ZFS to reduce memory overhead.
| I'm pretty sure that if you took a decade old version of ZFS
| you'd struggle to run it on a system with 512MB RAM.
| michaelmrose wrote:
| At present 512MB of RAM is notable in how ridiculously tiny
| it is and 2TB is still an acceptable amount of storage.
| Without resorting to decades obsolete software can you put a
| pin on exactly how much storage it would take to render that
| tiny amount of RAM unusable and then explain how much storage
| it would take to render a machine with 4GB of RAM likewise
| unusable so that we may demonstrate memory usage scaling with
| storage?
| FullyFunctional wrote:
| Ha, this is awesome, thanks for checking that out. One point of
| note though, I'm pretty sure it would have been faster to build
| the kernel + OpenZFS on the Debian/RISC-V in QEMU. QEMU on decent
| hardware runs very fast and much faster than the D1.
|
| ADD: Geekbench 5.4.1 on RISC-V
|
| - under QEMU/Ryzen 9 3900XT: 82
|
| - under QEMU/M1: 76
|
| - Native D1: 32 (https://browser.geekbench.com/v5/cpu/13259016)
|
| The M1 result is skewed because for some reason AES emulation is
| much faster on Ryzen. The rest of the integer stuff is faster on
| the M1, up to 30% faster.
| bombcar wrote:
| Anyone have a low-power RISC or ARM hardware that supports many
| SATA ports?
| mustache_kimono wrote:
| Don't know why anyone hasn't made such a board a priority.
| Seems like a sweet spot.
| vorpalhex wrote:
| Would also love to see this, even if it's experimental or beta.
|
| PiBox is the only contender I am aware of.
| dark-star wrote:
| But can it do dedupe on such a box? I think the recommendation is
| still "1GB of RAM for each TB of storage" if you're using
| dedupe...
|
| I still have some boards with ~512mb RAM lying around (an
| UltraSPARC for example) that I'd love to re-purpose to a cheap
| NAS, just for the heck of doing it on a non-x86 platform....
| Wowfunhappy wrote:
| Yeah, I think the author may be mixing up recommendations for
| dedup vs non-dedup. The solution is always to not enable dedup,
| it's a niche feature that's not worthwhile outside of _very_
| specific scenarios.
| R0b0t1 wrote:
| The speed you want them to run is a factor also. The rule of
| thumb hasn't applied for a while, he's right in noting that
| in the post.
| Wowfunhappy wrote:
| I thought it does generally apply for dedup, though,
| because ZFS is then required to keep the dedup tables in
| memory?
| lazide wrote:
| I've tried dedup out, and even with a large powerful box with a
| LOT of duplicate files (multi-TB repositories of media files
| which get duplicated several times due to coarse snapshotting
| from other less fancy systems), I get near zero deduplication.
| I think it was literally low single digits percents.
|
| ZFS dedup is block based, and actual block size varies
| depending on data feed rate for most workloads (zfs queues up
| async writes and merges them), so in practice once a file gets
| some non-zero block offset somewhere which happens all the
| time, even identical files don't dedup.
| Wowfunhappy wrote:
| Wow, that's worse than I realized! Honestly, this makes me
| wonder whether the feature should even exist in ZFS. Given
| the enormous hardware requirements and minimal savings...
| well, I'd be curious to hear if anyone has ever found a real
| use case.
| FullyFunctional wrote:
| Does anyone actually use dedup? I think even the OpenZFS
| documentation says compression is more useful in practice.
| If at all, dedup should be an offline feature, to be run as
| scheduled by the operator.
|
| My setup tries to get the absolute highest bandwidth and
| uses NVMe sticks in a stripe (I get my redundancy
| elsewhere), no compression, no dedup and yet can only hit ~
| 3.5 GB/s reads (TrueNAS Core, EPYC 7443P, Samsung 980PRO,
| 256 GiB). I hope TrueNAS SCALE will perform better.
| watersb wrote:
| My first ever large (> 4TB) ZFS pool is still stuck with
| dedup. It's a backup server, gets about 2x with
| deduplication.
|
| At the time, it was the difference between slow and
| impossible: I couldn't afford another 2x of disks.
|
| These days, the pool could fit on a portable SSD that
| would fit in my pocket.
|
| Careful, file-based dedup on top of ZFS might be more
| effective.
|
| Small changes to single, large files see some advantage
| with block based deduplication. You see this in
| collections disk images for virtual machines.
|
| You might see that in database applications, depending on
| log structure. I don't know, I don't have that
| experience.
|
| For most of us, file-based deduplication might work out
| better, and is almost certainly easier to understand. You
| can come with a mental model of what you're working with,
| dealing with successive collections of files.
|
| Even though files are just another abstraction over
| blocks, it's an abstraction that leaks less without the
| deduplication.
|
| I haven't used a combination of encryption and
| deduplication. That was Really Hard for ZFS to implement,
| and I'm not sure how meaningful such a combination is in
| practice.
| bombcar wrote:
| It would be nice if ZFS was able to combine dedup and
| compression - basically be able to notice that a
| block/file/datastream was similar/identical to another
| one, and do compression along with a pointer ...
| lazide wrote:
| practically speaking the tradeoffs to make that work are
| unlikely to make you or anyone else happy except in some
| VERY specific workloads.
| willis936 wrote:
| ZFS can have both features enabled at once.
|
| Though there is no clean way to disable either.
| Compression can be removed from files by rewriting them,
| but removing deduplication requires copying over all data
| to a fresh pool.
| mlok wrote:
| ZFS Dedup has been wonderful for me : dedupratio = 7.05x (144
| GB stored on a 25 GB volume, and still 1.3 GB left free). I
| use it for backups of versions of the same folders and files
| slowly evolving over a long period of time ( > 15 years) that
| gives a lot of duplication, of course. (I could also use
| compression on top of it)
| magicalhippo wrote:
| While regular dedup is only a win for _highly_ specific
| workload, the file-based deduplication[1][2] which is in the
| works seems like it can have some potential.
|
| They discussed it, along with some options for a background-
| scanning dedup service (trying to find potential files to
| dedup), in the February leadership meeting[3].
|
| [1]: https://openzfs.org/wiki/OpenZFS_Developer_Summit_2020_t
| alks...
|
| [2]: https://youtu.be/hYBgoaQC-vo
|
| [3]: https://www.youtube.com/watch?v=hij7PGGjevc
| spullara wrote:
| Wow, that is a very naive dedup algorithm.
| lazide wrote:
| Without restricting pretty heavily how you can interact
| with files or causing severe bottlenecks it's the best that
| can be done probably, since the FS API doesn't provide any
| real guarantees about what data WILL be written later, or
| how much of it, etc. So it has to figure things out as it
| goes with minimal performance impact.
___________________________________________________________________
(page generated 2022-03-13 23:00 UTC)