[HN Gopher] 6 Raspberry Pis, 6 SSDs on a Mini ITX Motherboard
___________________________________________________________________
6 Raspberry Pis, 6 SSDs on a Mini ITX Motherboard
Author : ingve
Score : 311 points
Date : 2022-08-17 14:45 UTC (8 hours ago)
(HTM) web link (www.jeffgeerling.com)
(TXT) w3m dump (www.jeffgeerling.com)
| alberth wrote:
| Imagine instead of 6 Pi's this was 6 M2 arm chips on a mini-ITX
| board.
| dis-sys wrote:
| That would cost 6 arms and 6 legs.
| alberth wrote:
| As a thought experiment, so the:
|
| - 8GB Pi cost $75 (Geekbench score of 318 single / 808 multi-
| core)
|
| - 8GB M1 Mac mini cost $699 (Geek bench score of 1752 single
| / 7718 multi-core)
|
| This isn't too far off linearly price scaling.
|
| The M1 Mac mini costs 9.3x more, but get 5.5x faster single-
| core & 9.5x faster multi-core.
|
| https://browser.geekbench.com/v5/cpu/15902536
|
| https://browser.geekbench.com/v5/cpu/15912650
| rbanffy wrote:
| > The M1 Mac mini costs 9.3x more, but get 5.5x faster
| single-core & 9.5x faster multi-core.
|
| It's not always that we get a Mac that has a better
| price/performance than any other computer ;-)
|
| Their M-series is quite remarkable.
| kllrnohj wrote:
| You can get things like Epyc mini-ITX boards (
| https://www.anandtech.com/show/16277/asrock-rack-deep-mini-i...
| ) if you're just looking to ramp up the compute in a tiny
| space. Divide it up into 6 VMs if you still want it to be a
| cluster even.
| rbanffy wrote:
| I find these boards - that get little boards and network them
| into a cluster - very interesting. I'd like to see more of
| these in the future. I hope someone makes a Framework-
| compatible motherboard with these at some point.
|
| The Intel Edison module could have been a viable building block
| for one (and it happened a long time ago in computing terms -
| 2014) - it was self-contained, with RAM and storage on the
| module - but it lacked ethernet to connect multiple ones on a
| board - and I don't remember it having a fast external bus to
| build a network on. And it was quickly discontinued.
| AreYouSirius wrote:
| tenebrisalietum wrote:
| > Many people will say "just buy one PC and run VMs on it!", but
| to that, I say "phooey."
|
| I mean with VM-leaking things like Spectre (not sure how much
| similar things affect ARM tbh) having physical barriers between
| your CPUs can be seen as a positive thing.
| mrweasel wrote:
| Sure, it's just that the Raspberry Pi isn't really fast enough
| for most production workloads. Having a cluster of them doesn't
| really help, you'd still be better off with a single PC.
|
| As a learning tool, having the ability to build a real hardware
| cluster, in a MiniITX case is awesome. I do sort of wonder what
| the business case for these boards are, I mean are there
| actually enough people who want to do something like this...
| schools maybe? I still think it's beyond weird that that there
| are so much hardware available for build Pi clusters, but I
| can't get an ARM desktop motherboard, with a PCI slot capable
| of actually being used as a desktop, for a reasonable prices.
| schaefer wrote:
| The Nvidia Jetson AGX Orin Dev Kit is getting pretty damn
| close to a performant Linux/ARM desktop system.
| GB5 Scores (Single: 763/ Multi: 7193) That's roughly
| 80% the performance of my current x86 desktop.
| Ubuntu 20.04 LTS 12 Core Cortex A78AE v8.2 64 bit. 2.20
| Ghz 32 GB LDPDDR5 memory, 256 bit, 204.8 GB/s
| NVIDIA graphics/AI acceleration. PCI-E Slot 64 Mb
| eMMC 5.1 M.2 PCIe Gen 4 Display Port 1.4a +MST
|
| The deal breaker, if there is one is price: $2k
| geerlingguy wrote:
| I think a lot of these types of boards are built with the
| business case of either "edge/IoT" (which still for some
| reason causes people to toss money at them since they're hot
| buzzwords... just need 5G too for the trifecta), or for
| deploying many ARM cores/discrete ARM64 computers in a
| space/energy-efficient manner. Some places need little ARM
| build farms, and that's where I've seen the most non-hobbyist
| interest in the CM4 blade, Turing Pi 2, and this board.
| ReadTheLicense wrote:
| The future of cloud is Zero Isolation... With all the
| mitigation slowing it down, and the current energy prices and
| rising, having super-small nodes that are always reserved to
| one task seems interesting.
| sitkack wrote:
| Anyone else having problems with the shipping page? Says it
| cannot ship to my address, formatted incorrectly ...
| erulabs wrote:
| Man, Ceph really doesn't get enough love. For all the distributed
| systems hype out there - be it Kubernetes or blockchains or
| serverless - the ol' rock solid distributed storage systems sat
| in the background iterating like crazy.
|
| We had a _huge_ Rook /Ceph installation in the early days of our
| startup before we killed off the product that used it (sadly). It
| did explode under some rare unusual cases, but I sometimes miss
| it! For folks who aren't aware, a rough TLDR is that Ceph is to
| ZFS/LVM what Kubernetes is to containers.
|
| This seems like a very cool board for a Ceph lab - although -
| extremely expensive - and I say that as someone who sells very
| expensive Raspberry Pi based computers!
| halbritt wrote:
| I love it, but when it fails at scale, it can be hard to reason
| about. Or at least that was the case when I was using it a few
| years back. Still keen to try it again and see what's changed.
| I haven't run it since bluestore was released.
| teraflop wrote:
| Yeah, I've been running a small Ceph cluster at home, and my
| only real issue with it is the relative scarcity of good
| _conceptual_ documentation.
|
| I personally learned about Ceph from a coworker and fellow
| distributed systems geek who's a big fan of the design. So I
| kind of absorbed a lot of the concepts before I ever actually
| started using it. There have been quite a few times where I
| look at a command or config parameter, and think, "oh, I know
| what that's _probably_ doing under the hood "... but when I
| try to actually check that assumption, the documentation is
| missing, or sparse, or outdated, or I have to "read between
| the lines" of a bunch of different pages to understand what's
| really happening.
| geerlingguy wrote:
| I think many people (myself included) had been burned by major
| disasters on earlier clustered storage solutions (like early
| Gluster installations). Ceph seems to have been under the radar
| for a bit of time when it got to a more stable/usable point,
| and came more in the limelight once people started deploying
| Kubernetes (and Rook, and more integrated/wholistic clustered
| storage solutions).
|
| So I think a big part of Ceph's success (at least IMO) was its
| timing, and it's adoption into a more cloud-first ecosystem.
| That narrowed the use cases down from what the earliest
| networked storage software were trying to solve.
| mcronce wrote:
| Ceph is fantastic. I use it as the storage layer in my homelab.
| I've done some things that I can only concisely describe as
| _super fucked up_ to this Ceph cluster, and every single time I
| 've come out the other side with zero data loss, not having to
| restore a backup.
| erulabs wrote:
| Haha "super fucked up" is a much better way of describing the
| "usual, rare" situations I was putting it into as well :P
| dylan604 wrote:
| Care to provide examples of what these things were that you
| were doing to a storage pool? I guess I'm just not
| imaginative enough to think about ways of using a storage
| pool other than storing data in it.
| erulabs wrote:
| In our case we were a free-to-use-without-any-signup way
| of testing Kubernetes. You could just go to the site and
| spin up pods. Looking back, it was a bit insane.
|
| Anyways, you can imagine we had all sorts of attacks and
| miners or other abusive software running. This on top of
| using ephemeral nodes for our free service meant hosts
| were always coming and going and ceph was always busy
| migrating data around. The wrong combo of nodes dying and
| bursting traffic and beta versions of Rook meant we ran
| into a huge number of edge cases. We did some
| optimization and re-design, but it turned out there just
| weren't enough folks interested in paying for multi-
| tenant Kubernetes. We did learn an absolute ton about
| multi-tenant K8s, so, if anyone is running into those
| challenges, feel free to hire us :P
| Lex-2008 wrote:
| not OP, but I would start with filling disk space up to
| 100%, or creating zillions of empty files. In case of
| distributed filesystems - maybe removing one node (under
| heavy load preferably), or "cloning" nodes so they had
| same UUIDs (preferably nodes storing some data on them -
| to see if the data will be de-duplicated somehow).
|
| Or just a disk with unreliable USB connection?
| 3np wrote:
| We're more and more feeling we made the wrong call with
| gluster... The underlying bricks being a POSIX fs felt a lot
| safer at the time but in hindsight ceph or one of the newer
| ones would probably have been a better choice. So much
| inexplicable behavior. For your sake I hope the grass really is
| greener.
| rwmj wrote:
| Red Hat (owner of Gluster) has announced EOL in 2024:
| https://access.redhat.com/support/policy/updates/rhs/
|
| Ceph is where the action is now.
| imiric wrote:
| Can someone with experience with Ceph and MinIO or SeaweedFS
| comment on how they compare?
|
| I currently run a single-node SnapRAID setup, but would like to
| expand to a distributed one, and would ideally prefer something
| simple (which is why I chose SnapRAID over ZFS). Ceph feels to
| enterprisey and complex for my needs, but at the same time, I
| wouldn't want to entrust my data to a simpler project that can
| have major issues I only discover years down the road.
|
| SeaweedFS has an interesting comparison[1], but I'm not sure
| how biased it is.
|
| [1]: https://github.com/seaweedfs/seaweedfs#compared-to-ceph
| nik736 wrote:
| Ceph seems to be always related to big block storage outages.
| This is why I am very wary of using it. Has this changed? Edit:
| rephrased a bit
| antongribok wrote:
| Ceph is incredibly stable and resilient.
|
| I've run Ceph at two Fortune 50 companies since 2013 to now,
| and I've not lost a single production object. We've had
| outages, yes, but not because of Ceph, it was always
| something else causing cascading issues.
|
| Today I have a few dozen clusters with over 250 PB total
| storage, some on hardware with spinning rust that's over 5
| years old, and I sleep very well at night. I've been doing
| storage for a long time, and no other system, open source or
| enterprise, has given me such a feeling of security in
| knowing my data is safe.
|
| Any time I read about a big Ceph outage, it's always a bunch
| of things that should have never been allowed in production,
| compounded by non-existent monitoring, and poor understanding
| of how Ceph works.
| shrubble wrote:
| Can you talk about the method that Ceph has for determining
| whether there was bit rot in a system?
|
| My understanding is that you have to run a separate
| task/process that has Ceph go through its file structures
| and check it against some checksums. Is it a separate step
| for you, do you run it at night, etc.?
| lathiat wrote:
| That's called ceph scrub & deep-scrub.
|
| By default it "scrubs" basic metadata daily and does a
| deep scrub where it fully reads the object and confirms
| the checksum is correct from all 3 replicas weekly for
| all of the data in the cluster.
|
| It's automatic and enabled by default.
| shrubble wrote:
| So what amount of disk bandwidth/usage is involved?
|
| For instance, say that I have 30TB of disk space used and
| it is across 3 replicas , thus 3 systems.
|
| When I kick off the deep scrub operation, what amiunt of
| reads will happen on each system? Just the smaller amount
| of metadata or the actual full size of the files
| themselves?
| teraflop wrote:
| In Ceph, objects are organized into placement groups
| (PGs), and a scrub is performed on one PG at a time,
| operating on all replicas of that PG.
|
| For a normal scrub, only the metadata (essentially, the
| list of stored objects) is compared, so the amount of
| data read is very small. For a deep scrub, each replica
| reads and verifies the contents of all its data, and
| compares the hashes with its peers. So a deep scrub of
| all PGs ends up reading the entire contents of every
| disk. (Depending on what you mean by "disk space used",
| that could be 30TB, or 30TBx3.)
|
| The deep scrub frequency is configurable, so e.g. if each
| disk is fast enough to sequentially read its entire
| contents in 24 hours, and you choose to deep-scrub every
| 30 days, you're devoting 1/30th of your total IOPS to
| scrubbing.
|
| Note that "3 replicas" is not necessarily the same as "3
| systems". The normal way to use Ceph is that if you set a
| replication factor of 3, each PG has 3 replicas that are
| _chosen_ from your pool of disks /servers; a cluster with
| N replicas and N servers is just a special case of this
| (with more limited fault-tolerance). In a typical
| cluster, any given scrub operation only touches a small
| fraction of the disks at a time.
| teraflop wrote:
| Just to add to the other comment: Ceph checksums data and
| metadata on every read/write operation. So even if you
| completely disable scrubbing, if data on a disk becomes
| corrupted, the OSD will detect it and the client will
| transparently fail over to another replica, rather than
| seeing bad data or an I/O error.
|
| Scrubbing is only necessary to _proactively_ detect bad
| sectors or silent corruption on infrequently-accessed
| data, so that you can replace the drive early without
| losing redundancy.
| aftbit wrote:
| Imagine being able to buy 6 Raspberry Pis! I have so many
| projects I'd like to do, both personal and semi-commercial, but
| it's been literal years since I've seen a Raspberry Pi 4
| available in stock somewhere in the USA, let alone 6.
| hackeraccount wrote:
| Crap. When did that happen? I swear I bought like two or three
| not that long ago and they were like 40 or 50 apiece.
| tssva wrote:
| Micro Center often has them in stock. My local Micro Center
| currently has 17 RPi 4 4GB in stock. They are available only in
| store, but you can check stock at their website. Find one you
| know someone close to that is willing to purchase for you and
| ship.
| alexk307 wrote:
| Cool but I still can't find a single raspberry pi compute module
| despite having been in the market for 2 years...
| simcop2387 wrote:
| https://rpilocator.com/ is probably the best place to keep an
| eye out for them. This is unfortunately also the case for non-
| CM rpis. Been wanting to get some more pi4s to replace some
| rather old pi3 (non+) that i've got running just because i want
| the uefi boot on everything since it makes managing things that
| much easier.
| mongol wrote:
| Yes when will this dry spell end?
| geerlingguy wrote:
| So far it seems like "maybe 2023"... this year supplies have
| been _slightly_ better, but not amazing. Usually a couple CM4
| models pop up over the course of a week on rpilocator.com.
| onlyrealcuzzo wrote:
| > I was able to get 70-80 MB/sec write speeds on the cluster, and
| 110 MB/sec read speeds. Not too bad, considering the entire
| thing's running over a 1 Gbps network. You can't really increase
| throughput due to the Pi's IO limitations--maybe in the next
| generation of Pi, we can get faster networking!
|
| This isn't clear to me. What am I missing?
| gorkish wrote:
| The device has an onboard 8 port unmanaged gigabit switch. The
| two external ports are just switch ports and cannot be
| aggregated in any way. The entire cluster is therefore limited
| effectively to 1gbps throughput.
|
| IMO it ruins the product utterly and completely. They should
| have integrated a switch IC similar to what's used in the
| netgear gs110mx which has 8 gigabit and 2 multi-gig interfaces.
| geerlingguy wrote:
| It would be really cool if they could split out 2.5G
| networking to all the Pis, but with the current generation of
| Pi it only has one PCIe lane, so you'd have to add in a PCIe
| switch for each Pi if you still wanted those M.2 NVMe
| slots... that adds a lot of cost and complexity.
|
| Failing that, a 2.5G external port would be the simplest way
| to make this thing more viable as a storage cluster board,
| but that would drive up the switch chip cost a bit (cutting
| into margins). So the last thing would be allowing management
| of the chip (I believe this Realtek chip does allow it, but
| it's not exposed anywhere), so you could do link
| aggregation... but that's not possible here either. So yeah,
| the 1 Gbps is a bummer. Still fun for experimentation, and
| very niche production use cases, but less useful generally.
| lathiat wrote:
| 110MB/s is gigabit. It's limited to gigabit networking and only
| has 1Gbps out from the cluster board. So there's no way to do
| an aggregate speed of more than 1Gbps/110MBs on this particular
| cluster board.
|
| If each pis Ethernet was broken out individually and you used a
| 10G uplink switch or multiple 1G client ports then you could do
| better.
|
| The write speed being lower than the read speed will be because
| writes have to be replicated to two other nodes in the ceph
| cluster (everything has 3 replicas) which are also sharing
| bandwidth on those same 1G links. Reads don't need to replicate
| so can consume the full bandwidth.
|
| So basically it's all network limited for this use case. Needs
| a 2.5G uplink, LACP link aggregation or individual Ethernet
| ports to do better.
| sitkack wrote:
| Which ICs would you use for this? Do you have something in
| mind?
| Retr0id wrote:
| I'm not sure about specific ICs, but if you took a look
| inside a Netgear GS110MX you'd have a good candidate IC.
| magicalhippo wrote:
| Just a random search on Mouser, but something like the
| BCM53134O[1] as four 1GbE ports, and one 2.5GbE port. A bit
| pricier you have the BCM53156XU[2] with eight 1GbE ports
| and a 10G port for fiber.
|
| Not my field so could be other, better parts.
|
| [1]: https://eu.mouser.com/ProductDetail/Broadcom-
| Avago/BCM53134O...
|
| [2]: https://eu.mouser.com/ProductDetail/Broadcom-
| Avago/BCM53156X...
| CameronNemo wrote:
| 110 Megabits == 880 Megabits, which is approaching the top
| speed of the network interface, which is the main bottle neck.
| A board with more IO, like the rk3568 which has 2x PCIe 2 lanes
| and 2x PCIe 3 lanes, or a hypothetical rpi5, can deliver more
| throughput.
| naranha wrote:
| Do you think Raspberries without ECC RAM are fine to use for a
| Ceph storage cluster? I did some research yesterday on the same
| topic, many say ECC Ram is essential for Ceph (and ZFS too). But
| I'm not sure what to believe, sure data could get corrupted in
| RAM before being written to the cluster, but how likely is that?
| dpedu wrote:
| ECC is not necessary for ZFS - this is a commonly repeated
| myth.
|
| https://news.ycombinator.com/item?id=14447297
| dijit wrote:
| raspberry pi's have ECC memory; it's just on-die ECC.
|
| (this was a surprise to me too)
| EvanAnderson wrote:
| It sounds like the ECC counters are completely hidden from
| the SoC, though:
| https://forums.raspberrypi.com/viewtopic.php?t=315415
| naranha wrote:
| Sounds like what DDR5 is doing too, errors are corrected
| automatically, but not necessarily communicated (?)/
| amarshall wrote:
| DDR5 on-die ECC is _not_ the same as traditional ECC. To
| that point, there are DDR5 modules with full ECC. On-die
| DDR5 ECC is there because it _needs_ to be for the
| modules to really work at all.
| jonhohle wrote:
| This looks incredible. Is it possible to expose a full PCI
| interface from an NVMe slot? I have an old SAS controller that I
| want to keep running. If I could do that from a PI, that would be
| incredible.
| geerlingguy wrote:
| If you want to use SAS with the Pi, I've only gotten newer
| generation Broadcom/LSI cards working so far--see my notes for
| the storage controllers here:
| https://pipci.jeffgeerling.com/#sata-cards-and-storage
| jonhohle wrote:
| Incredible resource, thanks! I'm currently using an older
| MegaRAID card, but could upgrade if I can find a reasonable
| configuration to migrate.
| formerly_proven wrote:
| > newer
|
| Which is probably for the best - I don't know how these newer
| cards behave, but a commonality of all the older RAID/HBA
| cards seems to be "no power management allowed". Maybe they
| improved that area, because it's pretty unreasonable for an
| idle RAID card to burn double digit Watts if you ask me...
| geerlingguy wrote:
| The 9405W cards I most recently tested seem to consume
| about 7W steady state (which is more than the Pi that was
| driving it!), so yeah... they're still not quite as
| efficient as running a smaller SATA card if you just need a
| few drives. But if you want SAS or Tri-mode (NVMe/SAS/SATA)
| and have an HBA or RAID card, this is a decent enough way
| to do it!
| mcronce wrote:
| You can get M.2 to PCI-E add-in-card adapters, yeah - as long
| as it's an M.2 slot that supports NVMe, not SATA only
| crest wrote:
| I don't see how they could have hooked a 2.5Gb/s Ethernet NIC
| to the CM4 modules without using up the single PCI-e 2.0 lane
| other than adding a power hungry, expensive and often out of
| stock PCI-e switching chip.
| wtallis wrote:
| There's no such thing as an NVMe slot, just M.2 slots that have
| PCIe lanes. NVMe is a protocol that runs on top of PCIe, and is
| something that host systems support at the software level, not
| in hardware. (Firmware support for NVMe is necessary to boot
| off NVMe SSDs, but the Pi doesn't have that and must boot off
| eMMC or SD cards.)
| gorkish wrote:
| Would buy this in an instant if it weren't hobbled as hell by the
| onboard realtek switch. If it had an upstream 2.5/5/10g port it
| would be instantly 6 times more capable.
| antongribok wrote:
| Aside from running Ceph as my day job, I have a 9-node Ceph
| cluster on Rasberry Pi 4s at home that I've been running for a
| year now, and I'm slowly starting to move things away from ZFS to
| this cluster as my main storage.
|
| My setup is individual nodes, with 2.5" external HDDs (mostly
| SMR), so I actually get sligtly better performance than this
| cluster, and I'm using 4+2 erasure coding for the main data pool
| for CephFS.
|
| CephFS has so far been incredibly stable and all my Linux laptops
| reconnect to it after sleep with no issues (in this regard it's
| better than NFS).
|
| I like this setup a lot better now than ZFS, and I'm slowly
| starting to migrate away from ZFS, and now I'm even thinking of
| setting up a second Ceph cluster. The best thing with Ceph is
| that I can do a maintenance on a node at any time and storage
| availability is never affected, with ZFS I've always dreaded any
| kind of upgrade, and any reboot requires an outage. Plus with
| Ceph I can add just one disk at a time to the cluster and disks
| don't have to be the same size. Also, I can move the physical
| nodes individually to a different part of my home, change
| switches and network cabling without an outage now. It's a nice
| feeling.
| bityard wrote:
| Is 9 the minimum number of nodes you need for a reasonable ceph
| setup or is that just what you arrived at for your use case?
| geerlingguy wrote:
| I've seen setups with as few as 2 nodes with osds and a
| monitor (so 3 in total), but I believe 4-5 nodes is the
| minimum recommendation.
| antongribok wrote:
| I would say the minimum is whatever your biggest replication
| or erasure coding config is, plus 1. So, with just replicated
| setups, that's 4 nodes, and with EC 4+2, that's 7 nodes. With
| EC 8+3, which is pretty common for object storage workloads,
| that's 12 nodes.
|
| Note, a "node" or a failure domain, can be configured as a
| disk, an actual node (default), a TOR switch, a rack, a row,
| or even a datacenter. Ceph will spread the replicas across
| those failure domains for you.
|
| At work, our bigger clusters can withstand a rack going down.
| Also, the more nodes you have, the less of an impact it is on
| the cluster when a node goes down, and the faster the
| recovery.
|
| I started with 3 RPis then quickly expanded to 6, and the
| only reason I have 9 nodes now is because that's all I could
| find.
| mbreese wrote:
| Can I ask an off topic/in-no way RPi related question?
|
| For larger ceph clusters, how many disks/SSD/nvme are
| usually attached to a single node?
|
| We are in the middle of transitioning from a handful of big
| (3x60 disk, 1.5PB total) JBOD Gluster/ZFS arrays and I'm
| trying to figure out how to migrate to a ceph cluster of
| equivalent size. It's hard to figure out exactly what the
| right size/configuration should be. And I've been using ZFS
| for so long (10+ years) that thinking of _not_ having
| healing zpools is a bit scary.
| antongribok wrote:
| For production, we have two basic builds, one for block
| storage, which is all-flash, and one for object storage
| which is spinning disks plus small NVMe for
| metadata/Bluestore DB/WAL.
|
| The best way to run Ceph is to build as small a server as
| you can get away with economically and scale that
| horizontally to 10s or 100s of servers, instead of trying
| to build a few very large vertical boxes. I have run Ceph
| on some 4U 72-drive SuperMicro boxes, but it was not fun
| trying to manage hundreds of thousands of threads on a
| single Linux server (not to mention NUMA issues with
| multiple sockets). An ideal server would be one node to
| one disk, but that's usually not very economical.
|
| If you don't have access to custom ODM-type gear or
| open-19 and other such exotics, what's been working for
| me have been regular single socket 1U servers, both for
| block and for object.
|
| For block, this is a very normal 1U box with 10x SFF SAS
| or NVMe drives, single CPU, a dual 25Gb NIC.
|
| For spinning disk, again a 1U box, but with a deeper
| chassis you can fit 12x LFF and still have room for a
| PCI-based NVMe card, plus a dual 25Gb NIC. You can get
| these from SuperMicro, Quanta, HP.
|
| Your 3x60 disk setup sounds like it might fit in 12U
| (assuming 3x 4U servers). With our 1U servers I believe
| that can be done with 15x 1U servers (1.5 PiB usable
| would need roughly 180x 16TB disks with EC 8+3, you'll
| need more with 3x replication).
|
| Of course, if you're trying to find absolute minimum
| requirements that you can get away with, we'd have to
| know a lot more details about your workload and existing
| environment.
|
| EDITING to add:
|
| Our current production disk sizes are either 7.68 or
| 15.36 TB for SAS/NVMe SSDs at 1 DWPD or less, and 8 TB
| for spinning disk. I want to move to 16 TB drives, but
| haven't done so for various tech and non-tech reasons.
| lathiat wrote:
| For the standard 3x replicated setup, 3 nodes is the minimum
| for any kind of practical redundancy but you really want 4 so
| that after failure of 1 node all the data can be recovered
| onto the other 3 and still have failure resiliency.
|
| For erasure coded setups which is not really suited to block
| storage but mainly object storage via radosgw(s3) or cephfs
| you need minimum k+m and realistically k+m+1. That would
| translate to 6 minimum but realistically 7 nodes for k=4,m=2.
| That's 4 data chunks and 2 redundant chunks which means you
| use 1.5x the storage of the raw data (half that of a
| replicated setup). You can do k=2,m=1 also. So 4 nodes into
| that case.
| [deleted]
| kllrnohj wrote:
| I was running glusterfs on an array of ODROID-HC2s (
| https://www.hardkernel.com/shop/odroid-hc2-home-cloud-two/ )
| and it was fun, but I've since migrated back to just a single
| big honking box (specifically a threadripper 1920x running
| unraid). Monitoring & maintaining an array of systems was its
| own IT job that kinda didn't seem worth dealing with.
| trhway wrote:
| Looking at that ODROID-HC2 i wonder when the drive
| manufacturers would just integrate such a general computer
| board onto the drive itself.
| Infernal wrote:
| I want to preface this - I don't have strong opinion already
| here, and I'm curious about Ceph. As someone who runs a 6 drive
| raidz2 at home (w/ ECC RAM) does your Ceph config give you
| similar data integrity guarantees to ZFS? If so, what are the
| key points of the config that enable that?
| antongribok wrote:
| When Ceph migrated from Filestore to Bluestore, that enabled
| data scrubbing and checksumming for data (older versions
| before Bluestore were only verifying metadata).
|
| Ceph (by default) does metadata scrubs every 24 hours, and
| data scrubs (deep-scrub) weekly (configurable, and you can
| manually scrub individual PGs at any time if that's your
| thing). I believe the default checksum used is "crc32c", and
| it's configurable, but I've not played with changing it. At
| work we get scrub errors on average maybe weekly now, at home
| I've not had a scrub error yet on this cluster in the past
| year (I did have a drive that failed and still needs to be
| replaced).
|
| My RPi setup certainly does not have ECC RAM as far as I'm
| aware, but neither does my current ZFS setup (also a 6 drive
| RAIDZ2).
|
| Nothing stopping you from running Ceph on boxes with ECC RAM,
| we certainly do that at my job.
| magicalhippo wrote:
| If you take say old i7 4770k's, how many of those along with
| how many disks would you need to get 1GB/s sustained sequential
| access with Ceph?
|
| My single ZFS box does that with ease, 3x mirrored vdevs = 6
| disks total, but I'm curious as the flexibility of Ceph sounds
| tempting.
| antongribok wrote:
| I just setup a test cluster at work to test this for you:
|
| 4 nodes, each node with 2x SAS SSDs, dual 25Gb NICs (one for
| front-end, one for back-end replication). The test pool is 3x
| replicated with Snappy compression enabled.
|
| On a separate client (also with 25Gb) I mounded an RBD image
| with krbd and ran FIO: fio
| --filename=/dev/rbd1 --direct=1 --sync=1 --rw=write
| --bs=4096K --numjobs=1 --iodepth=16 --ramp_time=5
| --runtime=60 --ioengine=libaio --time_based --group_reporting
| --name=krbd-test --eta-newline=5s
|
| I get a consistent 1.4 GiB/s: write:
| IOPS=357, BW=1431MiB/s (1501MB/s)(83.9GiB/60036msec)
| underwater247 wrote:
| I would love to hear more about your Ceph setup. Specifically
| how you are connecting your drives and how many drives per
| node? I imagine with the Pis limited USB bus bandwidth, your
| cluster performs as more of an archive data store compared to
| realtime read/write like the backing block storage of VMs. I
| have been wanting to build a Ceph test cluster and it sounds
| like this type of setup might do the trick.
| aloer wrote:
| Considering that this is custom made for the CM4 form factor, the
| Turing Pi with carrier boards looks much more attractive because
| future proof. If only it were already available.
|
| It also has SATA and USB 3.0 which is nice
|
| Until I can preorder one I will slowly stock up on CM4s and hope
| I'll get there before pi5 comes out.
|
| Are there other boards like this?
| mhd wrote:
| Can I run a Beowulf cluster on this?
| oblak wrote:
| Credit where it's due - this is some 18 watt awesomeness at idle.
| Is it more "practical" than doing a Mini-ITX (or smaller, like
| one of those super small tomtom with up to 5900HX) build and
| equipping it with one one or more NVME expansion cards? Probably
| not. But it's cool.
|
| Now, if there were a new Pi to buy. Isn't it time for the 5? It's
| been 3 years for most of which they've been hard to fine. Mine
| broke and I really miss it because having a full blown desktop
| doing little things makes no sense, especially during the summer.
| formerly_proven wrote:
| 18 W idle is kinda horrible if you just want a small server
| (granted, this isn't one server, but instead six set-top boxes
| in one). That's recycled entry-level rack server range, which
| come with ILOM/BMC. Most old-ish fat clients can do <10 W, some
| <5 W, no problem. If you want a desktop that consumes little
| power when idle or not loaded a lot, just get basically any
| Intel system with an IGP since 4th gen (Haswell). Avoid Ryzen
| CPU with dGPU if that's your goal; those are gas guzzlers.
| oblak wrote:
| 1. I would bet at least half of all that wattage is the SSDs.
|
| 2. Buddy, you're spewing BS at someone who used to run a
| Haswell in a really small Mini-ITX case. It was a fine HTPC
| back in 2014. But now everything, bar my dead Pi, is some
| kind of Ryzen. All desktops and laptops. The various
| 4800u/5800u/6800u and lower parts offer tremendous
| performance at 15W nominal power levels. The 5800H I am
| writing this message on is hardly a guzzler, especially when
| compared to Intel's core11/12 parts.
|
| This random drive-by intel shilling really took me by
| surprise.
| formerly_proven wrote:
| > 1. I would bet at least half of all that wattage is the
| SSDs.
|
| SSDs are really good at consuming nearly nothing when not
| servicing I/O.
|
| > 2. Buddy, you're spewing BS ... various 4800u/5800u/6800u
| ... 5800H ...
|
| None of those SKUs are a "Ryzen CPU with dGPU".
| oblak wrote:
| SSD can get to really low power levels, depending on
| state. Does not mean they were all in power sipping 10mW
| mode during measurement.
|
| > just get basically any Intel system with an IGP since
| 4th gen (Haswell). Avoid Ryzen CPU with dGPU
|
| Was your advice to me. So I took it at face value and
| compared them, like you suggested, to the relevant
| models.
| technofiend wrote:
| I will once again lament the fact that WD Labs built SBCs that
| sat with their hard drives to make them individual CEPH nodes but
| never took the hardware to production. It seems to me there's
| still a market for SBCs that could serve a CEPH OSD on a per-
| device basis, although with ever increasing density in the
| storage and hyperconverged space that's probably more of a small
| business or prosumer solution.
| hinkley wrote:
| I think there's something to be said for sizing a raspberry pi or
| a clone to fit into a hard drive slot.
|
| I also think the TuringPi people screwed up with the model 2.
| Model 2 of a product should not have fewer slots than the
| predecessor, and in the case of the Turing Pi, orchestrating 4
| devices is not particularly compelling. It's not that difficult
| to wire 4 Pi's together by hand. I had 6 clones wired together
| using risers and powered by my old Anker charging station and an
| 8 port switch, with a few magnets to hold the whole thing
| together.
| rabuse wrote:
| If only Raspberry Pi's weren't damn near $200 now...
| greggyb wrote:
| Unless you are constrained in space to a single ITX case as in
| this example, you can get whole x86 machines for <$100 with RAM
| and storage included.
|
| There is a lot of choice in the <$150 range. You could get
| eight of these and a cheap 10-port switch for any kind of
| clustering lab you want to set up.
|
| Here is an example:
| https://www.aliexpress.com/item/3256804328705784.html?spm=a2...
| dpedu wrote:
| Same cpu, half the ram, quarter the price if you don't mind
| going used: https://www.ebay.com/itm/154960426458
|
| These are thin clients but flip an option in the bios and
| it's a regular pc.
| greggyb wrote:
| Yes. I just figured I would compare new for new. I love
| eBay for electronics shopping.
| criddell wrote:
| Would one of the boards from Pine work for this application?
|
| https://pine64.com/product/pine-a64-lts/
| RL_Quine wrote:
| No, those are nasty slow.
| Asdrubalini wrote:
| What is the power consumption tho? It easily adds up over
| time.
| greggyb wrote:
| The linked machine uses a 2W processor.
|
| The successor product on the company's site uses a 12 volt,
| 2 amp power adapter: https://www.bee-
| link.net/products/t4-pro
|
| Here is a YouTube review of the linked model with input
| power listed at 12 volt, 1.5 amp (link to timestamp of
| bottom of unit): https://youtu.be/56UA2Uto1ns?t=129
| belval wrote:
| A low-end x86 CPU will perform better than the RasPis. My
| current NAS is an Intel G4560 with 40GB of RAM and 4 HDD
| and it barely does over ~40W on average. The article's
| cluster does 18W which is better, but even over a year
| that's only a 192kWh difference (assuming that is runs all
| the time) which would amount to about 40$ at $0.20/kWh.
|
| It's not really worth comparing further as the
| configuration are significantly different, but if your goal
| is doing 110MB/s R/W, even when accounting for power
| consumption the product in the article is much more
| expensive.
| SparkyMcUnicorn wrote:
| The HP 290 idles around 10W.
|
| Picked one up off craigslist for ~$50 and use it as a
| plex transcoder since it has QuickSync and can
| simultaneously transcode around 20 streams of 1080p
| content.
| pbhjpbhj wrote:
| I don't know much about NAS and thought they were just a
| bundle of drives with some [media] access related apps on
| a longer cable ... 40G RAM? What's that for, is it normal
| for a NAS to be so loaded? I was looking at NAS and
| people were talking about 1G as being standard (which
| conversely seemed really low).
|
| G4560 suggests you're not processing much, is the NAS
| caching a lot?
| belval wrote:
| 40G is purely overkill and is not utilized. Initial build
| had 8G and then I had 32G lying around so I added it.
|
| 4G is probably enough. Though nextcloud does use a lot of
| memory for miniature generation.
|
| As for the G4560, I can stream 1080p with jellyfin so it
| packs a surprising punch for it's power envelope.
| formerly_proven wrote:
| Even for mainsteam x86 Intel chips idle power consumption
| is mostly down to peripherals, power supply (if you build
| a small NAS that idles on 2-3 W on the 12 V rail and
| can't pull more than 50 W, don't use a 650 W PSU),
| cooling fans, and whether someone forgot to enable power
| management.
| jotm wrote:
| I hear it's still possible, through heretic magic, to limit
| a CPUs power draw and most importantly, it will not affect
| speed on any level (load will increase).
|
| There's even people selling their souls to the devil for
| the ability to control the _actual voltage_ of their chips,
| increasing performance per watt drawn!
|
| But only Gods and top OSS contributors can control the
| power draw of chips integrated into drives/extension
| cards/etc
| Rackedup wrote:
| Adafruit had some in stock a few minutes ago:
| https://twitter.com/rpilocator ... I think every Wednesday
| around 11am ... I almost got one this time, but because they
| had me setup 2FA I couldn't checkout on time.
| mmastrac wrote:
| Is that just on the secondary market(s)? I'm still seeing them
| available <$100 in various models, but not always in-stock.
| lathiat wrote:
| Try https://rpilocator.com/ - no promises though.
| snak wrote:
| Wow, just checked and Pi 3 is over 100EUR, and Pi 4 over
| 200EUR.
|
| What happened? I remember buying a Pi 3B+ in 2019 for less than
| 50EUR.
| kordlessagain wrote:
| pastel-mature-herring~> Is this where compute is going?
|
| awesome-zebra*> There is no definitive answer to this question,
| as the direction of compute technology is always changing and
| evolving. However, the trend in recent years has been towards
| smaller, more powerful devices that are able to pack more
| processing power into a smaller form factor. The DeskPi Super6c
| is an example of this trend, as it offers a trim form factor and
| six individual Raspberry Pi Compute Module 4s, each of which
| offers a high level of processing power.
| mgarfias wrote:
| now if only we could get compute modules
| rustdeveloper wrote:
| This looks really cool! There was a tutorial posted on HN about
| building mobile proxy pool with RPI that had obvious limitations:
| https://scrapingfish.com/blog/byo-mobile-proxy-for-web-scrap...
| It seems this could be a solution to scale capabilities of a
| single RPI.
| marcodiego wrote:
| It is a shame we have nothing as simple as the old OpenMOSIX.
| sschueller wrote:
| If someone is trying to find a pi you can try the telegram bot I
| made for rpilocator.com. It will notify you as soon as there is
| stock with filters for specific pis and your location/preferd
| vendor.
|
| The bot is here: https://t.me/rpilocating_bot
|
| source: https://github.com/sschueller/rpilocatorbot
| 3np wrote:
| This could be neat for a 3xnomad + 3xvault cluster. Just add
| runners and an LB.
| marshray wrote:
| Quite the bold design choice to put the removable storage on the
| underside of the motherboard.
| pishpash wrote:
| 18W at idle seems like a lot of power.
| justsomehnguy wrote:
| Divide it for six PIs, six NVMEs, one switch.
| pishpash wrote:
| Why divide? You don't divide by how many cores are on a
| regular PC, to which this has comparable power.
| cptnapalm wrote:
| Oh my God, I want this. I have no use for it, whatsoever, but oh
| my God I want it anyway.
| cosmiccatnap wrote:
| I love fun projects like this. I would love to know if I could
| make one a router since the nic has two ports.
|
| I have a poweredge now which works fine but it's nowhere close to
| 20 watts and I barely use a quarter of it's cpu and memory
| antod wrote:
| Is it bad that my first thought was "Imagine a Beowulf cluster of
| these..."
___________________________________________________________________
(page generated 2022-08-17 23:01 UTC)