Post AXK7uGC0dLsd91uYhU by feld@bikeshed.party
(DIR) More posts by feld@bikeshed.party
(DIR) Post #AXIdjyJd3PQx1Mh3Ls by feld@bikeshed.party
2023-07-02T23:51:49.188911Z
0 likes, 0 repeats
> You then have a ZFS-style volume system, where you can specify things like "I want these objects replicated 2-3x, persistent / expiring, etc..."It's called CEPH
(DIR) Post #AXIfRtyodRNMGnTAmG by feld@bikeshed.party
2023-07-03T00:10:55.061899Z
0 likes, 0 repeats
Running a cluster over the internet will be the real challenge but otherwise CEPH has the features for a giant object store where you can change parity on the fly etc
(DIR) Post #AXJuFTbaYwALSx6A6a by feld@bikeshed.party
2023-07-03T14:31:12.307472Z
0 likes, 0 repeats
IPFS doesn't fit the requirements of being able to accurately monitor and control the replication (will need a complex pinning service involved)Plus all the existing software speaks S3, not IPFS, so CEPH's S3 would be a perfect drop-in replacement
(DIR) Post #AXK5zR6zGTWKY5u1Zo by feld@bikeshed.party
2023-07-03T16:43:00.483654Z
0 likes, 0 repeats
It's basically the best S3 / object storage system that exists outside of Amazon's S3.But it's very flexible and has an incredible storage architecture called RADOS[1] (Reliable Autonomic Distributed Object Store) and a distribution algorithm called CRUSH.CEPH storage can be mounted as a posix filesystem and as NFS as well.I cannot speak to the specific requirements for keeping a cluster online across a wide area network, but I'm pretty certain it could be done by a group of volunteers contributing their resources, and then offering S3 access to the wider community.I'm not certain if CEPH deduplicates the storage across buckets, but if it does... this would really shine as a community-run solution. (hopefully the people involved here know Anycast)CEPH supports any size storage. Add a 12TB drive today, a 4TB drive tomorrow, and it will do the right thing to keep things balanced and redundant throughout the cluster.[1] https://ceph.com/assets/pdfs/weil-rados-pdsw07.pdf[2] https://ceph.com/assets/pdfs/weil-crush-sc06.pdf
(DIR) Post #AXK645zBEnziTZTN20 by promovicz@chaos.social
2023-07-03T16:41:45Z
1 likes, 0 repeats
@emc2 @feld Yes - CEPH is intended for clusters.
(DIR) Post #AXK6BtvLy0W9NrP9GK by promovicz@chaos.social
2023-07-03T16:44:24Z
0 likes, 0 repeats
@feld @emc2 S3 front-ends for IPFS also exist.
(DIR) Post #AXK6BuV9ot9pAu9k4O by feld@bikeshed.party
2023-07-03T16:45:28.008771Z
0 likes, 0 repeats
yeah but IPFS has a ton of other complexities that makes it suboptimal for this specific use case. It's really, really hard to know how many copies of things are out there...
(DIR) Post #AXK73smCRwvNShrbAu by promovicz@chaos.social
2023-07-03T16:49:46Z
0 likes, 0 repeats
@feld @emc2 True. I don't think that either are a truly good fit. Architectures based on distributed hash tables could work to some degree, but some access latency would usually remain. I would also prefer something with crypto - because ActivityPub is already very insecure and brittle.
(DIR) Post #AXK73tM0IpZ3FkcByy by feld@bikeshed.party
2023-07-03T16:55:02.258408Z
0 likes, 0 repeats
CEPH is already designed for distributed heterogeneous storage, though, which is one of the hardest things to get right. The question is whether or not it could work well with large number of cluster members spread out so far geographically.It already has the algorithm to make sure that changes require as few data moves/copies as possible.With any other system I worry that if some volunteer's data goes offline we're left in a dangerously degraded state // new storage comes online and the system stupidly starts an automatic replication of data to repair the lost redundancy or rebalance and it chokes the links between the nodes and causes the storage to have massive performance issues for the end users.CEPH is meant to be designed already to avoid that.
(DIR) Post #AXK7MxzvGFor9qNlJ2 by feld@bikeshed.party
2023-07-03T16:58:28.086740Z
0 likes, 0 repeats
in either case, this is an incredibly hard problem to solve and I don't know any software other than CEPH or IPFS that would be a good starting point to build off of.
(DIR) Post #AXK7fEo3VQXGV8Xmbo by promovicz@chaos.social
2023-07-03T16:59:39Z
0 likes, 0 repeats
@feld @emc2 Yes, but CEPH is not intended to be run in public - which basically means that you have internal adversaries.
(DIR) Post #AXK7fFTsztzoas7BoG by feld@bikeshed.party
2023-07-03T17:01:41.547861Z
0 likes, 0 repeats
That's why I suggest that it not be "public" as such. It shouldn't be some open system anyone can join without permission.It should be coordinated around a group of volunteers who know what they're doing, have access to the datacenter space required, and run as a non-profit.
(DIR) Post #AXK7uGC0dLsd91uYhU by feld@bikeshed.party
2023-07-03T17:04:28.720362Z
0 likes, 0 repeats
Otherwise the best option we got is something like what Filecoin is trying to create on top of IPFS
(DIR) Post #AXK8naEloI8PEraCCu by promovicz@chaos.social
2023-07-03T17:08:18Z
0 likes, 0 repeats
@feld @emc2 A fully distributed approach has different issues too, but I have seen it work several times.Non-profits are difficult in many countries. They can't be run base-democratic in the US. They can be in Europe, but social issues tend to pop up at certain sizes.
(DIR) Post #AXK8nbThCHfT5RZd6u by feld@bikeshed.party
2023-07-03T17:14:26.378854Z
0 likes, 0 repeats
> but social issues tend to pop up at certain sizes.A DAO would be perfect but the fediverse is inherently allergic to anything from the crypto/web3 community
(DIR) Post #AXK9VEMSVQ8u8JzXsW by feld@bikeshed.party
2023-07-03T17:21:56.127006Z
0 likes, 0 repeats
not all of us are so crazy 🥲