[HN Gopher] A Storage Crisis
       ___________________________________________________________________
        
       A Storage Crisis
        
       Author : vikrum
       Score  : 43 points
       Date   : 2021-07-14 19:49 UTC (3 hours ago)
        
 (HTM) web link (blogs.harvard.edu)
 (TXT) w3m dump (blogs.harvard.edu)
        
       | bethecloud wrote:
       | I recommend looking into decentralized cloud storage solutions,
       | which are really the next generation of AWS S3-like services.
       | 
       | Check out storj.io
        
         | acdha wrote:
         | Decentralization adds a good deal risk and operational overhead
         | but it doesn't really change the core problem that people are
         | generating more data than many want to pay for. If you want to
         | store a non-trivial amount of data, someone needs to get paid
         | to maintain storage pools and validate multiple copies.
        
       | splittingTimes wrote:
       | Maybe the data retention policy of OP is the real problem. Sound
       | like a data hoarder/virtual messy to me.
       | 
       | When ever I take pictures on a trip or vacation, i go through
       | them at the end of each day, delete most of them and keep maybe 2
       | or 3 max, beautify them and the rest goes into the bin. No matter
       | how long the trip, i try to only keep at max the 15 best
       | Pictures. All filler no killer. That way I am comfortable to show
       | others pictures of a trip without boring them and I also like to
       | look at them from time to time as I know those are the best
       | moments.
       | 
       | At the end of each year we create a calendar with photo collages
       | of 4 to 6 pictures per month. The calendar goes to relatives and
       | we create a photo book from the print outs. That is what gets
       | archived.
        
       | the_third_wave wrote:
       | Assuming you have family living elsewhere but reachable through a
       | fast internet connection you can do what I do by making a deal
       | with them: they hang your backup box off their net and you will
       | do the same for them. The backup box is some piece of computing
       | equipment with storage media attached, e.g. a single board
       | computer hooked up to a JBOD tower. Depending on the level of
       | trust between you and your family you can use the thing as an
       | rsnapshot target - giving you fine-grained direct access to time-
       | based snapshots (I use 4-hour intervals for my rsnapshot targets
       | which are located on-premises in different buildings spread over
       | the farm) or as a repository of encrypted tarballs, or something
       | in between. Allow the drives to spin down to save power, they'll
       | be active only a fraction of the day. The average power
       | consumption of the whole contraption does not need to exceed
       | 10-15W making electricity costs negligible. You can have as much
       | storage capacity as you want/can afford at the moment, keep for
       | for as long as you want or until it breaks without having to pay
       | any fees (other than hosting their contraption on your network -
       | possibly including building it for them if they're not that
       | computer-savvy).
        
         | amarshall wrote:
         | Sure, but the OP is not talking about backup, they're talking
         | about primary storage.
        
           | ahnick wrote:
           | Anyone who needs storage space for pictures cares about
           | backup, they just may not be aware that they care about it or
           | they may just conflate the two needs.
        
         | [deleted]
        
         | yjftsjthsd-h wrote:
         | > Depending on the level of trust between you and your family
         | you can use the thing as an rsnapshot target - giving you fine-
         | grained direct access to time-based snapshots (I use 4-hour
         | intervals for my rsnapshot targets which are located on-
         | premises in different buildings spread over the farm) or as a
         | repository of encrypted tarballs, or something in between.
         | 
         | These days, openzfs native encryption is the best of all
         | worlds, I think.
        
           | the_third_wave wrote:
           | In that case, use that. ZFS seems to be a somewhat touchy
           | subject with some people using it as widely as possible while
           | others - myself included - prefer more modular storage
           | systems where the tasks of volume manager,
           | striping/slicing/raid management, encryption layers and file
           | systems are performed by discrete software layers. Both
           | systems work, both have their pros and cons, in the end it
           | comes down what you value the most.
        
             | ahnick wrote:
             | What's the cons of the ZFS approach?
        
         | sillysaurusx wrote:
         | I think I'm missing something obvious, but why is this better
         | than keeping your tower at home?
         | 
         | Is this to guard against house fires?
        
           | the_third_wave wrote:
           | House fires, electrical mayhem (lightning strikes have
           | released more magical smoke than I care to mention here),
           | burglary, flooding, law enforcement coming by to take your
           | things because of _$reason_ , earthquake damage or any other
           | localised threat which can not touch remote backups.
        
       | wmf wrote:
       | The solution is either NAS or cloud archive (not backup) such as
       | SmugMug or ExpanDrive.
        
       | wyager wrote:
       | Spend a few hundred to a few thousand buck and build a chunky
       | NAS. Get a bunch of 12TB SSDs and put them in a RAIDZn.
       | 
       | Also, what shitty phone takes 108MP photos? That's guaranteed to
       | be some stupid Android phone gimmick. There's no way having that
       | many pixels with a teeny optical path and a teeny sensor is
       | useful. I'd only want above 100MP on a medium format sensor.
        
       | JohnJamesRambo wrote:
       | Almost everything I create doesn't really need to be saved and I
       | could never find it if I needed to. I just let Google Photos keep
       | them. It's good enough.
        
       | lordnacho wrote:
       | What are the requirements? Large capacity, redundancy, reasonable
       | access speed, always on but maintenance downtime of even a few
       | hours a year tolerable?
       | 
       | Easy but your data isn't yours: sync your data to GDrive or Apple
       | or whatever, and sync a NAS to that.
       | 
       | A little harder but still doable: get a Hetzner and set that up
       | as your storage, set up your own access, sync to local NAS. A
       | Hetz is also really useful for running a load of other services,
       | so for 50 bucks or so it seems pretty reasonable.
       | 
       | You could just buy a huge disk and run a server in your house,
       | but it gets annoying in various ways. Kids unplug the power, it
       | creates heat, maybe noise, multiple disks end up needing
       | management, that kind of thing.
        
       | ur-whale wrote:
       | The real question being, of course, given that your life is
       | finite ... how often will you actually look at any of these pics.
       | 
       | The more pix there are, the less likely it is you'll ever even
       | open any of these.
        
         | tablespoon wrote:
         | > The real question being, of course, given that your life is
         | finite ... how often will you actually look at any of these
         | pics.
         | 
         | > The more pix there are, the less likely it is you'll ever
         | even open any of these.
         | 
         | Well, there are also future generations to think about, but
         | their interest will fall off too (until you reach the
         | genealogical profile level, which maxes out at a few portrait-
         | type pictures of any regular individual).
         | 
         | It's pretty essential to aggressively curate and organize data
         | like this.
        
       | darklighter3 wrote:
       | Amazon Snowball? https://aws.amazon.com/snowball/
        
       | jcoq wrote:
       | A few people have raised this question in the thread already, but
       | why are we compelled to store such unreasonable numbers of
       | images?
       | 
       | I'm definitely not immune and I find that the satisfaction I get
       | from my photo collection is inversely proportional to the size of
       | my collection. At this point, I'm just lugging around this huge
       | mass of data "just in case". There's no way I will ever have time
       | to sort my images. There are probably 2000 wedding images alone,
       | let alone the tens of thousands thousands of random snapshots
       | that may or may not be something I ever care to see again.
       | 
       | At this point, I would almost prefer a smart solution opt-in
       | solution similar to what Google photos provides for smart albums:
       | "We found these images that would be good long-term. Save them
       | indefinitely?"
        
       | [deleted]
        
       | ilamont wrote:
       | Isn't this the business model that Dropbox, iCloud and Google
       | Drive, etc. are based upon?
        
       | D13Fd wrote:
       | I back up about 15tb of photos with Arq to Wasabi cloud storage.
       | 
       | It has been running with zero maintenance (other than occasional
       | partial restores) since late 2018.
       | 
       | I transitioned off of AWS cloud storage when they raised the
       | prices.
       | 
       | I'm not sure if Wasabi is still the cheapest and fastest, but
       | they have been great to deal with. And Arq is an excellent set-
       | it-and-forget-it encrypted cloud backup app.
       | 
       | I also run a server with a 16tb RAID 1 array and a set of local
       | backup drives. Sadly it is almost full, and the volume of data
       | makes it a hassle to upgrade (not to mention the cost).
       | 
       | I've found standard 1Gig Ethernet to be just barely fast enough
       | for editing photos over the local network. However, for my own
       | sanity, I usually do the initial editing on a local drive before
       | sending the files to the server (and from there to the cloud
       | backup).
        
         | wmf wrote:
         | Note that he's looking for primary storage not backup.
        
       | PaulHoule wrote:
       | I think $6 a month for "unlimited data" is going to end badly.
       | 
       | The AWS or Azure price is high, but it's a scalable price, a real
       | price.
        
         | wgjordan wrote:
         | > I think $6 a month for "unlimited data" is going to end
         | badly.
         | 
         | Backblaze has been offering their 'unlimited data' for over a
         | decade [1], and it hasn't ended badly for them yet.
         | 
         | It's fully sustainable because they only lose money on a few
         | customers (like that one customer storing 430TB for $6/month
         | [2]), while most customers use much less storage than that, so
         | the service remains profitable overall.
         | 
         | The soft limit of its 'unlimited' comes from it being a
         | personal-backup mirroring service and not a fully-external
         | cloud storage, so the service only fits certain use cases.
         | 
         | [1] https://www.backblaze.com/blog/all-in-on-unlimited-backup/
         | 
         | [2]
         | https://www.reddit.com/r/IAmA/comments/b6lbew/were_the_backb...
        
           | res0nat0r wrote:
           | Backblaze is cool, but their limitations to make $6/month
           | affordable on their end essentially eliminate any large
           | backups. You must have any data locally connected to your PC
           | or it will be purged if not seen within 30 days (so you can
           | connect a bunch of USB drives locally, but this is a hack).
           | 
           | Also their upload speed is capped so you can't just upload at
           | 1GB/s or max out your connection to ingest data into their
           | system, like something you can do with S3.
           | 
           | Glacier Deep Archive is still the cheapest thing really at
           | $1/TB, but the retrieval times and also egress data charges
           | are a big catch.
        
             | secabeen wrote:
             | Yeah, they also don't have a Linux client for the backup
             | product. While you certainly can build a Windows-based
             | storage server, and there's even some interesting storage
             | tech in Windows, most data-hoarders store their data on
             | Linux.
             | 
             | Glacier/Deep is good, but with the 180-day minimum object
             | lifetime, you want to be sure that the data is ready to go
             | into the archive before pushing it there. (You can use
             | tiered storage, but then you're storing all data in
             | standard S3 for 30 days before it gets into Glacier, and
             | that one month of storage in standard S3 will cost you the
             | same as 10-months of Glacier storage.)
        
             | Causality1 wrote:
             | That would be super attractive to me if my ISP didn't have
             | data caps. My personal archive is ~40TB of drives in my
             | primary desktop. Even disregarding empty space and
             | duplicates, it would take four years or more to upload it
             | onto a cloud service without running over my data cap.
        
               | res0nat0r wrote:
               | I have symmetric gigabit ethernet and have a few TB
               | backed up to deep archive, but if I wanted to backup and
               | restore everything the data transfer pricing is insane,
               | it would be $1800 for 20TB. This really is the only thing
               | AWS is keeping artificially high to facilitate lock-in.
        
         | barbazoo wrote:
         | It's not "unlimited" in a practical sense because as the post
         | said, they're just mirroring the data so you yourself have to
         | have the storage capacity you're asking them for.
        
           | 908B64B197 wrote:
           | There might be a way to trick the client synching the data
           | into believing that it's still on disk by looking at how it
           | checks if the data is still there and writing a filter driver
           | to intercept and modify the result of the call.
        
             | inetknght wrote:
             | Good luck getting a checksum for a partial file that you
             | don't have
        
               | amelius wrote:
               | True, but perhaps the checksum doesn't need to be
               | computed if the timestamp didn't change. Or perhaps the
               | checksum is always computed in the same way so you can
               | just store the checksum.
        
               | 908B64B197 wrote:
               | Store the checksums? You still need local storage but way
               | less.
        
         | nateroling wrote:
         | Yup, I'm currently experimenting with using an AWS bucket
         | alongside [CloudMounter](https://cloudmounter.net) to create an
         | Archive disk for things I don't really need day-to-day, but
         | would like to hang onto.
         | 
         | I'm currently not backing that data up, though. It's not
         | critical, and the chances of AWS losing it are low enough that
         | I'm not too worried. The biggest risk would be myself
         | accidentally deleting it.
         | 
         | One thing I have decided for sure, though, is that archiving
         | lots of small files is awful. Much better to wrap them up in a
         | tarfile, uncompressed.
        
           | terpimost wrote:
           | Amazon Drive is fine. No need to have a mirror of all that
           | data on my computer. With that I could have selected sync or
           | no sync at all. + unlimited photo storage. Amazon Photo and
           | Amazon Drive it's like a single product.
        
         | charcircuit wrote:
         | The bandwidth cost makes me cry.
        
       | TheDudeMan wrote:
       | If you don't want to pay for the cloud, then there is only one
       | sane option: You keep upgrading your local storage so that you
       | never have more than a few devices. Every six months, buy a new
       | giant disk and decommission as many smaller disks as you can.
        
         | arthurcolle wrote:
         | This sounds extremely expensive
        
           | amarshall wrote:
           | Cloud storage is _really_ expensive, though. Putting 4 TB in
           | S3 costs US$94.21 /month before any transfer costs. A 4 TB
           | CMR HDD is ~$140. That doesn't include power costs, but over
           | a year you get $990 to spend on that and other things from
           | the cost difference vs. S3 (plus you can sell your old drives
           | when you're done with them).
           | 
           | (I use B2 for cloud storage backup since it's a lot cheaper
           | than S3 at US$20 for 4 TB, but that still is ~$105/year more
           | than local)
        
             | dragontamer wrote:
             | > 4 TB CMR HDD is ~$140
             | 
             | And that's with bad pricing! Last year, 4TB was $120 and
             | 6TB was ~$150.
             | 
             | Building out a NAS for $1000 ($600 in 4x hard drives, $400
             | for other components) is very reasonable. Last year that
             | was 4x6TB == 12TB storage + 12TB redundancy, but this year
             | prices are worse so you "only" get 8TB + 8TB redundancy.
             | 
             | $400 can afford a Synology or various NAS devices. It can
             | also afford a new desktop that you can install FreeNAS or
             | whatever onto.
             | 
             | -----------
             | 
             | Eventually, when the 8TB is not enough, just buy a new HBA
             | card and shove 4x more hard drives in there for a 2nd
             | storage on the same NAS. Maybe 8TB x 4 == 16TB usable +
             | 16TB redundancy.
             | 
             | Except no need to copy everything over, just keep the old
             | 8TB cluster working, and just start writing to the new 16TB
             | storage.
        
               | rhacker wrote:
               | This is the true answer to this entire thread.
        
       ___________________________________________________________________
       (page generated 2021-07-14 23:02 UTC)