[HN Gopher] A Storage Crisis
___________________________________________________________________
A Storage Crisis
Author : vikrum
Score : 43 points
Date : 2021-07-14 19:49 UTC (3 hours ago)
(HTM) web link (blogs.harvard.edu)
(TXT) w3m dump (blogs.harvard.edu)
| bethecloud wrote:
| I recommend looking into decentralized cloud storage solutions,
| which are really the next generation of AWS S3-like services.
|
| Check out storj.io
| acdha wrote:
| Decentralization adds a good deal risk and operational overhead
| but it doesn't really change the core problem that people are
| generating more data than many want to pay for. If you want to
| store a non-trivial amount of data, someone needs to get paid
| to maintain storage pools and validate multiple copies.
| splittingTimes wrote:
| Maybe the data retention policy of OP is the real problem. Sound
| like a data hoarder/virtual messy to me.
|
| When ever I take pictures on a trip or vacation, i go through
| them at the end of each day, delete most of them and keep maybe 2
| or 3 max, beautify them and the rest goes into the bin. No matter
| how long the trip, i try to only keep at max the 15 best
| Pictures. All filler no killer. That way I am comfortable to show
| others pictures of a trip without boring them and I also like to
| look at them from time to time as I know those are the best
| moments.
|
| At the end of each year we create a calendar with photo collages
| of 4 to 6 pictures per month. The calendar goes to relatives and
| we create a photo book from the print outs. That is what gets
| archived.
| the_third_wave wrote:
| Assuming you have family living elsewhere but reachable through a
| fast internet connection you can do what I do by making a deal
| with them: they hang your backup box off their net and you will
| do the same for them. The backup box is some piece of computing
| equipment with storage media attached, e.g. a single board
| computer hooked up to a JBOD tower. Depending on the level of
| trust between you and your family you can use the thing as an
| rsnapshot target - giving you fine-grained direct access to time-
| based snapshots (I use 4-hour intervals for my rsnapshot targets
| which are located on-premises in different buildings spread over
| the farm) or as a repository of encrypted tarballs, or something
| in between. Allow the drives to spin down to save power, they'll
| be active only a fraction of the day. The average power
| consumption of the whole contraption does not need to exceed
| 10-15W making electricity costs negligible. You can have as much
| storage capacity as you want/can afford at the moment, keep for
| for as long as you want or until it breaks without having to pay
| any fees (other than hosting their contraption on your network -
| possibly including building it for them if they're not that
| computer-savvy).
| amarshall wrote:
| Sure, but the OP is not talking about backup, they're talking
| about primary storage.
| ahnick wrote:
| Anyone who needs storage space for pictures cares about
| backup, they just may not be aware that they care about it or
| they may just conflate the two needs.
| [deleted]
| yjftsjthsd-h wrote:
| > Depending on the level of trust between you and your family
| you can use the thing as an rsnapshot target - giving you fine-
| grained direct access to time-based snapshots (I use 4-hour
| intervals for my rsnapshot targets which are located on-
| premises in different buildings spread over the farm) or as a
| repository of encrypted tarballs, or something in between.
|
| These days, openzfs native encryption is the best of all
| worlds, I think.
| the_third_wave wrote:
| In that case, use that. ZFS seems to be a somewhat touchy
| subject with some people using it as widely as possible while
| others - myself included - prefer more modular storage
| systems where the tasks of volume manager,
| striping/slicing/raid management, encryption layers and file
| systems are performed by discrete software layers. Both
| systems work, both have their pros and cons, in the end it
| comes down what you value the most.
| ahnick wrote:
| What's the cons of the ZFS approach?
| sillysaurusx wrote:
| I think I'm missing something obvious, but why is this better
| than keeping your tower at home?
|
| Is this to guard against house fires?
| the_third_wave wrote:
| House fires, electrical mayhem (lightning strikes have
| released more magical smoke than I care to mention here),
| burglary, flooding, law enforcement coming by to take your
| things because of _$reason_ , earthquake damage or any other
| localised threat which can not touch remote backups.
| wmf wrote:
| The solution is either NAS or cloud archive (not backup) such as
| SmugMug or ExpanDrive.
| wyager wrote:
| Spend a few hundred to a few thousand buck and build a chunky
| NAS. Get a bunch of 12TB SSDs and put them in a RAIDZn.
|
| Also, what shitty phone takes 108MP photos? That's guaranteed to
| be some stupid Android phone gimmick. There's no way having that
| many pixels with a teeny optical path and a teeny sensor is
| useful. I'd only want above 100MP on a medium format sensor.
| JohnJamesRambo wrote:
| Almost everything I create doesn't really need to be saved and I
| could never find it if I needed to. I just let Google Photos keep
| them. It's good enough.
| lordnacho wrote:
| What are the requirements? Large capacity, redundancy, reasonable
| access speed, always on but maintenance downtime of even a few
| hours a year tolerable?
|
| Easy but your data isn't yours: sync your data to GDrive or Apple
| or whatever, and sync a NAS to that.
|
| A little harder but still doable: get a Hetzner and set that up
| as your storage, set up your own access, sync to local NAS. A
| Hetz is also really useful for running a load of other services,
| so for 50 bucks or so it seems pretty reasonable.
|
| You could just buy a huge disk and run a server in your house,
| but it gets annoying in various ways. Kids unplug the power, it
| creates heat, maybe noise, multiple disks end up needing
| management, that kind of thing.
| ur-whale wrote:
| The real question being, of course, given that your life is
| finite ... how often will you actually look at any of these pics.
|
| The more pix there are, the less likely it is you'll ever even
| open any of these.
| tablespoon wrote:
| > The real question being, of course, given that your life is
| finite ... how often will you actually look at any of these
| pics.
|
| > The more pix there are, the less likely it is you'll ever
| even open any of these.
|
| Well, there are also future generations to think about, but
| their interest will fall off too (until you reach the
| genealogical profile level, which maxes out at a few portrait-
| type pictures of any regular individual).
|
| It's pretty essential to aggressively curate and organize data
| like this.
| darklighter3 wrote:
| Amazon Snowball? https://aws.amazon.com/snowball/
| jcoq wrote:
| A few people have raised this question in the thread already, but
| why are we compelled to store such unreasonable numbers of
| images?
|
| I'm definitely not immune and I find that the satisfaction I get
| from my photo collection is inversely proportional to the size of
| my collection. At this point, I'm just lugging around this huge
| mass of data "just in case". There's no way I will ever have time
| to sort my images. There are probably 2000 wedding images alone,
| let alone the tens of thousands thousands of random snapshots
| that may or may not be something I ever care to see again.
|
| At this point, I would almost prefer a smart solution opt-in
| solution similar to what Google photos provides for smart albums:
| "We found these images that would be good long-term. Save them
| indefinitely?"
| [deleted]
| ilamont wrote:
| Isn't this the business model that Dropbox, iCloud and Google
| Drive, etc. are based upon?
| D13Fd wrote:
| I back up about 15tb of photos with Arq to Wasabi cloud storage.
|
| It has been running with zero maintenance (other than occasional
| partial restores) since late 2018.
|
| I transitioned off of AWS cloud storage when they raised the
| prices.
|
| I'm not sure if Wasabi is still the cheapest and fastest, but
| they have been great to deal with. And Arq is an excellent set-
| it-and-forget-it encrypted cloud backup app.
|
| I also run a server with a 16tb RAID 1 array and a set of local
| backup drives. Sadly it is almost full, and the volume of data
| makes it a hassle to upgrade (not to mention the cost).
|
| I've found standard 1Gig Ethernet to be just barely fast enough
| for editing photos over the local network. However, for my own
| sanity, I usually do the initial editing on a local drive before
| sending the files to the server (and from there to the cloud
| backup).
| wmf wrote:
| Note that he's looking for primary storage not backup.
| PaulHoule wrote:
| I think $6 a month for "unlimited data" is going to end badly.
|
| The AWS or Azure price is high, but it's a scalable price, a real
| price.
| wgjordan wrote:
| > I think $6 a month for "unlimited data" is going to end
| badly.
|
| Backblaze has been offering their 'unlimited data' for over a
| decade [1], and it hasn't ended badly for them yet.
|
| It's fully sustainable because they only lose money on a few
| customers (like that one customer storing 430TB for $6/month
| [2]), while most customers use much less storage than that, so
| the service remains profitable overall.
|
| The soft limit of its 'unlimited' comes from it being a
| personal-backup mirroring service and not a fully-external
| cloud storage, so the service only fits certain use cases.
|
| [1] https://www.backblaze.com/blog/all-in-on-unlimited-backup/
|
| [2]
| https://www.reddit.com/r/IAmA/comments/b6lbew/were_the_backb...
| res0nat0r wrote:
| Backblaze is cool, but their limitations to make $6/month
| affordable on their end essentially eliminate any large
| backups. You must have any data locally connected to your PC
| or it will be purged if not seen within 30 days (so you can
| connect a bunch of USB drives locally, but this is a hack).
|
| Also their upload speed is capped so you can't just upload at
| 1GB/s or max out your connection to ingest data into their
| system, like something you can do with S3.
|
| Glacier Deep Archive is still the cheapest thing really at
| $1/TB, but the retrieval times and also egress data charges
| are a big catch.
| secabeen wrote:
| Yeah, they also don't have a Linux client for the backup
| product. While you certainly can build a Windows-based
| storage server, and there's even some interesting storage
| tech in Windows, most data-hoarders store their data on
| Linux.
|
| Glacier/Deep is good, but with the 180-day minimum object
| lifetime, you want to be sure that the data is ready to go
| into the archive before pushing it there. (You can use
| tiered storage, but then you're storing all data in
| standard S3 for 30 days before it gets into Glacier, and
| that one month of storage in standard S3 will cost you the
| same as 10-months of Glacier storage.)
| Causality1 wrote:
| That would be super attractive to me if my ISP didn't have
| data caps. My personal archive is ~40TB of drives in my
| primary desktop. Even disregarding empty space and
| duplicates, it would take four years or more to upload it
| onto a cloud service without running over my data cap.
| res0nat0r wrote:
| I have symmetric gigabit ethernet and have a few TB
| backed up to deep archive, but if I wanted to backup and
| restore everything the data transfer pricing is insane,
| it would be $1800 for 20TB. This really is the only thing
| AWS is keeping artificially high to facilitate lock-in.
| barbazoo wrote:
| It's not "unlimited" in a practical sense because as the post
| said, they're just mirroring the data so you yourself have to
| have the storage capacity you're asking them for.
| 908B64B197 wrote:
| There might be a way to trick the client synching the data
| into believing that it's still on disk by looking at how it
| checks if the data is still there and writing a filter driver
| to intercept and modify the result of the call.
| inetknght wrote:
| Good luck getting a checksum for a partial file that you
| don't have
| amelius wrote:
| True, but perhaps the checksum doesn't need to be
| computed if the timestamp didn't change. Or perhaps the
| checksum is always computed in the same way so you can
| just store the checksum.
| 908B64B197 wrote:
| Store the checksums? You still need local storage but way
| less.
| nateroling wrote:
| Yup, I'm currently experimenting with using an AWS bucket
| alongside [CloudMounter](https://cloudmounter.net) to create an
| Archive disk for things I don't really need day-to-day, but
| would like to hang onto.
|
| I'm currently not backing that data up, though. It's not
| critical, and the chances of AWS losing it are low enough that
| I'm not too worried. The biggest risk would be myself
| accidentally deleting it.
|
| One thing I have decided for sure, though, is that archiving
| lots of small files is awful. Much better to wrap them up in a
| tarfile, uncompressed.
| terpimost wrote:
| Amazon Drive is fine. No need to have a mirror of all that
| data on my computer. With that I could have selected sync or
| no sync at all. + unlimited photo storage. Amazon Photo and
| Amazon Drive it's like a single product.
| charcircuit wrote:
| The bandwidth cost makes me cry.
| TheDudeMan wrote:
| If you don't want to pay for the cloud, then there is only one
| sane option: You keep upgrading your local storage so that you
| never have more than a few devices. Every six months, buy a new
| giant disk and decommission as many smaller disks as you can.
| arthurcolle wrote:
| This sounds extremely expensive
| amarshall wrote:
| Cloud storage is _really_ expensive, though. Putting 4 TB in
| S3 costs US$94.21 /month before any transfer costs. A 4 TB
| CMR HDD is ~$140. That doesn't include power costs, but over
| a year you get $990 to spend on that and other things from
| the cost difference vs. S3 (plus you can sell your old drives
| when you're done with them).
|
| (I use B2 for cloud storage backup since it's a lot cheaper
| than S3 at US$20 for 4 TB, but that still is ~$105/year more
| than local)
| dragontamer wrote:
| > 4 TB CMR HDD is ~$140
|
| And that's with bad pricing! Last year, 4TB was $120 and
| 6TB was ~$150.
|
| Building out a NAS for $1000 ($600 in 4x hard drives, $400
| for other components) is very reasonable. Last year that
| was 4x6TB == 12TB storage + 12TB redundancy, but this year
| prices are worse so you "only" get 8TB + 8TB redundancy.
|
| $400 can afford a Synology or various NAS devices. It can
| also afford a new desktop that you can install FreeNAS or
| whatever onto.
|
| -----------
|
| Eventually, when the 8TB is not enough, just buy a new HBA
| card and shove 4x more hard drives in there for a 2nd
| storage on the same NAS. Maybe 8TB x 4 == 16TB usable +
| 16TB redundancy.
|
| Except no need to copy everything over, just keep the old
| 8TB cluster working, and just start writing to the new 16TB
| storage.
| rhacker wrote:
| This is the true answer to this entire thread.
___________________________________________________________________
(page generated 2021-07-14 23:02 UTC)