[HN Gopher] LTO Tape data storage for Linux nerds
___________________________________________________________________
LTO Tape data storage for Linux nerds
Author : detaro
Score : 240 points
Date : 2022-01-27 12:10 UTC (10 hours ago)
(HTM) web link (blog.benjojo.co.uk)
(TXT) w3m dump (blog.benjojo.co.uk)
| CharleFKane wrote:
| I would like to thank the author for bringing back memories. Not
| all of which are good...
|
| (I used to work for a four letter computer corporation doing
| enterprise technical support, mostly on tape-based products.)
| cassepipe wrote:
| "Unlike most block devices these are devices that do not enjoy
| seeking of any kind. So you generally end up writing streaming
| file formats to tape, unsurprisingly this is exactly what the
| Tape ARchive (.tar) is actually for. "
|
| Haha! moment
| magicalhippo wrote:
| Nice calculator. Crossover point here for LTO-8 seems to be
| around 250TB. I think I'll stick with my HDD's for now.
| paulmd wrote:
| I picked up an LTO-5 drive a couple years ago and one thing I
| found (this is probably a good place to bring it up!) is that the
| software documentation for tape utilities and high-level
| overviews of the strategies employed to build and manage
| libraries of data on this model is pretty thin on the ground at
| this point. Completely understandable given how few people have
| tapes these days, but it also makes it a little tougher to pick
| up from scratch.
|
| (And in particular the high-level overviews are important because
| tapes are wear items, you only have on the order-of a hundred or
| two (don't remember the exact figures) full tape reads before the
| tape wears out, so this is something you want to go into it
| knowing a strategy and not making it up as you go!)
|
| Since it's complimentary to this discussion I'll link a few:
|
| https://www.cyberciti.biz/hardware/unix-linux-basic-tape-man...
|
| https://databasetutorialpoint.wordpress.com/to-know-more/how...
|
| https://sites.google.com/site/linuxscooter/linux/backups/tap...
|
| https://access.redhat.com/documentation/en-us/red_hat_enterp...
|
| https://access.redhat.com/solutions/68115
|
| That is, unfortunately, essentially the apex of LTO tape
| documentation in 2022, as far as I can tell.
|
| Do note that in terms of tape standards, LTO-5 is an important
| threshold because that's where LTFS support got added, and that's
| the closest thing to a "normal" filesystem abstraction that's
| available for tape (sort of like packet-formatted CDRWs I guess,
| in the sense of presenting an abstraction over the raw seekable
| block device). There is also very little documentation on init,
| care, and feeding of LTFS iirc - and again, it would be nice to
| know any pitfalls that might cause shoeshining and tape death.
| Although I suppose in practice it's mostly going to get used more
| in a "multi session" scenario where you mostly aren't deleting
| files, you write till it's full and then maybe wipe the whole
| tape at once, and it's just a nice abstraction to allow the
| abstraction of "files" rather than sequential records (tape
| archives/TARs, in fact!) along an opaque track with no
| contextualization.
| MayeulC wrote:
| The tech doesn't seem to be too complex, is there an open
| hardware project?
|
| Seems like one could go quite far in terms of performance with
| just some basic HW and an FPGA. Is there significant difference
| between multiple generation of the tapes themselves, or is it
| just data encoding patterns that change?
|
| More specifically, I was a bit appalled by the "magnetic erasing"
| bit. Seems like DRM to me, on a medium that is conceptually
| extremely simple.
|
| One could probably take a VHS drive and convert it to a data
| drive, unless I'm being naively optimistic about it?
| justsomehnguy wrote:
| > One could probably take a VHS drive and convert it to a data
| drive
|
| https://en.wikipedia.org/wiki/ArVid
|
| > More specifically, I was a bit appalled by the "magnetic
| erasing"
|
| Nobody laments what there is no 'low level format' for HDDs
| anymore.
| zaarn wrote:
| A VHS drive uses a different encoding pattern, the head of the
| VHS player is physically incapable of moving like the head of
| an LTO tape. Additionally it lacks precision as an LTO tape is
| much more densely packed. Lastly, LTO drives use different
| magnetic materials and signalling, so by all chance the VHS
| head is probably only going to pick up noise.
| tssva wrote:
| Back in the day there were a few backup products available
| which connected to standard VHS VCRs.
| dark-star wrote:
| PSA: Don't do a cleaning run unless your tape drive tells you to
| (there are SCSI sense codes for that). The tapes can pretty well
| assess the need (or not) for cleaning, and excessive cleaning can
| negatively affect the lifetime of the r/w head (the cleaning
| tapes are abrasive)
| wolfgang42 wrote:
| The mt(1) manpage describes seeking on files, records, and file
| marks, but doesn't explain what any of them are. What's the
| difference between these these options? (It sounds like file
| marks are stored on the tape on a special track or something, but
| I can't seem to find any discussion of the others.)
| StillBored wrote:
| So, all from a few years old memory and its a complex
| interwoven mess.
|
| Lets start with tape has two types of head positioning
| commands, locate and space. Locate is absolute (and mt calls it
| seek), and space is relative. Mt is generally using space
| (although one can read the current position with tell then do
| relative space) for all the commands that aren't "seek". Hence
| the mt commands are things like "fsf" which is forward space
| file (mark), or "bsf" for back space file (mark). At some point
| in the past someone thought that each "file" would fit in a
| tape block, but then reality hit because there are limits on
| how large the blocks can actually be (in linux its generally
| the number of scatter gather entries that can fit in a page
| reliably). So there are filemarks, which are like "special"
| tape blocks without any data in them. Instead if you attempt to
| read over a filemark the drive returns a soft error telling you
| that you just tried to read a filemark. There are also "fsr"
| for forward space records with are just the individual blocks
| forming a "file".
|
| So back to seeking. If you man st, you will notice that each
| tape drive gets a bunch of /dev/st* aliases, which control the
| close behavior/etc, as well as some ioctls that match the mt
| commands. The two important close behaviors to remember are
| that if the tape is at EOD due to the last command being a
| write it will write a filemark, then rewind the tape unless a
| /dev/stXn device is being used, in which case it will leave the
| head position just past the FM (this is actually a bit more
| complex too because IIRC there may be two filemarks at EOD, and
| the tape position gets left between them).
|
| This allows one to do something like "for (x in *.txt); do cat
| $x >> /dev/st0n; done" and write a bunch of files separated by
| filemarks (at the default blocking size which will be slow
| (probably 10k), replace the cat with tar to control
| blocking/etc). Or if you want to read the previous file `mt -f
| /dev/st0n bsf 2` to back space 2 filemarks.
|
| Now, the actual data format on tape is going to be dictated by
| the backup utility used to write it. Some never use filemarks,
| some do but as a volume separator (eg tar), old ones actually
| put FM's between files, but that tends to be slow because it
| kills read perf because it takes the drive out of streaming
| mode whenever you either read over the filemark (not the part
| on man st about reading a filemark).
|
| Now you can pick which file to read via "mt -f /dev/st0 rewind;
| mt -f /dev/st0n fsf X; cat /dev/st0n > restore.file"
|
| There are also tape partition control commands, and tape set
| marks and various other options which may/may not apply to a
| given type of tape. Noticeably there are also density flags on
| the special file (some unix'es) and via mt. LTO for example
| doesn't have settable densities because its fixed by the
| physical tape in the drive. Some drives STK T10K/IBM
| TS11X0/3592 can upgrade the tape density/capacity when used in
| a newer drive.
|
| That got long...
| kortex wrote:
| Is there a unix-style streaming tool, like tar/zstd/age, that
| does forward error correction? I'd love to stick some ECC in that
| pipeline, data>zstd>age>ecc>tape, cause I'm paranoid about
| bitrot. I search for such a thing every few months and haven't
| scratched the itch.
|
| The closest is things like inFECtious, which is more of just a
| library.
|
| I would prefer something in go/rust, since these languages have
| shown really high backwards compatibility over time. Last thing
| you want is finding 10 years later building your recovery tool
| that you can't build it. Will also accept some dusty c util with
| a toolpath that hasn't changed in decades.
|
| https://github.com/vivint/infectious
|
| Ok I just dug up blkar, this looks promising, but the more the
| merrier.
|
| https://github.com/darrenldl/blockyarchive
| StillBored wrote:
| So, while others have pointed out the media blocks are ECC
| protected/etc, I think what you are really looking for is
| application/fs control. LTO supports "Logical Block Protection"
| which is meta data (CRC's) which are tracked/checked alongside
| the transport level ECC/etc on fibrechannel & the drive itself.
|
| Check out section 4.9 in
| https://www.ibm.com/support/pages/system/files/inline-
| files/....
|
| To be clear, this is a "user" level function that basically
| says "here is a CRC I want the drive to check and store
| alongside the data i'm giving it". It needs to be supported by
| the backup application stack/etc if one isn't writing the drive
| with scsi passthrough or similar. Its sorta similar to adding a
| few bytes to a 4k HD sector (something some FC/scsi HDs can do
| too) turning it into a 4K+X bytes sector on the media, that
| gets checked by the drive along the way vs, just running in
| variable block mode and adding a few bytes to the beginning/end
| of the block being written (something thats possible too since
| tape drives can support blocks of basically any size).
|
| The problem with these methods, is that one should really be
| encoding a "block id" which describes which/where the block is
| as well. Since its entirely possible to get a file with the
| right ECC/protection information and its the wrong (version)
| file.
|
| So, while people talk about "bitrot", no modern piece of HW
| (except intel desktop/laptops without ECC ram) is actually
| going to return a piece of data that is partially wrong because
| there are multiple layers of ECC protecting the data. If the
| media bit rots and the ECC cannot correct it, then you get read
| errors.
| eternityforest wrote:
| There's gotta be an API to get the raw data even if it's
| wrong, right?
| StillBored wrote:
| Not usually, its the same with HD's. You can't get the raw
| signal data from the drive unless you have special
| firmware, or find a hidden read command somewhere.
|
| The drive can't necessarily even pick "wrong" data to send
| you because there are a lot more failure cases than "I got
| a sector but the ECC/CRC doesn't match". Embedded servo
| errors can mean it can't even find the right place, then
| there are likely head positioning and amp tuning parameters
| which generally get dynamically adjusted on the fly. This
| AFAIK is a large part of why reading a "bad" sector can
| take so long. Its repeatedly rereading it trying to
| adjust/bias those tuning parameters in order to get a clean
| read. And there are multiple layers of signal
| conditioning/coding/etc usually in a feedback loop. The
| data has to get really trashed before its not recoverable,
| but when that happens it good and done. (think about even
| CD's which can get massively scratched/damaged before they
| stop playing).
| dmitrybrant wrote:
| If I'm not mistaken, the tape drive automatically adds ECC to
| each written block, and then uses it to verify the block next
| time you read it. So if there's bit rot on the tape (i.e. too
| much for ECC to fix), it will just be reported as a bad block
| with no data, and there wouldn't be any point of adding
| "second-order" ECC from the user end.
| metabagel wrote:
| You're exactly right. There is substantial ECC in the LTO
| format. If the drive can recover the data, then it's valid.
| BenjiWiebe wrote:
| There might be a point if you interleaved data and/or had a
| much higher amount of EC, such that you could recover from
| isolated bad blocks.
| c0l0 wrote:
| It may not _exactly_ be what you are looking for, but if you
| want to protect a stable data set from bit-rot after it 's been
| created, make sure to take a look at Parchive/par2:
|
| https://en.wikipedia.org/wiki/Parchive
|
| https://github.com/Parchive/par2cmdline/
| genewitch wrote:
| Parity archives used to be extremely popular back when dialup
| was king. I've often wondered if there's a filesystem that
| has that sort of granular control over how much parity there
| is. I'd use it, for sure.
| uniqueuid wrote:
| ZFS is probably closest to what you want.
|
| It allows you to choose the amount of parity on the disk-
| level (as in: 1,2, or 3 disk parity in raidz1, raidz2 and
| raidz3). You can also keep multiple copies of data around
| with copies=N (but note that when the entire pool fails,
| those copies are gone - this just protects you by storing
| multiple copies in different places, potentially on the
| same disk).
|
| [edit] To add another neat feature that allows for
| granularity: ZFS can set attributes (compression, record
| size, encryption, hash algorithm, copies etc.) on the level
| of logical data sets. So you can have arbitrarily many data
| stores on a single pool with different settings. Sadly,
| parity is not one of those attributes - that's set per
| pool, not per dataset.
| Notanothertoo wrote:
| Zfs is king imo. Brtfs is the more liberally licensed oss
| competitor and Refs is the m$ solution.
| JustFinishedBSG wrote:
| Still extremely popular (as in _the norm_ ) on Usenet
| dmitrygr wrote:
| man par2
| amelius wrote:
| With these prices for drives the market seems ripe for
| disruption.
| dsr_ wrote:
| It already is, by spinning disks. Cheaper at the low end,
| faster the whole way through, random access beats linear access
| for end user expectations.
| zozbot234 wrote:
| SMR spinning disks are also being widely repurposed as
| "archival", somewhat tape-like media since they turned out to
| be quite low-performance for the most common use scenarios
| (which means they were getting dropped from soft-RAID arrays,
| etc.).
| amelius wrote:
| You are too much focused on read speed. I just want to write
| huge amounts of data at a low cost, and don't mind waiting a
| day for retrieval. I.e., how one normally uses backups.
| lazide wrote:
| Most of the time when people think backups they need faster
| than 24 hr turn around to restore - because it usually
| takes about that long to figure out they even need a
| backup, and most people don't think ahead enough for 2 day
| recovery time to be useful for most use cases now a days.
|
| If their local snapshots are dead too, or they look for it
| and realize they can't find a copy of something they
| thought they had, it's often because they needed that data
| right away and it wasn't there when they went to get it.
| Hence 'user expectations'.
|
| That's not in a catastrophic case (which rarely happens)
| that's the 'bob just realized he deleted the folder
| containing the key customer presentation last Friday' or
| 'mary just tried to open the contract copy she needed and
| it's corrupted'.
|
| If it's a once in 10 or 100 year or whatever event, a 1-2
| day turnaround is not unexpected and everything else is
| probably broken too. The file deleted or something got
| screwed up happens more often and slow response there
| grinds things to a halt - and causes a lot of stress
| knowing it's not 'solved'.
| amelius wrote:
| I bet most companies who are confronted with ransom
| demands would die for tape backup even if restoration
| took a week (which is the amount of time they need anyway
| to get the whole mess sorted out).
| TheCondor wrote:
| And durability. I've had a portable usb hard drive fall
| over on my desk and it had major problems after that. Solid
| state fixes that but it's expensive and I've heard they can
| lose data if not plugged in with regularity
| lazide wrote:
| Yeah, SSD is not good for long term storage (like a copy
| of your tax documents from last year you might need in 5
| years). The expense for size also makes it infeasible to
| copy ongoing roll up copies of everything which is one
| way of solving that.
| KaiserPro wrote:
| kinda but not. The problem with spinny disks is that you have
| allocate space for them. You can't quickly swap out drives to
| take offsite.
|
| Whats grand about tape is that its still faster to dump to
| your library, eject the magazines and store off site.
|
| Whilst you can do that with HDDs (think snowballs but bigger)
| its a lot more expensive and error prone.
|
| Tape serves a purpose, but thats pretty niche by todays
| standards.
| wglb wrote:
| It would appear the Google backs up the internet on tape:
| https://www.youtube.com/watch?v=eNliOm9NtCM
|
| Or at least did at one time.
| fishnchips wrote:
| It probably still does. I was on the gTape SRE team until 2014
| and we had lots and lots of tapes and tape libraries back then,
| most of them giant beasts with 8 robots each. With the capacity
| of new LTO generations constantly growing and the existing
| investment in hardware and software it would be unusual to
| discard that.
| cassepipe wrote:
| Apart from archiving huuuuuge amounts of data, does it make sense
| for any business to invest in those when you add up in the
| qualified work time it necessitates for the halved priced it
| provides. Plus the constant reinvestment in hardware. Plus the
| fact that to get the data you actually need a human to fetch for
| you and operate a machine.
|
| Who uses this ?
| motoboi wrote:
| Everyone.
|
| It's much easier to store tapes in a fire proof and water
| resistant safe than to find a fire and water resistant storage.
|
| So you can keep you backups in disk, but last resort disaster
| recoveries should be on tape somewhere.
|
| Gmail has tapes[1]. And they saved me their asses at least
| once. This can give you a hint of how important and how much
| use tapes get.
|
| 1 -
| https://www.datacenterknowledge.com/archives/2011/03/01/goog...
| madduci wrote:
| A lot of companies, trust me
| archi42 wrote:
| Something not mentioned by the author, but what I was told here
| on Hacker News some years ago: If your drive has too much wear
| (or misalignment of the drive head?) you might end up with tapes
| that you can only read with exactly your drive.
| detaro wrote:
| That's something I've seen mentioned too but never could verify
| if that is something that's actually true with modern tape
| standards or not. (i.e. last I asked on HN I was told it wasn't
| a concern anymore) If the drive needs to adjust to get precise
| enough positioning anyhow, misalignment seems way less likely.
| StillBored wrote:
| That was true before embedded servo tracks (why the author
| mentions you cant bulk erase LTO tapes), its not been true
| for ~20 years unless one was using DLT, DAT, etc.
| op00to wrote:
| It's absolutely true. There is a LOT more to tape storage
| than meets the eye.
|
| Let's say you're using LTO tapes as an archive. Did you know
| LTO tape itself is abrasive, but that abrasive is meant to
| wear over time with the intended use of the cartridges, which
| was backups?
|
| If you use new tapes a single time, the abrasive doesn't wear
| and destroys the tape heads. You will go through a drive head
| at month, running the drives 24/7. I had a library used as a
| genomic storage archive with 8 drives (always write, almost
| never read), and two were constantly out of service, as we
| averaged two head replacements from IBM a week.
|
| This is much less a factor on use tapes that have been run
| through a drive a few times.
| detaro wrote:
| But that's different than "drive will produce tapes that it
| can read, but other drives don't"? Because sure, drives can
| fail and need service/replacement, but that's less
| insidious than a drive producing tapes that are silently
| unusable in other drives.
| KaiserPro wrote:
| It used to be true with DAT tapes.
|
| I've not seen it on LTO. Where I work we either had very
| large tape libraries, with 25+ drives in. We didn't have
| drive affinity, so if that happened I would get an alert.
|
| The other team used to import bulk data by receiving tapes
| from all over london and beyond, there must have been
| thousands of drives writing and reading that data. Plus we
| didn't buy fresh tapes, and they were dropped, thrown, left
| in the cold/sun, all sorts.
|
| I think LTO is pretty solid.
| eternityforest wrote:
| I wonder why the head has to touch the tape at all? Does
| the hard drive thing where you float a few nm away not
| apply?
| metabagel wrote:
| I worked for an LTO tape drive manufacturer for 20 years,
| and I never heard about this. I think something else was at
| play here, although I could be wrong. The drives are often
| used just as you did, although perhaps not always as
| intensively. Data is written to tapes, and they are shipped
| offsite. Basically, WORN (write once read never). The
| backups are for an absolute emergency, such as a 911 type
| event where a whole building comes down or a data center
| burns to the ground.
|
| A few factors which may have influenced what you
| experienced:
|
| * The quality of the tapes could be variable. In my
| experience, some branded tapes were significantly inferior
| to others.
|
| * If the drive ran hot, then that may have contributed.
| IIRC, IBM's LTO-3 drive ran very hot.
|
| * If you don't write data to the tape fast enough, it won't
| stream. It'll shoe-shine back and forth, as it runs out of
| data, repositions backwards on the tape, and resumes
| writing. I think this might affect the tape head life.
| op00to wrote:
| These were IBM drives in a QualStar XLS connected to
| systems running FileTek StorEdge. I don't remember if
| these were Fuji or Sony tapes, but I think Fuji, branded
| Fuji.
|
| We did have shoeshining issues in testing, but increasing
| the amount of caching fixed that. Never heard of any
| throughput issues in production, but .. .edu so you know
| how well we monitored. That was a software issue anyway.
|
| I think it was LTO5 era, but I don't rightly remember.
|
| The IBM dude who handled all the hardware support would
| take a look at everything, nod, and replace the drive. I
| took him out for beer once and that's when he told me
| about the issues with the tapes. I left for greener
| pastures before that was solved, but it was going on for
| a good year.
|
| Maybe he liked the food trucks outside the building, or
| maybe it was cheaper for them to replace the drives than
| actually help us fix the problem. Anyway, thanks for the
| insight! Glad I don't work on hardware anymore.
| MayeulC wrote:
| I'm wondering what would be the best way to store archival data?
|
| A disk image plus compressed, encrypted then forward-corrected
| `btrfs-send` snapshots sounds quite efficient to me. Take your
| hourly, etc snapshots to a regular disk, write monthly ones to
| the tape until fills up, then take another tape and repeat. The
| downside is that you need to replay multiple diffs.
|
| Or would it be a good idea to make more frequent writes? I'm not
| sure what best practices are when it comes to tape and backup.
| einpoklum wrote:
| > LTO Tape is ... much cheaper than hard drives ... a 12TB SATA
| drive costs around PS18.00 per TB ... a LTO-8 tape that has the
| same capacity costs around PS7.40 per TB ... That's a significant
| price difference.
|
| Actually, it isn't very significant. Price factor of 2.5. I had
| thought tape storage was cheaper than that. And then there are
| the drives: A drive to write (3,000 GBP for LTO-8), and at least
| a couple more drives for reading tapes.
|
| At this price ratio, I would say that ease-of-use and
| safety/robustness of the backed-up material are more important
| considerations.
| shellac wrote:
| Yes, this doesn't sound quite right to me but it may be an
| economies-of-scale thing. I work on an HPC system and we budget
| an order of magnitude less for tape storage, and that has held
| for quite a few years.
| AshamedCaptain wrote:
| I am also worried for the long-term. If there are new
| generations so frequently and backwards compatibility is
| limited or not guaranteed, I ponder if you'd be able to find a
| working-condition tape reader for your 20-year old tape...
|
| At least it's likely I can find a USB port 20 years from now,
| or a DVD reader (they are still being manufactured today, when
| even more than 20 years have passed since their introduction,
| and they are even compatible with much older CDs...).
| ktpsns wrote:
| What was actually ignored at that comparison is energy costs.
| Which can get quite somewhere, if you have all your disks
| running 24/7 and do not use power saving functions (which is
| frequently turned off in server contexts). Costs are in the
| ballpark of 5W per drive, given a contemprary 16TB drive this
| means 0.3W/TB, with 0.25EUR/kWh (a typical consumer price in
| Germany), this is roughly 0.6 EUR per TB per year. However,
| probably the replacement costs for these always-on disk drives
| will be even higher.
| q3k wrote:
| Another consideration related to this is that tapes, being
| usually offline, as much more secured against accidental (or
| malicious!) erasure when compared to always-on hard drives.
|
| Also related is that tapes can easily be transported
| around/offsite, literally thrown in the back of a truck as
| they are. Try doing that to hard drives and see how many
| start throwing bad sectors after a round-trip.
| einpoklum wrote:
| HDD's can be taken offline. But beyond that - if you're
| using HDDs as backup, you'll probably be using an HDD
| drawer, e.g. something like one of these:
|
| https://www.newegg.com/global/p/pl?d=hot+swap+hard+drive+ba
| y
|
| ... and the actual disks will usually be stored offline.
| So, no accidental erasure. But I agree that tapes are
| probably less sensitive to transportation.
| piaste wrote:
| If you want tape-like offline storage on HDDs, you can use a
| SATA docking station. Keep the 'active' backup drives plugged
| in, store full drives wherever you like.
|
| As a bonus, they can generally be used to offline clone
| drives.
| archi42 wrote:
| Tapes are offline and even require manual loading, so I think
| it's feasible to mitigate this by just powering down the
| backup system. At least that's what I do (with my primary
| NAS). But yeah, disk idle usage should not be underestimated.
|
| Also, some nit-picking: Energy prices in Germany are
| currently MUCH higher than that. We moved and had to get a
| new contract. Close to 40c/kWh. This makes your point a bit
| stronger.
|
| //edit: Also2, when doing the math I realized I should first
| transcode suitable content to h265 (per TB saved the
| necessary power is cheaper than a new disk), and as a second
| step replace my four or five remaining 1 TB HDDs with a
| single bigger drive to reduce the idle power draw (the NAS is
| on a btrfs mixed-size RAID1).
| pessimizer wrote:
| > and at least a couple more drives for reading tapes.
|
| Why?
| op00to wrote:
| Even in an "enterprise+++ class" multi petabyte, multi drive,
| totally integrated from top to bottom tape archive for
| scientific data, there would be all kinds of errors found by
| our data validation process that would have failed an archive
| restore. It's not just cache overruns, some times the tapes
| or drives just screwed up silently.
| benjojo12 wrote:
| If you have 500TB of tape, the chances are that you are
| reading at least one tape while also needing to write stuff.
|
| I've personally never experienced that scale, I'm sure the
| industry has some recommended ratio of drives to tapes.
| op00to wrote:
| When I evaluated this, it was all about read and write
| access patterns. So much data coming in for so much amount
| of time, that needs so much validation, and will be
| restored so many times in the next few years, etc. It's
| pretty easy if you know your data flows, but when it's a
| big question mark, you just kind of throw hardware at it
| and fix the bottlenecks when they come up. We usually wrote
| more than we read, but we absolutely needed to keep read
| capacity open.
| dale_glass wrote:
| Backups are there so that they can be restored. If your only
| drive is dedicated to writing, then you may never bother
| reading anything, and that's bad because you should verify
| your backups.
|
| Also, tape is slow. The MB/s is pretty nice on the latest
| tech, but a tape is pretty big, so if you have a lot of stuff
| it'll take a good while. Google says it takes 9.25 hours to
| write a full LTO8 12TB tape. Which means that if you have a
| sizable backup, in case of needing a full restore you might
| well spend a whole week reading tapes.
|
| And that's not accounting for that something might suddenly
| break, and the time where that becomes important is right
| when you need something restored urgently.
| connorgutman wrote:
| I recently purchased a LTO-5 drive for my Gentoo-based NAS and
| have a few key takeaways for those who are interested. Don't buy
| a HP tape drive if you want to use LTFS on Linux! HPE Library &
| Tape Tools is prety much dead on modern Linux. Official support
| is only for RHE 7.x and a few versions of Suse. Building from
| source is a dependency nightmare that will leave you pulling
| hair. IBM drives have much better Linux support thanks to
| https://github.com/LinearTapeFileSystem/ltfs. That being said,
| IMO, you should consider ditching LTFS for good ol' TAR! It's
| been battle tested since 1979 and can be installed on basically
| anything. TAR is easy to use, well documented, and makes way more
| sense for linear filesystems. While drag&drop is nice and all, it
| really does not make sense for linear storage.
| smackeyacky wrote:
| Upvote for tar! LTFS seems like an overly complex solution to a
| relatively simple problem that tar already solved. Treating
| tapes like disks and trying to run a file system on them
| ignores the way they work.
| wazoox wrote:
| Hum, that reminds me that I've written a somewhat more complete
| user guide for LTO tapes, but in French:
| https://blogs.intellique.com/tech/2021/08/20#BandesCLI
|
| Let me know if you'd like an English version :)
| wazoox wrote:
| I did it anyway:
| http://blogs.intellique.com/tech/2022/01/27#TapeCLI
| cbm-vic-20 wrote:
| > Unlike most block devices these are devices that do not enjoy
| seeking of any kind.
|
| Old-school DECtapes were actually random-access, seekable block
| devices! They help 578 blocks of data, each block being 512 bytes
| (or to be more period correct, 256 16 bit words), so 144kiB. They
| could be read/written in both directions. When mounted on a tape
| drive, the OS (like DEC RT-11) would treat it just like how a PC
| DOS computer treats a floppy: you could get a directly listing,
| work with files, etc. The random access nature caused the tape to
| move quickly back and forth across the tape head, a process known
| as "shoe shining".
|
| https://youtu.be/ZGBS8mBAfYo?t=579
| rbanffy wrote:
| I've seen AIX being installed from a DDS tape, after booting
| from said tape.
|
| Fun times.
| StillBored wrote:
| Tape can do random seeks, but its generally append only. LTO,
| though supports partitioning which is utilized by LTFS
| (https://www.lto.org/linear-tape-file-system/) to provide a
| mountable filesystem abstraction. It works just like any other
| filesystem, but one has to remember that seeks are much slower
| than HDs and that overwriting/updating a file basically is like
| a versioned FS where the old data is still being stored.
|
| Edit: Also, tape formats tend to come in two scan methods since
| they are generally wider than the tape heads (which frequently
| are actually multiple heads). Helical scan (think VHS/DAT) and
| serpentine. LTO is serpentine which means it writes a track
| from beginning to end, then writes the next track in "reverse"
| from end to beginning, then the next track again from beginning
| to end. Back and forth until it hits its track limit.
|
| So basically just about every modern drive reads and writes in
| both forward and reverse.
|
| Although shoe shining (backing up to start the next read/write)
| is still a thing despite variable speed drives which try to
| speed match to the data rate the host is reading/writing at.
| EvanAnderson wrote:
| This makes me think about the Stringy Floppy:
| https://en.wikipedia.org/wiki/Exatron_Stringy_Floppy
| tssva wrote:
| The Coleco Adam home computer also had tape drives which were
| random-access seekable block devices. 2 tracks with 128 1k
| blocks per track for a total capacity of 256k. Coleco called
| their tapes digital data packs. They were standard compact
| cassette tapes with some additional holes. If you drilled the
| appropriate holes you could use standard tapes instead of
| paying the Coleco premium.
|
| CP/M required booting from a block device and as far as I know
| the Coleco Adam was the only computer which could boot CP/M
| from a tape. Once booted to CP/M the tape drives were treated
| just as floppies.
| tlamponi wrote:
| Interesting read, as with most of ben's blog.. And yeah,
| buffering is definitively required to get acceptable speed out of
| tape tech.
|
| If you want a LTO Tape solution with more bells and whistles you
| could check out Proxmox Backup Server's tape support:
|
| https://pbs.proxmox.com/docs/tape-backup.html
|
| We also rewrote mt and mtx (for robots/changers) in rust, well
| the relevant parts:
|
| https://pbs.proxmox.com/docs/command-syntax.html#pmt
|
| https://pbs.proxmox.com/docs/command-syntax.html#pmtx
|
| The introduction/main feature section of the docs contain more
| info, if you're interested:
| https://pbs.proxmox.com/docs/introduction.html If you have your
| non-Linux workload contained in VMs and maybe even already use
| Proxmox VE for that it's really covering safe and painless self-
| hosted backup needs.
|
| Disclaimer: I work there, but our projects are 100% open source,
| available under the AGPLv3: https://git.proxmox.com/
| azalemeth wrote:
| Do you run a service where I can give you data and reasonable
| money, and you store it on tapes for me? Low cost cloud storage
| prices seem very distant from this, because presumably it's
| usually spinning rust and not tapes that are doing the storage.
| I'd be into a cheaper, larger storage service where this was
| offered.
| tlamponi wrote:
| No, we don't provide hosting services - only the software,
| i.e., Proxmox VE for Hypervisor (VM and Linux container),
| clustering, hyper-converged storage (Ceph, ZFS integrated
| directly and most Linux stuff somewhat too, then Proxmox
| Backup Server with PVE integration, can do duplicated and
| incremental sending of backups and save that to any Linux FS
| or, well, LTO Tapes, at last (at least currently, we got more
| up the pipeline) there's Proxmox Mail Gateway, the oldest
| project and a bit of a niche, but there's not much else like
| it available today anymore.
|
| > and you store it on tapes for me?
|
| I mean, we can do client-side encryption and efficient remote
| syncs, so such a service would be possible to pull of with
| PBS, but no, we don't got the bunker or dungeon to shelve all
| those LTO tapes at the moment :-)
| Johnny555 wrote:
| What is reasonable money? AWS Glacier Deep Archive is around
| $1/TB/month. Since it includes Multi-AZ replication for
| "free", you'd have to store multiple tapes in multiple
| facilities to get the same durability with tapes.
|
| Retrieval costs are additional of course, and depend on how
| quickly you need access to the data, but if you just want to
| store data long term in case of disaster, $1/TB for multi-AZ
| replicated data seems like pretty reasonable pricing.
|
| LTO-6 tapes hold 2.5TB of data (uncompressed), assuming you
| store 2 for redundancy, you'd need to find a place that will
| store them for $1.25/tape/month to break even, plus you're
| paying $25 for the tape itself, so over 3 years, that's
| almost another $1/month/tape. Plus the tape drive itself is
| around $1500.
|
| You can use newer tape technology for better economies of
| scale, but your buy-in cost is higher due to the higher price
| of the tape drive, so you'd need a pretty high volume of data
| to break even.
| terafo wrote:
| Glacier cost in the cheapest region is 3.6$/TB/month. Plus
| at least 50$ to download that terabyte once it's needed(if
| hardware that you're backing up is not in AWS), and I don't
| even factor in retrieval costs. You can get HDD storage
| cheaper than this(twice as cheap with some providers) if
| you are willing to use dedicated servers. And they come
| with unlimited traffic. And you can use hardware there for
| something. Glacier is expensive AF.
| Johnny555 wrote:
| _S3 Glacier Deep Archive_ is the closest equivalent to
| off site tape storage.
|
| From their pricing page:
|
| S3 Glacier Deep Archive - For long-term data archiving
| that is accessed once or twice in a year and can be
| restored within 12 hours - us-east-2 (Ohio)
|
| All Storage / Month $0.00099 per GB
|
| https://aws.amazon.com/s3/pricing/
| terafo wrote:
| Sorry, I confused it with Archive access tier. Still, you
| need to spend at least 50$ to download it from AWS.
| Johnny555 wrote:
| This is deep archive offsite tape storage, not something
| you'd need to restore often.
|
| When I last managed offsite tape backups, I never planned
| on really needing to retrieve the data -- I had the data
| on disk and on the most recent tapes. (I did do periodic
| restore tests)
|
| If I had to restore the data, I wouldn't care how much it
| costs (within reason).
| [deleted]
| aperrien wrote:
| That is really impressive! Are the Proxmox tape utilities
| separate from Proxmox itself? I have a Synology NAS that I'd
| like to back up to tape. I actually have a tape library, but I
| haven't seen anything that looks like a simple solution for
| this until now.
| tlamponi wrote:
| Well, the CLI tools are not really couple to Proxmox Backup
| Server and could be built for most somewhat modern Linux
| distros, quite possibly also other *nix like systems.
|
| The whole tape management is in the common PBS API, so that'd
| be a bit harder to port but not impossible. For example, I
| made some effort to get all compile on AARCH64 (arm) and
| while we do not officially support that currently there are
| some community members that run it just fine.
|
| So, maybe, but could require a bit more hands-on approach. If
| you run into trouble you could post in the community forum
| (<https://forum.proxmox.com>).
| epilys wrote:
| What I never see explained, is what exactly PCI cards should I
| get to get the full sized SAS drive to work with my desktop PC?
| Because looking at server component stores, I see there digit
| prices for SAS controllers, and the author mentions they are
| cheap.
| c0l0 wrote:
| My advice: On eBay (or any other platform that makes it easy to
| buy used hardware components), go look for "sas2008 4e", and
| check out the offers. You should be able to get a decent HBA
| driven by mpt2sas/mpt3sas for around 40 to 60US$.
| albertzeyer wrote:
| I'm interested specifically for long term archiving. So these
| tapes claim 30 years. I have read that some types of CD, DVD or
| Blue-rays can last much longer.
|
| https://superuser.com/a/71239/37009
|
| For example the M-DISC (https://en.wikipedia.org/wiki/M-DISC).
|
| > Millenniata claims that properly stored M-DISC DVD recordings
| will last 1000 years.
| buttonpusher wrote:
| Yes, but storing many TBs on several low volume disks is a PITA
| unless you can invest in a robotic library.
|
| I wonder if Sony's ODA format could ever become more popular in
| the consumer market. I've never heard anybody mention it
| before.
|
| Alternatively, I wonder if there could even be a "prosumer"
| robotic library system for common optical disks, something like
| a desktop archival data jukebox...
| albertzeyer wrote:
| There are many examples of cheap self-build robotic systems
| (basically robotic CD changers). E.g.:
|
| https://hackaday.com/tag/cd-changer/
|
| http://hackalizer.com/jack-the-ripper-is-an-automated-diy-
| di...
|
| http://hackedgadgets.com/2006/06/07/cd-changing-lego-robot/
|
| Yes, you definitely want sth like that. And further extend
| it.
| londons_explore wrote:
| If I had a large amount of data I needed to archive long term
| and cost effectively, I would archive it to 12 different medias
| with an 4,8 erasure code, such that if any 4 of the 12 media
| types are readable, then I can recover the data. I'd choose
| media like a few types of hard disk (different vendors), DVD's,
| SD cards, USB memory sticks, tapes.
|
| I would then store those bits of media geographically and
| politically distributed. And I'd store it with paper documents
| describing the encoding, the file formats, the compression, any
| encryption, etc. I'd also include a few physical computers (eg.
| a raspberry pi or laptop) that has all necessary software to
| read, decode, and display the data. Set it up to be usable by a
| non-expert - in 1000 years time, there may be nobody who knows
| how to use a shell or open a file!
|
| And I'd have a 2nd copy of the whole lot on hard drives
| connected to the internet for day to day serving of the data to
| people who need to see it. All the stuff above is only needed
| in case of organisational failure, war, civilisation collapse,
| etc.
| albertzeyer wrote:
| That sounds all nice... but do you actually do that? I'm sure
| you have some amount of data (maybe not so large) that you
| want to backup long-term? As most of us do?
|
| If you do that, I would really love to read some more details
| on how you actually organize that.
| raron wrote:
| Maybe Github's "code vault" would be interesting for you:
| https://github.com/github/archive-
| program/blob/master/GUIDE....
| https://archiveprogram.github.com/
| londons_explore wrote:
| My personal data I have no need to keep beyond my own
| lifespan, and I don't have much of it, so it's easy.
|
| The above is what I have set up for some organisations who
| want to keep data for thousands of years.
|
| There are other bits to the process, like every 10-30
| years, repeat the process with the new data _and_ the old
| data. This time, the 'old' data will be much smaller
| compared to the storage mediums, so keep that data
| uncompressed, preferably unencrypted, and un-erasure coded
| in every geographic location. That removes many barriers to
| access the data, and increases the chances someone that
| finds it in 200 years bothers recovering the data.
|
| Sadly in the future world there is a high chance some of
| the data is copyright, illegal knowledge or gdpr-impacted
| and all records need to be erased. There isn't really a
| good solution to that. It's almost impossible to protect
| against future humans _wanting_ your data gone.
| c0balt wrote:
| Iirc the cost per Tb, compared to tape, made discs unviable for
| most backup/ archival applications.
| paulmd wrote:
| Depends on who's asking. Amazon Glacier never formally
| disclosed their storage medium (at least as of a few years
| ago) and one of the theories on what it might be was actually
| a robotic optical disc changer library based on BD-XL, and
| the cost/capacity actually does math out. Yeah, discs might
| be $15 a pop (for a quad layer/128GB disc) for you as a
| consumer, but when you're Amazon and you'll be buying the
| complete output of at least one optical disc factory, the
| economies of scale kick in. It's just expensive because
| there's no market for 128GB media for consumers (and honestly
| these days hardly any market for WORM media at all as a
| consumer), it's not inherently that expensive to make the
| discs.
|
| (I believe the final consensus pointed to arrays of HDDs
| where most of them are powered off, and the number of "live"
| drives per rack is bounded to allow high density/low cost,
| hence the need for access time/service level bounds, but the
| BD-XL idea is still intriguing!)
|
| With the consumer discs, even considering cost per GB, the
| amount of effort required to handle a large library of low-
| capacity discs is just too great even if the cost is a little
| bit better. 128GB discs would have been very usable 5 years
| ago but again, those discs were never affordable to
| consumers, and the 25GB was still some effort at that time.
| Today even 128GB is not all that much, as data has grown. As
| far as I know there is nothing realistic on the horizon to
| replace blu-ray with higher capacity either, if movie content
| started being released in 8K it probably would be something
| like BD-XL with AV1 encoding (or maybe H265 again), not a
| fundamentally new iteration like DVD->BD.
|
| The future for consumer storage seems to be SSDs and hard
| drives for fast and slow/bulk storage, and cloud for nearline
| storage. Tape is still relevant for enterprises though
| especially in automatic libraries.
| c0balt wrote:
| Interesting, i didn't know about glacier.
|
| The theory of hard drives being shut off/ powered on
| dynamically in a rack sounds intriguing. Sounds simple and
| yet difficult because of the rare usecase, i.e. no
| commodity hardware available. Maybe something to test out
| for colo backups to keep power usage down and prolong disk
| health.
| numpad0 wrote:
| Capacity per disc too. Blu-Ray discs tops at 125GB and
| there's no cheap and easy way to automate disc handling to
| work around that.
| at_a_remove wrote:
| I am also interested in some long-term archiving: in
| particular, .ISOs of various Blu-Ray, DVD, and CD media
| releases.
|
| Still, aside from it being prohibitively expensive (LTO-8 seems
| like something of a floor given the size of Blu-Rays), tape
| backups seem to be a hard area to get into. I did some crappy
| little DLTs in the 1990s but nothing since, so "what software?"
| and the like questions are all new to me. And this would be
| with just a single drive, not even a library.
| Robotbeat wrote:
| Magneto-optical disks using glass media (instead of plastic)
| have a rated stable media lifetime of at least 50 years and can
| probably last a century or longer. Glass DVDs are a thing and
| often can be read in regular DVD drives.
| dehrmann wrote:
| > Magneto-optical disks using glass media
|
| Are there any commercial products that use this technology?
| eternityforest wrote:
| Not magneto, but m-disk makes a blu ray that lasts 1k years
| supposedly. Some people don't trust it though.
| EvanAnderson wrote:
| I believe mag-op has fallen out of fashion. I worked with
| HP-branded mag-op "platters" and drives back in the late
| 2000's. Plasmon and Sony both had offerings in that space
| too.
| c0l0 wrote:
| I recently started looking into using an LTO-7 tape drive that I
| got handed down, along with a few dozens of pristine LOT-6 tapes,
| for archiving purposes. I got to play around a bit with SAS HBAs,
| and was kinda shocked how much of a difference that can make in
| the user (or shall I say sysadmin?) experience: LTO-6 tapes are
| spec'd to transfer rates of around 150MB/s, so well within the
| reach of even the the first SAS gear generation. However, the
| very first SAS HBA with external SFF-8088 connector I managed to
| get my hands on (an LSI SAS1068e) topped out at a disappointing
| 80MiB/s, no matter what I tried in terms of blocking and
| buffering. Switching to a more modern (but still old) LSI
| SAS2008-based HBA got me close to the theoretical maximum.
|
| Then there's the (to me, still open) question of how to best use
| the actual tape storage capactiy... Since my hardware is newer
| than LTO-5, LTFS (https://github.com/LinearTapeFileSystem/ltfs)
| is an option for convenient access, especially listing tape
| contents, but that could make it hard for other people down the
| line to restore data from the tapes I create.
|
| It's probably safest to assume that tar will always be there, at
| least wherever there's tape, too. GNU tar also handles multi-
| volume/-tape archives, which seems like a necessity if you need
| to back up amounts of data that exceed a single tape's capacity.
| Then again, if you want to use encryption with actual tar
| (important for the kind of data I need to archive), your only
| option seems to be piping the whole archive through something to
| compress the stream, which will make accessing individual records
| in the archive opaque to the drive itself... and you can't just
| dispose of individual keys to make select parts of the archived
| data go away for good, either.
|
| Also, I would like to conserve as much tape as (conveniently)
| possible in my archiving adventure. There's "projects" (i.e.,
| top-level directories of directory trees) that consume more than
| one tape of their own, and then there's smaller projects that you
| can bin-pack together onto tapes that can fit more than one such
| project.
|
| I've started implementing a small python wrapper around GNU tar
| to solve a number of these problems by bin-packing projects into
| "tape slots" and also keeping track of tape-to-file mappings in a
| small sqlite database, but a workable solution for the encryption
| problem(s) is not something I managed to come up with yet... If
| someone has an idea (or better yet, a complete and free
| implementation of what I am trying to hack together :)), please
| be so kind and let me know!
| TheCondor wrote:
| LTFS has been reasonably well supported and it's fairly open.
| (I think it's totally open and published, but I haven't drilled
| deep in to it) I haven't manually restored files but I have
| switched vendors and it was transparent. It makes tape almost
| shockingly good, if you can identify by name what you want to
| recover you can recover it quite quickly.
|
| I had previously used Blu-ray for backups, I think they are
| fairly durable if you have a dry, cool place to store them, but
| if you have to find date spread over 20 discs, it's quite a
| pain. Now it would feel better if Redhat or Suse or somebody
| cooked ltfs in to their products as a first class thing. I
| think the catastrophic recovery process would involve building
| enough of a system to download and install ltfs to access the
| tapes. I could also create a "recovery system" and then just
| tar that on to a tape too.
|
| My advice strategy has been to keep things relatively warm and
| when ltfs starts to feel like a liability then I'm going to
| move the whole archive to something else, fortunately it's not
| 100s of br discs, it's tens of tapes so it will take some hours
| but it's mostly waiting on data to stream.
| [deleted]
| ndespres wrote:
| I think what you are describing is a feature of the Amanda
| backup system, which might be worth a look. It supports writing
| to a library of "virtual tapes" which can then be backed by
| real tapes, tape libraries, hard disks, etc. and will handle
| the splitting/overflow problem that you are dealing with.
|
| https://www.zmanda.com/downloads/
| abbbi wrote:
| if one wants to play with virtual tape libraries, quadstorvtl is
| a nice solution to that:
|
| https://quadstor.com/
|
| unfortunately they dont seem to have an open vcs for the
| source... (other than really old versions on github)
|
| other than that there is mhvtl:
|
| http://www.mhvtl.com
| Synaesthesia wrote:
| That's a lot of storage! Can't really think of a use for this
| (200tb plus) personally but it is appealing.
| throw0101a wrote:
| Tape makes more sense the larger you go, as it help amortize
| the fixed/upfront costs. The incremental costs of buying more
| tapes (that are re-usable) isn't that much at scale. It's often
| relatively cheap insurance against data loss for many
| organizations.
|
| A lot of 'enterprise' backup software is also now coming with
| hooks into cloud storage (e.g., S3 APIs), but then you have to
| worry about bandwidth and the time it takes to get the bits
| offsite at "x" bits/second.
|
| Of course you also have to worry about retrieving the data in
| case of disaster per the Recovery Time Objective:
|
| *
| https://en.wikipedia.org/wiki/Disaster_recovery#Recovery_Tim...
|
| Also: a backup has not happened until you try and succeed your
| recovery process.
| metabagel wrote:
| > but then you have to worry about bandwidth and the time it
| takes to get the bits offsite at "x" bits/second.
|
| Reminds me of the saying that the fastest throughput is
| achieved by a 747 full of hard drives.
|
| > Also: a backup has not happened until you try and succeed
| your recovery process.
|
| A thousand times this.
| simcop2387 wrote:
| For me it's a lot about just being a data hoarder and never
| _having_ to delete something because i 'm low on storage. About
| half of my system though is taken up by system backups and
| virtual machines. I should do a cleanup of those, but the
| freedom of just being able to spin up something new or put a
| new backup on there without ever going, "do i have enough space
| for this?" is rather nice.
| organsnyder wrote:
| I also rarely/never delete anything, but my ~2tb NAS still
| has plenty of room. I guess it makes a difference that the
| only media I store is my own photos and videos.
| Spooky23 wrote:
| Backups are a really interesting business. I helped out a
| colleague a few years ago with a project in a big data center and
| it was like a whole world that nobody knew existed.
|
| Because of the RTOs and backup windows, the supporting
| infrastructure was _fast_. The caching layer stuff was the
| fastest disk in the data center by far, and the team was a small,
| tight group of people who basically honed their craft by meeting
| auditor and other requirements. The management left them alone
| and they did their thing.
|
| That was about a decade ago now; those guys have all moved on to
| really big things.
| StillBored wrote:
| Its still that way, the netflix guys get a lot of press for
| their bandwidth numbers but plenty of backup systems were
| getting similar (or greater) bandwidth numbers years ago, since
| many of the caching stacks are basically pcie or mem bandwidth
| limited. The 300MB/sec number the author lists is really slow,
| and likely appropriate for LTO3/4, (IIRC, the wikipedia numbers
| are understated) LTO7+ can peak at > 1GB/sec with the modern
| drives going even faster if the compression is left enabled.
| So, given a library with a few dozen drives, the bandwidth gets
| insane. (ex: SL8500)
| trasz wrote:
| Someone should ask Spectra Logic folks for their numbers :-)
|
| (Spectra Logic's tape libraries run FreeBSD too.)
| monocasa wrote:
| SpectraLogic's code isn't in the data plane, you hook up to
| the drives directly, and the drives can forward changer
| requests to the internals of the library. So it's however
| fast the drives are (which are all third party).
|
| Also last I checked freebsd was used for their disk
| product, not tape.
| johnklos wrote:
| LTO has been around for more than twenty years, true, but not
| quite thirty, so we can't test the claim of thirty years of shelf
| life, but DLT, which are surprisingly similar, came out in 1984,
| and lots of thirty year old and older DLT media has been shown to
| be readable.
|
| The tape drives themselves are much more of an issue than the
| tapes. It's a shame, because it necessitates moving data on older
| tapes to newer generation tapes after a few generations (which
| reminds me I have to do that with some LTO-3 tapes).
| wheybags wrote:
| My one experience of digital magnetic tape is mini-dv
| cassettes. I recently ripped a bunch of old home videos from
| some cassettes from the 2000s, and quite a few were fairly
| damaged. Compared to the vhses from the same time and even
| older, they were way worse.
| jgrahamc wrote:
| Speaking of tape lifetimes, my old cassette CrO2 tapes seem to
| have survived my parents' house:
| https://blog.jgc.org/2009/08/in-which-i-switch-on-30-year-ol...
| grapescheesee wrote:
| Many clients I have seen using tapes for archive or onsite backup
| keep them in a humidity and temperature controlled device (looks
| like a mini fridge). Seems the emphasis is on humidity for the
| onsite backup rotations.
| watersb wrote:
| Everyone who cares about backups chooses a backup system design.
|
| Anyone who cares about their stuff needs to practice a full
| emergency RESTORE.
|
| I have met very few people who actually do that. For most systems
| I've seen, the first full test of the restore process is a very
| scary first production usage of the restore process.
|
| Which is very exciting, sure. I don't want excitement in my data
| management life.
|
| (I actually see weekly test of onsite backup power at the local
| banks, and at some large commercial kitchens. Those diesel
| generators are very loud. I've never seen systematic test of UPS
| or generators in a front-office environment.)
| smackeyacky wrote:
| There is one recommendation there I find a bit questionable and
| thats encryption. If you are out of options and restoring from
| tape, might be better to have it uncompressed and not encrypted.
| Its possible, after some physical disaster that you are on
| somebody elses infrastructure and having some encryption on your
| data doubles the problems you might have.
|
| I use an ancient LTO2 drive for last resort backups that are off
| cloud and off premises. Its more peace of mind than practical on
| a daily basis but I did find myself restoring a few files a
| couple of weeks ago as I had fat fingered an rm command. It was
| quicker than getting them from S3 glacier.
| El_RIDO wrote:
| I'd like to suggest two arguments that made me use software
| encryption on my tapes instead: 1. You don't have to trust the
| hardware and can use tool I trust and have the sources for. 2.
| If you encrypt yourself you can combine it with something like
| par2 to generate error detection and recovery data, letting you
| restore the encrypted file off a damaged tape.
|
| A downside of encrypting yourself is that you can't benefit
| from the hardware compression either, hence the articles
| suggestion to do that in software before compressing as well.
|
| Personally, my tape writing workflow is: dar (per file
| compression, skips uncompressable mime types + encryption)
| followed by par2cmdline with 30% redundancy. For comparison:
| CD-ROMs have 33% redundancy information (8 bits per 24 bits,
| CIRC encoding).
| op00to wrote:
| The tapes compress themselves. There's no real need for file
| compression.
| benjojo12 wrote:
| I agree somewhat. Encryption is more critical on tape because
| there is no easy path to wiping a tape, and in a company
| situation if you need to erase something in your backups too
| (think GDPR erasure), then encryption is reasonably critical
| unless you want to go though all of your cold backups.
|
| For my archival use (the reason why I got into this in the
| first place) I do not encrypt nor compress the data going to
| tape. For server/desktop backups. they are compressed and
| encrypted.
| rowanG077 wrote:
| It's trivial to wipe data on a tape with a degausser. You
| destroy the Tape in the process since it also wipes out
| factory written servo tracks.
| kortex wrote:
| Is there a way to restore the servo tracks? This sounds
| like the kind of hack a dedicated nerd could pull off with
| an arduino and duct tape.
| ansible wrote:
| Without looking into the specs, at the very least, you'd
| need to modify the LTO drive firmware. The drive itself
| isn't designed to operate without the servo tracks. Those
| are written to newly-manufactured tapes with special
| equipment at the factory.
|
| So, it would take a very dedicated nerd indeed.
| rowanG077 wrote:
| Not that I know off. But the positioning on recent Gen
| LTOs is pretty tight. I don't think it's out of the realm
| of possibility for a dedicated nerd but it won't be
| trivial.
| throw0101a wrote:
| > [...] _nor compress the data going to tape._
|
| Just to note that tape drives have built-in compression that
| generally is done transparently in the background. So while
| using something like _zstd_ (per the article) may get more
| bits on a given tape, there is some compression that one gets
| "for free" without doing anything at all.
|
| * https://en.wikipedia.org/wiki/Linear_Tape-
| Open#Optional_tech...
|
| * https://en.wikipedia.org/wiki/Magnetic_tape_data_storage#Da
| t...
| benjojo12 wrote:
| I mention this in the post itself
| lights0123 wrote:
| You mention that they're advertised in the amount of
| compressed data that can be stored, not that they
| actually compress data themselves. I thought you meant
| that they assume you use a compression algorithm
| yourself.
| benjojo12 wrote:
| Ah, ok fair enough! I should have pointed that out more
| clearly!
| throw0101a wrote:
| You wrote:
|
| > _Drives above LTO-4 have built-in hardware encryption,
| however I would steer away from using it and instead just
| encrypt data yourself (possibly with the tool I helped
| make called age!). Like most things, you should also
| consider compressing your data before encrypting and
| writing it to tape. LTO tape capacities are often quoted
| in their "compressed capacity" which is a little cheeky
| since it assumes basically over a 50% compression ratio,
| this is not at all likely to be true if you are writing
| video or other lossy mediums like images etc to the tape.
| I generally run my data through zstd to compress and then
| age to encrypt. Zstd and age are quite fast and I've not
| found them to impede performance noticeably._
|
| If someone is not familiar with tape drives, I think it
| would be easy not to realize that the compression is
| built into drives like the explicitly called out "built-
| in hardware encryption".
| lostapathy wrote:
| > Encryption is more critical on tape because there is no
| easy path to wiping a tape.
|
| I used to work for a government agency. We ran backup tapes
| that rotated out through a degaussing machine that spun them
| around for like 10 minutes to wipe them. It's not common to
| have, but it's definitely easy.
| amelius wrote:
| Would love to see an article of someone taking a drive apart, and
| hooking an oscilloscope to the read head of a tape drive.
| dmitrybrant wrote:
| Funny, I just recently did a similar thing: found an LTO-4 tape
| drive on eBay for $40, and a few used cartridges (2TB each) for
| $20.
|
| But before writing my backup to the cartridges, I tried reading
| their contents, and found that they actually came from a major
| film studio, with backups of raw animated film content on them!
| paulmd wrote:
| one thing to emphasize is that the quoted LTO capacity numbers
| are usually including transparent device compression - if your
| data is not compressible, such as ZIP/RAR files or compressed
| audio/video, that's not the number you will get!
|
| Home users will really want to think in terms of the "raw
| capacity" imo. This is normally half of the advertised capacity
| for the older standards (I believe the newer ones have stronger
| compression that squeezes a bit more). LTO-5 tapes are 1.5tb
| raw, for example.
|
| Maybe you'll get a little bit out of it, but a lot of the
| things you'd want to back up (and especially the bulkier stuff
| that really eats space) are already compressed. Family photo
| library, audio/video storage? JPGs are compressed, H264/H265 or
| MP3/FLAC/etc are already compressed. System images? A lot of
| application files are already compressed. Home user scenarios
| are not outlook mailboxes and database backups like the
| "official" scenarios.
| nybble41 wrote:
| > Home users will really want to think in terms of the "raw
| capacity" imo.
|
| _Everyone_ would be better off thinking in terms of the raw
| capacity. "Compressed capacity" is nothing but a marketing
| gimmick. Even in enterprise use cases the compression ratios
| will vary, and the drive's transparent compression is
| unlikely to offer the most savings. If your data is at all
| compressible you should compress the backup yourself before
| sending it to the drive.
| dark-star wrote:
| It actually works pretty well. Compression in the tape
| drives is certainly worse than what you could achieve by
| zipping before, but at least it works at line speed (which
| is a couple hundred megabytes per second). Factor in the
| fact that you often write out multiple streams in parallel
| from a single server to multiple tapes, and it'll become
| rather tricky to find a compression algorithm that keeps up
| AND compresses better than the drive.
|
| And most enterprises don't really care if their monthly
| backup requires 10 or 15 tapes. And zipping it all up
| beforehand requires even more space on the primary storage
| which is even more expensive than a couple dozen tapes
| nybble41 wrote:
| It's still misleading to market the tapes based on a
| compression factor which will depend in practice on the
| data being stored. The _tape 's_ capacity is one thing;
| the effectiveness of the _drive 's_ hardware-accelerated
| compression algorithm on any given dataset is something
| else entirely. The two should not be mixed.
| NavinF wrote:
| I was looking into the same thing recently. The price is right
| ($10/TB tape vs $13/TB HDD) and it'd be nice to have fewer HBAs
| and SAS cables, but having to swap the tapes manually every 2TB
| (every 6 hours?) kinda ruins it for me. An automatic tape
| library would be ideal, but I couldn't find any in the 100TB
| range that are cheaper than spinning rust.
| numpad0 wrote:
| I have a 2U sized LTO2 robot that might have collapsed from
| stuffs on top by now, but it seemed to have a standard 5" bay
| drive inside with a passthrough adapter marshaling the drive
| and the loader mechanism. I wonder if a more recent drive can
| just be dropped into those libraries or if they need firmware
| supports.
| ChuckNorris89 wrote:
| Damn, that's cool. I wish the second hand market in my country
| was abundant with cheap exotic hardware. Then again, maybe not,
| because I'd probably fill my small apartment from hoarding
| stuff like this.
|
| Still did you try to recover any material and wach it?
| dragontamer wrote:
| I've come to the understanding that tape-drives are for people
| who need to "build a custom-sized storage solution", especially
| if you need capacity but not necessarily read/write speeds.
|
| A tape-drive is your read-heads. The tape is like a platter. The
| tape-library / jukebox is just a robotic mechanism for switching
| tapes into and/or out of the read-head.
|
| ----------
|
| If you need a Petabyte of uncompressed storage, you can reach it
| with a tape-library consisting of 84 LTO8 tapes (12TB each). If
| read/write of 400MB/s is sufficient, one tape drive is
| sufficient. If you need faster access speeds, you buy a 2nd, 3rd,
| or 4th tape drive.
|
| So lets say you need 2GB/s read/write speed and a petabyte of
| storage. You simply get 4x LTO 8 drives, 84 LTO8 tapes, and stick
| them into a tape library of some kind.
|
| You then buy a certain amount of SSDs + HDDs sufficient for
| caching, so that you can read/write to this tape library at
| sufficient speeds (especially since it could be many minutes
| before a specific byte is accessed).
| kragen wrote:
| Hey uh
|
| is that a DECTape?
| benjojo12 wrote:
| The header image is the insides of a LTO 5 tape
| watersb wrote:
| I use a cloud storage provider to back up via Arq
| https://arqbackup.com
|
| But I don't expect to restore more than a few gigabytes at a time
| from that.
|
| It would take me a week or more to download a terabyte of data. I
| have very little power over internet connection speed, and there
| are very few alternatives here. I believe there are two different
| vendors providing connectivity to our town, and you can pick
| between four retail resellers.
|
| With those limitations, I have tested a full restore process
| exactly once. That's not good enough.
|
| Data at rest on LTO or offline hard disk is something I can
| control. Distributed offsite storage, too. Restore within 12
| hours, I can do that.
|
| The downside to tape or cold disk is more in the management of
| hourly/daily/weekly backups: you have to provision a media
| rotation schedule, whereas that's sort of built into an online
| cloud storage service.
| robohoe wrote:
| I cut my sysadmin teeth doing tape work in early 2000s. It was
| quite fun but I don't miss changing tapes and ensuring that the
| FC tape loader library properly labeled them.
| PaulHoule wrote:
| I notice that he talks alot about dealing with malfunctioning
| drives and malfunctioning tapes.
|
| That is my experience too. There is that time I got kicked out of
| the computer lab as an undergraduate because I'd created a number
| of newsgroups and they 'wrote' all my files... to what turned out
| to be an empty SunTape. That time I tried to recover a
| configuration file from an IBM tape robot and it took 14 hours.
| When I was successful with tape I always did a lot of practicing
| and testing. A sysadmin who taught me a lot (esp. how to get
| things done in a place where you need 'social engineering' to get
| things done) told me "you don't have a backup plan until you've
| tested it" and many people learned that the hard way.
| ansible wrote:
| > _" you don't have a backup plan until you've tested it"_
|
| Yep. Though that's what makes small-shop disk-to-disk backups
| easy, depending on the backup software used.
|
| We use rsnapshot, which uses rsync and "cp -l" to make backups.
| So restoring is as easy as using cd to go into the appropriate
| directory and copying out the files. No special utilities
| needed. Yes, we encrypt the backup drives using cryptfs / LUKS.
___________________________________________________________________
(page generated 2022-01-27 23:00 UTC)