[HN Gopher] Is stuff online worth saving?
       ___________________________________________________________________
        
       Is stuff online worth saving?
        
       Author : Brajeshwar
       Score  : 48 points
       Date   : 2024-12-17 14:31 UTC (4 days ago)
        
 (HTM) web link (rubenerd.com)
 (TXT) w3m dump (rubenerd.com)
        
       | pabs3 wrote:
       | If you're interested in that sort of thing, come hang out with
       | ArchiveTeam:
       | 
       | https://wiki.archiveteam.org/
        
       | underseacables wrote:
       | _I suppose it comes down to what the purpose of such archiving
       | is._
       | 
       | I think it's the preservation of information, but I also believe
       | 90% is absolutely pointless. There is just so much of it, and
       | data storage so cheap, that it makes sense to just save
       | everything.
        
         | sigio wrote:
         | Well... storage is cheap, but not cheap enough to save
         | everything, with just usenet being in the 400TB/day range these
         | days. Sure, it's cheap enough to save every webpage you visit
         | during your life, but probably not cheap enough to save every
         | video you click on youtube or watch on a streaming-service, and
         | all the music you listen to all day.
         | 
         | Though just the music compressed in opus at 128kbit might work
         | ok, 60 years of 24/7 128kbit is 30TB, so that would fit on 1
         | large HDD currently.
        
           | saulpw wrote:
           | Music is actually an ideal candidate. I don't listen to music
           | all day, and when I do listen to it, it's often something
           | I've listened to before. My current collection is about 200GB
           | and that includes a ton of stuff I've never listened to; it
           | seems reasonable that a full life's worth of music could fit
           | in 1TB, easily.
        
         | dreamcompiler wrote:
         | That data storage is also ephemeral. Nobe of it will last as
         | long as a paper note, unless some human goes to the trouble of
         | copying it all onto new drives with new software every ten
         | years or so.
        
           | Atreiden wrote:
           | With a proper NAS and RAID10 for double parity, it's a bit
           | like Theseus ship. Just keep swapping out drives when they
           | become unhealthy and you never have to rebuild or migrate
        
             | ninalanyon wrote:
             | Eventually the controller will die and eventually
             | compatible ones will no longer be produced or will at least
             | be inconvenient to obtain or commission and hence
             | expensive.
             | 
             | Paper lasts for centuries without any attention beyond
             | keeping it moderately dry and away from things that eat it.
        
               | emptiestplace wrote:
               | No sane person uses hardware RAID in 2024, if that's what
               | you're referring to.
        
               | zamadatix wrote:
               | Whether you're using hardware RAID or not you still need
               | a hardware storage controller of some type which accepts
               | the new disks you can buy and works with the NAS. What
               | they are saying is eventually that'll be more $ and time
               | than just migrating off the system would be. From ENIAC
               | to now could fit in one lifespan, would you still be
               | maintaining a home floppy drive backup system in the
               | 2040s or just save the time and effort with a migration?
        
         | danielbln wrote:
         | Data rots though, you can't just save it once and be done with
         | it. You have to migrate it across storage mediums, formats etc.
         | It's a recurrent effort/cost.
        
           | bdhcuidbebe wrote:
           | More planning for less effort.
           | 
           | Do your research first. Use standards
           | 
           | Eg: html, pdf, h264/h265/av1 in mp4 container, chd, zip and
           | so on depending on what you are storing.
        
             | HeatrayEnjoyer wrote:
             | On what physical medium?
             | 
             | I have 1 terabyte of data in 1860, how do I make sure the
             | storage medium is still intact in 2024?
        
       | JKCalhoun wrote:
       | One hundred twenty-three years ago my great grandmother's first
       | husband died in a hotel in Kansas City from asphyxiation from the
       | gas having been left on over night (the hotel did not yet have
       | electric lighting). A letter was hastily written on a piece of
       | hotel stationary to be delivered to his wife in the neighboring
       | farming community where she lived.
       | 
       | It is fortunate to me that someone thought to hang on to that
       | note since I have become interested in genealogy and this was a
       | fairly significant event in family history (had he not died I
       | don't suppose I would be around since it was her second marriage
       | that gave me my grandfather).
       | 
       | I long for scraps of _anything_ that my dead relatives, wrote,
       | created, etc. It connects me better to the past -- the lives they
       | lived, how they lived them. It somehow grounds me a little better
       | ... well, it 's rather hard to explain the draw of genealogy.
       | 
       | Sadly very little of the ephemera of everyday life was kept. I
       | get it. It might have seemed like hanging on to junk mail -- like
       | you were a hoarder or whatever, but in this digital era we should
       | be able to hold terabytes of what may appear to be ephemera.
       | 
       | I'm doing what I can - not for ego, I think, but for future
       | generations that may find a connection to their past interesting.
        
         | willis936 wrote:
         | 30 years ago there was no digital world. Nearly all information
         | was in physical artifacts. The things worth saving haven't
         | really changed, but the amount of noise they are buried in has.
         | Imagine if that letter was kept in a two ton pile of ad fliers.
         | Sure, someone would find some of those fliers interesting, but
         | you'd have been much less likely to even know about the letter.
        
           | palmfacehn wrote:
           | >...a two ton pile of ad fliers
           | 
           | Alamy is selling scans of ad prints from the 1850s.
           | 
           | https://www.alamy.com/stock-photo/1850s-advert.html
        
             | chgs wrote:
             | Because they are rare
        
               | chefandy wrote:
               | I don't think that's true? Tons of stuff from that era
               | had been digitized, even before newer more relevant stuff
               | and older rarer stuff, because the acid paper had a short
               | shelf life and there were so many ads in printed stuff
               | then. I might have a skewed perspective from working in
               | the digitization world for quite some time. I think
               | they're selling what they sell with all their other
               | content-- discovery, curation, preparation, and easy
               | delivery.
        
             | chefandy wrote:
             | Ads range from a (necessary, in a capitalist society)
             | nuisance to a scourge, and people justly put up
             | increasingly thick boundaries to shield themselves from
             | their influence. When waning cultural relevance or whatever
             | dilutes that influence, you can more easily see the ads for
             | what they are-- often manipulative marketing tactics
             | implemented through often genuinely beautiful art and
             | design. Both aspects are fascinating to consider and the
             | art can be quite enjoyable. Early modernist posters from
             | Paris are _beautiful_. Watching collections of mid century
             | television ads in the prelinger archives is fun, and tells
             | us a lot about the ways we are influenced by modern ads
             | speaking to current perspectives, fashions, and concerns.
        
             | zamadatix wrote:
             | A selection 74 items over a 10 year period is a different
             | proposal compared to e.g. keeping two tons of ad fliers
             | from November 17th 1907 (and every other thing, every other
             | day, all the time).
        
           | qwertox wrote:
           | What about robots reading each flier and checking if
           | something is odd about that particular one? It could find the
           | letter and report it to you. Even easier if it was all
           | digital information.
        
           | jonhohle wrote:
           | An aside about ad spam from companies that I occasionally buy
           | from:
           | 
           | Often as spam comes from the same mailbox as order receipts
           | and includes words like "order" while messages with receipts
           | never include the word "receipt". When inundated with daily
           | or sometimes multiple times a day ad spam from the same
           | company it becomes very difficult to filter for only not
           | receipts, to clean a neglected inbox.
           | 
           | After I'm gone, I fully expect my family just to delete it
           | all because the signal to noise is so low.
        
             | sdenton4 wrote:
             | Sorting through twenty years of spammy email is one of
             | those things that seem like an llm would actually be good
             | for.
        
             | justsomehnguy wrote:
             | I don't have anyone to do anything after I'm gone, so I
             | just delete the emails myself. I do keep the notable ones,
             | like registration information and _some_ payment receipts
             | but otherwise everything goes to the trash.
             | 
             | Bonus points:
             | 
             | I don't need 30/50/100Gb mailbox (and the associated
             | mailbox cost nowadays).
             | 
             | Search is not only fast but if I didn't found something -
             | then there is nothing of this something in the mailbox.
             | 
             | I't mentally pleasurable to log in once in a while and
             | throw a bunch of _unneeded stuff_ into the trash bin, quite
             | similar to a real life room cleaning.
        
           | eesmith wrote:
           | A two-ton pile of ad fliers? Sounds like Ted Nelson's Junk
           | Mail collection,
           | https://archive.org/details/tednelsonjunkmail .
        
           | bongodongobob wrote:
           | If only we had search algorithms...
        
         | kerkeslager wrote:
         | Sure, there are a ton of reasons to archive. And if it's cheap
         | to do (in terms of money, yes, but also in terms of time,
         | effort, mental health, etc.) then I am of the mind that we
         | should archive _everything_.
         | 
         | But, it often _isn 't_ cheap to do, and in that case, it makes
         | sense to prioritize. The high priority items for me are the
         | things that I might want to share, the ideas I want to amplify
         | for my contemporaries and future generations that might examine
         | my life. Stuff like [1] [2] and [3] which has influenced my
         | thinking fundamentally, that I hope to build upon so that
         | others can build upon what I have built.
         | 
         | I'd argue that you do this intuitively: you're mentioning a
         | letter from your family's past _because it is a high priority
         | item_ --it's relevant because it was the last written words of
         | your great-grandmother's first husband.
         | 
         | But, there's a lot that _isn 't_ worth keeping. My first form
         | of archiving as a teenager was keeping ticket stubs for movies
         | and concerts--a decade later I was going through my pile and
         | found that I didn't even remember most of them. The better
         | movies, I remembered--and I had them on DVD. The better
         | concerts, I remembered--and I also had journal entries and CDs
         | to remember the experience and the music. It's not important to
         | me where/when I saw _Everything, Everywhere, All At Once_ in
         | theaters, but I have it on DVD and I _can 't wait_ to show it
         | to my niece when she's older. And sure, I saw Amigo the Devil
         | live, but frankly, he's not an artist you need to see in
         | concert--the greatest impact of _Cocaine and Abel_ [4] on me
         | was when I listened to it alone in my room. The ticket stubs
         | simply don't matter to me.
         | 
         | [1]
         | https://www.viridiandesign.org/notes/451-500/the_last_viridi...
         | 
         | [2]
         | https://www.ted.com/talks/brene_brown_the_power_of_vulnerabi...
         | 
         | [3] https://digital.wpi.edu/pdfviewer/wm117p10z
         | 
         | [4] https://www.youtube.com/watch?v=ZzjtLm0G49E
         | 
         | EDIT: All the things linked above, I have backed up in one form
         | or another. Notably, the Schutt paper isn't at its original
         | URL.
        
       | asimpletune wrote:
       | It reminds me of the cool links page I see now and then.
        
         | mxuribe wrote:
         | [delayed]
        
       | smitelli wrote:
       | > I got a picture of my great grandfather, thing took six hours
       | to take your picture. [...] Every guy had one picture back then.
       | And it's just him like, "[grimacing] I gotta get back, feed them
       | hogs!" Now, in the future of course it'll be different. 50 years
       | from now, people will be going like, "Hey! You wanna see a
       | hundred thousand pictures of my great grandfather? I got 'em
       | right here plus everything he did every day of his life." --Norm
       | Macdonald[1]
       | 
       | There is certainly a quantity of stuff online that is absolutely
       | worth saving, but there's a considerably larger proportion that's
       | just redundant to the point of being unremarkable and pointless.
       | The trick is filtering, which can be capital-H Hard. That's why
       | some may want to err on the side of over-collecting to reduce the
       | possibility of missing something that will actually be important
       | someday.
       | 
       | [1]: https://www.youtube.com/watch?v=sY6SjMITHrQ
        
         | nytesky wrote:
         | Another funny take from Macfarlan
         | 
         | Definitely no smiling:
         | 
         | https://youtu.be/8SslNMLO0tw
        
         | diggan wrote:
         | Yeah, this is a good point. Isn't it better we save too much,
         | as tooling for filtering stuff out will always get better,
         | rather than saving too little? The latter has no workaround
         | (today at least).
        
       | nilamo wrote:
       | Personally, I like that the internet is ephemeral. It matches
       | real life in that way. I would rather see the internet as a means
       | of connecting people over large distances (across space, Mars,
       | etc), maintaining 20,000 copies of every irrelevant thing is just
       | silly.
        
         | qwertox wrote:
         | > Personally, I like that the internet is ephemeral.
         | 
         | It is not. It is only for us normal people. But the companies
         | which log our lives in order to then capitalize on it, for them
         | the internet is not ephemeral. They have copies of videos,
         | pages, podcasts, whatever it is what can be found there.
         | 
         | Why would you want those companies to know more about yourself
         | than you do?
        
           | zamadatix wrote:
           | Archive.org or Google can cache more of the internet than I
           | do while still having the majority of the content be
           | ephemeral.
           | 
           | I'd also hazard to guess most people in this camp would want
           | these companies to also not store these things the same as
           | they don't want people to.
        
         | lxgr wrote:
         | The problem is that not everything it has replaced was
         | originally ephemeral.
         | 
         | In a the Internet is both too ephemeral (self-hosted blogs
         | disappear, Youtube videos get taken down) and too persistent at
         | the same time; I don't think that most Twitter posts of non-
         | public figures would need to remain public forever by default,
         | for example, and I don't think I need to mention various data
         | breaches.
         | 
         | The Internet Archive somewhat mitigates the first issue, but it
         | makes me pretty nervous that there's essentially just one
         | organization doing what used to be much more distributed to
         | various physical libraries.
         | 
         | For the second one, I hope we'll see better solutions (both
         | technical and social) as the technology and our interactions
         | with it mature.
        
       | paulpauper wrote:
       | Digital storage is free; yes, save it all
        
         | lxgr wrote:
         | Please do share where I can reliably store my backups for free!
        
           | fragmede wrote:
           | > Backups are for wimps. Real men upload their data to an FTP
           | site and have everyone else mirror it.
           | 
           | -- Linus Torvalds
        
             | LinuxBender wrote:
             | This does still happen. Microsoft may nuke a git repo and
             | someone has to figure out who has the latest version of the
             | entire repo with all the latest commits of every branch.
        
             | theandrewbailey wrote:
             | The vast majority of people aren't privileged enough to
             | have anyone mirror their data.
        
             | lxgr wrote:
             | But how do I get everyone to mirror my gigabytes of
             | encrypted photo backups?
        
               | paulpauper wrote:
               | just upload them to social media accounts. Afik twitter,
               | facebook, and youtube do not have storage limits . no
               | deletion for inactivity either.
        
             | paulpauper wrote:
             | dump it on Wikipedia. afik wiki never removes anything. it
             | just gets buried in an edit history . or Wikimedia image
             | files
        
       | Viktoire wrote:
       | When I save things, I try to make sure that it'll be immediately
       | useful to me once I find it again.
       | 
       | I'll highlight, summarise and take notes of what I save. Or some
       | combination of those. If I don't find anything new or directly
       | applicable to my life, I'll let it pass by.
       | 
       | This approach isn't good for archival purposes, but I hesitate to
       | save a lot of things that I'll never read again.
        
         | ghaff wrote:
         | I'm going through my file cabinets right now. I'll keep a few
         | things that catch my eye but I'll likely throw out most of it.
         | The odd 25 year old computer magazine is probably interesting
         | but not all of them collectively for the most part. And I'm
         | certainly not going to index them in a way that they'd be
         | useful to me.
        
           | galleywest200 wrote:
           | You can probably sell or donate those old magazines to a
           | collector, or a kid interested in that stuff. At the very
           | least drop them off at a thirft store instead of just dumping
           | them.
        
             | ghaff wrote:
             | Thrift stores don't want a ton of old paper. There are a
             | lot of things that someone somewhere would probably like
             | but I'm not going to track them down or get them there.
             | Mostly it's not magazines anywway. It's a bunch of articles
             | I ripped out over the years.
             | 
             | The one thing I have in my garage I know _someone_ would
             | want is a big pile of laserdiscs. But, again, a thrift shop
             | (or my library) wouldn 't want them and I live pretty far
             | out from a major city. Probably will try Craigslist post-
             | winter though as I'm trying to declutter.
        
       | stared wrote:
       | I often find myself revisiting old posts and stories. As with any
       | human artifacts, most things aren't worth revisiting or are only
       | meaningful in the moment. If they're gone, few people miss them.
       | 
       | I'm a link hoarder myself (over 13k links on Pinboard:
       | https://pinboard.in/u:pmigdal/). While I don't revisit most of
       | them, some have proven invaluable for re-reading and sharing. I'm
       | not sure about the typical half-life of internet content, but a
       | lot disappears--whether because people stop paying for domains,
       | official websites get reorganized (or their content removed), or
       | other reasons.
       | 
       | This is where the Internet Archive steps in, doing the essential
       | work of a digital librarian. I often share links from its Wayback
       | Machine, which has been a link-saver more times than I can count.
        
       | swayvil wrote:
       | Curve smoothing. Chaikin's algorithm and Jarek's tweak etc. Very
       | clever and nice way of making angular geometry curvy.
       | Constructive geometry stuff.
       | 
       | There were like a dozen algs. I kept links to nice papers with
       | diagrams. Then they started disappearing. Now I'd be pressed to
       | find 2.
       | 
       | This is really useful info that is apparently disappearing. So
       | yes, it happens, and maybe you should save that stuff.
        
       | RajT88 wrote:
       | Stuff online is absolutely worth saving. It is a window into the
       | past - what people concerned themselves with, what they loved and
       | hated.
       | 
       | Scholars will write papers on this era, speculating what it was
       | like and how it fit into what came after.
       | 
       | The web documents the massive societal changes underway which do
       | not relate to the internet directly. Things like changes in
       | transportation technology, medicine, sexuality and gender, and
       | how your average people felt about all of it. Scholars will data
       | mine those opinions to understand who felt what ways and why,
       | with the benefit of hindsight. New knowledge will come of it.
       | 
       | So yeah! It is all worth saving.
        
       | greatgib wrote:
       | Some times you have strange obsessions or a strange mindset
       | related to your technological habits. And you might easily think
       | that it is only you that is weird, not thinking straight. If you
       | are the only one doing something, you are probably wrong.
       | 
       | And then, hopefully, there are nice personal blog posts like this
       | one, showing you that you are not alone having some peculiar
       | habits and so that it might make sense even if most people don't
       | even think about it.
       | 
       | I have the exact same feeling when I discover through hn, blog
       | posts and events that I'm not the only one having my web browsers
       | full of tabs. Literally having thousand of tabs.
        
       | thefaux wrote:
       | There are many things in life that have immense personal value
       | and zero value to nearly everyone else. This creates a lot of
       | misunderstanding and incentive misalignment.
        
       | impure wrote:
       | The rise of LLM's has really devalued saving stuff online. What
       | is the point of saving an article if I could just ask ChatGPT to
       | created it and would probably do a pretty good job? It's still
       | worth keeping notes and stuff that may be hard to find but the
       | majority of things online can easily be reproduced and are not
       | worth saving.
        
       ___________________________________________________________________
       (page generated 2024-12-21 18:00 UTC)