[HN Gopher] Digital Archivists: Protecting Public Data from Erasure
___________________________________________________________________
Digital Archivists: Protecting Public Data from Erasure
Author : rbanffy
Score : 127 points
Date : 2025-04-02 16:03 UTC (6 hours ago)
(HTM) web link (spectrum.ieee.org)
(TXT) w3m dump (spectrum.ieee.org)
| Teever wrote:
| I made this related submission[0] recently but it was flagged.
|
| This stuff is very important to talk about so I hope that this
| submission by rbanffy isn't also flagged.
|
| [0] https://news.ycombinator.com/item?id=43543075
| hsuduebc2 wrote:
| I agree. I do not understand how this is perceived as an
| political issue and thus got flagged.
|
| Climate change is perceived for some reason politically too and
| not get flagged so often.
| donnachangstein wrote:
| No it isn't. It's merely a cause du jour for data hoarders to
| justify their hobby in light of this Chicken Little hysteria.
|
| 30 years ago it was thought collecting every issue of magazines
| like TV Guide was important. No one even knows what that is
| anymore.
|
| No one is ever going to look at 99% of this data. In the
| meantime, send more hard drives for my NAS!!
| dreamworld wrote:
| It might be of some interest to cultural historians in the
| future. But I think it makes more sense to take
| sample+curated data. But in any case if we can afford it, eh
| why not.
| rbanffy wrote:
| We don't know now what to curate for the future. We should
| preserve as much of everything we can - we don't know what
| will be important in 50, or 500 years.
|
| Case in point: retrocomputing is my hobby. I buy, restore,
| preserve, and use old computers. Most of them are home
| computers, because business computers go directly from the
| office to the recycling facility or the landfill. Unless
| someone deliberately preserved, say, a Burroughs B-25
| desktop, or the similar from Data General, they are gone.
| Suppafly wrote:
| My son is into retrocomputing, mostly using older
| hardware I have from when I was younger, and we have a
| stack of old compaq desktops where you can't access the
| bios because it requires a specific floppy that is nearly
| impossible to find online. This is 486/pentium era stuff,
| the older stuff is even harder to find.
| peppermill wrote:
| I think the data being discussed is quite a bit different
| than old TV Guides...
| NoMoreNicksLeft wrote:
| I was, believe it if you wish, thinking about old TV guides
| just this morning and wondering how one would even go about
| archiving those. Most of the stumbling blocks for taking
| apart the glued binding for scanning have been figured out,
| of course, but for any given week there may have been as
| many as 60 or 70 editions (for each television market, I
| think). None of these have proper ISSN numbers as far as
| I'm aware, and other than the listings they can be visually
| indistinguishable. Then there is the challenge of finding
| those, and not knowing whether this or that edition is
| missing (from time to time, the company would create new
| additions for new regions, or fold old ones back into some
| other are) along with even parsing the content. Many of
| these tv shows aren't on themoviedb or thetvdb, and if the
| shows are, then there won't be episode listings (there were
| 6000 Donahue talk show episodes, after all). On top of all
| of that, you can't necessarily know what was on tv at a
| given time and day, with federal government preemptions,
| commercials, unreported last-minute rescheduling, etc.
|
| But I can also see why people might want to keep more
| interesting data, like when the Federal Cheese-Sniffing
| Agency moved offices back in 1982 and they have meticulous
| records of the 483 filing cabinets that had to be moved
| from the original location to their new home in Furrytown,
| Pennsylvania.
| zorpner wrote:
| I wonder if those would be useful in identifying the
| potential contents of specific Marion Stokes tapes (my
| understanding is that they're sorted, but are only labeled
| with channel and date/time and are being archived slowly):
| https://libwww.freelibrary.org/blog/post/5393
| hermannj314 wrote:
| My wife takes thousands of photos every year, when my
| daughter was young she took even more.
|
| When we were moving out of our apartment there was damage to
| a door hinge that we never noticed when we moved in but that
| had definitely been there from the onset of our two years of
| living in that apartment.
|
| Guess what? I had a photo from the day after we moved in of
| that door hinge in a state of damage! Not because we took the
| photo for that intention, but because my daughter was playing
| in the hallway and my wife snapped a photo and it just
| happened to capture the damage. Saved me several hundreds of
| dollars in repair costs from my landlord.
|
| You are right, 99% of the data will never be looked at. But
| do you know what the 1% is today? I'm guessing you don't.
| donnachangstein wrote:
| Your example of personal family photos is in no way
| comparable to storing terabytes of essentially unindexed
| data for which one has no detailed knowledge about, under
| the notion that the government is somehow lighting a match
| to everything, and they're going to save it.
|
| The government doesn't delete _anything_. It might be moved
| or inaccessible to the public but that data is _somewhere_
| in perpetuity.
|
| It's one of the most deranged larps I've ever seen, then
| they pat each other on the back on BlueSky, desperately
| wanting to be a part of something.
|
| These people envision themselves as folk heroes when what
| they really need to do is go outside and touch grass.
| nancyminusone wrote:
| If it's inaccessible to the public, it might as well be
| deleted. What's the difference? If you can't get it, you
| don't have it.
| alnwlsn wrote:
| Patently false. https://www.archives.gov/personnel-
| records-center/fire-1973
| squarefoot wrote:
| Among the deleted data there was the police accountability
| database. You probably won't have to deal with thugs now
| feeling omnipotent and immune from prosecution because of
| this.
|
| https://www.police1.com/federal-law-enforcement/national-
| law...
| thowawatp302 wrote:
| I've had the idea of recreating tv channels on my plex server
| by using tv guide data from the late 90s early 00s
|
| The insurmountable part of that project would be getting the
| guide data.
|
| You don't know what other people will want in the future
| badlibrarian wrote:
| There's a lot of panic and overlap in the space; a way to
| coordinate these efforts would be helpful.
|
| Internet Archive et al. made noise and promises but told
| volunteers to stop because they couldn't actually handle the
| ingest.
|
| https://www.reddit.com/r/Archiveteam/comments/1jbgycm/us_gov...
|
| These folks made a notable effort.
|
| https://webrecorder.net/blog/2025-03-25-govarchive-us-and-mi...
| nla wrote:
| Best thing I ever heard from the head of archives at the BBC:
|
| Once you format shift, you will always be format shifting.
|
| Keep your originals whenever you can.
| dmillar wrote:
| Many criminal records, petty or otherwise, are public record.
| When archived, expunged or dismissed infractions never truly
| become that. A traffic violation or other petty misdemeanor from
| 20 years ago, that has been expunged from official record, can
| show up on a background check because companies archive public
| data. So, there is a flip side to this.
| overfeed wrote:
| Public data is incompatible with secrecy. Expunged records
| still appear in newspapers archives if the local reporter on
| the Crimes beat captured the proceedings. IMO, "expunged" means
| removed from _Official court records_ - not from the public
| memory, including newspapers, archived websites, police
| blotters and prosecutors ' files.
| Damogran6 wrote:
| Hypothetically: -Government leader says they're nuking data -Mad
| rush to back up data through other means -Government leader
| declares they've 'transferred the cost of maintaining data out of
| government, thus making for a smaller, more efficient,
| government'
|
| I hate everything about this.
| krunck wrote:
| There is inherent inefficiency in government accountability
| efforts. I'm ok with that.
| riku_iki wrote:
| In general it makes sense to shift this part to business, if
| data is valuable, there will be market and services. Probably
| problem is how fast they nuked without grace period.
| tehjoker wrote:
| im okay with data being hosted for free or cheap by the
| government and not being price gouged for access to public
| data
| mikrl wrote:
| How does this relate to dox?
|
| Let's say an individual posted identifying or incriminating
| information online, inadvertently or intentionally, in a public
| place.
|
| Then a third party decides to store it, and possibly make it
| accessible to others.
|
| If the original self doxxing user then pulled the original dox,
| but was unable to scrub the rest, would that information still be
| considered public, or would it be private? Was it ever truly
| public? Or private for that matter?
| sixothree wrote:
| Which data set are you thinking this might apply to?
| calebio wrote:
| That's a really good question.
|
| In my head, I'm imagining someone early in the morning posting
| a flyer up on a bulletin board downtown.
|
| Throughout the day many folks walked by and took photos of the
| flyer with their cell phone.
|
| At the end of the day, the original person came back and
| removed the flyer.
|
| IMO, at the time that the folks took the photo of the flyer,
| that flyer was public information. It remains public
| information even after the flyer is removed[0].
|
| This isn't a great analogy of mine, and has plenty of holes,
| but was interesting to me after I read your comment. I know it
| was in the context of doxxing, but I think it's pretty
| interesting philosophically.
|
| I think something similar applies to photos taken of other
| people in public spaces. Both the person who took the photo and
| the subject of the photo are no longer in that physical public
| space, but the actions took place within that space.
|
| I think something similar applies to digital "public spaces".
| But what does a public space even mean in the context of walled
| gardens[1], etc.
|
| [0] you then run into the question of what happens if someone
| posts non-public information, publicly? [1] are digital walled
| garden communities that different from physical communities
| that gate access, whether free or paid. Whether information
| shared within those contexts are public or private is an
| interesting thread as well.
| ziddoap wrote:
| If you intentionally post something publicly, it's public. Full
| stop.
|
| The tricky part is dealing with inadvertent or malicious (i.e.
| some other party), posting of private information to a public
| space. That's really hard to deal with on multiple levels.
|
| For one, the archives would retain the information and
| scrubbing it is effectively impossible.
|
| Secondly, legitimate things which _should_ remain public (i.e.
| were posted publicly, are of public interest, etc.) can be
| argued to have been inadvertently or maliciously posted. So you
| need some way to moderate and create rulings for each
| individual case, which quickly becomes untenable due to the
| sheer volume of information being posted and the inordinate
| amount of time required to investigate vs. post.
| hsuduebc2 wrote:
| I wonder. Maybe for this would be blockchain actually usefull
| technology?
___________________________________________________________________
(page generated 2025-04-02 23:00 UTC)