Post AZKDaPiApr2OFEypbU by NatureMC@mastodon.online
 (DIR) More posts by NatureMC@mastodon.online
 (DIR) Post #AZIVby2s4uelsTdF0i by gerrymcgovern@mastodon.green
       2023-08-31T08:20:06Z
       
       0 likes, 1 repeats
       
       If we deleted 90% of the data we have, everything would work better and we'd need 90% less data centers.After 90 days, the probability of a piece of content being reused is 5%Data Is the New Oil, and That Makes It an Environmental Hazard, Bill Tolson, Spice Works, 2021https://www.toolbox.com/tech/big-data/guest-article/data-is-the-new-oil-and-that-makes-it-an-environmental-hazard/ 91% of content gets no traffic from Googlehttps://ahrefs.com/blog/search-traffic-study/ 95% of apps are unused after 90 days.https://andrewchen.com/new-data-shows-why-losing-80-of-your-mobile-users-is-normal-and-that-the-best-apps-do-much-better/
       
 (DIR) Post #AZIVbzt1EVxpaV2Ovo by gerrymcgovern@mastodon.green
       2023-08-31T09:30:48Z
       
       0 likes, 0 repeats
       
       86% of data is duplicative, redundant or Dark Data—nobody knows what it is.http://info.veritas.com/databerg_report 95% of all corporate data is not managed at all by the central corporate authority. https://www.toolbox.com/tech/big-data/guest-article/data-is-the-new-oil-and-that-makes-it-an-environmental-hazard/ 90% of bank data never accessed again https://www.computerworld.com/article/2482219/snw--90--of-bank-data-never-accessed-again-.html Over 80% of data is not actually actively accessed and is cold. https://venturebeat.com/2021/07/22/why-unstructured-data-is-the-future-of-data-management/
       
 (DIR) Post #AZIVc1i6S4Q9FDwiC8 by gerrymcgovern@mastodon.green
       2023-08-31T13:52:29Z
       
       0 likes, 0 repeats
       
       "Of all the estimated 1,500 terabytes of the advertising footage stored by Film Locker, less than 2% has ever been accessed for a recut or revision.”Film Locker, https://filmlocker.com/ “I have similar figures with my clients,” was the reply from Thibault Joubert, a product manager at Microsoft 365, when I told him that my experience was that only 5% - 10% of data is being accessed 90 days after it's first stored. https://www.thijoubert.com/2023-03/Why-important-to-limit-infobesity-in-Microsoft365/
       
 (DIR) Post #AZIVc4b7iTcWCRhgES by gerrymcgovern@mastodon.green
       2023-08-31T13:54:47Z
       
       0 likes, 0 repeats
       
       "A huge percentage of the data that gets processed is less than 24 hours old. By the time data gets to be a week old, it is probably 20 times less likely to be queried than from the most recent day. After a month, data mostly just sits there."https://motherduck.com/blog/big-data-is-dead/ In 2021, Google analyzed the aggregate data from all customers across its platform, and over 600,000 gross kgCo2e was associated with projects that it recommended for cleanup or reclamation. https://cloud.google.com/blog/topics/sustainability/new-tools-to-measure-and-reduce-your-environmental-impact
       
 (DIR) Post #AZK1NsawwA82bvrFnU by gerrymcgovern@mastodon.green
       2023-09-01T10:06:45Z
       
       0 likes, 0 repeats
       
       Digital is physical.2009: 1 zettabyte of data created that yearTrees required to print 1 zettabyte: 20 trillionTrees on Earth: 3.5 trillion2022: 100 zettabytes createdAbout 10% of data is stored every year70 million servers required to store data in 2022To make one server causes 1-2 ton of CO2, multiple tons of toxic mining waste, and 100,000s liters of wastewater2022: 20 million servers became e-waste. Fastest growing waste stream in world. Causes 70% of the toxicity in dumps.
       
 (DIR) Post #AZK1NtSTj8wlHwu0ES by gerrymcgovern@mastodon.green
       2023-09-01T10:12:52Z
       
       0 likes, 1 repeats
       
       2035: 2,100 zettabytes per year1.5 billion servers required400 million servers of e-waste every yearPeople say: We don’t have a data problem. Or, we have to store all this because you never know …We have a data crisis. We are literally destroying the environment to build digital world. 90% of data is waste. If we cleaned up, made better decisions when to create, what to keep, what to delete, we could have positive impact. But no, we store everything because it’s “cheap”.Cheap?
       
 (DIR) Post #AZKDaNioEWME4dGJJg by mistakenotmy@universeodon.com
       2023-08-31T08:53:59Z
       
       0 likes, 0 repeats
       
       @gerrymcgovern "After 90 days, the probability of a piece of content being reused is 5%"That is a strange argument though. You could argue in the same way that the content of most libraries and archives is useless.Also, keeping data is a tiny fraction of the internet's energy usage.
       
 (DIR) Post #AZKDaOdsoK0kvdxtHE by gerrymcgovern@mastodon.green
       2023-08-31T09:28:44Z
       
       0 likes, 0 repeats
       
       @mistakenotmy It's only a strange argument in the weird world of digital and IT. A typical archive will keep no more than 5% of available objects. Libraries are always getting rid of old books to make way for new.We are facing a data waste crisis. Data growth is out of control. Data center growth is exploding, and we will require up to 20 times more data centers by 2035 to keep up with data growth.
       
 (DIR) Post #AZKDaPiApr2OFEypbU by NatureMC@mastodon.online
       2023-08-31T10:07:17Z
       
       0 likes, 0 repeats
       
       @gerrymcgovern 1/2 How do you come to these numbers? Archives and national libraries are working hard on digital conservation. It is no waste, it's the conservation of cultural heritage, science, knowledge etc. They don't throw away these data, on the opposite, research how they can be migrated in the future. Only one example: In archaeology, 3D scans of buildings and findings enable research when the original has long since fallen victim to erosion or war.@mistakenotmy
       
 (DIR) Post #AZKDaRflXmIeKLrw80 by gerrymcgovern@mastodon.green
       2023-08-31T10:25:51Z
       
       0 likes, 0 repeats
       
       @NatureMC @mistakenotmy We are having a massive impact on the environment storing all this stuff. A server can cause 1-2 tons of CO2 to manufacture, cause multiple tons of toxic mining waste and hundreds of thousands of liters of wastewater. In 2022, we have 70 million servers storing stuff, and we were destroying 20 million a year. To meet 2035 data needs, we'll need over a billion servers. It's not remotely sustainable. Digital is physical.
       
 (DIR) Post #AZKDaSQYjnjKfTlJ44 by NatureMC@mastodon.online
       2023-08-31T10:12:35Z
       
       0 likes, 0 repeats
       
       @gerrymcgovern 2/2 I work in a very tiny museum and it may not be typical. We only throw objects away from our archives, if they are destroyed beyond repair. We are investing a lot in digital capture (not fully done so far). This allows us to also record for posterity the objects that cannot be conserved or repaired. And we'll keep the data even for scientists coming later than after 90 days.😉 @mistakenotmy
       
 (DIR) Post #AZKDaT1QWjDkVp0kWu by NatureMC@mastodon.online
       2023-08-31T10:45:51Z
       
       0 likes, 0 repeats
       
       @gerrymcgovern Yes, but so you advocate throwing away our #history and #culturalHeritage, our #knowledge? I hope that I misunderstood you?Talking about #librairies and #archives: cultural assets are specially protected (Haager convention et al.) The #UNESCO is involved in the preservation of #digitalHeritage: https://en.unesco.org/about-us/legal-affairs/charter-preservation-digital-heritage So, please don't mix data and data. These are part of our civilisation! Wouldn't it make more sense to start with companies or social media?! @sohkamyung
       
 (DIR) Post #AZKDaVRjVtS41zKmoK by gerrymcgovern@mastodon.green
       2023-08-31T10:54:01Z
       
       0 likes, 0 repeats
       
       @NatureMC @sohkamyung What I am saying is that digital has huge and growing costs. Every piece of data we store has a demand on materials, water, energy. It is not cost free. We need to make wise decisions of what to store and not store.Right now, we are destroying the civilization of indigenous people all over the world, to mine the lithium and other metals needed for the explosive growth of digital. Our digital lifestyle has massive consequences for the environment.
       
 (DIR) Post #AZKDaWnkTWekEYdslU by NatureMC@mastodon.online
       2023-08-31T11:21:57Z
       
       0 likes, 0 repeats
       
       @gerrymcgovern We are in complete agreement that we have to do something and why! Nevertheless, I don't understand why you seem to be so obsessed with cultural heritage. Imagine what the world would have become if indigenous peoples at least had been able to digitally preserve their cultural heritage, which the colonialists destroyed. Do you realise what such destruction can mean? It's called cultural genocide. These mines are not created primarily for libraries, but for cars, etc. @sohkamyung
       
 (DIR) Post #AZKDade51s57SbAlAO by NatureMC@mastodon.online
       2023-08-31T10:52:07Z
       
       0 likes, 0 repeats
       
       @gerrymcgovern Mind you, my criticism is not against the fact that we have to do something in general.But I think that important archives and national libraries are not the lion's share. And we certainly cannot and do not want to fall behind our UNESCO achievements etc.!Take the Talibans' iconoclasm: we have only digital remains today! https://www.getty.edu/publications/cultural-heritage-mass-atrocities/part-1/03-parzinger/ @sohkamyung
       
 (DIR) Post #AZKDaeDAvO9dDRamrw by gerrymcgovern@mastodon.green
       2023-08-31T11:30:08Z
       
       0 likes, 0 repeats
       
       @NatureMC @sohkamyung Actually, my focus wasn't at all on cultural heritage in the original post, just about how we store so much crap data. :-)
       
 (DIR) Post #AZKDagIZAJefgk87Y8 by FreePietje@x0f.org
       2023-09-01T13:52:41Z
       
       0 likes, 0 repeats
       
       @gerrymcgovern @NatureMC @sohkamyung > we store so much crap dataIt seems to me that got lost in the discussion.UNESCO: "Not all digital materials are of enduring value"And a museum makes a *pre*selection of what they think is worth saving.But with digital data we tend to keep everything and not make a choice whether it's worth saving, assuming there's no cost to it.How many people go through the photos they've taken and delete all the crappy ones or otherwise not worth saving?
       
 (DIR) Post #AZKG1zXTwAc15aFzhw by tomcrinstam@universeodon.com
       2023-09-01T11:50:05Z
       
       0 likes, 0 repeats
       
       @gerrymcgovern If we targetted duplicated data and companies that collect and store personal data they don't need besides for customer manipulation, how would that change these numbers?
       
 (DIR) Post #AZKG20Nwn6ZziIntU8 by gerrymcgovern@mastodon.green
       2023-09-01T12:06:06Z
       
       0 likes, 0 repeats
       
       @tomcrinstam I would be a start, for sure. There's an incredible amount of unnecessary duplication.
       
 (DIR) Post #AZKG2120OAcdiXXsvI by tomcrinstam@universeodon.com
       2023-09-01T12:18:57Z
       
       0 likes, 1 repeats
       
       @gerrymcgovern I totally understand your points, but seeing "we need to delete 90% of data" causes about the same reaction, for me, I have when I see people advocating for the burning of books. Mostly because I can see this being used to target 'uncomfortable info'.Just as a real world example, musk deleting information about the Arab Spring that was solely on twitter servers. "It's not my fault, server storage costs to the environment made it necessary."It's like a library saying they have to destroy 90% of their books with no regard to what that 90% is, allowing the wholesale destruction of knowledge under the guise of environmental protection.Im almost positive that isn't your intent, but I'm almost positive it would be used that way.
       
 (DIR) Post #AZKHtN2JPCSdGah7Lc by tyx@lor.sh
       2023-09-01T14:41:01Z
       
       0 likes, 0 repeats
       
       @gerrymcgovernI personally will store each and every piece of potentially valuable data I'll get my hands on - hundreds of thousands of field photos, sequencing data, papers used for projects, news I read, tons of UA war footage, social networks screenshots.  Even software distributions unless I'm 99% sure that they'll be available in 20 years from noncommercial public repositories.Why?Because I tried many times to verify statements about past politic events just to find that there are no publicly available sources left other than one manipulative paragraph in wiki.More than once tried to reproduce analysis from a 10-year old paper just to understand that the legal ways to obtain software authors used aren't there anymore.Have had realized that sample I processed a decade ago was misidentified.And your reasoning has one flaw here: >70 million servers required to store data in 2022To store data you need only a piece of magnetized material. You need server for a very different reason - to access/process the data.And often it's your personal data for someone's else profit.Data storage is fine. (consider donating to @internetarchive BTW) Clouds and marketing-purpose data mining are not.
       
 (DIR) Post #AZKIXgxojcK0ebopQe by tyx@lor.sh
       2023-09-01T14:48:19Z
       
       0 likes, 0 repeats
       
       @gerrymcgovern >91% of content gets no traffic from GoogleAnd this one makes me particularly furious - ever tried to find repair manual for a something from 90s or IC datasheet from 2001 for example? I tried desperately many times. An thanks to all people, who kept these 91% up and running. And I hope techno-gods will send a good GW-scale atmospheric zap to everyone who thought reading this "ofc google should just drop this 91 from database, no big deal".#repair #data #storage #archive #internetarchive