[HN Gopher] A Face Is Exposed for AOL Searcher No. 4417749 (2006)
___________________________________________________________________
A Face Is Exposed for AOL Searcher No. 4417749 (2006)
Author : acqbu
Score : 108 points
Date : 2024-02-25 09:22 UTC (13 hours ago)
(HTM) web link (www.nytimes.com)
(TXT) w3m dump (www.nytimes.com)
| acqbu wrote:
| https://archive.is/sfAMv
| kleiba wrote:
| Couldn't this functionality be automized somehow? Every time
| there's a link on HN to a paywalled article, I have to the same
| dance:
|
| 1. Click on the link
|
| 2. Find out it's behind a paywall
|
| 3. Go back in the browser
|
| 4. Click on the "comments" link.
|
| 5. Look for the post that has the archive.is version of it.
|
| 6. Click on that.
|
| Surely that could somehow be collapsed into just a single
| click?
| hoherd wrote:
| Open the current page in archive.li: javasc
| ript:(function()%7B%0A%20window.location%20%3D%20%22https%3A%
| 2F%2Farchive.li%2F%22%20%2B%20window.location%3B%0A%7D)()
| rsaarelm wrote:
| You can go in the browser URL field and type "archive.is/" in
| front of the URL and press enter after step 2. It'll either
| redirect you to an existing archive page or lets you create
| one if one doesn't exist.
| Zambyte wrote:
| For some reason it isn't loading for me, but if you use a
| search engine that supports bangs in your URL bar (DDG or
| Kagi) you can prefix the url with !ais and just search
| that. Same with !wbm or !ia for Wayback Machine
| extraduder_ire wrote:
| I get good use out of this browser extension:
| https://github.com/arantius/resurrect-pages
|
| Sites are usually archived already.
| danjc wrote:
| It cant be a first party feature.
| rsaarelm wrote:
| Here's how this could be done as a HN-side feature with zero
| interaction with archive.is servers:
|
| * Compile a list of domains like nytimes.com that have soft
| paywalls.
|
| * When a link like https://example.com/ is submitted and its
| domain is on the paywall list, insert
| [archive](https://archive.is/timegate/https://example.com/)
| after it in the title area. Just prefix the timegate part and
| it's a working link.
| c22 wrote:
| I usually just start with the comments. If I see an archive
| link I'll use that (assuming I've determined that the source
| article is worth reading at all).
| DicIfTEx wrote:
| There was also a theatre production produced around AOL User 927:
| https://arstechnica.com/uncategorized/2008/05/uare-what-u-se...
|
| And a documentary series about User 711391:
| https://www.imdb.com/title/tt1455044/
| samwillis wrote:
| This is important to look back on in the context of what's
| happing now with AI tools. This story is obviously about the leak
| of the data publicly, but what it shows is the profiling that is
| available to corporations.
|
| Search has exposed so much data about ourselves to the services
| we use with very little regulation on what they are permitted to
| do with it inside their own walls.
|
| My fear with AI is that we are moving toward sending even more
| data to party services. Tools such a co-pilot (which I enjoy
| using) are a gold mine for behavioural analysis. The profiling
| that will be possible with these tools is extraordinary and we
| don't yet fully understand the implication.
|
| It's because of this that I'm a massive proponent of "Local AI".
| We need to be pushing for the industry to adopt a local inference
| architecture asap. It needs to become the standard pattern as
| early as possible to reduce the risk of the AI revolution being a
| repeat of the invasive internet search and advertising industry.
| Spooky23 wrote:
| The issue with AI is that people are going to be creating work
| product with it, and it doesn't require the extensive
| infrastructure that search has.
|
| Google knows alot about your behavior - they can and have been
| able to correlate online behavior with health and meatspace
| actions to identify budding extremists or people at risk of
| addiction, etc. AI will bring that capability to business
| processes.
|
| With the number of little companies that are springing up, it
| will become much easier for outside parties to figure out how
| instituions work. This capability exists, but it's gated by
| Google and Microsoft and they have drawn lines to protect the
| overall business. Some jackass will install a creepy AI tool to
| scrape outlook and salesguys will be able to get a profile of
| who makes what decision in a company, for example.
| dartos wrote:
| AI does take a ton of infrastructure. You need data
| collection and curation. Massive amounts of training
| hardware.
|
| And a large infrastructure to ensure you can scale. No easy
| feat with the current GPU stack.
| hunter2_ wrote:
| Are there not relatively tiny workloads that fall under the
| umbrella of AI? Or does the term itself inherently refer to
| intense workloads, like how the phrase "big data" refers to
| sets that can't be processed by typical means?
| tomoyoirl wrote:
| Training AIs is almost always done with massive data, as
| small data usually doesn't have sufficient information
| content, statistically speaking, to build a good model
| from it. Certainly this is the case for the generative
| models.
| dartos wrote:
| I think AI refers to large models.
|
| Smaller probabilistic models line linear regression, I'd
| call machine learning.
| Spooky23 wrote:
| It takes a ton of infrastructure to maintain a sustained
| misinformation campaign.
|
| Yet cloud providers happily sell resources and APIs to
| unethical companies. They rightfully don't insert
| themselves into most legal business matters, with
| exceptions.
| paulmd wrote:
| > Google knows alot about your behavior - they can and have
| been able to correlate online behavior with health and
| meatspace actions to identify budding extremists or people at
| risk of addiction, etc. AI will bring that capability to
| business processes.
|
| not in the EU it won't.
|
| if you can de-anonymize people from the data it's not
| anonymous, and collecting this data at all would be illegal
| in the EU without user consent, unless it's being used solely
| for the purpose of delivering the service.
| whoisthemachine wrote:
| I fully agree. If local AI takes traction, then we actually
| have a unique opportunity to take away some of that massive
| profiling, as some of what you use search engines for today can
| be done by AI. This may be why there's some fear-mongering
| around truly open-source AI and such by the big ones.
| akira2501 wrote:
| Noise generation is the only answer the common person has
| against the giant corporate machine.
|
| You need a personal "AI" that just does random searches
| unconnected with your life, constantly, in the background, and
| then injects this data into all the portals that are watching
| you.
|
| Ultimately, their data will become dominated by noise, and
| ultimately useless to the point of severely destroying the
| value of the entire enterprise and data collection mechanisms
| in the first place.
|
| No matter how many tools you make "local only" you're only a
| forgotten "send telemetry back to the mothership" checkbox away
| from being right where you started.
| cj wrote:
| This might work at an individual level, but it isn't scalable
| to the general population.
|
| It's hard to imagine any population-scale solution that
| doesn't involve regulation.
|
| The biggest problem with regulation (in my eyes) is that it
| thwarts competition between countries. E.g. if the US imposes
| restrictions on technology, innovation is incentivized to
| happen elsewhere. The EU has been bold on the privacy
| regulatory front with GDPR and the like, and has probably
| lost out on immeasurable monetary gains as a result. There's
| a huge cost to regulation, but it works.
| jll29 wrote:
| That episode (releasing the AOL search query log file for
| research purposes and subsequent aftermath) led to some firings
| at the company, but some information retrieval searchers used
| this log to conduct important experiments.
|
| The "60s lady with the dog that kept peeing her sofa" got her
| hour of fame, and the whole thing became a case study in de-
| anonymization.
|
| A few pointers:
|
| https://en.wikipedia.org/wiki/AOL_search_log_release
|
| https://www.researchgate.net/publication/233390862_Privacy_P...
|
| https://github.com/wasiahmad/aol_query_log_analysis
|
| https://www.technologyreview.com/2006/08/15/100592/who-benef...
|
| https://www.sciencedirect.com/science/article/abs/pii/S00200...
|
| https://isquared.wordpress.com/2014/04/24/mining-search-logs...
| elzbardico wrote:
| 2006... If they only knew....
| rvnx wrote:
| Now people create accounts to get their searches directly tied
| to their profile
| HeatrayEnjoyer wrote:
| Like that new search engine that's better than googly?
| alwa wrote:
| You mean Kagi? They're pretty transparent about how they
| approach query data, and it's as privacy-friendly as any
| I've seen.
|
| https://kagi.com/privacy
| warkdarrior wrote:
| > Kagi [...] [is] as privacy-friendly as any I've seen
|
| For now
| rvnx wrote:
| No, I wasn't referring to Kagi (which is done by a hard-
| working guy btw), just in general to the trend that the
| internet is now completely different if seen from a
| perspective of 2006.
|
| Storage was expensive, and data wasn't seen as a goldmine
| as now, so most long-term logs went to /dev/null.
|
| That the normality now is to ask users to create an
| account, have data-scientists (whose goal is precisely to
| find needles in haystacks), etc.
| piperswe wrote:
| Kagi makes it very clear in their privacy policy and in
| their settings that search queries are not saved. If they
| save them regardless, that's a clear cut violation of the
| law.
|
| From their settings page: > Save My Search
| History > Currently this option can not be turned on.
| Kagi does not save any > searches by default. In the
| future we may add features that will > utilize your
| search history and then we will allow you to enable this.
|
| It sure seems like it will always be opt-in, even if they
| add query saving in the future.
| rvnx wrote:
| On tons of search engines and AI services, you are nudged
| or required to have an account (which forces you to de-
| anonymize yourself and/or link your activity to a specific
| account).
|
| 20 years ago, when this leak happened, the situation wasn't
| like that.
|
| Gmail was barely born, so Google accounts didn't make
| sense.
|
| The article was like "wow we managed to deanonymize a
| search query", but that's actually the norm now.
|
| Essentially, this scandalous AOL-leak, became a legitimized
| every-day routine (sadly).
|
| Today when you type anything inside a ChatGPT-like AI app
| (this is the case also for many search engines), you get
| tons of contractors, workers and partners who have access
| to such dataset:
|
| researchers, engineers, support, advertising platforms,
| technical intermediaries, legal, etc.
|
| Though the future isn't gloomy; in the short-term, with the
| advent of LLMs, we may actually see a really good solution
| private-wise: fully local answers.
|
| Which means that for the first time, queries and questions
| may not leave the device or sent to whoever you need to
| trust.
| karaterobot wrote:
| Re-identification of supposedly anonymous data was a problem
| twenty years ago, and is a bigger problem today. Soon it may
| become a crisis, as the tools needed to do it become more and
| more turnkey, effective, and commodified. Now we dox as in a
| glass, darkly, etc.
| neonate wrote:
| https://web.archive.org/web/20170715075814/https://www.nytim...
___________________________________________________________________
(page generated 2024-02-25 23:01 UTC)