[HN Gopher] Search tool that only returns content created before...
___________________________________________________________________
Search tool that only returns content created before ChatGPT's
public release
Author : dmitrygr
Score : 822 points
Date : 2025-12-01 04:06 UTC (18 hours ago)
(HTM) web link (tegabrain.com)
(TXT) w3m dump (tegabrain.com)
| johng wrote:
| I don't know how this works under the hood but it seems like no
| matter how it works, it could be gamed quite easily.
| cryzinger wrote:
| If it's just using Google search "before <x date>" filtering I
| don't _think_ there 's a way to game it... but I guess that
| depends on whether Google uses the date that it indexed a page
| versus the date that a page itself declares.
| madars wrote:
| Date displayed in Google Search results is often the self-
| described date from the document itself. Take a look at this
| "FOIA + before Jan 1, 1990" search: https://www.google.com/se
| arch?q=foia&tbs=cdr:1,cd_max:1/1/19...
|
| None of these documents were actually published on the web by
| then, incl., a Watergate PDF bearing date of Nov 21, 1974 -
| almost 20 years before PDF format got released. Of course,
| WWW itself started in 1991.
|
| Google Search's date filter is useful for finding documents
| _about_ historical topics, but unreliable for proving when
| information actually became publicly available online.
| littlestymaar wrote:
| Are you sure it works the same way for documents that
| Google indexed at the time of publication? (Because
| obviously for things that existed before Google, they had
| to accept the publication date at face value).
| madars wrote:
| Yes, it works the same way even for content Google
| indexed at publication time. For example, here are
| chatgpt.com links that Google displays as being from
| 2010-2020, a period when Google existed but ChatGPT did
| not:
|
| https://www.google.com/search?q=site%3Achatgpt.com&tbs=cd
| r%3...
|
| So it looks like Google uses inferred dates over its own
| indexing timestamps, even for recently crawled pages from
| domains that didn't exist during the claimed date range.
| littlestymaar wrote:
| Interesting, thanks.
|
| I wonder why they do that when they could use time of
| first indexing instead.
| qwertygnu wrote:
| True, but there's probably many ways to do this and unless AI
| content starts falsifying tons of its metadata (which I'm sure
| would have other consequences), there's definitely a way.
|
| Plus other sites that link to the content could also give away
| it's date of creation, which is out of the control of the AI
| content.
| layman51 wrote:
| I have heard of a forum (I believe it was Physics Forums)
| which was very popular in the older days of the internet
| where some of the older posts were actually edited so that
| they were completely rewritten with new content. I forget
| what the reasoning behind it was, but it did feel shady and
| unethical. If I remember correctly, the impetus behind it was
| that the website probably went under new ownership and the
| new owners felt that it was okay to take over the accounts of
| people who hadn't logged on in several years and to
| completely rewrite the content of their posts.
|
| I believe I learned about it through HN, and it was this blog
| post: https://hallofdreams.org/posts/physicsforums/
|
| It kind of reminds me of why some people really covet older
| accounts when they are trying to do a social engineering
| attack.
| joshuaissac wrote:
| > website probably went under new ownership
|
| According to the article, it was the founder himself who
| was doing this.
| CGamesPlay wrote:
| "Gamed quite easily" seems like a stretch, given that the
| target is definitionally not moving. The search engine is
| fundamentally searching an immutable dataset that "just" needs
| to be cleaned.
| johng wrote:
| How? They have an index from a previous date and nothing new
| will be allowed since that date? A whole copy of the
| internet? I don't think so.... I'm guessing, like others,
| it's based on the date the user/website/blog lists in the
| post. Which they can change at any time.
| fragmede wrote:
| Yes they do. It's called common crawl, and is available
| from your chosen hyperscaler vendor.
| 1gn15 wrote:
| Does this filter out traditional SEO blogfarms?
| JKCalhoun wrote:
| Yeah, might prefer AI-slop to marketing-slop.
| al_borland wrote:
| They are the same. I was looking for something and tried AI.
| It gave me a list of stuff. When I asked for its sources, it
| linked me to some SEO/Amazon affiliate slop.
|
| All AI is doing is making it harder to know what is good
| information and what is slop, because it obscures the source,
| or people ignore the source links.
| venturecruelty wrote:
| I've started just going to more things in person, asking
| friends for recommendations, and reading more books
| (should've been doing all of these anyway). There are some
| niche communities online I still like, and the fediverse is
| really neat, but I'm not sure we can stem the Great Pacific
| Garbage Patch-levels of slop, at this point. It's really
| sad. The web, as we know and love it, is well and truly
| dead.
| swyx wrote:
| somebody said once we are mining "low-background tokens" like we
| are mining low-background (radiation) steel post WW2 and i
| couldnt shake the concept out of my head
|
| (wrote up in https://www.latent.space/i/139368545/the-concept-of-
| low-back... - but ironically repeating something somebody else
| said online is kinda what i'm willingly participating in, and
| it's unclear why human-origin tokens should be that much higher
| signal than ai-origin ones)
| jeffchuber wrote:
| that was me swyx
| rollulus wrote:
| Multiple people have coined the idea repeatedly, way before
| you. The oldest comment on HN I could find was in December
| 2022 by user spawarotti:
| https://news.ycombinator.com/item?id=33856172
| threeducks wrote:
| Here is an even older comment chain about it from 2020:
| https://news.ycombinator.com/item?id=23895706
|
| Apparently, comparing low-background steel to pre-LLM text
| is a rather obvious analogy.
| rollulus wrote:
| Oh wow, great find! That's really early days.
| pseidemann wrote:
| As well as that people often do think alike.
|
| If you have a thought, it's likely it's not new.
| jeffchuber wrote:
| i didnt claim to invent it.
|
| i claimed swyx heard it through me - which he did
| swyx wrote:
| you did!!
| jrjfjgkrj wrote:
| every human generation built upon the slop of the previous one
|
| but we appreciated that, we called it "standing on the
| shoulders of giants"
| rebuilder wrote:
| That's because the things we built on weren't slop
| walrusted wrote:
| the only structure you can build with slop is a burial mound
| Dilettante_ wrote:
| What is unhardened concrete but slop?
| hoppp wrote:
| You can't build on slop because slop is a slippery slope
| Dilettante_ wrote:
| Maybe we'll have to slurp the slop so we don't slip on the
| slope.
| kgwgk wrote:
| Nothing conveys better the idea of a solid foundation to
| build upon than the word 'slop'.
| teiferer wrote:
| Because the pyramids, the theory of general relativity and
| the Linux kernel are all totally comparable to ChatGPT
| output. /s
|
| Why is anybody still surprised that the AI bubble made it
| that big?
| jrjfjgkrj wrote:
| for every theory of relativity the is the religious non-
| sense and superstitions of the medieval ages or today
| JumpCrisscross wrote:
| > _for every theory of relativity the is the religious
| non-sense and superstitions of the medieval ages or
| today_
|
| If Einstein came up with relativity by standing on "the
| religious non-sense and superstitions of the medieval
| ages," you'd have a point.
| scotty79 wrote:
| If we have billions of AIs one might pick the correct
| learning materials. Same way human Einstein did.
| flir wrote:
| I know we're just pointlessly abusing the analogy here,
| but... mediaeval cathedrals are a greater work of
| artifice than pyramids.
| bigiain wrote:
| > we called it "standing on the shoulders of giants"
|
| We do not see nearly so far though.
|
| Because these days we are standing on the shoulders of giants
| that have been put into a blender and ground down into a
| slippery pink paste and levelled out to a statistically
| typical 7.3mm high layer of goo.
| _kb wrote:
| The secret is you then have to heat up that goo. When the
| temperature gets high enough things get interesting again.
| pseidemann wrote:
| Just simulate some evolution here and there.
| gilleain wrote:
| You get Flubber?
| groestl wrote:
| We have two optimization mechanisms though which reduce noise
| with respect to their optimization functions: evolution and
| science. They are implicitly part of "standing on the
| shoulders of giants", you pick the giant to stand on (or it
| is picked for you).
|
| Whether or not the optimization functions align with human
| survival, and thus our whole existence is not a slop, we're
| about to find out.
| shevy-java wrote:
| This sounds like an Alan Kay quote. He meant that in regards
| to useful inventions. AI-generated spam just decreases the
| quality. We'd need a real alternative to this garbage from
| Google but all the other search engines are also bad. And
| their UI is also horrible - not as bad as Google, but also
| bad. Qwant just tries to copy/paste Google for instance
| (though interestingly enough, sometimes it has better results
| than Google - but also fewer in general, even ignornig false
| positive results).
| visarga wrote:
| Deep Research reports I think are above average internet
| quality, they collect hundreds of sources, synthesize and
| contrast them & provide backlinks. Almost like a generative
| wikipedia.
|
| I think all we can expect from internet information is a
| good description of the distribution of materials out
| there, not truth. This is totally within the capabilities
| of LLMs. For additional confidence run 3 reports on
| different models.
| ben_w wrote:
| There's a reason this is comedy: Listen, lad.
| I built this kingdom up from nothing. When I started here,
| all there was was swamp. Other kings said I was daft to build
| a castle on a swamp, but I built it all the same, just to
| show 'em. It sank into the swamp. So, I built a second one.
| That sank into the swamp. So, I built a third one. That
| burned down, fell over, then sank into the swamp, but the
| fourth one... stayed up! And that's what you're gonna get,
| lad: the strongest castle in these islands.
|
| While this is religious: [24] "Everyone then
| who hears these words of mine and does them will be like a
| wise man who built his house on the rock. [25] And the rain
| fell, and the floods came, and the winds blew and beat on
| that house, but it did not fall, because it had been founded
| on the rock. [26] And everyone who hears these words of mine
| and does not do them will be like a foolish man who built his
| house on the sand. [27] And the rain fell, and the floods
| came, and the winds blew and beat against that house, and it
| fell, and great was the fall of it."
|
| Humans build not on each other's slop, but on each other's
| success.
|
| Capitalism, freedom of expression, the marketplace of ideas,
| democracy: at their best these things are ways to bend the
| wisdom of the crowds (such as it is) to the benefit of all;
| and their failures are when crowds are not wise.
|
| The "slop" of capitalism is polluted skies, soil and water,
| are wage slaves and fast fashion that barely lasts one use,
| and are the reason why workplace health and safety rules are
| written in blood. The "slop" of freedom of expression
| includes dishonest marketing, libel, slander, and propaganda.
| The "slop" of democracy is populists promising everything to
| everyone with no way to deliver it all. The "slop" of the
| marketplace of ideas is every idiot demanding their own un-
| informed rambling be given the same weight as the considered
| opinions of experts.
|
| None of these things contributed our social, technological,
| or economic advancement, they are simply things which
| happened at the same time.
|
| AI has stuff to contribute, but using it to make an endless
| feed of mediocrity is not it. The flood of low-effort GenAI
| stuff filling feeds and drowning signal with noise, as others
| have said: just give us your prompt.
| pseidemann wrote:
| You may have one point.
|
| The industrial age was built on dinosaur slop, and they were
| giant.
| Mistletoe wrote:
| How to make fire or kill a woolly mammoth was not slop come
| on.
| mwidell wrote:
| Low background steel is no longer necessary.
|
| "...began to fall in 1963, when the Partial Nuclear Test Ban
| Treaty was enacted, and by 2008 it had decreased to only 0.005
| mSv/yr above natural levels. This has made special low-
| background steel no longer necessary for most radiation-
| sensitive uses, as new steel now has a low enough radioactive
| signature."
|
| https://en.wikipedia.org/wiki/Low-background_steel
| juvoly wrote:
| Interesting. I guess that analogously, we might find that X
| years after some future AI content production ban, we could
| similarly start ignoring the low background token issue?
| actionfromafar wrote:
| We used a rather low number of atmospheric bombs, while we
| are carpet bombing the internet every day with AI marketing
| copy.
| MadnessASAP wrote:
| The eternal September has finally ended. We've now
| entered the AI winter. It promises to be long, dark, and
| full of annoyances.
| embedding-shape wrote:
| "Winter" in AI (or cryptocurrency, or any at all)
| ecosystems denote a period of low activity, and a focus
| on fundamentals instead of driven by hype.
|
| What we're seeing now is something more like the peak of
| summer. If it ends up being a bubble, and it burtst, some
| months after that will be "AI Winter" as investors won't
| want to continue chucking money at problems anymore, and
| it'll go back to "in the background research" again, as
| it was before.
| MadnessASAP wrote:
| It was a continuation of the nuclear analogy, a nuclear
| winter following a large scale nuclear exchange.
|
| Also that winter comes after September (fall)
| SecretDreams wrote:
| We're bombing the internet into extinction. But we were
| way before AI. It got real bad during the
| SEO/monetization phase. AI was just the final nail.
| piker wrote:
| What's the half-life of a viral meme?
| doe88 wrote:
| Can't wait, in fifty years we will have our data clean again.
| alansaber wrote:
| Since synthetic data for training is pretty ubiquitous seems
| like a novelty
| anticensor wrote:
| You should call it Predecember, referring to the eternal
| December.
| unfunco wrote:
| September?
| littlestymaar wrote:
| ChatGPT was released exactly 3 years ago (on the 30th of
| November) so December it is in this context.
| permo-w wrote:
| surely that would be eternal November then
| littlestymaar wrote:
| No, being released on Nov 30th means November was still
| before the slop era.
| retsibsi wrote:
| In the end the analogy doesn't really work, because
| 'eternal September' referred to what used to be a
| regular, temporary thing (an influx of noobs disrupting
| the online culture, before eventually leaving or
| assimilating) becoming the new normal. 'Eternal {month
| associated with ChatGPT}' doesn't fit because LLM-
| generated content was never a periodic phenomenon.
| hackable_sand wrote:
| AI R&D certainly _was_ periodic. Good thing we put a stop
| to that!
| permo-w wrote:
| to be honest, GPT-3, which was pretty solid and extremely
| capable of producing webslop, had been out for a good
| while before ChatGPT, and GPT-2 even had been being used
| for blogslop years before. maybe ChatGPT was the
| beginning of when the public became aware of it, but it
| was going on well beforehand. and, as the sibling
| commenter points out, the analogy doesn't quite fit
| structurally either
| AlecSchueler wrote:
| Yes, and this site is for everything before the slop era,
| hence eternal November.
| 123malware321 wrote:
| everything is dead after november passes
| anticensor wrote:
| aka 0 December
| themanmaran wrote:
| The low-background steel of the internet
|
| https://en.wikipedia.org/wiki/Low-background_steel
| HelloUsername wrote:
| As mentioned half a year ago at
| https://news.ycombinator.com/item?id=44239481
| thm wrote:
| As mentioned 7 months ago
| https://news.ycombinator.com/item?id=43811732
| Ginger-Pickles wrote:
| As mentioned in this thread :P
| https://news.ycombinator.com/item?id=46103662
| tkgally wrote:
| Somewhat related, the leaderboard of em-dash users on HN before
| ChatGPT:
|
| https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...
| maplethorpe wrote:
| They should include users who used a double hyphen, too -- not
| everyone has easy access to em dashes.
| venturecruelty wrote:
| Oof, I feel like you'll accidentally capture a lot of
| getopt_long() fans. ;)
| Kinrany wrote:
| Excluding those with asymmetrical whitespace around might
| be enough
| gblargg wrote:
| Does AI use double hyphens? I thought the point was to find
| who wasn't AI that used proper em dashes.
| jader201 wrote:
| Anytime I do this -- and I did it long before AI did --
| they are always em dashes, because iOS/macOS translates
| double dashes to em dashes.
|
| I think there may be a way to disable this, but I don't
| care enough to bother.
|
| If people want to think my posts are AI generated, oh well.
| teiferer wrote:
| There is also the difference in using space around em-
| dashes.
| JumpCrisscross wrote:
| > _Anytime I do this -- and I did it long before AI did
| -- they are always em dashes_
|
| It depends if you put the space before and after the
| dashes--that, to be clear, are meant to be there--or if
| you don't.
| oniony wrote:
| I cannot remember ever reading a book where there was a
| space around the dashes.
| kuschku wrote:
| That depends on the language -- whereas German puts
| spaces around --, English afaik usually doesn't.
|
| Similarly, French puts spaces _before_ and after ? !
| while English and German only put spaces afterwards.
|
| [EDIT: I originally wrote that French treats . , ! ?
| specially. In reality, french only treats ? and !
| specially.]
| iLoveOncall wrote:
| French doesn't put one before the period.
| bratwurst3000 wrote:
| french does "," and "." like the british and germans the
| rest is space befor space after
| greenicon wrote:
| In German you use en-dashes with spaces, whereas in
| English it's em-dashes without spaces. Some people
| dislike em-dashes in English though and use en-dashes
| with spaces as well.
| JumpCrisscross wrote:
| > _whereas in English it's em-dashes without spaces_
|
| Didn't know! Woot, I win!
|
| Why does AI have a preference for doing it differently?
| dragonwriter wrote:
| In English, typically em-dashes are set without spaces or
| with thin spaces when used to separate
| appositives/parentheticals (though that style isn't
| universal even in professional print, there are places
| that aet them open, and en-dashes set open can also be
| used in this role); when representating an interruption,
| they generally have no space before but frequently have
| space following. And other uses have other patterns.
| bloak wrote:
| In British English en-dashes with spaces is more common
| than em-dashes without spaces, I think, but I don't have
| any data for that, just a general impression.
| LoganDark wrote:
| Technically, there are supposed to be _hair spaces_
| around the dashes, not regular spaces. They 're small
| enough to be sometimes confused for kerning.
| cachius wrote:
| Em dashes used as parenthetical dividers, and en dashes
| when used as word joiners, are usually set continuous
| with the text. However, such a dash can optionally be
| surrounded with a hair space, U+200A, or thin space,
| U+2009 or HTML named entities   and   These
| spaces are much thinner than a normal space (except in a
| monospaced (non-proportional) font), with the hair space
| in particular being the thinnest of horizontal whitespace
| characters.
|
| https://en.wikipedia.org/wiki/Whitespace_character#Hair_s
| pac...
|
| Typographers usually add space to the left side of the
| following marks: : ; " ' ! ? / ) ] } *
| ? > >> @ (r) (tm) deg ! ' " + + = / - - --
|
| And they usually add space to the right of these:
| " ' / ( [ { > >= < <= PS $ C/ EUR < << [?] m # @ + = / -
| - --
|
| https://www.smashingmagazine.com/2020/05/micro-
| typography-sp...
|
| 1. (letterpress typography) A piece of metal type used to
| create the narrowest space. 2. (typography, US) The
| narrowest space appearing between letters and
| punctuation.
|
| https://en.wiktionary.org/wiki/hair_space
|
| Now I'd like to see how the metal type looks like, but
| ehm... it's difficult googling it. Also a whole
| collection of space types and what they're called in
| other languages.
| fragmede wrote:
| What, no love for our friend the en-dash?
|
| - vs - vs --
| chickensong wrote:
| I once spent a day debugging some data that came from an
| English doc written by someone in Japan that had been
| pasted into a system and caused problems. Turned out to
| be an en-dash issue that was basically invisible to the
| eye. No love for en-dash!
| ben_w wrote:
| Similar.
|
| Compiler error while working on some ObjC. Nothing
| obviously wrong. Copy-pasted the line, same thing on the
| copy. Typed it out again, no issue with the re-typed
| version. Put the error version and the ok version next to
| each other, apparently identical.
|
| I ended up discovering I'd accidentally lent on the
| option key while pressing the "-"; Monospace font, Xcode,
| m-dash and minus looked identical.
| 1718627440 wrote:
| This issue also exists with (so called) "smart" quotes.
| fragmede wrote:
| Which, the iOS keyboard "helpfully" uses for you.
| withinboredom wrote:
| Especially when you're sending some quick scratch code in
| a slack message.
| mh- wrote:
| Pretty much the first thing I turn off on a new laptop
| (it's in the keyboard settings on iOS too.)
| bigiain wrote:
| That would false positive me. I have used double dashes to
| delimit quote attribution for decades.
|
| Like this:
|
| "You can't believe everything you read on the internet." --
| Abraham Lincoln, personal correspondence, 1863
| dragonwriter wrote:
| That's literally a standard use of em-dash being
| approximated by a double hyphen, though.
| SoftTalker wrote:
| Double-hyphen is an en-dash. Triple-hyphen is an em-dash.
| dragonwriter wrote:
| Double hyphen is replaced in some software with an en-dash
| (and in those, a triple hyphen is often replaced with an
| em-dash), and in some with an em-dash; its usually used
| (other than as input to one of those pieces of software) in
| places where an em-dash would be appropriate, but in
| contexts where both an em-dash set closed and an en-dash
| set open might be used, it is often set open.
|
| So, it's not unambiguously s substitute for either is
| essentially its own punctuation mark used in ASCII-only
| environments with some influence from both the use of em-
| dashed and that of en-dashes in more formal environments.
| a5c11 wrote:
| Apparently, it's not only em-dash that's distinctive. I've went
| through comments of the leader, and spot he also uses the
| backtick "'" instead of the apostrophe.
| kuschku wrote:
| I (~100 in the leaderboard, regardless of how you sort) also
| frequently use ' (unicode apostrophe) instead of ' :D
| baiwl wrote:
| Just to be clear this is done automatically by macOS or iOS
| browsers when configured properly.
| lxgr wrote:
| Amazing! But no love for en dashes?
| GaryBluto wrote:
| Why use this when you can use the before: syntax on most search
| engines?
| aDyslecticCrow wrote:
| doesn't actually do anything anymore in Google or bing.
| Thorrez wrote:
| Searching Google for
|
| chatgpt
|
| vs
|
| chatgpt before:2022-01-01
|
| give me quite different results. In the 2nd query, most
| results have a date listed next to them in the results page,
| and that date is always prior to 2022. So the date filtering
| is "working". However, most of the dates are actually Google
| making a mistake and misinterpreting some unimportant date it
| found on the page as the date the page was created. At least
| one result is a Youtube video posted before 2022, that edited
| its title after Chatgpt was released to say Chatgpt.
|
| Disclosure: I work at Google, but not on search.
| progman32 wrote:
| Not affiliated, but I've been using kagi's date range filter to
| similar effect. The difference in results for car maintenance
| subjects is astounding (and slightly infuriating).
| permo-w wrote:
| besides for training future models, is this really such a big
| deal? most of the AI-gened text content is just replacing
| content-farm SEO-spam anyway. the same stuff that any half-awares
| person wouldn't have read in the past is now slightly better
| written, using more em dashes and instances of the word "delve".
| if you're consistently being caught out by this stuff then likely
| you need to improve your search hygiene, nothing so drastic as
| this
|
| the only place I've ever had any issue with AI content is
| r/chess, where people love to ask ChatGPT a question and then
| post the answer as if they wrote it, half the time seemingly
| innocently, which, call me racist, but I suspect is mostly due to
| the influence of the large and young Indian contingent. otherwise
| I really don't understand where the issue lies. follow the exact
| same rules you do for avoiding SEO spam and you will be fine
| system2 wrote:
| Yes indeed, it is a problem. Now the old good sites have turned
| into AI-slop sites because they can't fight the spammers by
| writing slowly with humans.
| pajamasam wrote:
| SEO-spam was often at least somewhat factual and not complete
| generated garbage. Recipe sites, for example, usually have a
| button that lets you skip the SEO stuff and get to the actual
| recipe.
|
| Also, the AI slop is covering almost every sentence or phrase
| you can think of to search. Before, if I used more niche search
| phrases and exact searches, I was pretty much guaranteed to get
| specific results. Now, I have to wade through pages and pages
| of nonsense.
| Cadwhisker wrote:
| In the past, I'd find one wrong answer and I could easily spot
| the copies. Now there's a dozen different sites with the same
| wrong answer, just with better formatting and nicer text.
| finaard wrote:
| The trick is to only search for topics where there are no
| answers, or only one answer leading to that blog post you
| wrote 10 years ago and forgot about.
| zwnow wrote:
| Yes it is a big deal. I cant find new artists without having a
| fear of their art being AI generated, same for books and music.
| I also cant post my stuff to the internet anymore because I
| know its going to be fed into LLM training data. The internet
| is dead to me mostly and thankfully I lost almost all interest
| of being on my computer as much as I used to be.
| darkwater wrote:
| > besides for training future models, is this really such a big
| deal? most of the AI-gened text content is just replacing
| content-farm SEO-spam anyway.
|
| Yes, it is because of the other side of the coin. If you are
| writing human-generated, curated content, previously you would
| just do it in your small patch of Internet, and probably SEs
| (Google...) will pick it up anyway because it was good quality
| content. You just didn't care about SEO-driven shit anyway. Now
| you nicely hand-written content is going to be fed into LLM
| training and it's going to be used - whatever you want it or
| not - in the next generation of AI slop content.
| visarga wrote:
| It's not slop if it is inspired from good content. Basically
| you need to add your original spices into the soup to make it
| not slop, or have the LLM do deep research kind of work to
| contrast among hundreds of sources.
|
| Slop did not originate from AI itself, but from the feed
| ranking Algorithm which sets the criteria for visibility.
| They "prompt" humans to write slop.
|
| AI slop is just an extension of this process, and it started
| long before LLMs. Platforms optimizing for their own interest
| at the expense of both users and creators is the source of
| slop.
| never_inline wrote:
| A colleague sent me a confident ChatGPT formatted bug report.
|
| It misidentified what the actual bug was.
|
| But the tone was so confident, and he replied to my later
| messages using chat gpt itself, which insisted I was wrong.
|
| I don't like this future.
| blitzar wrote:
| I have dozens of these over the years - many of the people
| responsible have "Head of ..." or "Chief ..." job titles now.
| artursapek wrote:
| Did you call his ass out for being lazy and wasting your
| time?
| crazygringo wrote:
| It's not the future. Tell him not to do that. If it happens
| again, bring it to the attention of his manager. Because
| that's not what he's being paid for. If he continues to do
| it, that's grounds for firing.
|
| What you're describing is not the future. It's a fireable
| offense.
| Aurornis wrote:
| > the only place I've ever had any issue with AI content is
| r/chess, where people love to ask ChatGPT a question and then
| post the answer as if they wrote it, half the time seemingly
| innocently
|
| Some of the science, energy, and technology subreddits receive
| a lot of ChatGPT repost comment. There are a lot of people who
| think they've made a scientific or philosophical breakthrough
| with ChatGPT and need to share it with the world.
|
| Even the /r/localllama subreddit gets constant AI spam from
| people who think they've vibecoded some new AI breakthrough.
| There have been some recent incidents where someone posted
| something convincing and then others wasted a lot of time until
| realizing the code didn't accomplish what the post claimed it
| did.
|
| Even on HN some of the "Show HN" posts are AI garbage from
| people trying to build portfolios. I wasted too much time
| trying to understand one of them until I realized they had
| (unknowingly?) duplicated some commits from upstream project
| and then let the LLM vibe code a README that sounded like an
| amazing breakthrough. It was actually good work, but it wasn't
| theirs. It was just some vibecoding tool eventually arriving at
| the same code as upstream and then putting the classic LLM
| written, emoji-filled bullet points in the README
| tobr wrote:
| For images, https://same.energy is a nice option that, being
| abandoned but still functioning since a few years, seems to
| naturally not have crawled any AI images. And it's all around a
| great product.
| voiper1 wrote:
| Of course my first thought was: Let's use this as a tool for AI
| searches (when I don't need recent news).
| ricardo81 wrote:
| FWIW Mojeek (an organic search engine in the classic sense) can
| do this with the before: operator.
|
| https://www.mojeek.com/search?q=britney+spears+before%3A2010...
| pknerd wrote:
| Something generated by humans does not mean high quality.
| Krssst wrote:
| Yes, but AI-generated is always low quality so it makes sense
| to filter it out.
| IshKebab wrote:
| I wouldn't say _always_... Especially because you probably
| only noticed the bad slop. Usually it is crap though.
| josephjrobison wrote:
| Grokipedia would like a word
| a5c11 wrote:
| At least when reading a human-made material you can spot
| author's uncertainty in some topics. Usually, when someone
| doesn't have knowledge of something, he doesn't try to describe
| that. AI, however, will try to convince you that pigs can fly.
| cryptozeus wrote:
| technically you can ask chatgpt to return the same result by
| asking it to filter by year
| RomanPushkin wrote:
| For that purpose I do not update my book on LeanPub about Ruby. I
| just know one day people gonna read it more, because human-
| written content would be gold.
| zkmon wrote:
| Most of college courses and school books haven't changed in
| decades. Some reputed college keep courses for Pascal and Fortran
| instead of Python or Java, just because, it might affect their
| reputation of being classical or pure or to match their campus
| buildings style.
| fastasucan wrote:
| Or because the core knowledge stay the same no matter how it is
| expressed.
| defraudbah wrote:
| ChatGPT also returns content only created before ChatGPT release,
| which is why I still have to google damn it!
| fragmede wrote:
| Click the globe icon below the input box to enable web
| searching by ChatGPT.
| stinos wrote:
| Is that still the case? And even if so how is it going to avoid
| keeping it like that in the future? Are they going to stop
| scraping new content, or are they going to filter it with a
| tool which recognizes their own content?
| defraudbah wrote:
| it's a known problem in ML, I think grok solved it partially
| and chatGPT uses another model on top to search web like
| suggested below. Hence MLOps field appeared, to solve models
| management
|
| I find it a bit annoying to navigate between hallucinations
| and outdated content. Too much invalid information to filter
| out.
| ETH_start wrote:
| I'm grateful that I published a large body of content pre-ChatGPT
| so that I have proof that I'm not completely inarticulate without
| AI.
| phplovesong wrote:
| The slop is getting worse, as there is so much llm generated shit
| online, now new models are getting trained on the slop. Slop
| training slop, and slop. We have gone full circle just in a
| matter of a few years.
| muixoozie wrote:
| I was replaying Cyberpunk 2077 and trying to think of all the
| ways one might have dialed up the dystopia to 11 (beyond what
| the game does). And pervasive AI slop was never on my radar.
| Kinda reminds me of the foreword in Neuromancer bringing
| attention to the fact the book was written before cellphones
| became popular. It's already fucking with my mind. I recently
| watched Frankenstein 2025 and 100% thought gen ai had a role in
| the CGI only to find out the director hates it so much he
| rather die than use it. I've been noticing little things in old
| movies and anime where I thought to myself (if I didn't know
| this was made before gen ai, I would have thought this was
| generated for sure). One example
| (https://www.youtube.com/watch?v=pGSNhVQFbOc&t=412) cityscape
| background in this a outro scene with buildings built on top of
| buildings gave me ai vibes (really the only thing in this whole
| anime), yet this came out ~1990. So I can already recognize a
| paranoia / bias in myself and really can't reliably tell what's
| real.. Probably also other people have this and why some non-
| zero number of people always thinks every blog post that comes
| out was written by gen ai.
| Cyan488 wrote:
| I had the same experience, watching a nature documentary on a
| streaming service recently. It was... not so good, at least
| at the beginning. I was wondering if this was a pilot for AI
| generated content on this streaming service.
|
| Actually, it came out in 2015 and was just low budget.
| dinkblam wrote:
| google results were already 90% SEO crap long before ChatGPT
|
| just use Kagi and block all SEO sites...
| paweladamczuk wrote:
| How do we (or Kagi) know which ones are "SEO sites"? Is there
| some filter list or other method to determine that?
| Jolter wrote:
| If you took Google of 2006, and used that iteration of the
| pagerank algorithm, you'd probably not get most of the SEO
| spam that's so prevalent in Google results today.
| joshvm wrote:
| It seems like a mixture of heuristics, explicit filtering and
| user reports.
|
| https://help.kagi.com/kagi/features/slopstop.html
|
| That's specifically for AI generated content, but there are
| other indicators like how many affiliate links are on the
| page and how many other users have downvoted the site in
| their results. The other aspect is network effect, in that
| everyone tunes their sites to rank highly on Google. That's
| presumably less effective on other indices?
| shevy-java wrote:
| > This is a search tool that will only return content created
| before ChatGPT's first public release on November 30, 2022.
|
| The problem is that Google's search engine - but, oddly enough,
| ALL search engines - got worse before that already. I noticed
| that search engines got worse several years before 2022. So, AI
| further decreased the quality, but the quality had a downwards
| trend already, as it was. There are some attempts to analyse this
| on youtube (also owned by Google - Google ruins our digital
| world); some explanations made sense to me, but even then I am
| not 100% certain why Google decided to ruin google search.
|
| One key observation I made was that the youtube search, was
| copied onto Google's regular search, which makes no sense for
| google search. If I casually search for a video on youtube, I may
| be semi-interested in unrelated videos. But if I search on Google
| search for specific terms, I am not interested in crap such as
| "others also searched for xyz" - that is just ruining the UI with
| irrelevant information. This is not the only example, Google made
| the search results worse here and tries to confuse the user in
| clicking on things. Plus placement of ads. The quality really
| worsened.
| justinclift wrote:
| Are you aware of Kagi (kagi.com)?
|
| With them, at least the AI stuff can be turned off.
|
| Membership is presently about 61k, and seems to be growing
| about 2k per month: https://kagi.com/stats
| amelius wrote:
| Be aware of:
|
| https://www.reddit.com/r/SearchKagi/comments/1gvlqhm/disappo.
| ..
| justinclift wrote:
| Damn. I didn't know that.
|
| Now we need a 2nd Kagi, so we can switch to that one
| instead. :(
| smusamashah wrote:
| There are few other powerful countries, with countless Web
| services, who freely wages war(s) on other countries and
| support wars in many different ways. Is there a way to
| avoid their products?
| jwr wrote:
| Whataboutism doesn't get us anywhere -- saying "but what
| about X" (insert anything for X here) usually results in
| doing nothing.
|
| Some of us would rather take a stand, imperfect as it is,
| than just sit and do nothing. Especially in the very
| clear case of someone (Kagi) doing business with a
| country that invaded a neighboring country for no reason,
| and keeps killing people there.
| graemep wrote:
| Why this particular stand? Is doing nothing any better
| than taking what are essentially random stands? Obviously
| if you are Ukrainian this will be an important stand to
| you, but otherwise doing things based on a mix of what
| the media you like focuses on or whatever is not really
| very different from doing nothing.
| baconbrand wrote:
| Doing something is literally the opposite of doing
| nothing. This is complete gibberish.
| clucas wrote:
| I think "no wars of conquest" is a bright line that was
| crossed by Russia, that hasn't been crossed by other
| nations in a long time. And I think it's important for
| the whole world to take a stand on that, not just the
| nation that was invaded. It's not a "random stand."
| Alex2037 wrote:
| >"no wars of conquest"
|
| how about "no wars of genocide"? you know, like the one
| the collective West had enthusiastically supported for a
| while now?
| AlecSchueler wrote:
| Plenty of people boycott Israeli goods and there's an
| increasing trend of moving away from reliance on American
| services also.
| DrammBA wrote:
| did you just "but what about X" to the previous comment
| which is the whole point of this thread?
| jwr wrote:
| I am amused by my (unpopular and downvoted by now)
| comment by the scourge of "whataboutism" sparked a
| discussion, where comments begin with "how about" :-)
|
| That is exactly my point! Saying "but what about" is akin
| to saying "you shouldn't do anything, because there is
| another unrelated $thing happening elsewhere". I refuse
| to follow this line of thinking.
| clucas wrote:
| I find it much easier to take a strong stand on
| Russia/Ukraine than on Israel/Palestine. The history of
| Israel/Palestine is much more of a gray area. Palestine
| has used plenty of aggressive actions and rhetoric that
| make Israel's actions more understandable (if not
| justified).
|
| Example of actions: Gaza invaded Israel and killed,
| raped, and kidnapped civilians on October 7. Ukraine had
| no such triggering event that caused Russia to invade.
|
| Example of rhetoric: Gaza's political leaders have said
| they want to destroy Israel. I don't think anyone in
| power in Ukraine has said they want to destroy the
| Russian state.
| donkyrf wrote:
| "enthusiastic support"
|
| https://yougov.co.uk/international/articles/52279-net-
| favour...
|
| https://www.pewresearch.org/politics/2025/10/03/how-
| american...
|
| etc etc....
|
| I'm not sure what collective West you're referring to;
| but apparently it excludes every major Western European
| nation, America, and Canada.
| jwr wrote:
| > Why this particular stand?
|
| First, _any_ stand is better than whataboutism and just
| sitting there doing nothing.
|
| Second, this stand results from my thoughts. It is my
| stand. There are many like it, but this one is mine.
|
| Third, in the history of the modern world there were very
| few black&white situations where there was one side which
| was clearly the aggressor. This is one of them.
| graemep wrote:
| > First, any stand is better than whataboutism and just
| sitting there doing nothing.
|
| I definitely disagree with this. There are many cases
| where you might take the wrong stand, especially where
| you do not have detailed knowledge of the issue you re
| taking a stand over.
| artursapek wrote:
| "whataboutism" is the reddit word for calling out
| hypocrisy
| mcv wrote:
| As a European, I'm also increasingly in favour of
| avoiding American companies. Especially the big
| corrupting near-monopolists.
|
| It's worth pointing out the flaws of all bad actors. The
| more info we have, the more effectively we can act.
| eirini1 wrote:
| I don't agree with this logic. It implies that people who
| use Google, Bing and a million other products made by US-
| based companies are supportive of the huge amount of
| attrocities commited or aided by the United States. Or
| other countries. It feels very odd to single out Russia's
| invasion of Ukraine but to minimize the Israeli genocide of
| palestinians in Gaza, the multiple unjust wars waged by the
| United States all over the world etc.
| kortilla wrote:
| Google doesn't censor those atrocities for the US
| government. That's the key difference.
| ssl-3 wrote:
| It's often fairly easy to find US government-centric news
| and criticism with Google.
|
| But as one counterexample: The end of the US penny was
| formed and announced not with public legislative
| discourse, nor even with an executive order, but with a
| brief social media post by the president.
|
| And I don't mean that it's atrocious or anything, but I
| wanted to see that social media post myself. Not a report
| about it, or someone's interpretation of it, but -- you
| know -- the actual utterance from the horse's mouth.
|
| Which should be a simple matter. After all, it's the WWW.
|
| And I've been Googling for as long as there has been a
| Google to Google with. I'd like to think that I am
| proficient at getting results from it.
|
| But it was like pulling teeth to get Google to
| eventually, kicking and screaming, produce a link to the
| original message on Truth Social.
|
| If that kind of active reluctance isn't censorship on
| Google's part, then what might it be described as
| instead?
|
| And if they're seeking to keep me away from the root of
| this very minor issue, then what else might they also be
| working to keep me from?
| baconbrand wrote:
| It doesn't imply any of that at all.
|
| There certainly is a huge army of people ready to spout
| this sort of nonsense in response to anyone talking about
| doing anything.
|
| Hard to know what percentage of these folks are trying to
| assuage their own guilt and what percentage are state
| actors. Russia and Israel are very chronically online,
| and it behooves us internet citizens to keep that in
| mind.
| super256 wrote:
| Yandex has the best image search, and others are years
| behind it. Further more Nebius has sold all group's
| businesses in Russia and certain international market. They
| are completely divested from Russia for a 1.5 years
| already: https://nebius.com/newsroom/ynv-announces-
| successful-complet...
|
| The post you linked was posted when the divestment was
| already going underway, so it is at least dishonest if not
| malicious.
| cluckindan wrote:
| I wouldn't trust a divorce where one party still provides
| for the other.
| _heimdall wrote:
| You don't "trust" a divorce is alimony was part of the
| settlement?
| kortilla wrote:
| Yep, when the party paying can decide not to pay and
| there are no teeth to extract payment, that gives immense
| power to the payer.
| _heimdall wrote:
| At least in my area, there are legal avenues if alimony
| goes unpaid. Assets can be seized to pay off late
| payments and wages can be garnished.
|
| Its a different story if the payer truly can't afford to
| pay the alimony, but at that point they wouldn't have the
| immense power you are concerned with.
| varjag wrote:
| Yandex is the government approved search engine in
| Russia, which is impossible without the state exerting
| control over it. I wouldn't pay much attention to
| divestment, it's not how any of that works.
|
| For instance here you can learn that Yandex NV is fully
| controlled by a group of Russian investors: https://www.r
| bc.ru/business/06/03/2024/65e7a0f29a7947609ea39...
| oh_fiddlesticks wrote:
| The government's where the offices of a software company
| are physically located exert control over them. To follow
| this logic to its end and apply it even handedly results
| in nation based NIH syndrome surely?
| varjag wrote:
| You are talking about an entity whose ownership is 99.8%
| Russian nationals and state companies; whose employees
| for the most part are Russian nationals, whose main
| market is Russia and with very little tangible assets
| that can be arrested in the Netherlands. The only reason
| for this "divestment" is sanctions evasion.
| tryauuum wrote:
| you clearly don't know anything about nebius
|
| They have a lot of hardware in e.g. Finland. I don't
| think they provide GPU access to the russian companies,
| feel free to correct me
| varjag wrote:
| We were talking search engines here, but interesting
| indeed! What's the name of Neibus CEO?
| stopthe wrote:
| Some clarification. Since 2024 Yandex NV split into
| Nebius (NL-registred NASDAQ-listed company, no longer a
| search engine) and russian-based Yandex. The latter is
| fully controlled by russian investors.
| Ylpertnodi wrote:
| https://news.ycombinator.com/item?id=42349797 (11 months
| ago)
|
| https://som.yale.edu/story/2022/over-1000-companies-have-
| cur...
|
| You pays your money, you takes your choice.
| hopelite wrote:
| You are mistaken to think that zealots can be reasoned
| with. They have been conditioned to react upon anything
| "Russia" like a Pavlovian cue, a command of the trained
| animal. They are a herd that moves as a herd, based on
| cues of lead animals. No amount of proof or evidence will
| ever dissuade them from a position that the herd is
| moving in. They cannot reason on their own and lack the
| courage to separate, let alone say something that the
| herd disapproves of, lest they be expelled from the herd
| and ganged up on.
| spIrr wrote:
| Thank you. Didn't know that and was, until now, considering
| paying for a Kagi subscription.
| scotty79 wrote:
| > "We do not discriminate based on current geopolitical
| issues."
|
| That's one way of phrasing it.
| artursapek wrote:
| based Vlad tbh
| Ferret7446 wrote:
| I find this amusing, because it seems like Kagi's target
| audience dislikes this (politically polarized), and I as
| someone who is not Kagi's target audience likes this
| (politically neutral).
| embedding-shape wrote:
| Wait, what? Their choice is specifically a politically
| neutral one, wouldn't that mean their target audience is
| a politically neutral one? Why is your impression that
| Kagi's target audience is politically polarized users?
| Been a paying user of Kagi for years, never got that
| impression.
|
| FWIW, I don't think Kagi should remove or avoid indexing
| content from countries that invade others, because a lot
| of the times websites in those countries have useful
| information on them. If Kagi were to enact such a block,
| it would mean it would no longer surface results from HN,
| reddit and a bunch of other communities, effectively
| making the search engine a lot less useful.
| bawolff wrote:
| Politics is not just a 1 dimensional line.
| saturnite wrote:
| Yeah, it's two dimensional. One axis goes from good to
| evil. The other axis, chaotic to lawful.
| Dilettante_ wrote:
| There's a secret third dimension you can ascend to
| through a hole in the neutral middle where the forces of
| the other two axes cancel out. 'The Elites' doesn't want
| you to know this.
|
| /hj?
| brendoelfrendo wrote:
| Why is supporting Yandex, who are involved in Russian
| politics and linked to the ruling regime, a neutral
| decision? That is very much a political decision, in the
| same way that working with US tech companies is a
| political decision. You need to decide what you're
| willing to tolerate and where your ethical lines are
| drawn; the alternative isn't neutrality, it's nihilism.
| lostlogin wrote:
| Solution: Kagi as it is, but with a 'remove Yandex'
| toggle. Even if it was a paid upgrade, I'd take it.
| duxup wrote:
| Yeah I kept thinking "man I should try kagi" and then that
| :(
| akie wrote:
| Try it anyway.
| alessioalex wrote:
| He probably doesn't want to support genocide.
| richwater wrote:
| Hope he doesn't pay his taxes then considering where US
| aid ends up
| duxup wrote:
| I pay my taxes, that's not optional. Paying search engine
| is.
| duxup wrote:
| Naw, the well is poisoned and I question the company's
| decision making at this point.
| buellerbueller wrote:
| Imo, Kagi is _still_ the better option, because it isn 't
| supporting the global surveillance mechanism we call
| advertising. All these people, missing the forest for the
| single yandex tree.
| stronglikedan wrote:
| Meh. Most people, including myself, couldn't care less, and
| Yandex image search is very capable.
| troyvit wrote:
| So if America invades Venezuela should we all stop using
| google? Should we have stopped using google when the U.S.
| invaded Iraq and killed 150,000 people[1]?
|
| Should we stop using products imported from China for the
| cultural genocide they've perpetrated against the
| Uyghurs?[2]
|
| Is Yandex Russia?
|
| [1]
| https://en.wikipedia.org/wiki/Casualties_of_the_Iraq_War
|
| [2] https://en.wikipedia.org/wiki/Persecution_of_Uyghurs_in
| _Chin...
| brendoelfrendo wrote:
| Honest answers are yes, yes, and yes. It may be
| unavoidable for the average person to avoid imported
| goods from China, but we should remain aware of our place
| in the world and try where we can. If the US does invade
| Venezuela, I sincerely hope that individuals and business
| owners try to cut as many ties with complicit US tech
| companies as possible. Honestly, with this clusterfuck of
| war crimes going on over "drug boats," I hope they're
| already starting.
| alessioalex wrote:
| You can take whatever stand you want. When there's a
| country that killed, raped and tried to exterminate most
| of Eastern Europe we can choose to cut any and all ties
| with it and consider them for all intents and purposes
| ..terrorists.
| mcv wrote:
| And the fact that there are other countries that should
| also be considered terrorists, doesn't mean we shouldn't
| boycott this one. It means we should boycott them all.
| But boycotting a few is still better than nothing.
| troyvit wrote:
| I sort-of see where you're coming from, but it also
| ignores a double standard to me. Don't buy search from a
| company that uses an api from another company that is (or
| was? unclear) based in a country that invaded another
| country and completely upended the world order. For some
| people that's a line that they don't want to cross and I
| get it.
|
| However if that's the case how can they continue buying
| Chinese products when China has done the same thing, but
| worse, and for longer, to their own population? Because
| it's less convenient to stop? _That_ to me lands squarely
| in the "take whatever stand you want" category with the
| addendum of, "and don't worry if it doesn't make sense."
|
| Is it because it's within their own borders and therefore
| isn't our problem?
| phantasmish wrote:
| I _directly_ use Yandex sometimes, because there are huge
| blind spots for all the US-based engines I 'm aware of, and
| it fills some of them in.
|
| If someone can point me to a better index for that purpose,
| I'd love to avoid Yandex. Please inform me.
| devmor wrote:
| Kagi is based in the United States, as is YC.
|
| If you are concerned about heinous war crimes and the
| slaughter of civilians to the point that you don't want to
| use private services from countries that conduct such acts,
| you should avoid both already.
| immibis wrote:
| Why's that something to be aware of? Yandex is actually a
| good search engine, so I'm told, as long as you don't
| search for things related to Russian politics. Kagi
| presumably knows this and won't use their results related
| to Russian politics.
|
| Feels more like a scare campaign to me - someone doesn't
| want you to use Kagi, and points to Yandex as a reason for
| that.
| Seattle3503 wrote:
| I'm surprised this is possible given the sanctions on
| Russia.
| dncornholio wrote:
| How does Kagi know what is AI stuff? I don't see how they can
| 'just turn it off'
| Zambyte wrote:
| It's driven by community ratings.
|
| https://news.ycombinator.com/item?id=45919067
| pratyahava wrote:
| so it is like humans vs robots started? robots ask humans
| questions to verify they are not robots. humans mark
| content as robot-generated to filter it out.
| pvdebbe wrote:
| My first instinct is that users abuse it like they do any
| other report/downvote mechanism. They see something they
| just don't plain like, they report it as AI slop.
| justinclift wrote:
| By "turn it off" I mostly mean that Kagi have their own AI
| driven tools available, but a toggle in your user settings
| disables it completely.
|
| ie it's not forced down your throat, nor
| mysteriously/accidentally/etc turned back on occasionally
| mebizzle wrote:
| Haven't looked back since I signed up.
| tempacct2cmmnt wrote:
| I've had much better results with Kagi than with Google in
| the past few months. I'd trialed them a couple times in the
| past and been disappointed, but that's no longer the case.
| PaulDavisThe1st wrote:
| The AI stuff in google search can be turned off.
| https://www.google.com/search?udm=14&q=kagi
|
| My default browser search tool is set to google with ?udm=14
| automatically appended.
| nailer wrote:
| What is UDM? Presumably the U is Urchin but what's the
| rest?
| PaulDavisThe1st wrote:
| Never seen it documented.
| Maken wrote:
| There is also the fact that automatically generated content
| predates ChatGPT by a lot. By around 2020 most Google searches
| already returned lots of SEO-optimized pages made from scrapped
| content or keyword soups made by rudimentary language models or
| markov chains.
| black3r wrote:
| Well there's also the fact that GPT-3 API was released in
| June 2020 and its writing capabilities were essentially on
| par with ChatGPT initial release. It was just a bit harder to
| use, because it wasn't yet trained to follow instructions, it
| only worked as a very good "autocomplete" model, so prompting
| was a bit "different" and you couldn't do stuff like "rewrite
| this existing article in your own words" at all, but if you
| just wanted to write some bullshit SEO spam from scratch it
| was already as good as ChatGPT would be 2 years later.
| wongarsu wrote:
| Also the full release of GPT-2 in late 2019. While GPT-2
| wasn't really "good" at writing, it was more than good
| enough to make SEO spam
| Maken wrote:
| I didn't remember that, but it would explain the spam
| exponential grow back then.
| gield wrote:
| And 10 years ago, Reddit was already experimenting with auto-
| generated subreddits:
| https://www.reddit.com/r/SubredditSimulator.
| PunchyHamster wrote:
| It was popular way before 2020 but Google managed to keep up
| with SEO tricks for good decade+ before. Guess it got to
| breaking point.
| bratwurst3000 wrote:
| the main theory is that with bad results you have to search
| more and get more engaged in ads so more revenue for google.
| Its enshitification
| master-lincoln wrote:
| I think this is about trustworthy content, not about a good
| search engine per se
| trinix912 wrote:
| But it's not necessarily trustworthy content, we had
| autogenerated listicles and keyword list sites before
| ChatGPT.
| GTP wrote:
| Sure, but I think that the underlying assumption is that,
| after the public release of ChatGPT, the amount of
| autogenerated content on the web became significantly
| bigger. Plus, the auto-generated content was easier to spot
| before.
| robot-wrangler wrote:
| > Google made the search results worse here
|
| Did you mean:
|
| worse results near me
|
| are worse results worth it
|
| worse results net worth
|
| best worse results
|
| worse results reddit
| d-lisp wrote:
| search: Emacs Did you mean vim ?
| (vice-versa)
| ganzsz wrote:
| Tbh, this sounds like a Google Easter egg.
| mghackerlady wrote:
| Because it is
| zipy124 wrote:
| Honestly the biggest failing is just SEO spam sites got too
| good at defeating the algorithm. The amount of bloody listicles
| or quora nonsense or backlink farming websties that come up in
| search is crazy.
| Nextgrid wrote:
| This is bullshit the search engines want you to believe. It's
| trivial to detect sites that "defeat" the algorithm; you
| simply detect their incentives (ads/affiliate links) instead.
|
| Problem is that no mainstream search engine will do it
| because they happen to _also_ be in the ad business and
| wouldn 't want to reduce their own revenue stream.
| AznHisoka wrote:
| For most commercial related terms, I suspect if you got rid
| of all "spanmy" results you would be left with almost
| nothing. No independent blogger is gonna write about the best
| credit card with travel points.
| eszed wrote:
| I agree with your point, but you picked a poor example.
| Have you _met_ any credit reward min-maxers?
| baconbrand wrote:
| I had a coworker who kept up a blog about random purchases
| she'd made, where she would earn some money via affiliate
| links. I thought it was horrendously boring and weird, and
| the money made was basically pocket change, but she seemed
| to enjoy it. You might be surprised, people write about all
| sorts of things.
| asdff wrote:
| People used to do it early internet before affiliate
| marketing really took it over. Certainly it was more
| genuine and products were bemoaned for their compromises
| in one dimension as much as praised for their performance
| in another. Everything is a glowing review now and
| comparisons are therefore meaningless.
| strbean wrote:
| Sites like Credit Karma / NerdWallet exist. While I think
| they are rife with affiliate link nonsense and paid
| promotion masquerading as advice, I'm also pretty sure they
| have paid researchers and writers generating genuine
| content. Not sure that quite falls into the bucket of SEO
| blogspam.
| asdff wrote:
| It still counts because they would only ever recommend
| affiliate partnered products.
| watwut wrote:
| Afaik they did not lost the fight. They stopped trying,
| because it was good for short term earnings
| masfuerte wrote:
| Yes, this is true. It was revealed in Google emails
| released during antitrust hearings. Google absolutely made
| a deliberate decision to enshittify their search results
| for short term gains.
|
| Though maybe it's a long term gain. I know many normal
| (i.e. non-IT) people who've noticed the poor search
| results, yet they continue to use Google search.
| duxup wrote:
| I feel like google gave up the fight at some point. I think
| HN had some good articles that indicated that.
| strbean wrote:
| Certainly seems that way if you observed the waves of
| usability Google search underwent in the first 15 years.
| There was several distinct cycles where the results were
| great, then garbage, then great again. They would be
| flooded with SEO spam, then they would tweak and penalize
| the SEO spam heavily, then SEO would catch up.
|
| The funny thing is that it seems like when they gave up it
| wasn't because some new advancement in the arms race. It
| was well before LLMs hit the scene. The SEO spam was still
| incredibly obvious to a human reader. Really seems like
| some data-driven approach demonstrated that surrendering on
| this front led to increased ad revenue.
| groundzeros2015 wrote:
| Significant changes were made to Google and YouTube in 2016 and
| 2017 in response to the US election. The changes provided more
| editorial and reputation based filtering, over best content
| matching.
| benterix wrote:
| > if I search on Google search for specific terms, I am not
| interested in crap such as "others also searched for xyz" -
| that is just ruining the UI with irrelevant information
|
| You assume the aim here is for you to find relevant
| information, not increase user retention time. (I just love the
| corporate speak for making people's lives worse in various
| ways.)
| mcv wrote:
| You finding relevant information used to be the aim.
| Enshittification started when they let go of that aim.
| 123malware321 wrote:
| ML and AI killed it between 2011-2016 somewhere.
| https://en.wikipedia.org/wiki/Dead_Internet_theory
| juujian wrote:
| The problem is that before Nov 30, 2022 we also had plenty of
| human-generated slop bearing down on the web. SEO content
| specifically.
| ForHackernews wrote:
| Goodhart's law applies to links, too. Google monetized them and
| destroyed their value as a signal.
| jollyllama wrote:
| > The problem
|
| That's a separate problem. The search algorithm applied on top
| of the underlying content is a separate problem from the
| quality or origin of the underlying content, in aggregate.
| 0xEF wrote:
| > I am not 100% certain why Google decided to ruin google
| search.
|
| Ask Prabhakar Raghavan. Bet he knows.
| xnx wrote:
| Counterpoint: The experience of quickly finding succinct
| accurate responses to queries has never been better.
|
| Years ago, I would consider a search "failed" if the page with
| related information wasn't somewhere in the top 10. Now a
| search is "failed" if the AI answer doesn't give me exactly
| what I'm looking for directly.
| codyb wrote:
| I've been using DuckDuckGo for the last... decade or so. And it
| still seems to return fairly relevant documentation towards the
| top.
|
| To be fair, that's most of what I use search for these days is
| "<<Programming Language | Tool | Library | or whatever>>
| <<keyword | function | package>>" then navigate to the
| documentation, double check the versions align with what I'm
| writing software in, read... move on.
|
| Sometimes I also search for "movie showtimes nyc" or for a
| specific venue or something.
|
| So maybe my use cases are too specific to screw up, who knows.
| If not, maybe DDG is worth a try.
| geldedus wrote:
| DuckDuckGo uses Bing search results.
| EGreg wrote:
| Can't we just append "before:2021-01-01" to Google?
|
| I use this to find old news articles for instance.
| theodric wrote:
| This tool has no future. We have that in common with it, I fear.
|
| What we really need to do is build an AI tool to filter out the
| AI automatically. Anybody want to help me found this company?
| keiferski wrote:
| Projects like this remind me of a plot point in the Cyberpunk
| 2077 game universe. The "first internet" got too infected with
| dangerous AIs, so much so that a massive firewall needed to be
| built, and a "new" internet was built that specifically kept out
| the harmful AIs.
|
| (Or something like that: it's been awhile since I played the
| game, and I don't remember the specific details of the story.)
|
| It makes me wonder if a new human-only internet will need to be
| made at some point. It's mostly sci-fi speculation at this point,
| and you'd really need to hash out the details, but I am thinking
| of something like a meatspace-first network that continually
| verifies your humanity in order for you to retain access. That
| doesn't solve the copy-paste problem, or a thousand other ones,
| but I'm just thinking out loud here.
| jascha_eng wrote:
| The problem really is that it is impossible to verify that the
| content someone uploads came from their mind and not a computer
| program. And at some point probably all content is at least
| influenced by AI. The real issue is also not that I used
| chatgpt to look up a synonym or asked a question before writing
| an article, the problem is when I copy paste the content and
| claim I wrote it.
| Ylpertnodi wrote:
| > The problem really is that it is impossible to verify that
| the content someone uploads came from their mind and not a
| computer program.
|
| Er...digital id.
| _heimdall wrote:
| Ignoring the privacy and security issues for a moment, how
| would having a digital ID prove that the blog post I put on
| my site came only out of my own mind and I didn't use an
| LLM for it?
| visarga wrote:
| > the problem is when I copy paste the content and claim I
| wrote it
|
| Why is this the problem and not the reverse - using AI
| without adding anything original into the soup? I could
| paraphrase an AI response in my own words and it will be no
| better. But even if I used AI, if it writes my ideas, then it
| would not be AI slop.
| immibis wrote:
| There doesn't need to be any difference in treatment between
| AI slop and human slop. The point isn't to keep AI out - it's
| to keep spam and slop out. It doesn't matter whether it's
| produced by a being made of carbon or silicon.
|
| If someone can consistently produce high-quality content with
| AI assistance, so be it. Let them. Most don't, though.
| jascha_eng wrote:
| I think the main issue is that when content is hand written
| you can be certain someone put at least the effort it takes
| to write into it. And while some people write fast, I would
| assume that at least means they have read their own writing
| once.
|
| AIslop you can produce faster than you're able to read it.
| This makes it incredibly costly to filter out in
| comparison. It just messes so much with the signal to noise
| ratio on the web.
| fao_ wrote:
| > And at some point probably all content is at least
| influenced by AI.
|
| [citation needed]
|
| (I see absolutely no reason why that should be the case)
| asdff wrote:
| The issue is most things being derivative along with AI now
| representing an increasing share of "most things" from
| which to derive from.
| lukebuehler wrote:
| Arguably this is already happening with much human-to-human
| interactions moving to private groups on Signal, WhatsApp,
| Telegram, etc.
| SonnyTark wrote:
| I share an opinion with Nick Bostrom, once a civilization
| disrupting idea (like LLMs) is pulled out of the bag, there is
| no putting it back. People in isolation will recreate it simply
| because it's now possible. All we can do is adapt.
|
| That being said, the idea of a new freer internet is reality..
| Mastodon is a great example. I think private havens like
| discord/matrix/telegram are an important step on the way.
| ionwake wrote:
| how does one keep ai out of private havens? thorough
| verification? is that the future? private havens on
| platforms?
| embedding-shape wrote:
| In person web of trust in order to join any private
| community. It'll suck and be hard in the beginning, but
| once you reach a threshold, it'll be OK. Ban entire trees
| of users when you discover bots/puppets, to set an example.
| visarga wrote:
| So we expect either 1. people using AI and copy pasting
| into the human-only network, or 2. other people claiming
| your text sounds like AI and ostracizing you for no good
| reason. It won't be a happy place - I know from anti-
| generative AI forums.
| immibis wrote:
| Yep and then you deperson them
| visarga wrote:
| > a new human-only internet
|
| Only if those humans don't take their leads from AI. If they
| read AI and write, not much benefit.
| pavel_lishin wrote:
| There were also similar plot points mentioned in Peter Watts'
| Starfish trilogy, and Neal Stephenson's Anathem.
| Roritharr wrote:
| I hope there's an uncensored version of the Internet Archive
| somewhere, I wish I could look at my website ca. 2001, but I
| think it got removed because of some fraudulent DMCA claim
| somewhere in the early 2010s.
| lxgr wrote:
| > This is a search tool that will only return content created
| before ChatGPT's first public release on November 30, 2022.
|
| How does it do that? At least Google seems to take website
| creation date metadata at face value.
| erikpukinskis wrote:
| Interesting concept. As a side benefit this would allow you to
| make steady progress fighting SEO slop as well, since there can
| be no arms race if you are ignoring new content.
|
| You could even add options for later cutoffs... for example, you
| could use today's AIs to detect yesterday's AI slop.
| softwaredoug wrote:
| The other day I was researching with ChatGPT.
|
| * ChatGPT hallucinated an answer
|
| * ChatGPT put it in my memory, so it persisted between
| conversations
|
| * When asked for a citation, ChatGPT found 2 AI created articles
| to back itself up
|
| It took a while, but I eventually found human written
| documentation from the organization that created the technical
| thingy I was investigating.
|
| This happens A LOT for topics on the edge of knowledge easily
| found on the Web. Where you have to do true research, evaluate
| sources, and make good decisions on what you trust.
| fireflash38 wrote:
| AI reminds me of combing through stackoverflow answers. The
| first one might work... Or it might not. Try again, find a
| different SO problem and answer. Maybe third times the charm...
|
| Except it's all via the chat bot and it isn't as easy to get it
| to move off of a broken solution.
| visarga wrote:
| Simple solution - run the same query on 3 different LLMs with
| different search integrations, if they concur chances of
| hallucination are low.
| baconbrand wrote:
| Or you could just... not use LLMs
| asdff wrote:
| Or they've converged on the same bullshit
| vertnerd wrote:
| Just the other evening, as my family argued about whether some
| fact was or was not fake, I detached from the conversation and
| began fantasizing about whether it was still possible to buy a
| paper encyclopedia.
| stopthe wrote:
| In hindsight, that would've been a real utility use case for
| NFTs. A decentralized cryptographic prove that some content
| existed in a particular form at a particular moment.
| Barathkanna wrote:
| I didn't know "eccentric engineering" was even a term before
| reading this. It's fascinating how much creativity went into
| solving problems before large models existed. There's something
| refreshing about seeing humans brute force the weird edges of a
| system instead of outsourcing everything to an LLM.
|
| It also makes me wonder how future kids will see this era. Maybe
| it will look the same way early mechanical computers look to us.
| A short period where people had to be unusually inquisitive just
| to make things work.
| hombre_fatal wrote:
| Maybe like how I view my dad and the punchcard era: cool and
| endearing that he went through that, but thankful that I don't
| have to.
| dpedu wrote:
| I mean I get it, but it seems a bit silly. What's next - an image
| search engine that only returns images created before photoshop?
| josephjrobison wrote:
| The real gold is content created before the internet!
| javaskrrt wrote:
| This is such a great idea
| audiala wrote:
| It doesn't really work. I tried my website and it shows up, while
| definitely being built after 2023. There is a mistake in the
| metadata of the page that shows it as from 2011.
|
| https://audiala.com/changelog
| potato-peeler wrote:
| You don't need an extension to do this. Simply add a "before:"
| search filter to your search query, eg -
| https://www.google.com/search?q=Happiness+before%3A2022
| dwa3592 wrote:
| so it's a filter by date and you chose the chatgpt's public
| release?
| Bad_Initialism wrote:
| How about a search engine that only returns what you searched
| for, and not a million other unrelated things that it hopes you
| might like to buy?
|
| This goes for you, too, website search.
| diavarlyani wrote:
| We now need an extension to hide 3 years of the internet because
| it was written by robots. This timeline is undefeated.
| throwawayk7h wrote:
| I noticed AI-generated slop taking over google search results
| well before ChatGPT. So I don't agree with the premise on this
| site that you can be "you can be sure that it was written or
| produced by the human hand."
| 1vuio0pswjnm7 wrote:
| "This browser extension uses the Google search API to only return
| content published before Nov 30th, 2022 so you can be sure that
| it was written or produced by the human hand."
| micromacrofoot wrote:
| What kind of heuristics does it use to determine age? a lot of
| content on Google actually backdates for some reason...
| presumably some sort of SEO scam?
| stocksinsmocks wrote:
| I really thought this was going to be the Dewey Decimal system.
| Exclude sources from this century. It's the only way to be sure.
| ris wrote:
| For a while I've been saying it's a pity we hadn't been regularly
| trusted-timestamping everything before that point as a matter of
| course.
| 2OEH8eoCRo0 wrote:
| low-background information
___________________________________________________________________
(page generated 2025-12-01 23:01 UTC)