[HN Gopher] Ask HN: What are these low quality "code snippet" si...
       ___________________________________________________________________
        
       Ask HN: What are these low quality "code snippet" sites?
        
       Whenever i am trying to google a code issue i have, there is
       countless low quality sites just showing SO threads with no added
       value whatsoever. It is so annoying it actually drives me mad.
       Does anyone know what's up with that?  I am really disappointed
       because the guys creating these sites (i guess for some kind of
       monetization) must have some relation to coding. But i feel this is
       an attack against all of us. Every programmer should be grateful
       for the opportunity to find good quality content quickly. Now my
       search results are flooded with copy & paste from SO. They are
       killing that.  Am I the only one experiencing this or being that
       annoyed by it?  P.S: I don't name URLs because if you don't know
       what I am talking about already, you probably don't have that
       issue.
        
       Author : endofreach
       Score  : 396 points
       Date   : 2021-12-01 14:34 UTC (8 hours ago)
        
       | viktorcode wrote:
       | The issue is actually pretty old. There was a time when Google
       | introduced blacklisting of search results and revenue of those
       | sites dived. Sadly, later Google rolled back the blacklist.
        
         | matt_heimer wrote:
         | Any idea why they rolled it back?
        
           | datenarsch wrote:
           | Because Google is not interested in serving the best possible
           | search results but rather in serving those that will make
           | them the most money.
        
           | russh wrote:
           | It was effective and those affected were able to get the
           | feature axed.
        
       | mkl95 wrote:
       | I believe Google have hit a sweet spot (for them) where they can
       | keep you browsing a specific topic for a long time while still
       | showing you mildly interesting results. Since the results are
       | consistently on topic, you are shown ads that are interesting to
       | you time and time again, which results in a lot of clicks and a
       | lot of revenue.
        
         | Taylor_ wrote:
         | I really hope that is not true but I guess if you are
         | optimizing for a metric then that could make sense.
        
       | dvirsky wrote:
       | One thing that makes SO an easy target for this is that they let
       | you download all their data and you don't even need to crawl and
       | scrape the content from the website. Just download a dump, put it
       | in an database, slap an HTML template on top of it, splash a few
       | ads, and boom.
        
       | dmortin wrote:
       | Search spam sites can be reported to google:
       | https://developers.google.com/search/docs/advanced/guideline...
        
         | beefield wrote:
         | Quoting: "While Google does not use these reports to take
         | direct action against violations"
         | 
         | Given e.g. pinterest in the google results, I find it difficult
         | to imagine as sure way to waste your time uselessly than to
         | report SEO spam sites to google. It is obvious they do not care
         | the slightest bit.
        
       | ScottWRobinson wrote:
       | For years now I've ran a programming site (stackabuse.com) and
       | have closely followed the state of Google SERPs when it comes to
       | programming content. A few thoughts/ramblings:
       | 
       | - The search results for programming content has been very
       | volatile the last year or so. Google has released a lot of core
       | algorithm updates in the last year, which has caused a lot of
       | high-quality sites to either lose traffic or stagnate.
       | 
       | - These low-quality code snippet sites have always been around,
       | but their traffic has exploded this year after the algorithm
       | changes. Just look at traffic estimates for one of the worst
       | offenders - they get an estimated 18M views each month now, which
       | has grown almost 10x in 12 months. Compare that to SO, which has
       | stayed flat or even dropped in the same time-frame
       | 
       | - The new algorithm updates seem to actually hurt a lot of high-
       | quality sites as it seemingly favors code snippets, exact-match
       | phrases, and lots of internal linking. Great sites with well-
       | written long-form content, like RealPython.com, don't get as much
       | attention as they deserve, IMO. We try to publish useful content,
       | but consistently have our traffic slashed by Google's updates,
       | which end up favoring copy-pasted code from SO, GitHub, and even
       | our own articles.
       | 
       | - The programming content "industry" is highly fragmented
       | (outside of SO) and difficult to monetize, which is why so many
       | sites are covered in ads. Because of this, it's a land grab for
       | traffic and increasing RPMs with more ads, hence these low-
       | quality snippet sites. Admittedly, we monetize with ads but are
       | actively trying to move away from it with paid content. It's a
       | difficult task as it's hard to convince programmers to pay for
       | anything, so the barrier to entry is high unless you monetize
       | with ads.
       | 
       | - I'll admit that this is likely a difficult problem because of
       | how programmer's use Google. My guess is that because we often
       | search for obscure errors/problems/code, their algorithm favors
       | exact-match phrases to better find the solution. They might then
       | give higher priority to pages that seem like they're dedicated to
       | whatever you searched for (i.e. the low-quality snippet sites)
       | over a GitHub repo that contains that snippet _and_ a bunch of
       | other unrelated code.
       | 
       | Just my two cents. Interested to hear your thoughts :)
        
         | andai wrote:
         | >it's hard to convince programmers to pay for anything
         | 
         | Offtopic but I'm curious why this is the case? Is the Free
         | Software movement responsible for this mindset?
        
           | TheDudeMan wrote:
           | It's hard to convince anyone to pay for anything.
        
           | john-tells-all wrote:
           | I train programmers, and strongly recommend they buy books or
           | do other types of money/time investment to make themselves
           | better (and more highly-paid programmers).
           | 
           | They won't do it.
           | 
           | I've had multiple programmers literally shocked and avoid my
           | outstretched book. Once I got a question, said "I just read
           | the answer to this in this book right here"... and the
           | programmer refused to read the book to answer his question.
           | 
           | I don't get it.
           | 
           | This, coupled with companies lack of investment in their
           | expensive engineers, is mystifying.
           | 
           | None of the above has anything to do with the FSF.
        
             | tobyhinloopen wrote:
             | Maybe books are too low density. Like the "quantity of
             | information" per amount of words is lower than a blog post
             | for example.
             | 
             | I dunno, I'm not really a reading type but I do own
             | programming-related books. It's the only type of book I
             | own. I learned a lot from books like The Clean Coder, The
             | Pragmatic Programmer and some 90's book about Design
             | Patterns with c++ examples and I don't even write c++
        
           | ahepp wrote:
           | The content I want already exists. It's provided for no
           | charge on stackoverflow, reddit, Wikipedia, cppreference, and
           | a handful of other high quality sources. All of which do not
           | charge users a fee, most of which obtain the content from
           | users in the first place.
           | 
           | So as far as I see it, the problem is not that the content is
           | uneconomical to produce. The problem is that searching
           | doesn't discover the content I want. It brings up
           | geeksforgeeks or worse.
        
           | mdasen wrote:
           | It's been a hard habit for me to break, but when you know
           | that you _could_ do something and theoretically do it better,
           | it can bias you against paying for something. None of that is
           | meant to sound arrogant - every company could do a better job
           | given more time and resources, just as I could. But my time
           | isn 't infinite and I've found that paying for solutions in
           | my life is a good thing. Sure, I could run my own email, but
           | I'll just pay for it. Sure, I could create an app that does
           | exactly what I want, but this one is close enough and it's
           | $10.
           | 
           | With knowledge, the problem can be worse. You can't even
           | evaluate if it's any good until it's been given to you. At
           | that point, you don't need to pay for it because you have it.
           | The number of paid knowledge things that I see that don't
           | really have good information and avoid all the real problems
           | that they promised they were going to solve for you can be
           | high.
           | 
           | I think sites can build trust with users, but it can mean
           | creating a lot of free content in the hopes of converting
           | some people to paid users. Of course, if that model is too
           | successful, then there will be an army of people dumping free
           | content hoping to do the same, but then your paid content is
           | competing with the free content available from many trying to
           | do the same business model. If Bob creates a business doing
           | this, do Pam, Sam, Bill, James, and Jess try to do that a few
           | months later which then means that the amount of free content
           | is now 5x what it was and there's no need to pay for it
           | because it'll be a freebie on one of those sites?
        
           | tobyhinloopen wrote:
           | Most developers will rather spent 2 days to build something
           | than pay $60 for something better.
        
         | charlesdaniels wrote:
         | I wonder if we'll see a comeback of hand-curated directories of
         | content? I feel like the "awesome list" trend is maybe the
         | start of something there.
         | 
         | I would be willing to pay an annual fee to have access to well-
         | curated search results with all the clickbait, blogspam, etc.
         | filtered out.
         | 
         | Until then, I recommend uBlacklist[0], which allows you to hide
         | sites by domain in the search results page for common search
         | engines.
         | 
         | 0 - https://github.com/iorate/uBlacklist
        
           | notJim wrote:
           | > hide sites by domain
           | 
           | This gives me the idea to build a search engine that only
           | contains content from domains that have been vouched for.
           | Basically, you'd have an upvote/downvote system for the
           | domains, perhaps with some walls to make sure only trusted
           | users can upvote/downvote. It seems like in practice, many
           | people do this anyway. This could be the best of both worlds
           | between directories and search engines.
        
           | freediver wrote:
           | You can access one without paying a dime.
           | 
           | http://teclis.com
           | 
           | Problem is people usually want one general search engine, not
           | a collection of niche ones.
        
         | new_guy wrote:
         | > I'll admit that this is likely a difficult problem because of
         | how programmer's use Google
         | 
         | It's beyond simple for Google to fix. Just drop those sites
         | from the search index. But Google won't do that because it's in
         | their interests to send you to those shit holes because they're
         | littered with Google ads.
        
         | Terry_Roll wrote:
         | I've seen programming newsgroups, those things from the 90's,
         | with what can best be described as MITM attacks having taken
         | place when coders have been looking for solutions to problems
         | and the solutions have not been correct. Most newsgroups were
         | never secure so vulnerable to MITM from day 1 and what is being
         | reported today is just the latest variation in that attack
         | process.
         | 
         | I've also seen Bing & Google citing StackOverFlow and the
         | replies in SO awarding or agreeing on a solution comes straight
         | from this "text book" "The Gentleman's Guide To Forum Spies"
         | https://cryptome.org/2012/07/gent-forum-spies.htm
         | 
         | Perhaps it would be useful to dig into a posters history on a
         | site and then decide who you trust instead of just trusting a
         | random on the internet?
         | 
         | How many people have download code from SO into VS and found it
         | doesnt even do what its purported to do? I've seen plenty of
         | that.
         | 
         | Resource Burning the population, in this case programmers, is a
         | perfectly valid technique for a variety of reasons, but the
         | main one being, you are stuck in front of computer and that
         | means you cant get into mischief away from work. Religions have
         | been using that technique for hundreds of years and
         | colloquially its know as "The devil makes work for idle hands
         | to do" or something to that effect.
         | 
         | Choose carefully what you want to believe and trust.
        
           | webmaven wrote:
           | _> I 've seen programming newsgroups, those things from the
           | 90's, with what can best be described as MITM attacks having
           | taken place when coders have been looking for solutions to
           | problems and the solutions have not been correct. Most
           | newsgroups were never secure so vulnerable to MITM from day 1
           | and what is being reported today is just the latest variation
           | in that attack process._
           | 
           | Well, that's also the side effect of taking Cunningham's Law
           | to heart, which says "the best way to get the right answer on
           | the Internet is not to ask a question, it's to post the wrong
           | answer."
        
         | jrumbut wrote:
         | Recently I've just punted and begun searching SO and Github
         | directly.
         | 
         | One thing Google has gotten really good at lately is knowing
         | that when I search for "John Smith" I mean John Smith the
         | baseball player not the video game character or vice-versa.
        
           | dhimes wrote:
           | I've always just searched 'John Smith baseball' and that
           | works well in DDG too.
        
           | nunez wrote:
           | you can also add 'site:github.com' or
           | 'site:stackoverflow.com' to your search
        
             | jrumbut wrote:
             | It's not as good, especially for Github.
             | 
             | GH also breaks down the results into types which is very
             | helpful when you only want code or are looking for
             | documentation or discussion.
        
         | mritchie712 wrote:
         | Who is the offender?
        
         | Nextgrid wrote:
         | > It's a difficult task as it's hard to convince programmers to
         | pay for anything
         | 
         | I wonder if there's an opportunity for a paid bundle of
         | programming-related sites. I indeed will not pay for a single
         | site (nor for a news site for that matter), but a $10-20/month
         | subscription that covers quality programming websites could be
         | interesting.
        
         | btown wrote:
         | There's an interesting dilemma here - if the algorithm were to
         | favor "pages with lots of content alongside the exact-match
         | phrase you're looking for" then it would incentivize content
         | farms that _combine_ lots of StackOverflow answers together.
         | And if you favor the opposite, where there 's less content
         | alongside the exact-match phrase, you incentivize content farms
         | that _strip away_ content from StackOverflow before displaying
         | it. Ideally, of course, you 'd rely on site-level reputation -
         | is Google just having trouble recognizing that these are using
         | stolen content?
        
           | ScottWRobinson wrote:
           | > is Google just having trouble recognizing that these are
           | using stolen content?
           | 
           | It's very possible. In general Google will penalize you for
           | duplicate content, but that might not apply to code snippets
           | since code is often re-used within projects or between
           | projects.
           | 
           | The code snippet sites also typically have 2 or more snippets
           | on the same page. When combined, it might then look unique to
           | Google since their algorithm probably can't understand code
           | as well as it understands natural text. Just a guess
        
       | bschwindHN wrote:
       | The answer is, if someone can make money by doing something
       | shitty but not illegal, they will do it.
       | 
       | Almost everything on the web is some scheme to put ads in your
       | face so someone can make some money.
        
         | alx__ wrote:
         | Yeah there's a whole mini scam network that copies programming
         | how-to videos on YouTube and puts new thumbnails on them. So
         | you watch the ad and then get 30 seconds into the video before
         | realizing it's garbage, they've made some a few stupid pennies
         | :(
        
         | [deleted]
        
       | jonas21 wrote:
       | All user-contributed content on Stack Overflow is under a CC-BY-
       | SA license. So what the sites are doing is allowed under the
       | license, as long as they're providing attribution.
       | 
       | Is it annoying? Sure. But neither Stack Overflow nor the authors
       | of the content can do anything about it since they gave away a
       | license to do it.
       | 
       | One of the things you have to accept when you release something
       | under an open-source or Creative Commons license is that other
       | people can take it and use it in ways that you don't like.
        
       | leifdenby wrote:
       | Does anyone know of a good list of these copy sites? I just came
       | across this Firefox extension which makes it possible to filter
       | sites from search results: https://addons.mozilla.org/en-
       | US/firefox/addon/hohser/. Would be great with a community
       | blocklist like those for pi.hole
        
       | ruffrey wrote:
       | I suspect they were always there, but google and ddg are getting
       | gamed more now. The quality of results has dropped quite a bit in
       | the past 4 or 5 years in this regard.
        
       | sam0x17 wrote:
       | It's quite simple. SO has a huge easily indexable database of
       | answers, and SEO scammers can make a quick buck by copying it all
       | and making it seem like they have an answer for unanswered
       | questions. Nothing to see here, blame your search engine.
        
       | bigtex wrote:
       | If I recall the Stackoverflow dataset is open source or at least
       | made available to download so I assume all these sites just
       | download that information regularly.
        
       | ljm wrote:
       | My pet peeve is ApiDock, which has managed to SEO itself so high
       | up the rankings when searching for anything connected to Ruby or
       | Rails that it is actually quite difficult to get to the
       | legitimate, official documentation.
       | 
       | What's worse is most of the results are outdated so you're
       | looking at web-scraped API docs for Rails 3 or something.
       | 
       | Really frustrating.
        
         | pricci wrote:
         | And the site is so slooow for me.
        
       | TYPE_FASTER wrote:
       | People are crawling content that is searched for frequently, then
       | using SEO to rank higher in the results than the original content
       | to make money from the ad revenue.
       | 
       | Code and recipes are two examples.
       | 
       | I'm also seeing politicians posting Tweets containing a link to
       | their personal website, which has ads.
        
       | austinhutch wrote:
       | Here some of the sites that have gotten traction in my SERPs
       | lately that I can't stand
       | 
       | https://pretagteam.com/
       | 
       | https://www.codegrepper.com/
       | 
       | https://issueexplorer.com/
       | 
       | They are all scraping Stack Overflow / Github in some fashion
        
       | cheriot wrote:
       | I wonder if there's a market for a software engineering specific
       | search engine. Skip the shitty content farms, include code from
       | open source projects, and potentially be more smart about finding
       | package uses
        
       | shmde wrote:
       | Just to chip in with a very minor annoyance, I hate how google
       | puts up w3schools results above MDN for anything related to
       | JS/HTML.
        
         | a_e_k wrote:
         | Hell, yesterday I searched for `python string split`.
         | 
         | Stupid w3schools was the first hit. The official library
         | reference documentation on python.org (which was what I was
         | looking for) didn't turn up until nearly the bottom of the 2nd
         | page of search results!! Beyond frustrating.
        
           | aaron695 wrote:
           | w3schools is better than python.org in your example.
           | 
           | So that issue needs to be addressed first.
           | 
           | I don't give a fuck about what library it's in blah blah a
           | massive text wall.
           | 
           | I want a quick example at the top. Most of the time Fini.
           | 
           | Then more info about arguments, with quick examples.
           | 
           | Then more deeper info.
           | 
           | Fuck documentation as it stands in the IT world.
           | 
           | Also set your Google results to 100. What's the chances what
           | you want will be in the top 10? Sure you are not a unique
           | snowflake, but you should be better than the top 10 some of
           | the time if you have any sort of expertise.
        
         | Kiro wrote:
         | I prefer W3Schools to MDN.
        
         | projproj wrote:
         | Check out tomdn.com. I created it to make it easier to get to
         | MDN. E.g., instead of googling array.map, just type
         | tomdn.com?array.map. It will take you straight to MDN docs on
         | Array.prototype.map. Other things work as well, like
         | tomdn.com?object.keys, tomdn.com?css.color,
         | tomdn.com?htmlel.button, etc.
         | 
         | If there is a pattern it doesn't recognize, you end up on the
         | MDN search results page.
         | 
         | edit: More info. about how it works is available here:
         | https://github.com/tayler/tomdn
        
           | hnarn wrote:
           | What value does this provide over simply Googling or DDG'ing
           | "array.map mdn"? It's shorter and the first result for me is
           | (unsurprisingly) MDN on both, and I don't have to run my
           | searches through an unknown third party.
           | 
           | DDG also has !bangs for this specific purpose and !mdn does,
           | as expected, lead you to MDN.
        
             | projproj wrote:
             | Valid questions. I just think it's faster. One action on
             | tomdn.com instead of 1. DDG "array.map mdn", 2. Find MDN
             | result (usually first), 3. Click result, 4. See docs.
             | 
             | Bangs are great, just shorter again to use tomdn.com: -
             | `!mdn array.map` leads me to
             | https://developer.mozilla.org/en-US/search?q=array.map,
             | which I have to scan the results and click on. -
             | tomdn.com?array.map leads me to the exact page in the docs.
             | https://developer.mozilla.org/en-
             | US/docs/Web/Javascript/Refe...
             | 
             | This may or may not be valuable, but I like saving the
             | extra steps for something I do many times a week. Basically
             | this calculation (https://xkcd.com/1205/) works out for me
             | + the fun of working on a new project.
             | 
             | edit: >I don't have to run my searches through an unknown
             | third party
             | 
             | Very sympathetic to this concern. Feel free to host
             | yourself -- it's a single html file.
             | https://github.com/tayler/tomdn
        
               | striking wrote:
               | `! mdn array.map` takes you right to MDN
        
               | projproj wrote:
               | Totally had no idea. Thanks for the tip. I guess
               | tomdn.com is for non-ddg users only.
        
         | bbarn wrote:
         | I've seen several developers google "w3schools someDomEvent"
         | because they are conditioned by those search results. Has it
         | accidentally become a halfway useful site?
        
           | kilburn wrote:
           | > Has it accidentally become a halfway useful site?
           | 
           | It has. It is nowhere near the quality of MDN, but also
           | nowhere nearly as bad as it used to be.
           | 
           | W3Schools has less information than MDN. This is bad for the
           | experienced, but a relief for people who are entering the
           | trade.
           | 
           | As an example, consider an inexperienced web developer that
           | wants to do something when a button is clicked. They google
           | for it and find these two pages:
           | 
           | - https://www.w3schools.com/jsref/event_onclick.asp . Right
           | there in the middle of the page is an example snippet that
           | will get the job done. If I scroll down I see other ways to
           | do it, all of them fully contained (i.e.: I can copy-paste it
           | to some website and they will work).
           | 
           | - https://developer.mozilla.org/en-
           | US/docs/Web/API/Element/cli... . There's a table of things I
           | don't understand (Bubbles? Cancellable?), two lengthy
           | discussions about weird edge cases in some browsers, and
           | (finally!) a single not-so-simple example that is not even
           | enclosed in <script> tags.
           | 
           | What website do you think they'll pick?
        
           | blahyawnblah wrote:
           | I do the same, but with "mdn" instead of "w3schools"
        
           | 5e92cb50239222b wrote:
           | Are you positive the first term wasn't prefixed by a hyphen-
           | minus?
        
           | vlunkr wrote:
           | IMO it is a legitimately useful site, but strictly for
           | beginners, and probably only for html/css. It teaches you the
           | basics and has the sandbox to test things out. It's fine,
           | just not what you want to see when you're looking for actual
           | documentation.
        
             | myohmy wrote:
             | I used it to learn CSS way back in the day and it still
             | seems to be a good balance between too little and too much
             | for the beginner, so I still send my newbies there.
             | 
             | Not surprised it shows higher than the actual spec. I think
             | the HN community is just getting to the greybeard stage
             | (I'm still cutting out grays), and this is our version of
             | JUST READ THE MAN FILE
        
           | valenaut wrote:
           | It's fine for simple things. I referenced it a lot when I was
           | first learning JS and HTML because it was usually at the top
           | of Google results. I think MDN is strictly better.
        
       | whatsakandr wrote:
       | Recently I've found duckduckgo to provide better coding results
       | than Google, which really surprised me. I was only using
       | duckduckgo at home for privacy, and Google at work because "best
       | tool for the job", but I think that might not be the case
       | anymore.
        
       | firemelt wrote:
       | this shits is just as same as quora and pinterest w3school and
       | apidock
        
       | steve_avery wrote:
       | I am with you on this. Lately I have noticed that I've googled
       | for an issue, find a low quality site with relevant results, and
       | later discover that it's just a copy of the GitHub issues page
       | from the original project. Why didn't the GitHub issue link make
       | it to the first page of Google and this crappy knockoff, with no
       | link back to the source material beat it? So frustrating.
        
       | terafo wrote:
       | Sites that do auto translate of original SO threads and pretend
       | that it's their original content are the worst. Google sometimes
       | prefers to give me that results instead of the actual SO thread
       | because I'm not in English-speaking country. I have to waste some
       | time to understand that it's just stolen SO thread. And it's not
       | even that useful because some of them _AUTO TRANSLATE CODE_.
        
         | prox wrote:
         | DDG allows you to set your own country or any country easily
         | (without needing an account)
        
         | bserge wrote:
         | Lol. Google's absolute insistence that you must want local
         | language results has been a pain in the ass for years.
         | 
         | It's impossible for them to understand. Clearly they have zero
         | people working and living in countries other than their own. /s
        
       | kristaps wrote:
       | I wonder if Google as an organization cares enough to start a new
       | spam crusade, but here's what happened the last time:
       | https://googleblog.blogspot.com/2011/01/google-search-and-se...
        
         | diveanon wrote:
         | Based on the current state of GMail I don't think anyone at
         | Google gives a fuck about spam.
        
       | lloydjones wrote:
       | As others have said: SO content is ripped-off (poorly) and
       | mirrored. The page games Google's algorithm and shows up as a
       | 'legit' result.
       | 
       | Probably more complex than simple keyword stuffing, which isn't
       | supposed to work these days..
        
         | marginalia_nu wrote:
         | I don't understand how these sites rank so highly though, they
         | can't get much organic traffic at all. Most people who click on
         | them will bounce immediately.
        
           | Nasrudith wrote:
           | Generally they show up as one of the few options if you look
           | for very specific subqueries - that is when I tend to find
           | them.
        
       | nightfeather wrote:
       | Surely, even using an extension mainly to hide these stuffs.
       | 
       | Not only mirroring SO, also its siblings (like serverfault and
       | askubuntu), and others like GitHub.
       | 
       | But the most annoying part is it keeps showing those mirrored and
       | machine translated stuffs that offers little to none benefits to
       | me and I'm already being forced trained enough to identify those
       | at first glance.
       | 
       | Those even shows up when I'm already searching in other
       | languages, ahrr.
       | 
       | edit: formatting
        
       | pupppet wrote:
       | Really ticks me off that Google allows itself to be so easily
       | gamed, it's your core business for christ's sake.
        
         | UweSchmidt wrote:
         | We have to assume that Google devs see these crappy spammy code
         | websites too on a daily basis, right? What's up with the
         | incredibly strong programmer culture over there, gone already
         | after 20 years? You'd think someone would take these weak
         | search results personally...
        
           | Hnaomyiph wrote:
           | As someone who knows a handful of devs and hear their near
           | daily grievances of google, it seems like the culture over
           | there has died and it's just a bog standard Big Corp(tm) now.
           | By how they describe it, a lot of the people, (but not all of
           | course), who deeply care about the 'craft' of development
           | have retired or moved onto more interesting roles at newer
           | and more exciting companies and were replaced by people whose
           | main driving force is climbing the corporate ladder at a
           | FAANG. As a result the perf cycles/promotions seem to be
           | heavily based on high impact projects/launches, leaving the
           | "little things" like bad search results or other maintenance
           | work forever on the back burner, because working on those
           | things won't get you a promotion/more money.
        
         | tenebrisalietum wrote:
         | Desktop web search is probably less the core business these
         | days than making search work well on phones/mobile devices with
         | specific applications, such as Maps, etc.
        
           | jjallen wrote:
           | Why would they do that though? Why would they show you a low
           | quality result if there was a better one available? I have
           | pondered some of these sites as well. I think the answer is
           | that they are actually helpful to some people.
        
             | tenebrisalietum wrote:
             | Businesses are profit seeking entities run by human beings
             | who tend to prefer conservation of energy. If a business
             | goal can be met with X effort, X+1 effort is a waste and
             | the resources can be directed elsewhere or kept as profit.
        
             | nickthesick wrote:
             | I don't think they track 'quality' they just see people
             | clicking and use that as a proxy for 'quality'. (Quoted
             | because what does quality even mean?)
        
               | nitrogen wrote:
               | I seem to recall reading that they (used to?) track
               | abandonment of the search results page as a successful
               | answer to your question from the embedded page snippets
               | or info boxes. So if you get a search result page that's
               | complete garbage and just give up, they increase the
               | ranking of those sites.
        
         | pixelgeek wrote:
         | No it isn't. Ads are their core business. Those crap sites
         | almost all run ads from Google
        
           | dgb23 wrote:
           | That's a shortsighted view because it would imply that the
           | quality of search doesn't have any bearing on their success.
           | I highly doubt any Googler would agree with that. Especially
           | not the ones who have been working on search specifically.
        
             | throw_m239339 wrote:
             | > That's a shortsighted view because it would imply that
             | the quality of search doesn't have any bearing on their
             | success. I highly doubt any Googler would agree with that.
             | Especially not the ones who have been working on search
             | specifically.
             | 
             | Google is the goto search engine today, the competition is
             | virtually non-existant, so they don't have to compete
             | anymore. Hence the quality of their search engine going
             | down hill, conflict of interests and what not. It's not
             | shortsighted, it would take billions to compete with Google
             | search and even Bing isn't even trying anymore.
             | 
             | When I look at a Google search page result today I see
             | loads of ads, then their shopping stuff, image search
             | stuff, a list of questions related or not to my search and
             | their answers extracted from third party web pages, then
             | only when I scroll 1 page I see actual search results.
             | Google search wasn't like that 15 years ago.
             | 
             | Googlers are working so that Google makes more money, and
             | Google is an ad company first and foremost.
        
             | _jal wrote:
             | Extrapolating from an employee's feelings to the behavior
             | of a huge public company is a terrible way to model
             | reality.
        
               | spiralx wrote:
               | This whole topic is extrapolating from people's feelings
               | to the behaviour of a huge public company. I've not seen
               | a single piece of actual data anywhere so far.
        
         | tomcooks wrote:
         | Search hasn't been Google core business for decades, they're
         | into selling advertisement spaces
        
         | dehrmann wrote:
         | I used to work for a company that gamed Google like this, but
         | with product search results (that were, themselves, CPC ads).
         | The company got bought for scrap because Google got good at
         | downranking their pages in search results and finally taking
         | shopping seriously.
         | 
         | A lot has changed since then, but I doubt that Google has
         | decided these kinds of content farms are suddenly OK. It should
         | also give pause to people asking for "open ranking algorithms."
         | Without some amount of secrecy, content farms will pop up for
         | literally every keyword they can sell ads against, and they'll
         | know how to rank more often rather than just blindly guessing.
        
           | WhisperingShiba wrote:
           | It does give me pause, but do you really think there is no
           | way to have both open algorithm and good results? What is
           | users assessed sites as they used a search engine, and then
           | this was broadcast back to the engine?
           | 
           | I have a feeling there is a way to introduce randomness to
           | this search engine in a way that smooths out the game
           | theoretic arms race of gaming an algorithm. This goes against
           | our natural urge to have a fully deterministic algorithm
           | which always brings the most pertinent sites right at the
           | top. However, because humans are trying to bring order to
           | things for their own benefit, throwing in chaos could prevent
           | people from even trying to game the algorithm
        
             | Nasrudith wrote:
             | Neural network weighting loops and adversarial inputs say
             | yes, it is impossible.
        
             | motoxpro wrote:
             | Markets dont work when everyone knows what I am THINKING.
             | i.e. If I know with 100% certainty you are going to buy
             | something for 1$ I just bid 1.0001. Same with Google
             | search. If I know with 100% certainty that you rank pages
             | in a certain way, I can game them. Markets only work when
             | all data is available but no all opinions.
             | 
             | Introducing random elements just makes it worse for
             | everyone.
        
       | michele127 wrote:
       | I made this chrome extension because of this exact issue.
       | 
       | https://chrome.google.com/webstore/detail/search-noise-filte...
        
       | mypastself wrote:
       | The biggest problem is that they waste your time even when the
       | content is ostensibly helpful, since the search result is usually
       | listed after the Stack Overflow page it crawled from anyway. Each
       | click steals a few seconds of developers' time, which adds up,
       | given how frequently these types of results pop up on Google
       | search lately. That makes them worse than useless, they actively
       | subtract value.
        
       | bryanrasmussen wrote:
       | I have this problem, and contrary to a lot of people I don't
       | protect a lot of my PI from Google. It used to be Google was good
       | at giving me stuff I wanted in ads, especially in gmail but they
       | don't really anymore. You would think the more they knew about
       | you they would be able to give you better results, so maybe a
       | large scale test should be done - if Google knows your PI do you
       | get better or worse results, or doesn't it matter at all.
        
       | diveanon wrote:
       | This is more of an issue with google results than the content
       | itself.
       | 
       | Google is a shit product and you get shit results when you use
       | it.
        
         | cally wrote:
         | I have many problems with google as a company, but as a
         | product, google search is fantastic. To say it's shit is a
         | huge, huge, huge over-reaction.
        
       | pronik wrote:
       | Important tidbit: SO's content is CC-licensed and this is
       | probably completely legal (apart from those who fail to add a
       | link to the original). Not that I don't want those sites to burn
       | in hell, but they are not even in a grey zone legally.
        
       | endofreach wrote:
       | Well, as i wrote i understand that they try to monetize it.
       | 
       | But: why the sudden explosion? I feel there is more of these
       | sites going live regularly. Many times they make up 80% of the
       | first pages on search results, just repeating the SO threads
       | listed before. So it's really getting difficult.
       | 
       | Something must be done...
        
         | bellyfullofbac wrote:
         | Maybe someone figured out The Google Algorithm (TM/all hail the
         | algorithm...) is allowing source code on dodgy sites, so
         | they've created and marketed to spammers a script/bot to create
         | these kinds of sites (or all the sites just belong to them) to
         | farm that ad money.
         | 
         | A bit like how when "hoverboards" got popular, Chinese
         | factories started churning them. And then when fidget spinners
         | happened, same thing:
         | https://www.buzzfeednews.com/article/josephbernstein/how-to-...
        
         | coldpie wrote:
         | Everyone should install an ad blocker so these types of
         | "businesses" become unprofitable. Installing an ad blocker is
         | the ethical choice.
        
         | lozenge wrote:
         | Maybe these sites share some code? If you've already captured
         | 2% of searches with one site, why not capture 1.5+1.5=3% of
         | searches by running two sites.
        
       | vegancap wrote:
       | I really hate these. Especially when I'm trying to figure
       | something out and I'm struggling to find answers, I end up
       | haplessly finding the exact same wrong answer on three different
       | sites.
        
         | brootstrap wrote:
         | I've 100% noticed it in the past months. Have a new employee
         | that i'm trying to train up a bit with python and googling has
         | gotten more annoying. fucking codegrepper.com & shit. FUG EM .
         | hilarious when you find the exact same snippet over and over.
         | Even more hilarious those asswipes are probably making $$
         | (potentially lots) with all the stupid clicks n that.
        
       | throwaway_dcnt wrote:
       | I spoke to a VP at google in 2006 in london and discussed using a
       | combination of curation and entropy to flush out duplicates. He
       | seemed pretty excited by the idea but I don't think anything
       | materialized. Which is another way of saying this is not new - in
       | those days these sites were copying newsgroups too.
        
       | tomaszs wrote:
       | It is just Google, with its amazing algorithms that rank
       | established website way higher than a random spam website.
        
       | reliable wrote:
       | Is it bad that I've actually found my answer on some of these
       | sites haha. But yeah, they're pretty low quality in general.
        
         | didericis wrote:
         | Yes; if the content were original or the different context was
         | increasing discoverability, that'd be ok, but the duplicates of
         | content posted elsewhere almost always just obscures the
         | original, better post with better context and real users to
         | follow up with.
         | 
         | I've googled things and gotten an answer pop up from one of
         | these types of sites too/not saying its something to beat
         | yourself up about, but if you're playing devil's advocate and
         | suggesting there's discoverability value added outweighing the
         | other value lost, I disagree.
        
       | dadboddilf2 wrote:
       | baeldung is the worst offender for java echosystem
        
         | mhzsh wrote:
         | I actually saw a job posting by them on SO awhile back for
         | "part-time article writing". My guess is if they legitimately
         | hire people to do it, those people just end stealing content to
         | meet their writing demands.
        
       | dlandis wrote:
       | FYI, people have been posting about this on stackoverflow meta
       | going back at least 8 years:
       | https://meta.stackexchange.com/questions/200177/a-site-or-sc...
        
       | hawthornio wrote:
       | I've had to switch to !py on ddg because the official Python docs
       | never make the first page. It's really frustrating. :/
        
       | distortedsignal wrote:
       | There was an SO outage this last year, and I only found that out
       | when I tried to go to SO and couldn't get to it. I checked page 2
       | of Google and found one of the mirrors that you're talking about.
       | I grabbed the content from there and continued with work.
       | 
       | I think it's a matter of perspective. If you _know_ that you want
       | a specific site, use google's 'site:' specifier. If you're
       | looking and find something that is from SO, redo the search and
       | get to the SO Q/A. As for me, I'm moderately grateful for the
       | decentralized backups.
        
         | remuskaos wrote:
         | This is a rather contrived argument. In the rare instances that
         | SO was down, I had to make do looking up the original
         | documentation for whatever language I was using.
         | 
         | But these scape sites add nothing of value at all, not even as
         | an involuntary backup/archive type, since they are usually
         | laden with ads.
        
       | mattwad wrote:
       | It's for all kinds of sites lately. The uBlacklist extension has
       | solved it for me - one click and you can remove an entire domain
       | from future results.
        
       | vanusa wrote:
       | They get fed into a web crawler and then into a giant hopper
       | whence they become the backbone of that shiny "No Code"
       | technology you've been hearing about.
        
       | temp8964 wrote:
       | The worst of all are those websites that only show the content in
       | search engine. When you click the link to their webpages, you can
       | only see random texts have nothing to do with the search result
       | at all. There must be some really narcissistic programmers behind
       | these.
        
       | radiantttt wrote:
       | Maybe code snippets are "enriched" in these sites?
        
         | scollet wrote:
         | They are not. All the context and conversations are stripped.
        
       | system2 wrote:
       | Why don't you name the URL? Share so we know to avoid. It is not
       | like we are going to dox the guy or something.
        
       | null_object wrote:
       | I've got into the habit of clicking the 3-dot icon next to the
       | search entry (often number 1 in Googles's results) and reporting
       | these sites as scrapers, stealing content from SO.
       | 
       | Maybe if we all did this, google might eventually take notice?
        
       | MiddleEndian wrote:
       | uBlacklist is a great tool to block sites you don't want from
       | showing up on your search results on google (and a couple others
       | including bing and duckduckgo). It supports regular expressions
       | as well, I use /pinterest\\..*/ to block all pinterest-related
       | content.
       | 
       | https://addons.mozilla.org/en-US/firefox/addon/ublacklist/
       | 
       | https://chrome.google.com/webstore/detail/ublacklist/pncfbmi...
        
         | pixelgeek wrote:
         | If you are using Safari, the DevonAgent app[1] that was linked
         | to in another thread also has a denylist feature
         | 
         | [1]:https://www.devontechnologies.com/apps/devonagent
        
         | shever73 wrote:
         | That's a great tool, thanks for the recommendation. Pity it
         | doesn't work with Ecosia's image search. I wrote my own
         | extension to remove Pinterest from image results, and I've been
         | thinking of extending it to block Instagram and Facebook too. I
         | hate how these companies point their sewer pipe at search
         | engines, so you get into a walled garden where you have to sign
         | up to see the indexed content.
        
         | jetrink wrote:
         | I wish Google would add that feature natively. Google could
         | even use it to help detect spam sites.
        
           | enobrev wrote:
           | Google used to make it possible to filter domains from
           | results. I was sad to see it disappear
        
             | leephillips wrote:
             | You can still do "-site:spam.com".
        
               | enobrev wrote:
               | For sure! Thanks for the reminder. There was a point (as
               | you may know) when the domains were saved into your
               | profile, so you didn't have to add it.
        
               | leephillips wrote:
               | I didn't know.
        
           | bserge wrote:
           | Oh dear God, no. Don't need Google to have even more control.
           | Imagine blocking something and it shows up anyway because
           | Google decided so.
           | 
           | Oh wait, you don't have to imagine, just use Gmail on
           | Android. "Spam? That's not spam" - Google.
           | 
           | Or just Android with it's fucking "notifications" from all
           | the garbage news sites. Tbf, my fault for using Google apps.
           | 
           | Honestly, blocking shit you don't want to see locally is the
           | best solution. Decentralized, under your control, no need to
           | trust BigCorp number 294.
        
         | BizarroLand wrote:
         | There's also Personal Blocklist, which appends a small "block
         | site.etc" under every google search result.
         | 
         | If there is a site that is clogging up your search results,
         | just click it and it's gone.
         | 
         | https://addons.mozilla.org/en-US/firefox/addon/personal-bloc...
        
           | riezebos wrote:
           | To me, this comment made it sound like this "block site.etc"
           | was a feature that Personal Blocklist has and uBlacklist
           | doesn't, but if I open the ublacklist page the first
           | screenshot of the extension shows "Block this site" next to
           | each search result. Is the "block site.etc" in Personal
           | Blocklist different from the one in uBlacklist?
        
             | BizarroLand wrote:
             | I haven't used uBlacklist, this was a solution I had
             | independently stumbled into and I wanted to share it.
             | 
             | Based on the dialogue from the posters I was responding to,
             | it seemed like uBlacklist would require some ongoing
             | maintenance or memorization of input fields to work.
             | 
             | If it also offers a single-click permanent block of all
             | sites in a domain from a Google search then that's just as
             | cool, but that feature wasn't obvious from the
             | conversation.
        
         | nerdponx wrote:
         | It'd be pretty neat if DDG let you configure this as a setting.
         | 
         | They don't have "accounts" as such, but they do provide users
         | with a one-time password that lets you save and restore
         | settings.
         | 
         | Alternatively, if they at least supported "do not include"
         | filters in regular searches, you could build a client that
         | stored those kinds of search defaults locally. The best you can
         | do now is strip out those results on the client side, which
         | isn't bad but seems like a hack.
        
           | Kiro wrote:
           | They can't do that since they are getting their results from
           | Bing.
        
       | slownews45 wrote:
       | Google search is going downhill in my experience.
       | 
       | My question: What is the alternative?
        
         | randomluck040 wrote:
         | I'm using DDG most of the time but when I have to use Google
         | hoping the results will be better, I try to specify words that
         | need to be in there. However, the more niche your problems are,
         | the less you will find.
        
           | slownews45 wrote:
           | DDG is a gimmick.
           | 
           | They claim to be running their own search engine, never
           | checked out for me.
           | 
           | More covered here:
           | 
           | https://news.ycombinator.com/item?id=4817576
           | 
           | Anyone seeing them crawling your site?
           | 
           | Is bing the best alternative? I've seen some pretty
           | compelling evidence DDG copies / uses / has a deal with bing.
        
       | speby wrote:
       | Reminds me a little of an oldie but a goodie: expertsexchange.com
        
       | brianwawok wrote:
       | It's an easy way to make money. Scrape a popular site like Stack
       | overflow or Wikipedia and add a bunch of advertisements.
       | 
       | One of the many ways that scum ruin the web.
        
         | didericis wrote:
         | Wonder if there's any way to get advertisers to veto scraped
         | content sites. Not really in their incentive to boycott
         | eyeballs, but if advertisers were actively avoiding those
         | sites, it'd dry up the incentive to make them.
         | 
         | This could backfire, but fining advertisers that show up on
         | those sites might work. The difficulty would be all the claim
         | verification and process of determining what exactly is a
         | "scraped site", and backfire scenario could be another hurdle
         | for non established sites/more centralization of content. But
         | if you targeted the advertisers rather than the sites
         | themselves, the advertising networks would be incentivized to
         | do that identification themselves.
        
           | penteract wrote:
           | Contacting the advertisers to tell them that being shown on
           | those sites makes you want to avoid their products might
           | work, although that would involve not blocking the ads.
        
       | Rastonbury wrote:
       | Somewhat tangential but I believe Google Search is going
       | downhill, they seem to be losing the content junk spam SEO fight.
       | Recently, I've had to append wiki/reddit/hn to queries I search
       | for because everything near the top is shallow copied content
       | marketing.
        
         | fsflover wrote:
         | > Google Search is going downhill
         | 
         | Discussion: https://news.ycombinator.com/item?id=29392702.
        
         | stevesimmons wrote:
         | > they seem to be losing the content junk spam SEO fight
         | 
         | Google clearly isn't even trying.
         | 
         | So many of these sites are polluting search results for months.
         | It isn't a case of sites that pop up for a few hours until
         | Google notices and blacklists them.
         | 
         | Google Search has gone so far downhill. I'm not sure what
         | they're optimising for. Long-term irrelevance, it seems.
        
           | testudovictoria wrote:
           | > _Google Search has gone so far downhill. I 'm not sure what
           | they're optimising for. Long-term irrelevance, it seems._
           | 
           | I assume they're optimizing for the most common denominator:
           | standard, non-power users. Early search didn't return great
           | results for human-like questions such as _What are the
           | wavelengths of the colors on the visible spectrum_? The
           | result might be within the first two pages, but that query
           | had too many irrelevant search terms. A better query would
           | have been _wavelengths color visible spectrum_. That query
           | only has the necessary key terms. Sometimes queries required
           | the user to know search operators (e.g. exact match, date
           | range, synonyms) just to get relevant results.
           | 
           | The average person probably didn't know that early searches
           | gave better results when constructed in the second way.
           | Google changed search to adapt to how normal people search.
           | Now the human-like query will return good results. Combine
           | that with locales, search history, and personal interests,
           | even the most basic user can get worthwhile results from
           | asking Google question. The cost is that power users who
           | understand operators and the power of key terms get less
           | relevant results but likely still correct.
        
             | stevesimmons wrote:
             | If Google actually used its search history, it would see I
             | was a sw eng power user, and return the types of results
             | that users like me want to see.
             | 
             | Then again, I have been using Google Maps for 16 years and
             | it still cannot show me distances in km rather than miles.
             | A boolean setting ffs that I've had to manually switch
             | probably a thousand times.
             | 
             | If billions of dollars of AI R&D can't even figure that
             | out...
        
               | Kiro wrote:
               | Funny how people complain about Google tracking them and
               | then get angry when they are not being tracked enough.
        
           | vadfa wrote:
           | Is it legally a good idea for Google to go and blacklist
           | individual sites?
        
             | GhettoComputers wrote:
             | Child porn is the extreme. www.chillingeffects.org is the
             | other.
        
             | briffle wrote:
             | Exactly, change the word blacklist to censor, and then it
             | seems wrong.
        
               | labster wrote:
               | Change the flag button on this post to censor and I bet
               | people wouldn't click it. But people want some level of
               | censorship, so long as they convince themselves they're
               | not against free speech.
        
               | SrslyJosh wrote:
               | Blacklisting by a private entity is not censorship. It's
               | even sillier when you consider that what's being removed
               | are _copies_ and the original content is still there.
        
               | SrslyJosh wrote:
               | But that's not censorship. Google is not a government,
               | and no one's speech is being suppressed by filtering out
               | literal copies of something and leaving the original.
        
               | snerbles wrote:
               | Censorship can be conducted by private or public
               | entities.
               | 
               | https://www.aclu.org/other/what-censorship
        
             | kevin_thibedeau wrote:
             | They killed off the Wikipedia clones that way. It's funny
             | that they haven't come back even though the SO clones do
             | the exact same thing.
        
               | SrslyJosh wrote:
               | Wikipedia clones probably got enough traffic to show up
               | on someone's dashboard/report. SO clones are probably a
               | small fraction of all traffic, even though they seem to
               | show up in a large percentage of the searches we perform.
        
           | Nasrudith wrote:
           | Isn't that a bit narcissistic to think that your one problem
           | in particular out of millions is a sign that they aren't
           | doing anything at all?
        
           | gwittel wrote:
           | Pretty much this. It's likely down to the lack of incentives
           | for leaders and line workers. Ie: search spam volume isn't a
           | measure by which search team performance is graded
        
           | MrWiffles wrote:
           | Well I think you already nailed it with "not even trying".
           | You don't need to try or optimize when you're a monopoly.
           | Competition forces that function, but a lack thereof enables
           | mediocrity and complacency.
        
             | rhdunn wrote:
             | One of the problems with things like this is how do you
             | know which site has copied from another, especially if you
             | don't want a list of hard-coded exceptions.
             | 
             | Related is if you have a lot of content from git
             | repositories that are mirrored from different locations
             | (GitHub, GitLab, etc.), all of which are showing the same
             | content. Or if different sites are hosting versions of
             | public domain texts. You don't want to derank those
             | results, even if they are similar to the "copy a popular
             | site" websites.
        
               | lazyeye wrote:
               | Google knows instantly how much time you spend on a page
               | and how quickly you return to the search results because
               | it is a useless copy/paste page from SO.
        
               | cutemonster wrote:
               | And probably they track content age too? Since they
               | somehow can lookup by content (they can detect duplicated
               | content), there's probably a date remembered too
               | 
               | So they could know if a code snippet was already X years
               | old? And from where, originally. (Unless it got edited a
               | lot but that's not the case here?)
        
               | MrWiffles wrote:
               | > One of the problems with things like this is how do you
               | know which site has copied from another, especially if
               | you don't want a list of hard-coded exceptions.
               | 
               | The core function of that is actually pretty simple:
               | 
               | 1. Strip all X/HTML tags 2. Run `diff`
               | 
               | Sure, it's not perfect, but an organization that pursues
               | academic quantum computing research can sure as hell
               | afford to run the results of the above against an AI to
               | check for similarities.
        
             | kristiandupont wrote:
             | Oh come on! I am no fan of Google and their privacy
             | policies etc., and I agree that results like the sites
             | mentioned in this thread are annoying but I am not so cocky
             | as to think I have any idea what Google is fighting.
             | 
             | There are a lot of very smart people from all over the
             | world putting everything they have into getting to the top
             | of that site. It's not a trivial task for them.
        
               | MrWiffles wrote:
               | Consider it from a management/budgetary point of view.
               | Super smart people? Oh hell yeah, and I don't fault them
               | for this one bit. But bean counters? Why the hell invest
               | in hiring those people, or budgeting for search quality
               | improvements, when you have essentially no competition to
               | speak of? Makes more sense to put those same smart people
               | on projects that correlate people's activities outside of
               | Google with activities they can observe either directly
               | or indirectly to create a more wholistic picture of the
               | consumer for advertisers. Those are much harder problems
               | to solve, especially now that the public is finally
               | becoming slowly more aware of the massive abuses of
               | privacy the bean counters like I described above have
               | been pushing for years.
               | 
               | At the end of the day Google doesn't exist for the good
               | of the public, they exist to return the highest possible
               | investment for shareholders. It's literally the opposite
               | of their job to improve search quality when competition
               | doesn't force them to (because investors see that as a
               | waste of money that could have been spent on further
               | eroding people's privacy to enable higher and higher ad
               | revenues.)
        
               | kizer wrote:
               | Fair point. But again, that would be less of an issue if
               | there were like 3 or 4 big search engines with different
               | algorithms, all changing and adapting on their own
               | schedules. Would be harder to game the system then.
        
               | motoxpro wrote:
               | You're so right. It's sad that even if you remove 99% of
               | something, to the outside it looks like you are removing
               | 0% because you have no reference.
        
               | Nextgrid wrote:
               | I disagree. Google has absolutely dropped the ball - I'm
               | not saying that it's easy to algorithmically rank
               | good/bad content, but they're not even trying.
               | 
               | A stop-gap solution would be to simply penalize anything
               | with ads. Legitimate websites will still have enough "SEO
               | juice" (for the lack of a better term) to offset the
               | penalty, but brand new copycats with otherwise no inbound
               | links to them from other legitimate websites (and no
               | other business model beyond scammy ads) will never be
               | able to rank high, essentially removing the incentive
               | from setting up these sites in the first place.
               | 
               | Not to mention, these problems can be identified,
               | prioritized and then tackled manually one-by-one.
               | Stackoverflow copycats can be dealt with by downloading
               | the SO data dump, parsing it and then severely penalizing
               | any website where the bulk of the content matches the
               | dump. Pinterest can be penalized by simply excluding it
               | from image searches until they actually display the
               | searched image without asking for login. So on and so
               | forth.
               | 
               | You can't win them all, and you can't train an algorithm
               | to win every time either, but you can manually observe
               | what's happening and deal with the biggest problems.
               | 
               | The problem however is that the current status-quo is
               | good for Google. They've got the monopoly on search
               | (every other one is typically even worse when it comes to
               | these issues), and spammy copycat websites happen to have
               | Google ads or analytics so Google actually benefits from
               | them too.
        
               | Kiro wrote:
               | How do you know Google is not already doing all those
               | things? You said it yourself, you can't win them all and
               | we're witnessing the result of that. If Google truly did
               | nothing it would be an unimaginable chaos.
        
               | Nextgrid wrote:
               | Because I'm not seeing the results. Let's ignore SO
               | copycats for now because that solution requires at least
               | a slight amount of effort and look at the elephant in the
               | room - Pinterest. Dealing with it is a simple "if
               | image_search and domain == pinterest.com then skip" and
               | they're not even doing _that_.
               | 
               | They have a site that's been polluting image search
               | results for years, isn't even trying to hide (unlike the
               | SO copycats which could technically churn out an infinite
               | amount of domains to work around bans) and they can't
               | even deal with _that_.
        
         | prox wrote:
         | I also a lot of strange domains (.it, .so) and tangential
         | related domainnames. Clicking it gives a really weird content
         | scraped jumble that seems AI generated. My malware extension
         | also went on alert. And this is in the top ten in a search
         | result!
        
         | ok_dad wrote:
         | The other day, I searched for something and got a whole HN
         | thread copied to some crap site but it had just enough veneer
         | to seem legit. I figured it out when I searched for something
         | from that thread and found the actual original HN thread that
         | had way more context. Google has been good to me overall, but
         | in the past few weeks I've really noticed the SEO spam, and I'm
         | not sure what to do about it so that I don't find those dodgy
         | sites. I think Google search is a marvel of modern software,
         | but I'm a firm believer that their advertising business has
         | warped their other products in a negative way, though I'm not
         | the person to suggest a fix.
        
         | trumpeta wrote:
         | ddg has the same issue though
        
         | gfiorav wrote:
         | Here's an alternative explanation: the quality of the content
         | has gotten worse. Everyone and their mother is doing SEO now.
         | They're trying to maximize clicks, not quality content.
        
         | BLanen wrote:
         | They seem to keep giving more and more confidence in their
         | language semantic ai.
         | 
         | Quoted words in the search should return only pages with that
         | actually on the page. That used to work. Now it often shows
         | pages that don't contain it all, not in the search results page
         | itself, nor on the page when you go there. So you'd think it
         | just ignored the quoted text and gave me results without it
         | instead, right? Wrong. Removing the quoted text to either
         | unquoted text or removing it entirely both result in different
         | results. So it DOES process the quotes SOMEHOW. But it's not a
         | clear RULE because probably it's also just processed as an
         | input to the semantics engine. Just tell me you don't have any
         | pages with the text instead, please. Which it actually also
         | some sometimes...
         | 
         | It's an unfixable mess. And I don't think this can be turned
         | back. Building a competitor costs hundreds of billions. A
         | competitor will likely end up taking the same approach anyway.
         | 
         | I just wanted a list of google's big search engine updates but
         | even searching this I get SEO-d blogspam ABOUT THE SEO IMPACT
         | OF THE UPDATES.
        
           | 1_player wrote:
           | Google Search has been going downhill _fast_ over the past 5
           | years or so. Since it's probably incompetence rather than
           | malice, what the heck is going on at Google Search? It seems
           | to me the "let's use AI for everything" camp has taken over
           | the entire place even though it's making Search worse than it
           | ever was.
           | 
           | And please spare me the excuse that now Google can answer
           | questions. It can't, it just answers with snippets extracted
           | from websites it deems relevant, and often the answer is flat
           | out wrong or irrelevant.
           | 
           | I just hope the ML/AI mania that has taken over these big
           | tech companies proves to be a fad that just goes nowhere like
           | it did in the 70s and we return to plain old algorithms and
           | good software engineering.
        
             | Clubber wrote:
             | >I just hope the ML/AI mania that has taken over these big
             | tech companies proves to be a fad that just goes nowhere
             | like it did in the 70s and we return to plain old
             | algorithms and good software engineering.
             | 
             | It's now being used by law enforcement. Sponsored by an
             | errant swat raid near you. Don't worry, they'll prosecute
             | you anyway to cover their ass. Can't risk losing their
             | pensions, ya know. /scared
             | 
             | https://www.policechiefmagazine.org/product-feature-
             | artifici...
        
             | dorkwood wrote:
             | If I search "who gave away Anne Frank's hiding place?",
             | Google confidently gives me the answer "Miep Gies".
             | 
             | I don't know why Google would even suggest this -- Miep was
             | one of Anne's helpers. Imagine all the other people out
             | there having their names unfairly smeared by Google's
             | algorithm.
        
             | tekstar wrote:
             | Maybe some PM realized:
             | 
             | Worse search results -> more Google searches -> more ad
             | views
        
               | ahepp wrote:
               | That backfired then (well, for me). I use DDG now because
               | if my results are going to suck either way, I'd rather
               | have privacy.
        
               | stavros wrote:
               | I've been using DDG for years, and its quality has
               | recently gone down too. I was accustomed to using !g to
               | search Google, but now I'm very surprised to find that
               | the Google results are _worse_!
               | 
               | Google has completely lost the plot if DDG is the engine
               | with the better results.
        
               | ahepp wrote:
               | Yeah, it's at the point where search engines are now
               | closer to shortcut bars for me. It's rare that I don't
               | have to append "wiki" or "Reddit" to my search in order
               | to find a decent result (which I assumed existed before I
               | even searched it).
               | 
               | I wonder what the consequences are for discoverability on
               | the web. It can't be good. Maybe I'm waxing nostalgic,
               | but it seemed like there was a time I discovered new and
               | cool web sites. Now all I discover is ad spam clogging up
               | the tubes.
        
               | stavros wrote:
               | > I wonder what the consequences are for discoverability
               | on the web. It can't be good. Maybe I'm waxing nostalgic,
               | but it seemed like there was a time I discovered new and
               | cool web sites. Now all I discover is ad spam clogging up
               | the tubes.
               | 
               | I feel the exact same way too. I can't even discover
               | blogs with solutions any more (even ones I know exist
               | because I've seen them before), it's all just spam.
        
               | rafale wrote:
               | Good point. Users only know about Google at this point.
               | And won't even consider another search engine.
        
               | [deleted]
        
               | handrous wrote:
               | And if the spammy, garbage pages now appearing on results
               | page #1 also use Google ads....
        
             | alex_c wrote:
             | >It can't, it just answers with snippets extracted from
             | websites it deems relevant, and often the answer is flat
             | out wrong or irrelevant.
             | 
             | I recently searched the name of a YouTuber I was watching,
             | the top "People also ask" suggestion was "Is [name of
             | youtuber] married?"
             | 
             | I didn't particularly care but clicked out of curiosity. It
             | expanded to show a snippet from an old (1980s) NY Times
             | article about a completely different person getting married
             | long before this particular person was even born.
             | 
             | Google AI using other companies' content to provide wrong
             | answers to questions I didn't even ask... That says it all.
        
             | shuntress wrote:
             | Just like pretty much every other thing online Google
             | (search) is constantly in danger of being run over by
             | people acting in bad-faith to game the system.
             | 
             | Of course their actions have not been perfect but it is a
             | mistake to say that their search would be better as a
             | "plain old algorithm" no matter how well-engineered it is.
             | 
             | I'm certain that search results would be worse than they
             | are now if the algorithm was just "grep but for the whole
             | internet". Or that, in that case, the careful complexity
             | necessary in each search to exclude all the SEO garbage
             | would be unbearable.
        
             | handrous wrote:
             | The plummet began around '08 or '09, to my recollection. It
             | was as if, over a period of a few months, they just
             | completely _gave up_ trying to fight webspam shit. Simply
             | surrendered.
        
           | christkv wrote:
           | I just want a search engine that will let me mark sites and
           | links and let me search in that subset when I want so I can
           | add curated sources to my little index.
        
             | jiggawatts wrote:
             | "Engagement with the site is up!"
             | 
             | Translates to: "The users can't find what they're looking
             | for, so they're clicking around a lot."
        
             | UnFleshedOne wrote:
             | Or a good support for search results block lists, then
             | users can deal with SEO spam same way they deal with ads.
             | Pinterest will be sooo gone...
        
         | mritchie712 wrote:
         | I have the Algolia HN search bookmarked. For specific types of
         | searches I go straight to that
        
         | systemvoltage wrote:
         | Algolia should look into becoming the next search engine.
         | 
         | I want it to absolutely destroy Google/Microsoft.
        
       | analog31 wrote:
       | Not related to coding, but I've noticed a lot of "best of" and
       | "top ten" sites that appear to be of the same ilk, possibly
       | automated, that just combine pictures and paragraphs ranging from
       | ad copy to pure drivel. On topics ranging from bicycles to Linux
       | distro's.
        
       | hidden-spyder wrote:
       | An index of these sites will be helpful to mass blacklist them
       | with the uBlacklist extension.
       | 
       | Anyone up for creating one so everyone can contribute to it?
       | 
       | The extension allows subscribing to blacklists via links, so a
       | single txt file will be enough.
        
       | thesunkid wrote:
       | > Every programmer should be grateful for the opportunity to find
       | good quality content quickly
       | 
       | Totally. There should be a better way to index SO.
       | 
       | you.com seems to try doing it that way. For most code issues,
       | it's easier to navigate and decide what's worth reading from You
       | than from google IMO.
        
       | tsrand0m wrote:
       | Scrape-paste is one of the easiest way to make significant money,
       | if it takes off, and that's why these sites are made.
       | 
       | I think, google does well in general with coding or SO questions,
       | but will show you these low quality sites when the questions are
       | new or very specific and difficult to answer. Maybe, time to
       | apply your head more.
       | 
       | *been on both sides
        
       | notreallyserio wrote:
       | I've found that Bing does a better job at detecting spam like
       | this. Not perfect, just better.
        
       | lykahb wrote:
       | The content farms get ahead of the organic results for many other
       | areas too. Search for the programming questions isn't so bad. At
       | least, the garbage is easy to recognize. Queries about products
       | and services probably have the worst results.
        
       | nickthesick wrote:
       | Not only SO threads, I particularly hate the ones that mirror
       | GitHub Issues. They don't even link back to the original thread,
       | for Christ's sake!
        
         | yumraj wrote:
         | And as I had mentioned in another thread, DDG so far ignores
         | them, at least in my 1 case, but Google has those GitHub issues
         | mirror sites as the first results. So they are clearly gaming
         | Google SEO.
        
           | ziml77 wrote:
           | My experience has been finding more garbage on DDG, not less.
           | I ended up getting too tired of those results being above the
           | good ones that I switched back to Google.
        
             | tomtheelder wrote:
             | Wow total opposite experience for me. I find Google much
             | better for casual searches, but DDG oceans better for
             | anything technical.
        
             | yumraj wrote:
             | Try this, the exact query I was referring to. I leave it to
             | the reader to decide which is better.. The top 2 Google
             | results are scrapped clones (or whatever they should be
             | called) of the 1st result from DDG (Github issues #162 for
             | docker-calibre-web)
             | 
             | The closest Google comes in top 4 links is a link to the
             | entire Github issues for docker-calibre-web
             | 
             | DDG: https://duckduckgo.com/?t=ffab&q=FakeUserAgentError+ca
             | libre-...
             | 
             | Google: https://www.google.com/search?hl=en&q=FakeUserAgent
             | Error%20c...
             | 
             | DDG Results:
             | 
             | 1) https://github.com/linuxserver/docker-calibre-
             | web/issues/162
             | 
             | 2) https://www.nas-forum.com/forum/topic/67746-tuto-
             | calibre-web...
             | 
             | 3) https://pypi.org/project/fake-useragent/
             | 
             | 4) https://github.com/linuxserver/docker-calibre-
             | web/issues/
             | 
             | Google Results:
             | 
             | 1) https://githubmemory.com/repo/linuxserver/docker-
             | calibre-web...
             | 
             | 2) https://issueexplorer.com/issue/linuxserver/docker-
             | calibre-w...
             | 
             | 3) https://github.com/linuxserver/docker-calibre-web/issues
             | 
             | 4) https://github.com/janeczku/calibre-web/issues/1527
        
           | coryrc wrote:
           | I'm getting lots of these in DDG too. Right now I'm doing
           | searches about home renovation and I see lots of pages which
           | are just a rip off of an online forum page (which I find
           | elsewhere in my results, usually). Which, unlike SO, is not
           | licensed for people to copy.
        
           | deadbunny wrote:
           | Sample of 1 as well but I do see them on DDG, not nearly as
           | much as Google though.
        
           | znpy wrote:
           | I'd blame gooogle more than the website, since google search
           | results are generally getting worse and less relevant
        
             | monkeybutton wrote:
             | Totally. This is a exactly what I was thinking about
             | yesterday in the thread where someone was asking if Google
             | results were getting worse.
             | 
             | There's no reason the official docs for Python should be
             | lower in the results than a shitty docs clone / spam site
             | when searching for a common package/function in the
             | standard library.
        
               | kikokikokiko wrote:
               | "There's no reason the official docs for Python should be
               | lower in the results than a shitty docs clone"
               | 
               | If Google still used/respected the original page rank
               | algo, the official docs would never be ranked lower than
               | a spammy clone site. I just wish they reverted back to
               | the "power of the crowd" algo, let each node vote with
               | their links and reputation.
               | 
               | Nowadays I almost stopped using Google completely. I
               | first noticed it with torrent/streaming censorship ~5
               | years ago, then the political/ideological censorship
               | started to feel unbearable. I just want my search engine
               | to show me the most relevant results BASED ON THE QUERY I
               | SUBMITTED. No moral judgement crap.
               | 
               | My search stack nowadays is a mix o Bing, ddg, Yandex and
               | if everything else failed, Google. It's a sad reality.
        
         | johnchristopher wrote:
         | Just found out about github clones today. It threw me off.
         | 
         | At least for SO you can get back to SO to search but github
         | issues search engine is something else.
        
       | slater wrote:
       | Spammers. They mirror SO stuff on their own sites and put google
       | ads on them
        
       | legrande wrote:
       | Just whitelist Stack Overflow in your head and avoid splogs /
       | spamblogs?
        
       | grinchygrinch wrote:
       | Same thing with various GitHub issues lookalikes.
        
       | edmcnulty101 wrote:
       | I use personal blocklist extension.
       | 
       | It lets you remove entire sites from your search.
       | 
       | https://chrome.google.com/webstore/detail/personal-blocklist...
       | 
       | I've also removed a bunch of trash news sites and wwwSchools, and
       | it's really sanitized my search.
        
       | howmayiannoyyou wrote:
       | IMHO the biggest offender:
       | https://www.geeksforgeeks.org/
       | 
       | They are making me insane with the modal login demand. I wonder
       | if Google has downgraded the authoritative standing of
       | StackOverflow?
        
         | dmurray wrote:
         | These definitely aren't what the OP are talking about. They
         | focus on original content but the quality is not great.
         | 
         | The other sites, every page is a mirror of an SO page, with
         | worse CSS. Maybe you haven't encountered them yet!
        
         | chakkepolja wrote:
         | you can use UBO element zapper, but i recommend staying away
         | from that site except for leetcode type questions. The content
         | quality is mixed at best.
        
         | phgn wrote:
         | Do they not have original content? I'm always frustrated when I
         | click too fast and land on geeksforgeeks.org, because their
         | code examples are so low quality and without explanations.
         | 
         | I didn't notice SO being this bad.
        
         | 0des wrote:
         | This site has fooled me once or twice. I think there are other
         | domains that point to the same data-set, because that's not the
         | only domain serving up this exact content.
        
         | jer0me wrote:
         | I've just disabled JavaScript. The site works fine and the
         | prompt never appears.
        
         | S6eUL wrote:
         | See this reddit answer on how to circumvent the login prompt:
         | https://old.reddit.com/r/programming/comments/q0oqai/what_is...
        
         | sireat wrote:
         | geeksforgeeks actually has humanly created articles. The
         | quality varies (kind of like W3Schools does).
         | 
         | With ublock origin I have not seen any modal logins.
         | 
         | For example this article on red black trees:
         | https://www.geeksforgeeks.org/red-black-tree-set-1-introduct...
         | does not seem that horrible with proper references to CLRS and
         | everything
         | 
         | OP is talking about ugly looking automated Stack Overflow
         | copies. No idea how those end up so high in rankings.
        
       | x86_64Ubuntu wrote:
       | It will probably take some time, but sites such as roseindia and
       | expertsexchange also clouded search results in a similar manner.
       | They are now history because Google and others deranked them to
       | the depths of hell.
        
       | pawelduda wrote:
       | site:stackoverflow.com <query>
       | 
       | That's what I do
        
       | dgb23 wrote:
       | This is an obvious case of SEO spam. But there are tons and tons
       | of other examples worth mentioning.
       | 
       | For example many news sites have soft paywalls that can easily be
       | circumvented with a few clicks. The reason they don't have an
       | _actual_ paywall is likely to come up in search results. So
       | essentially they spam search results and obscure the content for
       | technical illiterate users instead of just paying for ads. They
       | want their cake and eat it too.
       | 
       | Now this whole dynamic is super weird. We often talk about these
       | issues as if Google was some kind of public service that should
       | make useful and fair search suggestions. Sure they have the
       | incentive to do so, but they have conflicting interests at the
       | same time.
        
       | gompertz wrote:
       | This is why I often search solutions directly on Stack Overflow,
       | and not via Google. Or I add "site:stackoverlow.com" to my
       | search. Generally SO has all I ever need... I find vendor forums
       | to be a total wasteland for help (ie. Power BI forums) so don't
       | need them as part of Google results.
        
       ___________________________________________________________________
       (page generated 2021-12-01 23:01 UTC)