[HN Gopher] Search engines and SEO spam
       ___________________________________________________________________
        
       Search engines and SEO spam
        
       Author : iamjbn
       Score  : 438 points
       Date   : 2022-01-03 16:08 UTC (6 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | NmAmDa wrote:
       | One example, the website called gitmemory which crawls github
       | data regularly and have better SEO than github that usually you
       | will find results above original github links.
        
       | PaulHoule wrote:
       | I have wondered about this.
       | 
       | When I run web sites I frequently look at the log and find a
       | large fraction of the traffic is from search engines. This is a
       | problem because it costs me money to serve that traffic. It might
       | not be initially obvious but it costs more than serving real
       | users because the search engines will scan everything and break
       | the cache.
       | 
       | Google sends a significant amount of traffic. Bing sends a
       | detectable amount of traffic. Baidu's crawler might be more
       | active than the two of those together but I never get hits from
       | Baidu. Other crawlers deliver me trouble instead of value: even
       | if I'm not interested in hosting pirate or plagiarized content, a
       | crawler that is looking for trouble is only going to bring me
       | trouble.
       | 
       | I hate doing it but I turn off crawlers other than Google and
       | Bing both at the robots.txt and web server level because I just
       | can't afford to serve Baidu queries.
       | 
       | I'd like to sign an exclusivity contract with a search engine
       | such that they get exclusive access to crawl it and in turn I get
       | a privileged position in search results. This would give the
       | search engine and myself an incentive to deliver end-to-end
       | quality results.
        
       | snth wrote:
       | Several people mention DuckDuckGo in that Twitter thread. I use
       | DuckDuckGo for my main search engine, and it's not obviously any
       | better than Google regarding SEO spam.
        
         | Kiro wrote:
         | I don't think they apply much spam fighting to the results they
         | get from the underlying search index (Bing), but I could be
         | wrong.
        
       | ttiurani wrote:
       | Is there a search engine for programming? One that not only
       | searches stackoverflow, github, relevant subreddits and the other
       | big sites, but also finds programming articles in personal blogs?
       | 
       | That would be valuable to me.
        
       | anovikov wrote:
       | Sadly the only way of fixing it is making search results
       | unattractive for cracking. Ranking of a page in search results is
       | a metric, and every metric is a hackable metric. Only way they
       | won't be hacked is if there's no incentive to.
       | 
       | Sure a search engine that specialises on narrow area of knowledge
       | without much money in it, can be very relevant and bullshit-free.
       | 
       | But there's no way to make it work for the general web search.
       | People hack things. If they didn't we'd have Communism built by
       | now (yes the "good" - classless, stateless one).
        
       | deltarholamda wrote:
       | The quote-Tweeted thread mentioned recipes as one of the things
       | that has been SEObliterated. It's a great example of the problem,
       | and also a great example of the problems any solution will
       | encounter.
       | 
       | Recipes have become a bellwether Internet problem. In the past,
       | your great-grandmother had a card file with a bunch of 3x5 index
       | cards with the ingredients and instructions on how to make
       | everything, and they pretty much all fit on one side. There was a
       | great deal of domain knowledge required (e.g. "whip to stiff
       | peaks"), but these things reveled in their terseness.
       | 
       | Internet recipes all begin with 9 paragraphs of the author's
       | first time encoutering the dish in a Moroccan bazaar in 1997, and
       | the life story of the chef. There are two embedded 10-minute
       | videos of the lifecycle of the vanilla bean. And then you get to
       | the ingredient list. Then two more 10-minute videos, then
       | instructions.
       | 
       | The drive to make recipes full-contact Internet content has
       | changed what it means to be a recipe. This is similar to how
       | cooking shows evolved from Julia Childs working on a sound stage
       | to a carnival barker presentation with vivid personalities
       | dominating the scene.
       | 
       | I'm not sure there is any technological solution to a problem
       | that has fundamentally changed what it means to be a recipe,
       | short of establishing a new informational silo in the form of a
       | new Web site devoted to recipes only. You could encourage an RSS-
       | like format for recipes, but that requires buy-in from places
       | that profit from the new evolution. This new status quo may be
       | good or bad--you can make the argument either way--but it is what
       | it is. A cultural change is required more than tweaking
       | algorithms.
       | 
       | (Unless tweaking algorithms can be foundational to cultural
       | change, in which case we really, really, really need to take a
       | hard look at the corporate behemoths and their algorithms, and
       | sooner better than later.)
        
         | max49 wrote:
         | >I'm not sure there is any technological solution
         | 
         | The technological solution would be to stop rewarding them for
         | these monstrosities. One of the main motivator for turning a
         | short recipes into a 19 page essai about the chef's life is
         | that more words = better ranking.
        
           | notreallyserio wrote:
           | And funny enough, it's obvious that Google's engineers know
           | this because they're adding more small, self-hosted featured
           | results to the top of the page all the time.
        
           | Nextgrid wrote:
           | And the end game is that ads pay for it. Nuke ads or downrank
           | them and the incentive goes away, making space for enthusiast
           | non-profit-driven websites.
        
           | deltarholamda wrote:
           | But the technology is solely controlled by a single
           | multinational advertising corporation. The motivations are
           | controlled by that same advertising corporation.
           | 
           | Which is the same as there being no technological solutions.
        
         | foobarian wrote:
         | The recipe problem is mostly because actual recipes are not
         | copyrightable. See e.g.
         | https://www.plagiarismtoday.com/2015/03/24/recipes-copyright...
        
           | leobg wrote:
           | Neither are ideas. Hence article spinning and book summaries.
           | 
           | We need a semantic dupe filter: If it doesn't add new facts
           | or new ideas, treat it as an identical copy.
        
       | lifeisstillgood wrote:
       | Google is not important because it has all the information - it's
       | important because it has hardly any.
       | 
       | A major complaint is that there used to be good free reviews of
       | commercial products that could be easily found.
       | 
       | That is not "all the information". Information about the current
       | round of commerically advertised products is something like 5-10%
       | of all commerce (or less).
       | 
       | And we are entering a world where "all the information" is what
       | we do all day, what we say, how we react to different stimuli.
       | 
       | That is the real review sites - why do people take this train and
       | not that, why is that park safe and this one full of muggings.
       | 
       | We need to solve the Google problem not because we want blogging
       | like it's 2009 but because epidemiology is about to open
       | humanity's eyes. And it's going to hurt if we don't make it free
       | and open.
        
       | [deleted]
        
       | streamofdigits wrote:
       | Eventually search will become a decentralized activity (No, not a
       | web3/crypto/coin type decentralization, I am talking about the
       | useful type).
       | 
       | Is there any particular reason why internet search has to have a
       | distorting gatekeeper to the global commons (that pretends
       | playing Maxwell's demon). For chrissake, the stuff being indexed
       | is _public_.
        
         | imranhou wrote:
         | Not all that google indexes is public, primarily
         | paywalls/loginwalls allow google IP's to crawl information
         | unhindered but as users you are not, so a new search engine
         | will have to get to a scale and popular enough for others to
         | open up. Quick example: Google can index many news sites, or
         | LinkedIn profiles for example that a regular user with no
         | account cannot.
        
           | streamofdigits wrote:
           | Thats true, but probably something that can be tackled later
           | and in any case it would not be a show-stopper for creating a
           | valuable alternative (There are similar thorny related issues
           | around IP e.g. for news sites)
        
         | mrkramer wrote:
         | >Eventually search will become a decentralized activity (No,
         | not a web3/crypto/coin type decentralization, I am talking
         | about the useful type).
         | 
         | People care about UX not about technology remember that unless
         | people are willing to sacrifice good UX in order to have
         | greater security and privacy. These things are tricky and there
         | is no right formula.
        
       | tester756 wrote:
       | How about Bing?
       | 
       | Is it viable competition?
        
       | anfilt wrote:
       | An other thing does not help is how some sites gate content from
       | being scraped. Also forums are not as popular today again
       | reducing the amount of indexable content. Think about some sites
       | have migrated from using forums to something like discord.
        
       | bretpiatt wrote:
       | This is already happening for a bunch of verticals:
       | 
       | Travel - Expedia, Hotels.com, Kayak, etc.
       | 
       | Consumer Goods - Amazon, WalMart, EBay, Etsy, etc.
       | 
       | Automobile Purchase - Cars.com, Autotrader, etc.
       | 
       | Career/Job - Indeed, LinkedIn, etc.
       | 
       | As Google continues to lose search volume on these big revenue
       | categories it is going to make spam much more difficult as they
       | are working to sort out long tail spam. Way harder.
        
         | jeffbee wrote:
         | Wow so that's completely opposite of my feelings on this topic.
         | I would never, ever use Expedia for travel search over Google
         | flights/hotels. Google travel _is_ the meta-search engine for
         | this vertical. Expedia etc. are all-in on spam and scams,
         | trying at every opportunity to take an extra dollar from you.
         | 
         | The same with Amazon. You'd _think_ that with all the armchair
         | search quality experts cropping up lately there might be more
         | vocal complaints about the fact that Amazon 's own search can't
         | find basic consumer products sold by Amazon itself. If I want
         | to find stuff on Amazon, I search Google for it.
        
         | bradyo wrote:
         | Yeah the comments in this thread are baffling (or they didn't
         | read the 100 character tweet lol). The tweet is just describing
         | domain specific database-like websites. Have people not heard
         | of allrecipes.com? Yummly? Or one of the other thousands of
         | recipe db sites? No blogspam, just structured recipe search.
         | You can even search by ingredient!
         | 
         | Mayo clinic, Harvard health, and pubmeb do a great job with
         | health info. IMDb for movies, Goodreads for books, *gearlab.com
         | for reviews, booking.com for accomodations.
         | 
         | I think the biggest threat to Google isn't a better general
         | search engine, it's user behavior switching to more domain-
         | specific websites as the top of the funnel. E.g. people going
         | directly to Amazon to search for products instead of first
         | searching Google.
         | 
         | To some extent, Google has figured this out, which is why they
         | now have a dedicated flight search, hotel search, product
         | search (Google shopping still exists and it's pretty good!),
         | etc.
        
       | cpeterso wrote:
       | Amazon's search results and scammy third party sellers are a
       | similar trust problem. When possible I try to purchase directly
       | from the product manufacturer's website. Similarly, I don't
       | search Google for product reviews, I go directly to trustworthy
       | review websites.
        
       | throwaway14356 wrote:
       | ah, so everyone wanted to move from carefully crafted personal
       | websites where every detail counts and low effort publications
       | are harshly punished to platforms with guranteed readership and
       | now we have a curration problem?
       | 
       | Someone (who probably doesnt have a website) said that comment
       | moderation on your own website is to much work. Perhaps the whole
       | internet is to much work?
       | 
       | But i like the spam search engine by and for spammers as a way of
       | finding the latest and greatest affiliate marketing and
       | blockchain swindle.
        
       | leoc wrote:
       | https://twitter.com/mwseibel/status/1477707884632834049
       | 
       | > I'm pretty sure the engineers responsible for Google Search
       | aren't happy about the quality of results either. I'm wondering
       | if this isn't really a tech problem but the influence of some
       | suit responsible for quarterly ad revenue increases.
       | 
       | Please no more of this. Two men, Page and Brin, together have
       | basically unfettered control over Google.* If Google does
       | something bad then, unless it's genuinely something small enough
       | that those two could not be expected to hear about it, it's
       | happening with--at the very least--their acquiescence. And low
       | overall search quality is not something that some "suit" is
       | successfully hiding from Good Czar Larry. They could fire the
       | "suit", or command him or her to make other decisions. This is--
       | again, at the very least--something that they have chosen not to
       | do. The responsiblity lies with them.
       | 
       | * There _is_ the risk of lawsuits from the minority shareholders,
       | I assume. But IIUC this is not realistically that big a restraint
       | on what shareholders with a majority of votes can do. However
       | IANAL.
        
       | wslh wrote:
       | In 2013 I elaborated about this topic:
       | http://blog.databigbang.com/letters-from-the-future-challeng... I
       | would add that in 2021 we can easily do Natural Language
       | Understading (NLU) and Natural Language Generation (NLG) and can
       | build zillions of web pages that don't follow the original page
       | ranking concept of Google. Probably important sites share less
       | low rank pages and there are many more link rings and clusters.
       | More decentralized blogs seems a thing of the past (expecting to
       | be rebooted in the future).
        
       | ape4 wrote:
       | One approach would be to have moderators from the community who
       | are allowed to make decisions about results.
        
       | cblconfederate wrote:
       | Good. In fact, if we want people to visit websites other than
       | google.com (and then read the answer in the snippet or the box in
       | the sidebar) then it's good that google results are crap. Use
       | google less.
        
       | beefield wrote:
       | Okay, given that we have pretty successful examples of wikipedia
       | as a general crowdsourced information storage and stackoverflow
       | as a specialized domain crowdsourced Q&A site, would it be
       | impossible to build a crowdsourced search engine? Not even
       | scraping the web, but I would just type my search term, if that
       | is already searched and results voted, I would see those. If it
       | wasa completely new search term, I would get no immediate
       | results, but my search would be displayed in "new searches page",
       | which some voluntary people would be following and trying to add
       | relevant results.
        
       | usrusr wrote:
       | "You might need to do a lot of manual spam fighting initially"
       | 
       | How would this be limited to "initially"? Wouldn't it be a lot,
       | initially, and then only get worse?
        
       | thr0wawayf00 wrote:
       | I'm honestly not trying to take a potshot against PG or YC here,
       | but it's kinda funny to see him saying this after I worked for a
       | YC-backed startup years ago that built its core revenue streams
       | around generating SEO spam, we just marketed it as something
       | else. Just to be clear, I don't think PG or YC are responsible
       | for all or even most SEO spam, but I know firsthand that they've
       | profited from it through at least one of their incubated
       | companies.
       | 
       | I never considered the possibility that an incubator would
       | support a specific product, then later on call for alternatives
       | that would essentially freeze out the original product that they
       | supported. I'm sure this very rarely happens, but it's
       | interesting to see a real-world example in action.
        
         | vlovich123 wrote:
         | Don't know how rarely it happens. After all, weapons dealers
         | frequently arm both sides of a conflict. They don't really need
         | to care who wins - they're just making twice the amount of
         | money.
        
         | awillen wrote:
         | I don't think it's as counterintuitive as it sounds - just
         | because they're playing the game doesn't mean that they think
         | it's right. If your options are to not be successful at SEO or
         | to do the SEO spam thing, I don't think it's necessarily wrong
         | to do the latter - it's your job to make your startup
         | successful, not to make a stand against the way Google does
         | things.
         | 
         | I view it as something like the rich folks who call for
         | additional taxation of the rich. They're not going to just pay
         | extra money that they don't have to under the current tax
         | rules, both because it's not particularly fair and because one
         | person paying extra taxes, even if they're very wealthy, isn't
         | going to make a big impact. That doesn't mean they can't lobby
         | to change the rules and be totally fine with it if everyone is
         | paying additional taxes.
        
       | jeffbee wrote:
       | All of the amateur search quality experts forget to mention the
       | regulatory environment. Obviously, Google could nuke Pinterest
       | from orbit, dramatically improving image search results. Clearly,
       | Google could effectively take down Statista, technically. But
       | various Eurocracies have shown an extreme willingness to take the
       | side of Yelp, Pinterest, and whatever other spam/scam mills are
       | able to form a shadow alliance with Microsoft's astroturf
       | campaigns like "fairsearch" and whatever.
        
         | wakiza33 wrote:
         | big part of it. and the scrutiny is only going to get more
         | intense
        
         | notreallyserio wrote:
         | If this is a real concern they can side step it by simply
         | allowing users to block specific domains, like they have in the
         | past.
        
       | coffeeroach wrote:
        
       | Covzire wrote:
       | Could this issue be related to Gmail's spam filtering? For
       | approximately 2 years now it's been downright porous, I'm getting
       | on average 1 obvious spam message in my inbox that is something
       | like:
       | 
       | c0nGrats-You_HaVe_Won_ThE_Pr1ze!
       | 
       | ..Or some silly variation of this that takes literally 0.1 ms for
       | a human to discern that it's spam. Yet something happened to
       | Gmail's spam algorithm in the last couple years that has been
       | consistently letting these through. To be fair, it does catch
       | most spam but it's only batting something like 75% and the spam
       | it does catch is often times much less obvious to human eyes than
       | the stuff it lets through.
        
       | igammarays wrote:
       | Interestingly, Google Maps doesn't suffer as much from the issues
       | with Google Search. Maybe because it has those community-driven
       | curation features that PG is talking about? Google Maps is
       | fantastic at finding places to go to (and getting you there).
       | 
       | Also, why hasn't Apple built a search engine yet? It baffles me
       | that they chose to go head-to-head with Google on Maps, yet
       | outsourced their search engine. I would've liked it the other way
       | around: Google Maps and Apple Search.
        
         | nostromo wrote:
         | Apple makes 10+ billion a year from Google by setting It to the
         | default search engine on iPhone.
         | 
         | I agree they should make their own search engine. But currently
         | they're being paid a ton of money not to.
        
         | techdragon wrote:
         | It's much harder to get SEO spam style content to maps given
         | the geographic region limits involved... But it happens my
         | favourite example is searching for suppliers of something
         | basic, say structural aluminium extrusions, big and heavy and
         | you ideally don't want to ship it far so it's an ideal thing to
         | search for a local supplier of. In Australian results is
         | basically a given that I'll get results for my city because
         | they will list it as a delivery area they supply to however
         | when you actually try to find them on the map as a pin, nope
         | it's either not their or just is a sales office not a warehouse
         | or workshop, so they have tricked the system into listing them
         | as local to my area in a way that pollutes my maps search
         | results.
         | 
         | But this only works for certain industries. It's much less
         | common to see this kind of tactic if your searching for say a
         | coffee shop because they sheer number of local results let's
         | Google be "hyper local" with these kinds of results.
        
         | Nextgrid wrote:
         | Google Maps is spammed by fake locksmith or other trades that
         | make it look like they're local but all route to the same
         | boiler room (probably right next to the tech support or IRS
         | scammers) from where they dispatch a crooked & most likely
         | unlicensed tradesman that will do a poor job and overcharge you
         | (destroying the lock so they can sell you an overpriced
         | replacement instead of picking it, etc).
         | 
         | For licensed trades the solution is to go to your official
         | trade licensing body (for the UK it's the NICEIC for
         | electricians and the Gas Safe Register for gas/HVAC
         | technicians), for unlicensed ones it's more difficult. There
         | are "review" sites that claim to provide good results but their
         | business incentives & vulnerability to spam/fake reviews are
         | unknown.
        
       | ben7799 wrote:
       | It'd be really interesting if Google allowed upvote/downvote on
       | search results... but it'd be super hard to imagine them every
       | taking the votes into account much versus ad revenue.
       | 
       | And the upvote/downvote would be very tricky to implement in a
       | way that the SEO crowd couldn't just game it horribly.
        
         | jeffbee wrote:
         | Clicking a result is essentially an upvote.
         | 
         | Immediately returning to the results page is essentially a
         | downvote.
         | 
         | You can't really crowdsource this stuff, because the problem of
         | brigading and other forms of abuse is way too high. Just
         | imagine what the crowdsourced results for "trump won" or "trump
         | lost" would look like, or hydroxychloroquine, or ivermectin, or
         | to go with some older cults of personality, Hitler or Ataturk.
        
       | 1024core wrote:
       | ... and the moment you gain some traction, the SEO monster will
       | train it's eye on you like Sauron; and without a billion dollar
       | budget, you will be toast.
        
       | nabla9 wrote:
       | When the search engine is funded by ads, there is incentive to
       | produce results that people who click ads like.
        
       | Veen wrote:
       | > Why not try writing a search engine specifically for some
       | category dominated by SEO spam?
       | 
       | Back in the olden days, there were lots of organizations that
       | collated high quality content from the best writers. They
       | nurtured expert writers and paid them well. They fact-checked the
       | content and employed diligent editors and proofreaders so it was
       | accurate and well-written. Over the years, they'd build a
       | reputation for reliability and trustworthiness that kept people
       | coming back for more. If you wanted to learn about fitness, or
       | cars, or cooking, or science, you'd find a reputable author and
       | publisher and buy their magazines or books.
       | 
       | But then, in the early 2000s, the geniuses from SV "disrupted"
       | the publishing industry and its financial model. They brought us
       | a much better way to find content, the search engine. Because
       | they were so much better than the old-fashioned publishers,
       | search engines gobbled up the advertising money and became the
       | dominant gateway to content. Publishers had to abandon expensive
       | high-quality writing because rankings and eyeballs now mattered
       | more than quality and trustworthiness. Instead of investing in
       | writers, they invested in marketers and SEO specialists.
       | 
       | The result: worthless content, writers banging out garbage for
       | peanuts, and useless search engines.
       | 
       | Two decades later, looking at the barren wasteland they had
       | created, the SV geniuses thought: I know what we need, more
       | search engines, but smaller ones that collate high-quality
       | content from the best writers. There must be money in that,
       | right?
        
         | mistermann wrote:
         | Something else that has largely disappeared is that there used
         | to be a fair amount of organization of content, whereas now a
         | lot of content is just thrown into a big pile and the user is
         | left to go fishing on their own with search engines, whose
         | ability to search seems to be declining (ie: Google often seems
         | to no longer support mandatory include/exclude search
         | parameters). Generally speaking, the result seems to be
         | decreasing order and increasing chaos.
         | 
         | Of course, the massive volume of content creates a fundamental
         | problem, but user curation & categorization on sites like
         | Youtube would be possible, were Google to provide the software
         | support so people could do that. Whether this and similar
         | decisions are deliberate or accidental is likely one of those
         | things that we will never know.
        
           | allochthon wrote:
           | > Google often seems to no longer support mandatory
           | include/exclude search parameters
           | 
           | I've noticed this, and it's frustrating. I have assumed it's
           | intentional. I am left to guess as to what a change in this
           | behavior would accomplish.
        
         | Volker_W wrote:
         | What does SV stand for?
        
           | smk_ wrote:
           | Silicon Valley
        
           | [deleted]
        
           | allochthon wrote:
           | Silicon Valley, i.e., the California tech scene.
        
         | salt-thrower wrote:
         | It's been really sad to grow up and watch the cool techie
         | optimism of the 2000's internet get sucked dry by profit
         | motives and left to rot. The change has occurred pretty much
         | entirely within my adult lifetime (I'm only 27 and I still
         | remember when Google was the cool new thing on the internet).
         | 
         | It went from "search engines and the web will usher in a new
         | era of wisdom and democracy" to "useful content is dying at the
         | hands of monetization schemes, and also the internet will be
         | the death of liberal democracy, woe unto us all" in about 15
         | years.
        
         | renewiltord wrote:
         | Those guys covered literally nothing compared to what I can get
         | recommendations for with "product type Reddit". No thanks.
        
           | Veen wrote:
           | You may not be aware, but the written word can be used for
           | more than product reviews.
        
             | renewiltord wrote:
             | Oh, they were just usually wrong on everything else.
             | Fortunately, these days we have individuals debunking the
             | nonsense. Back then, people just uncritically believed
             | total horseshit.
             | 
             | The invariant has always been: find people who make
             | falsifiable predictions and improve. Back then the pool was
             | small and you had no choice. Now, fortunately we have a
             | choice.
        
         | GarlicToum wrote:
         | I tried creating a search engine for recipes. It works well and
         | people like it, but the struggle is no one remembers that it
         | exists and Google is just their default for search.
         | 
         | So from an individual developer perspective, it's very hard to
         | get people to change their habits. And Google/duck/Bing is the
         | one stop shop.
         | 
         | It's still out there, but I haven't worked on it much lately. I
         | always think that if I had some good advertisers, a better UI,
         | and a salary coming in, maybe it could take over some of
         | Google's usage!
        
           | Volker_W wrote:
           | > I tried creating a search engine for recipes. It works well
           | and people like it, but the struggle is no one remembers that
           | it exists and Google is just their default for search.
           | 
           | Link please
        
             | GarlicToum wrote:
             | https://garlictoum.com/
        
               | imilk wrote:
               | Always nice to see other sites using Svelte!
        
               | kevingrahl wrote:
               | Just an idea but what about making it easier for folks to
               | remember to use you search somehow!?
               | 
               | I like and use Duck Duck Go's !bangs [1] all the time,
               | maybe try to add your site with a rememberable name.. may
               | I suggest !garlic ?
               | 
               | [1] - https://duckduckgo.com/bang [2] -
               | https://duckduckgo.com/newbang
        
               | GarlicToum wrote:
               | Just submitted it, let's see what happens!
        
           | citizenkeen wrote:
           | (1) What is it?
           | 
           | (2) Do you have a bang on DuckDuckGo? I'm pretty aggressive
           | with bangs, and I suspect a lot of DDG users end up being
           | aggressive with them as well.
        
             | GarlicToum wrote:
             | Linked above, I didn't know you could just submit a random
             | site to a DDG to be included in bangs
        
             | [deleted]
        
         | dehrmann wrote:
         | I think you're unfairly putting blame on Silicon Valley.
         | Publishers were only able to produce high-quality content
         | because, with no conversion metrics, advertisers were willing
         | to overpay for placement. Tech undermined publishers' revenue,
         | but what it revealed was that people don't actually want high-
         | quality journalism, they want entertainment, and they're
         | definitely not going to pay a premium for it. This was hidden
         | behind publishers' business model.
        
           | nemothekid wrote:
           | > _Publishers were only able to produce high-quality content
           | because, with no conversion metrics, advertisers were willing
           | to overpay for placement._
           | 
           | This implies that big budget advertisers (the CPGs, like Coke
           | and P&G), are buying Google/FB because they have better
           | conversion metrics. That isn't true today; only SMBs and
           | gaming companies care about conversion metrics. There are
           | interns in LA/NY probably collectively spending millions on
           | FB for P&G and only reporting the number of likes back to
           | their bosses. Google and FB has never meaningfully delivered
           | on conversions past anything like app downloads.
           | 
           | Tech undermined publisher's revenue because the internet
           | cratered distribution costs. Advertising revenues for big
           | media crashed because the eyeballs moved away, not because it
           | was any less efficient.
        
             | dehrmann wrote:
             | > the CPGs, like Coke and P&G
             | 
             | Did any of these heavily buy newspaper ads before 2000?
             | Definitely TV, possibly magazine, but newspaper? I just
             | don't remember seeing ads for Tide in newspapers.
        
         | jayd16 wrote:
         | I guess there's also the rise of influencers in the mix here.
         | The commoditization of publishing means content creators can
         | more easily work independently.
        
       | blunte wrote:
       | Google search results are garbage, at least from a developer's
       | perspective.
       | 
       | Most of the results are poorly formatted content "gathered" from
       | stackoverflow, github, quora, etc.
       | 
       | And from a "person who wants to see an image" perspective, Google
       | is purely a gateway to Pinterest or Gettyimages.
        
       | liveoneggs wrote:
       | do google search engineers use ad blockers?
        
       | noduerme wrote:
       | Hey, smart people: It's called _CURATION BY HUMANS_.
        
       | bombcar wrote:
       | Just give users the ability to blacklist domains when searching;
       | pretty soon you'll have a decent list of what users consider
       | worthless.
       | 
       | And pintrest would die.
        
         | krono wrote:
         | uBlock Origin static filters to the rescue!
         | 
         | Block results from specific domains on Google or DDG:
         | google.*##.g:has(a[href*="thetopsites.com"])
         | duckduckgo.*##.results > div:has(a[href*="thetopsites.com"])
         | 
         | And it's even possible to target element content with regex
         | with the `:has-text(/regex/)` selector.
         | google.*##.g:has(*:has-text(/bye topic of noninterest/i))
         | duckduckgo.*##.results > div:has(*:has-text(/bye topic of
         | noninterest/i))
         | 
         | Bonus content: Ever tried getting rid of Medium's obnoxious
         | cookie notification? Just nuke it from orbit:
         | *##body>div:has(div:has-text(/To make Medium work.*Privacy
         | Policy.*Cookie Policy/i))
        
         | li2uR3ce wrote:
         | Google used to have such a feature.
         | 
         | It would be nice if Google would ask you the simple question:
         | "Did you find what you're looking for?" Instead they rely on
         | the assumption that users only stop looking when they've found
         | what they're looking for.
         | 
         | These days, there's a reasonably high chance that I quit
         | looking because I gave up in futility--not because I found what
         | I was looking for.
         | 
         | It's also the case that there's no way to train Google not to
         | omit search terms or generalize them to the point of
         | uselessness.
         | 
         | I really wish abusive SEO were the only problem but it's far
         | from the case. Search results being crappy is a cumulative
         | effect. You could solve SEO spam and I'll still not be able to
         | find a USB SuperSpeed cable because it gets generalized to "usb
         | cable" and there are a gazillion more charging cables than
         | there are SuperSpeed cables.
         | 
         | Used to be that you could quote things to indicate that you
         | really meant it. That's fuzzy now too. Every time we figure out
         | how to circumvent the bad results, features are removed.
        
       | [deleted]
        
       | yumraj wrote:
       | If I remember correctly, it used to be called About.com - with
       | categorized and human curated links.
       | 
       | It was big during the dot com days, but withered after Google.
       | 
       | Interestingly, I do think that that model may need to be
       | revisited.
       | 
       | Edit: I feel that Reddit is filling some of this need, at least
       | for things like Vaccuum and Espresso machines with dedicated
       | spaces.
        
         | celestialcheese wrote:
         | It's funny you mentioned About.com. Or as it's known today
         | DotDash AKA IAC.
         | 
         | It's easily one of the largest SEO players out there, and
         | they've been on a buying spree recently with their purchase of
         | Meredith. The quality of the content has gotten better, but
         | it's still a monster of an SEO optimized content machine.
        
           | yumraj wrote:
           | Yes, I know nothing about what they're doing today. In fact
           | my pihole blocks them, must be in some list. So I'm pretty
           | sure they're crap today.
           | 
           | I just remember the concept from way back.
        
       | mrkramer wrote:
       | Good idea Paul I had similar one but no way you would do it
       | manually. Machine learning algorithms need to detect spam not
       | people because that way search engine can't scale. If people were
       | marking what's good content and what's not such search engine
       | would be reduced to content curation not organic search and
       | discovery.
        
       | lubesGordi wrote:
       | How about a platform for curation. Curators who know a subject
       | well can link to content that looks good to them. Search goes
       | through the curators, people can favorite certain curators. Lots
       | of people like to curate. This is a better idea than trying to go
       | after spam.
        
         | li2uR3ce wrote:
         | 301 Battlefield moved: curator spam.
        
       | nanna wrote:
       | If you were to start a search engine, what stack would you use?
        
         | aantix wrote:
         | Typesense looks easy to use.
         | 
         | But with 3x memory needed for the indexes, the server costs
         | probably aren't going to be bootstrap'able.
         | 
         | Especially for a "small" crawl of a billion web pages, event at
         | just 10k per page.
        
           | marginalia_nu wrote:
           | Eh, I run a 100M-index off consumer hardware in my living
           | room. Very doable if you avoid bloated off the shelf
           | solutions.
        
             | aantix wrote:
             | What search software do you run?
             | 
             | What sort of memory and space do you have on the single
             | server?
             | 
             | What's the average document size that you index?
             | 
             | Genuinely curious on how doable a modern search engine is
             | on modern hardware.
        
               | marginalia_nu wrote:
               | I rum all custom software, I feel most off the shelf
               | solutions aren't very resource effective.
               | 
               | The server has 128 Gb RAM and the index currently fits on
               | a single 1 Tb SSD + an Optane drive of 480 Gb.
               | 
               | I find the average document to clock in at 7 Kb, in terms
               | of raw HTML. In the index that's, dunno, probably less
               | than 1 KB/doc.
        
       | jrockway wrote:
       | To some extent, I worry that the problem with search engines is
       | that there isn't any data worth returning. Yesterday's thread
       | talked a lot about reviews. Writing a review is hard work that
       | requires deep domain expertise, experience with similar products,
       | and months of testing. If you want a review for something that
       | came out today, there is no way that work could have been done,
       | so there simply isn't anything to find. Instead you'll get a list
       | of "Best TVs 2021" or whatever, with some blurb and an affiliate
       | link, not an actual review. That's what people can make for free
       | with a day's notice, so if you write a search engine that
       | discards those sites, that's fine, you'll just return "no
       | results" for every interesting query.
       | 
       | I guess what I'm saying is that if you want better reviews, you
       | probably want to start writing reviews and figuring out how to
       | sell them for money. Many have tried, few have succeeded. But
       | there probably isn't some Javascript that will fix this problem.
        
         | winternett wrote:
         | The main driver of SEO spam, and online scams in general are
         | countries that have little to no opportunity for economic
         | growth. There are literally millions of Internet savvy people
         | who would be able to survive on what we would consider barely
         | anything profit-wise in adsense revenue, which also usually
         | pays out in US dollars. In this currently terrible global
         | economy, desperation turns the most intelligent minds bound
         | into poverty into bootleg SEO engineers, online catfishers,
         | scammers, and ransomware creators, and God bless their
         | creativity...
         | 
         | Instead of creating income opportunity and crowdsourcing people
         | in foreign countries for common (more positive) good, companies
         | rarely create opportunities for the people who would normally
         | turn into spammers and scammers, and that's what creates an
         | endless army of people that constantly destroy online
         | communities like Soundcloud, FaceBook, Twitter, and TikTok with
         | stolen content, trend scams, fake news, and spam messages.
         | 
         | Google search has been invalidating and subverting their most
         | accurate search results based on abstract SEO rules for quite
         | some time now. It was likely done so that they could implant
         | paid ads first into content, because that makes them the most
         | profit. Doing that has destroyed their reliability and
         | reputation as a search service leader, and they're never going
         | to admit it, but payola is the undertone that is ruining their
         | search results... There is a certain type of corruption that
         | occurs when a company turns away from upholding customer
         | service and value towards a monopolistic "profit-first economic
         | stranglehold" business model... That strategy never ultimately
         | works out well for both companies AND users in the long run.
         | The next leader will likely be a search that avoids the same
         | pitfalls until they themselves become a profit-driven monopoly.
         | 
         | There is no algorithm that will usefully and fairly counter
         | spam based on desperation, companies need to realize that
         | creating opportunity for people to operate equally on their
         | platforms is the best move, otherwise, spam will drive any
         | community of rule abiding users away or into madness.
        
           | mrkramer wrote:
           | >The main driver of SEO spam, and online scams in general are
           | countries that have little to no opportunity for economic
           | growth.
           | 
           | Not quite right because cybercrime aka hacking, cracking,
           | spamming etc. originated in US not in East Europe, Russia and
           | third world countries which are dominating hacking and
           | spamming scene today. Main motivation of cybercriminals is
           | quick money and ease of getting away with it since you are
           | not physically committing a crime but
           | digitally/electronically.
        
             | adventured wrote:
             | It is quite right. Those are the main drivers and it's due
             | to lack economic opportunities.
             | 
             | Hacking heavily originated in the US because the US
             | practically built the entire modern tech universe from the
             | ground up. The US was far out in front when it came to
             | utilizing the Internet and the Web, so of course unethical
             | people in the US pioneered various types of online crime,
             | the US was the early adopter.
             | 
             | If you're an elite engineer in the US, you can make
             | millions of dollars doing legal work for big tech. It helps
             | in a big way to drain the labor pool as it pertains to
             | criminal activity online. You generally can't do that today
             | in the countries that dominate SEO spam, online scams, etc.
             | In those countries elite engineers suffer terrible wages
             | doing legal work compared to what they should be able to
             | earn for their abilities; commonly they can earn a lot more
             | doing illegal work instead, it's a very potent lure.
             | 
             | You're an elite engineer in Russia, top ~1%-3% globally.
             | What do you do? Earn several thousand dollars per month
             | doing legit software development in Russia (with either
             | zero or little consequential equity compensation); flee
             | Russia for a more affluent market; or do illegal work where
             | the rewards can be dramatically greater. It would be
             | difficult to resist if you were unable or unwilling to
             | leave Russia.
        
               | mrkramer wrote:
               | >You're an elite engineer in Russia, top ~1%-3% globally.
               | What do you do? Earn several thousand dollars per month
               | doing legit software development in Russia
               | 
               | Become software entrepreneur?
               | 
               | And many international software companies have software
               | development teams and presence in Russia.
        
               | emerongi wrote:
               | > Become software entrepreneur?
               | 
               | Exactly. Hacking for hire, making cheats, botnets, SEO
               | farms, selling exploits and hacked social media accounts;
               | practically anything you can think of that US software
               | engineers can't be bothered with, as they already earn a
               | healthy salary. That _is_ entrepreneurism.
        
               | mrkramer wrote:
               | I wasn't speaking about that kind of entrepreneurism but
               | about making legal software and legal web services that
               | solve problems and are useful. So many Russian hackers
               | got arrested when they travelled somewhere outside Russia
               | and now they are serving 10 or 20 year sentences in US
               | jails.
        
               | PeterisP wrote:
               | > So many Russian hackers got arrested when they
               | travelled somewhere outside Russia
               | 
               | How many? 20? 30? IMHO the cases are rare (and get widely
               | publicized whenever that happens, creating a
               | disproportional visibility), you get a couple captures
               | per year but the number is just a tiny fraction of the
               | actual participants, more like an exception than the
               | rule.
        
               | PeterisP wrote:
               | In places with a less-established legal system it's
               | harder to make money by above-board entrepreneurship and
               | keep it instead of handing it over to local strongmen
               | (two colorful examples that have stayed in my memory and
               | unlike many others have become public and have also been
               | described in non-Russian media -
               | https://www.independent.co.uk/news/world/europe/valery-
               | pshen... and
               | https://abcnews.go.com/International/wireStory/us-
               | embassy-ru... , but of course those are the exceptions
               | because the usual result is complying with threats and
               | handing over your business or most of it). But it's not
               | really about Russia, it's a general issue with parallels
               | in other countries as well. And of course, there's the
               | issue of the local market; the financial advantages for a
               | skilled tech person going towards entrepreneurship
               | legitimately are less attractive in most places compared
               | to USA; heck, even EU potential tech entrepreneurs often
               | just go 'across the pond' to start their business.
               | 
               | If you can't get a work visa to a first world country,
               | you do have less options than someone already living
               | there; and the salaries offered by first-world
               | "international software companies" in their remote
               | subsidiaries tend to be 'according to local market rates'
               | (the same "several thousand dollars per month" mentioned
               | by the parent poster is a decent rate) and thus not as
               | competitive with "black entrepreneurship" which pays
               | according to global standards.
        
         | ALittleLight wrote:
         | Maybe there's a hardware engineer out there with a decade of
         | experience shipping and reviewing TVs publishing his thoughts
         | to his blog. He's heard about the latest and greatest and he's
         | offering his expectations based on the promotional material,
         | his friends at the company, history from the brand - whatever.
         | Maybe, if he's built a reputation of good reviews he's got a
         | big audience. Big audience? TV brands give him an early review
         | model.
         | 
         | Modern Google actually makes the content problem worse. When
         | our notional TV blogger is starting out in our world he
         | publishes two or three essays, nobody reads them, he stops
         | putting in do much effort, posts occasionally, dwindles off. In
         | a world with a perfect search engine his early essays get some
         | attention to encourage him to post more, a feedback loop
         | starts, and before you know it he's a full time TV reviewer.
        
         | mc32 wrote:
         | Reviews are a special category. It suffers from a couple of
         | issues:
         | 
         | 1. You need to have enthusiastic reviewers (people who care
         | enough about a product category to review them semi-throughly.)
         | 
         | 2. Proper reviews can take time and may need domain knowledge.
         | 
         | 3. Competition. When there were one or two people doing reviews
         | on some category of products, maybe the economics worked out.
         | Once you have hundreds or thousands competing with you, the
         | time demand may be overwhelming and not worth it.
         | 
         | 4. If you are a trusted reviewer or site, you will get economic
         | pressure to review a particular thing or brand you may not like
         | very much but the money may be good. So you will begin to
         | experience conflicts of interest.
         | 
         | 5. If reviews are just a hobby and not a way to make money,
         | eventually you will slow down or move on, opening a hole that
         | gets filled up by spammers.
         | 
         | 7. Some things are timeless (a pipewrench, let's say) and some
         | are seasonal (consumer electronics, toys, etc). The former
         | deserves a through review but the latter doesn't deserve as
         | much but it may get the bulk of interest due to seasonal
         | demand). Does it really matter if the latter's latest iteration
         | has 2% increased battery life to discuss?
         | 
         | I'm sure there is a lot I didn't think of. But it's a doomed
         | category, unless people are willing to pay for professional
         | reviews (Consumer Reports types and other independents).
        
           | eatonphil wrote:
           | I pay for Consumer Reports. I'd encourage more people to too.
           | I don't trust it completely but it's a good companion to
           | manual searches on Reddit/HN/car forums, etc.
        
             | aemreunal wrote:
             | Someone pointed out yesterday on that other search thread
             | that [most?] libraries provide free access to Consumer
             | Reports through a membership. I just looked at the San
             | Francisco Public Library and it does indeed give me access
             | to the magazine and a searchable database.
        
           | chongli wrote:
           | _If reviews are just a hobby and not a way to make money,
           | eventually you will slow down or move on, opening a hole that
           | gets filled up by spammers._
           | 
           | In my experience, the best reviewers are hobbyists. The thing
           | is, it's not _reviews_ that are their hobby. Rather, they
           | review the products go along with their hobby.
           | 
           | So, for my hobbies (espresso and aquariums), there are tons
           | of easily accessible reviews on all kinds of aquarium gear
           | and coffee machines, grinders, etc. On the other hand, nobody
           | does plumbing or HVAC as a hobby (that I know of) so it's
           | very difficult to find high quality reviews of water
           | softeners or furnaces. It takes a very special rare sort of
           | person who would install these things just to review them.
           | The closest thing I could find was this video [1] on a DIY
           | water filtration system by an RV/off the grid type hobbyist
           | (from what I can tell).
           | 
           | [1] https://www.youtube.com/watch?v=WCC4TOYYGF8
        
             | treis wrote:
             | People like to talk about their work too. There's plenty of
             | those sort of reviews out there. Mostly on reddit because
             | like others have mentioned organic search results are
             | completely gamed.
        
           | giantrobot wrote:
           | > Reviews are a special category. It suffers from a couple of
           | issues
           | 
           | Review sites suffer from a singular problem. They are
           | overwhelmingly SEO spam content farms. People go find some
           | product niche and pay some Fivvver/whatever people to write
           | literally fake reviews of products. Because they're pulling
           | all the SEO tricks and are in a niche category they shoot to
           | the top of search results for that niche.
           | 
           | Their reviews _sound_ realistic and viable but they 're pure
           | fantasy. The writers never touch the products being reviewed.
           | Many times they'll pull details from Amazon listings
           | (including factual errors) and even other "review" sites.
           | 
           | Once they get established in their niche they'll accept paid
           | placement from product manufacturers without marking it as
           | such. A single scammer might own dozens of these sites, even
           | supposedly competing ones.
        
         | snth wrote:
         | > If you want a review for something that came out today, there
         | is no way that work could have been done, so there simply isn't
         | anything to find.
         | 
         | That's not strictly true, given that reviewers are often sent
         | pre-release versions of things in order to do that work before
         | release day.
        
           | loceng wrote:
           | Not sure why you're being downvoted, as you're correct -
           | however to point out there seems to be a trend where
           | reviewers are only given pre-release versions if they
           | practically always give favourable reviews to the products
           | they list, especially if they're provided the product for
           | free; there doesn't have to be an express relationship or
           | contract between a reviewer and a company either, it's the
           | reverse of how Bill Gates apparently has given $200 million+
           | to different news channels/media organizations - and so
           | they're less likely going to as freely share negative news
           | about him or perhaps his organizations, so then ; this makes
           | me think, similarly to how stocks being sold by CEOs (etc)
           | must be pre-planned to avoid shenanigans like market
           | manipulation, that anyone giving large sums of money to any
           | media/journalism organization must divide the amount up over
           | 20-40+ years, so that organization at least has a runway and
           | not dependant on larger "dopamine hits" at shorter intervals.
        
             | nradov wrote:
             | I trust DC Rainmaker's reviews of fitness tech products
             | because he always returns products back to the
             | manufacturers after writing reviews. So there's no conflict
             | of interest based on free products.
             | 
             | https://www.dcrainmaker.com/product-reviews
        
               | ceejayoz wrote:
               | If companies don't like his reviews, they'll stop sending
               | review units. That hits both in the pocketbook and the
               | race to be one of the earlier reviewers of a new product.
               | Reduced conflict, perhaps, but not none.
        
               | whakim wrote:
               | This presupposes that companies think their products are
               | bad. If you have (what you believe to be) a good product,
               | you definitely want DC Rainmaker to review it. I think
               | this is a reasonably general point across industries -
               | companies _want_ to get their products into the hands of
               | the most reputable reviewers.
        
               | bluGill wrote:
               | Depends, if someone is popular you can't afford not to
               | have them review your things. A a certain point a bad
               | review will still generate more money than no review at
               | all. Few reach that level though, most reviewers don't
               | have that much following.
        
               | tchvil wrote:
               | In DC Rainmaker's case it is probably the opposite. A
               | fitness product not reviewed by him is a bad signal.
        
               | nradov wrote:
               | If companies don't send him review units then he just
               | buys them retail. He has already done this for many
               | products.
        
               | ceejayoz wrote:
               | Yes, I'm aware. That's less money in his pocket, and less
               | ability to have the review be available on or before the
               | product launch. There's still some conflict of interest,
               | even if it's lessened.
               | 
               | Only purchasing review units at retail would remove this
               | conflict.
        
             | _delirium wrote:
             | Yeah, that's been a problem with reviews for a long time.
             | In fact it's what Consumer Reports used initially to
             | differentiate themselves: their "thing" was that they only
             | reviewed products bought anonymously at retail (no free
             | samples or manufacturer-provided review items) and didn't
             | accept any advertising from manufacturers either.
             | 
             | Sites that receive free review samples and are supported by
             | affiliate links are kind of the exact opposite model.
        
         | fassssst wrote:
         | Companies like Sweetwater do this right. They have "sales
         | engineers" that help you find what you're looking for over the
         | phone or text message or email. It probably doesn't scale but
         | as a customer, I don't care as it saves me so much research
         | time and I consistently get what I'm looking for.
        
         | xondono wrote:
         | There's a lot of good quality reviews on YT on launch date of
         | pretty much anything these days.
         | 
         | It's not a problem of doing the review, it's that there's not
         | much of a market for written reviews, most people would rather
         | watch a video instead.
        
           | fauigerzigerk wrote:
           | Interesting. I never watch video reviews. They're painfully
           | slow and impossible to search.
        
           | midasuni wrote:
           | There's not much money in written reviews, and people can't
           | find them amongst the automatically written SEo/affiliate
           | crap
        
           | Karrot_Kream wrote:
           | > It's not a problem of doing the review, it's that there's
           | not much of a market for written reviews, most people would
           | rather watch a video instead.
           | 
           | I'd say it's more that YouTube offers a clearer path to
           | content monetization than text does. YT is a much more
           | lucrative platform for the same level of effort as SEO for
           | their text blogs.
        
           | basch wrote:
           | How do you compare 3 or 4 videos before watching them?
           | Watching video reviews of the reviewers?
        
             | xondono wrote:
             | Like most things, it's a reputation ladder.
             | 
             | There's top channels like LTT and if what you are looking
             | for is out of their niche, you look for the biggest
             | channels in that niche and go mostly by association (who
             | they have made collaborations with,..).
             | 
             | EDIT: of course the big win of video reviews is that you
             | can _see_ the thing working.
        
         | verve_rat wrote:
         | This resonates with me a lot. A few months back I upgraded my
         | desktop's insides. New motherboard, CPU, graphics card, etc.
         | That was the first time in about seven years I've gone looking
         | for review for that sort of stuff.
         | 
         | I remember doing the exact same thing in the past and being
         | overwhelmed with information. The detail and data in reviews
         | would take a long time to collate and make sense of. But this
         | time even the big name sites seem to be much shallower. Less
         | models reviewed, less testing and benchmarking, more
         | regurgitated press releases and other news.
         | 
         | Last time it took me a while to sort out all of the
         | information, this time all my time was spent trying to find any
         | that wasn't 100% fluff.
        
         | NumberCruncher wrote:
         | This resonates with my experience. A couple of years ago I
         | invested more time than I proud of into buying the right
         | bluetooth headset for me. I have found a site with pretty
         | detailed reviews and tested their reviews standing in stores
         | and trying dozens of headsets out. I also bought 3 headsets on
         | Amazon and sent back all of them later. My impression was that
         | the reviews on this particular site are 100% unbiased, where
         | all other reviews I read just want to sell whatever product is
         | in focus.
         | 
         | I wonder how a search engine could distinguish between "honest
         | & professional" and "fake & amateur" headset reviews without
         | having a head and two ears?
        
         | grvdrm wrote:
         | Well said - it is among my biggest annoyances with the web.
         | Reviews are almost always packaged into best-of or top-X lists.
         | The quality of the Wirecutter is gradually trending down but it
         | is still the website I use to find the "best" of something. I
         | don't have to waste time sorting through hundreds of list-spam
         | sites.
        
         | thereddaikon wrote:
         | Its a double edged sword. Reviews take effort so you want to
         | make them easier for customers to write. But making them easier
         | for your users also makes them easier for those trying to game
         | the system. This is why Amazon's product reviews are useless as
         | well as pretty much any other community based review system.
         | 
         | But on top of that you do have the problem of whether or not
         | someone is really qualified to write a review. So Joe User
         | thinks product X is good. What is their metric for good? It
         | reminds me of an LTT review of the Amazon TV from a few months
         | ago. They gave it an awful rating but noted that the reviews on
         | the product page were generally very positive. And their
         | reasoning was that the people buying these TVs and reviewing
         | them didn't have a good comparison point for what a good TV
         | actually is. They are probably comparing it to a much older and
         | less advanced product not to a contemporary one.
         | 
         | So then you think the answer must be get reviews from industry
         | related media. But then you fall into the classic problems of
         | unethical journalists or simply ones that are out of touch.
        
           | Spivak wrote:
           | It's not a question of whether someone is qualified or not,
           | everybody is more than qualified to write about their own
           | feelings toward a product they bought or service they used.
           | In fact no one is more qualified to talk about their own
           | feelings and experience. How useful that review is to you is
           | a combination of the writing quality and depth, and how
           | similar the reviewer is to your own experience and
           | preferences.
           | 
           | Professional critics usually try to distinguish themselves by
           | producing well written in-depth reviews but not from their
           | own perspective but that of a hypothetical everyman who,
           | ideally, is similar enough to a critical mass of their
           | audience.
           | 
           | So it always interests me when people complain about popular
           | gaming review sites being out of touch because almost always
           | it's the reader that's out of touch but doesn't realize their
           | bubble. It's not an absolute rule but I'm in enough niche
           | hobbies to realize that my desires for products are way out
           | of whack.
        
         | mminer237 wrote:
         | There are still good reviews. For TVs, RTINGS produces high-
         | quality reviews (although they're not listed super
         | straightforwardly). For computer internals, AnandTech does even
         | better. You don't have to talk about the absolute latest
         | product out that same day for you to have quality reviews of
         | other options in the meantime.
         | 
         | Everyone just makes blogspam because its far less work than
         | actually buying products and developing expertise and testing
         | them and writing out a whole thorough review. Google's
         | algorithms just can't tell a quality review from a surface-
         | level, uneducated take.
        
         | mikeryan wrote:
         | I always liked the wire cutter for just kind of cutting through
         | the crap and saying "this is the one". I wonder if we need some
         | sort of thing for reviews where humans filter out the sites
         | that are credible.
         | 
         | It's a bit funny because this we sort of done by Jason
         | Calcanis' Mahalo back in the day - but maybe he was just ahead
         | of the SEO curve.
        
         | loceng wrote:
         | Developing common standards/protocols for everything required
         | for a quality review vs. a "candy" or shallow hype review would
         | be a good place to start, making it culture that everyone
         | educated knows about to follow - and then they can only go to
         | or support reviewers who list what testing protocols they
         | follow.
         | 
         | Industry has already done this with the "food pyramid" -
         | influencing, capturing governments to make the food pyramid
         | more based on economic reasons and much less on science - with
         | the government putting it out and distributing it into
         | schooling of different levels, giving it an unearned or
         | undeserved authority which then people blindly trust/follow -
         | not understanding that or when systems and their output or
         | oversight have been captured; why the pandemic bringing the
         | classroom home via Zoom, so parents could see/hear the learning
         | material has outraged many parents - an example I've heard,
         | where white children are being taught to feel guilty about
         | their 'white privilege', or parents being upset their children
         | are being taught at a very young age that they can decide what
         | gender they are; I'm not stating what I believe here, just
         | giving examples I've heard of.
         | 
         | This capturing of the government is why I think ultimately the
         | government should be developing and maintaining such platforms,
         | as per law, and requiring individuals and organizations to in
         | real-time add and update their data (simply example being
         | restaurants, their menu's ingredients, their open hours) - in
         | part to de-risk the government having an unnatural power as
         | "the single arbiter" of truth, perhaps instead to de-risk
         | capture that the government funds multiple independent
         | organizations at the federal level - that States can decide
         | which ones they follow, if necessary, part of why States exist
         | - to de-risk the potential capture of the Federal umbrella;
         | however the system is in an imbalanced, broken, captured state
         | - with the duopoly evolving to be more extreme lead or formed
         | by the establishment, with a broken voting system in arguably
         | most countries of the world, and mainstream media being
         | captured by for-profit industrial complexes that fund MSM
         | through ad revenues - which further develop or mould our
         | culture and narratives/talking points and beliefs, whether
         | truthful or not; without fixing these the other
         | platforms/systems excelling won't be possible.
        
         | frenchyatwork wrote:
         | I think one of the fundamental things that make search work
         | well about 1-2 decades ago was that web sites would link to
         | each other, and that those links could vaguely correlate with
         | reputation. There were link spammers, but there was actually a
         | some decent organic content as well.
         | 
         | What's happened since then is that almost all the normal
         | "people linking to things they like" has gone behind walled
         | gardens (chiefly Facebook), and vast majority of what remains
         | on the open web are SEO spammers.
        
           | prepend wrote:
           | I wish FB would be more open, but since they have all this
           | walled garden info, are they well placed to start a competing
           | search engine? Would be interesting if their activity could
           | help filter out seo hackers.
        
             | kritiko wrote:
             | FB search seems to have gotten worse and worse. Unless I
             | can remember the specific Group where I saw something, it's
             | very unlikely that I can find it again. And they know which
             | posts I've been highly engaged on...
        
           | Mezzie wrote:
           | Yup, early Google relied on a lot of unpaid , unseen human
           | intervention and choices. I ran some weblinks and curatorial
           | sites during the search wars, and PageRank could only work
           | because there were people behind the sites choosing links
           | based on their usefulness to their audience.
        
           | wodenokoto wrote:
           | Why has blogs and articles stopped linking to things? I'm
           | reading a restaurant review site, and they won't link to the
           | restaurant. The chef name is a link to a list of all articles
           | tagged with the chefs name, rather a wikipedia link or
           | something useful that can tell me who that person is.
        
             | ufo wrote:
             | There aren't as many blogs now as there used to be.
        
               | adventured wrote:
               | That will get worse yet most likely. Younger people no
               | longer produce public text to the extent they did prior
               | to the the smartphone heavy era. Supply of that blog
               | style content will continue to dwindle as the producers
               | age out. I'm sure there's a stability point it may reach,
               | of course, because some tiny percentage of people will
               | always want to write long-form.
               | 
               | Younger people TikTok, they Instagram, they chat in
               | private conversations with eachother, they occasionally
               | post short messages in walled gardens like Facebook, they
               | YouTube, they listen to music, they watch Netflix & Co.
               | That's what they do. They do not persistently write
               | LiveJournals, Tumblrs, blogs. That pre video/audio-
               | focused era is over and it's not coming back (even if
               | there's occasionally a bubbling up of hipster fakery
               | centered around how cool it is to write text).
        
               | nkrisc wrote:
               | I find that claim surprising considering how many more
               | people there are simply using the internet at all.
               | 
               | Fewer unique blog domains due to "blogging" sites that
               | aggregate users? Sounds plausible. Fewer people blogging
               | overall? I'm not convinced yet.
        
               | lethologica wrote:
               | I wonder if the majority are moving to vlogging instead?
        
               | ufo wrote:
               | I think the bigger issue now is that more content is
               | inside social media "silos" like twitter, instagram or
               | youtube. I don't have the numbers though.
        
               | Volker_W wrote:
               | Why is this a problem? Can't google index social media
               | silos?
        
               | bluGill wrote:
               | Which ones. They can index their own, but for the others
               | only the public stuff. Facebook has a lot of things
               | private so nobody can see them except your friends. (they
               | are by no means perfect, but a lot of things are private
               | and only seen by friends - most of it isn't of interest
               | to a search engine anyway but comments of the form "I
               | love X product" could in a perfect world be indexed as a
               | sign of what people find good)
        
               | Baeocystin wrote:
               | I'd believe it. As an IT consultant, I interact with a
               | lot of people who are semi-techs themselves- mostly small
               | business owners who are used to wearing a lot of hats,
               | and also the type to have been motivated to run their own
               | personal blogs about
               | diving/photography/conlangs/quilting/gardening/whatever
               | their personal hobbies are.
               | 
               | Ten years ago, the majority(!) had at least something up
               | and running, where they would post essays, thoughts,
               | whatever came to mind.
               | 
               | Nowadays? All gone. All! When asked why, the answer
               | almost always is along a mix of ever-increasing negative
               | feedback and harassment from randos, and aggressive
               | automated spamming of their forums. Loss of the pseudo-
               | anonymity plays a large role as well. Many have deleted
               | years' worth of work, simply because they are afraid of
               | someone trolling through their posts to find something to
               | harass them with.
               | 
               | I was never a blogger myself, but I am sad about the
               | change. There was a lot of good stuff out there for a
               | while, and sometimes it just plain made me happy to read
               | someone joyfully nerding out on a favorite subject of
               | theirs.
        
               | nkrisc wrote:
               | I think a lot of people are still writing this kind of
               | content, but you have to look elsewhere for it: Reddit,
               | Facebook, Twitter; to name the obvious ones. It's also
               | harder to find, but you can find all kinds of personal
               | content written in comments and posts on these sites.
        
               | Baeocystin wrote:
               | I realize that this is a hard thing to 'prove', but I am
               | personally certain that the amount and quality of such
               | things has dropped significantly from a decade ago.
               | 
               | Not to zero. You can still find things tucked away in a
               | post on reddit or the like. Almost never, as far as I
               | have experienced, on Facebook or its ilk, as the
               | affordances are different. I genuinely think there has
               | been a loss.
        
               | kritiko wrote:
               | I frequently append site:reddit.com to searches for a
               | niche search term these days. I think a lot of people who
               | would have blogged or commented on blogs are posting
               | there instead.
        
               | prox wrote:
               | I wonder if it would be possible to have a big filter
               | button "commercial" or "non-profit" or something along
               | those lines. So you get results that are not deemed
               | commercial or are.
               | 
               | Don't know how hard it would be to know which is which.
               | Maybe non-commercial : don't run ads, don't sell a
               | product or service and provide information only.
        
               | freeflight wrote:
               | _> I find that claim surprising considering how many more
               | people there are simply using the internet at all._
               | 
               | Most of these many more people are mobile users, where
               | creating long-style text content can be quite bothersome.
               | 
               | What ain't bothersome, with a smartphone, is taking
               | pictures and videos to slap filters over them, alas
               | that's why we are where we are with TicToc, Instagram and
               | Twitter dominating large parts of the web.
               | 
               | It's even noticeable in a lot of online discussions with
               | text outside of these communities; The average length of
               | forum posts feels like it's gotten way shorter over the
               | decades. People have less attention to read anything that
               | looks longer than a few sentences, often declaring it a
               | "wall of text" based on quantity of text alone.
               | 
               | Imho it's a big part of what drives misinformation; Doing
               | any kind of online research on a small phone screen is
               | extremely bothersome compared to the workspace an actual
               | computer/laptop, particularly with multi-monitor, gives.
               | 
               | There's also the difference in attention; When I sit down
               | at my laptop/desktop, I actively decide to spend and
               | focus my attention on that task and device.
               | 
               | While smartphone usage is mostly dominated by short
               | bursts of "can't do anything else right now", I don't
               | chose to take out my phone and surf the web, it's
               | something I do when I'm stuck in some place with nothing
               | else to do and no access to an actual computer.
               | 
               | But for the majority of web-users [0], that smartphone
               | access to the web is all they know, which then ends up
               | heavily shaping the ways they consume and contribute to
               | it.
               | 
               | [0] https://techjury.net/blog/what-percentage-of-
               | internet-traffi...
        
               | lmkg wrote:
               | I heard an interesting theory the other day: blog
               | viability declined because Google killed Reader. Which
               | indirectly ends up poisoning Google's biggest well, since
               | blogs are an important source of relevant cross-domain
               | links.
               | 
               | I'm somewhat skeptical, it seems a little _too_ poetic to
               | blame Google 's ultimate downfall on a decision that was
               | notably hated at the time. But it's plausible. If you
               | want it to be a conspiracy theory, you can posit that
               | killing off independent blogs was the _intent_ , to
               | convince bloggers to migrate to Google Plus.
        
             | mtgx wrote:
        
             | orcasushi wrote:
             | Average websites goal is now to keep you on them as long as
             | possible. According to some metric folks, the longer you
             | stay on a website the more money you spend there. Linking
             | to another website destroys that metric.
             | 
             | Also if you are going to make a purchase somewhere, any
             | website would try to get a cut of the money you spend by
             | actually sending referral links to the product. So small
             | websites that do not allow this service will not get linked
             | so much.
             | 
             | On a metalevel it is thus that links or connections between
             | items are information. Information is money. And as soon as
             | that became evident links and connections also became more
             | scarce.
        
             | ijidak wrote:
             | Because, years ago, linking to lower reputation sites would
             | drain your page rank.
             | 
             | So everyone worried about SEO became afraid to link to
             | anything except:
             | 
             | 1) Their own website 2) High reputation sites like NYTimes,
             | etc.
             | 
             | It's sad. Makes it harder to navigate the web.
        
               | mrkramer wrote:
               | Wouldn't it be reasonable from Google to show how their
               | ranking algorithms work so all webmasters and content
               | creators now how to behave on the web. Now we have black
               | box that's causing confusion and is misdirecting websites
               | and web users.
        
               | freeflight wrote:
               | _> It 's sad. Makes it harder to navigate the web._
               | 
               | Some would even say it killed the web by centralizing all
               | the content in the hands of a few [0].
               | 
               | Which is the direct consequence of everybody optimizing
               | to better show up on Google/Facebook/Amazon/Microsoft and
               | ultimately even migrating all their hosting to these
               | companies.
               | 
               | [0] https://staltz.com/the-web-began-dying-in-2014-heres-
               | how.htm...
        
               | chongli wrote:
               | Bang on. Saying that "there isn't anything out there
               | anymore" is missing the point: Google's algorithms
               | _created this situation_ , intentionally or not. Before
               | Google, people linked to what they wanted and communities
               | would naturally cluster around topics of interest. Google
               | came in and made reputation into a currency which
               | effectively destroyed all these communities through
               | incentivizing selfishness.
        
               | wussboy wrote:
               | Surely there is just a different algo that could bring
               | about better communities?
        
               | jonathankoren wrote:
               | Different, but not better.
               | 
               | The incentives to game the algo remain. People adapt to
               | the environment.
        
               | amelius wrote:
               | > The incentives to game the algo remain. People adapt to
               | the environment.
               | 
               | Perhaps it could work if the algorithm changed its
               | algorithm all the time.
        
               | kilburn wrote:
               | That's why mechanism design [1] exists as a field of
               | study. The whole idea of that field is to provide the
               | proper incentives to steer the participants towards your
               | objective. Yes, considering they will try to "game" the
               | system however they can.
               | 
               | I'm pretty sure google could do strictly better (i.e.:
               | better in all reasonable accounts) than they do now if
               | they focused on the users' experience instead of revenue
               | for a couple terms.
               | 
               | [1] https://en.wikipedia.org/wiki/Mechanism_design
        
               | marcosdumay wrote:
               | Only if implemented by the monopolist.
               | 
               | People's best chance is stopping using Google and pushing
               | for it to be broken-up.
        
               | Beldin wrote:
               | "When a measure becomes a target, it ceases to be a good
               | measure"
               | 
               | -- Goodhart's Law.
               | 
               | Google's algorithms didn't create this situation; people
               | chasing high Google rankings did. Had Google used
               | completely different algorithms yet became equally
               | dominant, people still would have poured their hearts and
               | souls into getting higher rankings.
               | 
               | Basically, an application of the tragedy of the commons.
               | Or: "why we can't have nice things".
        
               | chongli wrote:
               | But that's taking for granted that Google would have
               | become dominant. Perhaps if they hadn't chosen the
               | algorithm they did then they wouldn't have been as
               | overwhelmingly successful. Instead, I could imagine a
               | world in which there are multiple search engines and none
               | of them are all that good. In fact, that's the world I
               | remember from before Google existed. Search was bad but
               | communities were strong and life was good.
               | 
               | Then Google came along and we all found it a lot more
               | convenient than the bad search engines we were used to.
               | And of course, we all know where that led. In some sense,
               | Google built an 8-lane superhighway and bypassed all the
               | small towns.
               | 
               | We all traded away paradise in exchange for convenience.
               | Now we have neither.
        
               | Beldin wrote:
               | On the glass-half-full side of this: we're getting those
               | communities again! Here on HN, on reddit, for certain
               | topics on various social media (there are pearls there
               | too), on Mastodon, various blog authors, Ars Technica,
               | Quanta, etc. [1]
               | 
               | It's just fragmented - i.e., catering to a specific
               | group. Because if it isn't, it's awesome for 5 minutes
               | and then monetization rot sets in.
               | 
               | [1] None of these work for everyone; conversely, all of
               | these are seen as great things by some and have people
               | who prefer that one thing over others for its quality.
        
               | mrkramer wrote:
               | >Google's algorithms didn't create this situation; people
               | chasing high Google rankings did.
               | 
               | But lowkey Google incentivized such behaviour by not
               | being open and transparent on how exactly their
               | algorithms work.
        
               | B-Con wrote:
               | That would have allowed people to artificially chase
               | rankings even faster and more efficiently. It makes the
               | problem worse, not better.
        
           | inlined wrote:
           | Does this mean that Facebook is the only company well poised
           | to take on google search?
        
           | dpeck wrote:
           | I agree very much with this. It seems that between the walled
           | gardens and also people being so reluctant to have "their"
           | audience leave their site/page/etc the discoverability of the
           | web has dropped dramatically.
        
           | ZetaZero wrote:
           | That's an interesting observation. IMO, we stopped linking to
           | good content because Google was good at finding it. Now
           | Google is suffering, and we need to go back to doing more
           | links.
        
         | hubraumhugo wrote:
         | Search engines are pretty good at solving the problem they were
         | designed to solve, which is "finding pages which contain all
         | the query words". But they are pretty bad at solving the much
         | harder problem of rating the trustworthiness & authenticity,
         | intentions of the owner, monetization of the site, etc.
         | 
         | One possible solution to this could be:
         | 
         | - Let the community vote on the most trusted sources
         | 
         | - Include results from enthusiasts that have little incentive
         | to write biased reviews (Reddit, HN, expert forums)
         | 
         | - Look at the ownership of the site and how transparent they
         | are about it
         | 
         | - Regularly reassess these criteria
         | 
         | This wouldn't scale for a generic search engine, but I'm
         | working on a service that does this for many product
         | verticals/niches.
        
           | darkwizard42 wrote:
           | Agreed here, but in your second bullet, people have great
           | incentive to write good quality reviews on Reddit, HN, expert
           | forums... karma/recognition etc. It just so happens that
           | these "forums" have built in voting systems that they spend
           | time preventing from being gamed so the search engine doesn't
           | have to.
           | 
           | Not sure if this is a good model for a search engine, but it
           | does work to a small degree in those forums.
        
             | onion2k wrote:
             | _people have great incentive to write good quality reviews
             | on Reddit, HN, expert forums... karma /recognition_
             | 
             | Internet points are a terrible reason to write anything.
             | They're completely meaningless. We should all judge
             | comments on their own merit and not because the author has
             | a lot of karma. Apart from mine, obvs.
        
         | sam0x17 wrote:
         | > If you want a review for something that came out today, there
         | is no way that work could have been done, so there simply isn't
         | anything to find.
         | 
         | I think in practice this is actually largely untrue -- with
         | technology products, video games, movies, and just about
         | anything I can think of, most well known reviewers are given
         | early access to the product so that reviews can come out on or
         | before day 1 of general availability. That said this does
         | create a dirth of 100% trustworthy reviews on day 1 since
         | companies are naturally disincentivized from giving early
         | access to reviewers who they know are going to write a negative
         | review.
        
         | ColinHayhurst wrote:
         | "SEO" spam is "Google SEO" problem. So SE ranking Optimization
         | is not (yet) so much a problem for other Crawler/index SEs
         | (Bing, Mojeek, Gigablast). You might say that Amazon (in
         | eCommerce) and TRIP (in Travel) have cracked the problem of
         | combining good/deep Content/Reviews and Category expertise with
         | Search.
         | 
         | We regularly see partnership opportunities with customers
         | interested in our API [0]. I presume Bing see the same, though
         | their terms are more fixed and require you sharing more data.
         | Definitely big opportunities in other categories, which are
         | often squandered through a naive, if understandable route, of
         | choosing a Scrape and index route.
         | 
         | [0] https://www.mojeek.com/support/api/
        
       | deadalus wrote:
       | I also consider Paywalls to be spam. Clicking on a link and
       | finding out that it is paywalled, is a massive waste of time.
        
         | imranhou wrote:
         | Agree that is annoying, but if you start excluding such results
         | then how does one find that type of content?
        
           | deadalus wrote:
           | You clearly label paywalled content with a symbol or image.
        
             | imranhou wrote:
             | I think the only issue there might be that google might be
             | unaware that it is a paywalled content due to how many
             | sites allow crawlers access to content but not to users
             | (based on crawler ip ranges). Agree such a flag would save
             | time when available or even a search filter option to skip
             | those results.
        
       | pictur wrote:
       | People don't want to search anymore. they want to see well-
       | categorized data. For example, instead of searching for cheap
       | vacuum cleaners, I think they want a site that lists vendors that
       | sell cheap vacuums.
        
       | [deleted]
        
       | Marazan wrote:
       | Google's ranking alhorithm shaoes the web.
       | 
       | And the web now looks like a 1500-2000 word listicle with 3
       | images becasue that is what thr ranking algorithm favours.
       | 
       | If you find the info you need and leave quickly that actually
       | down ranks the page. That is is idiotic. Pages that give you what
       | you want quickly are punished!
        
       | jefftk wrote:
       | I'm seeing a lot of comments along the lines of, "Google shows
       | ads on the SEO-gamed sites that show up in results, so their
       | incentive is to give spammy results". But wouldn't this predict
       | that results would be much better on Bing and other search
       | engines that don't have much presence in the "put ads on random
       | sites" market?
       | 
       | (Disclosure: I work on ads at Google, speaking only for myself)
        
         | going_ham wrote:
         | > But wouldn't this predict that results would be much better
         | on Bing and other search engines that don't have much presence
         | in the "put ads on random sites" market?
         | 
         | Honestly what I think is every search engine sucks these days,
         | and Google manages to suck a little bit less.
         | 
         | The reason is because how easy it has been to publish low
         | quality content. It's rare to find high quality contents. The
         | issue with search engine is that they don't show these rare
         | contents. These aren't recommended by default. These are
         | hidden.
         | 
         | What happened is the recommendation system is broken! If there
         | weren't any neural networks making decision, it would have been
         | different issue. But with modern search engines deploying
         | recommendation system, I think it is all about rich gets richer
         | scheme. You can't recommended new or fresh but quality content
         | because it was never visited! So, when the entire backend is
         | relying on user data, the system is being fed crappy data
         | because users don't care and those who do are few in number.
         | 
         | As long as system is making revenue, it will be this way. Most
         | people never care at all and would never be bothered because
         | they only care for simple queries. If anyone deviates from the
         | norm, Google search results are pretty bad like every other
         | search engines.
        
         | [deleted]
        
       | Quenhus wrote:
       | For developers, you can remove some spam websites from Google and
       | other search engines, with these uBlock filters:
       | https://github.com/quenhus/uBlock-Origin-dev-filter
        
       | RichardHeart wrote:
       | His suggestion basically is to become DMOZ.org If you are old
       | enough to remember it.
        
       | hnbad wrote:
       | I guess Paul's definition of "beating Google" is "creating a
       | startup without clear revenue path aiming to be acquired by
       | Google or a competitor" as I can't think of any meaningful way a
       | niche search engine would provide a good enough value proposition
       | against existing Google competitors or embeddable search engines
       | (as well as SaaS like Algolia).
        
       | aaron695 wrote:
        
       | abakker wrote:
       | I have a version of fixing this that I would personally enjoy a
       | lot. Leave google alone, let it crawl the web, prioritize what it
       | wants to via algorithms. But, give me a version of that which
       | ONLY surfaces results from discussion forums (including SO,
       | Reddit, HN, etc). For most of the stuff where I am actively
       | _searching_ and not just looking stuff up, discussion forums of
       | motivated, self-selected contributors have the stuff I need with
       | the context I need. It used to be that blogs had answers, but
       | that media has been categorically ruined by SEO.
       | 
       | Now, one of the deficiencies here has been examples. Try this:
       | "best miter saw". you will not find any websites that actually
       | discuss the answer to this question, despite it being a product
       | category with a lot of price variability and performance
       | tradeoffs (weight, capacity, power, cord vs cordless, accuracy).
       | 
       | Nearly any product reviews for large purchases follow the same
       | pattern unless consumer reports has decided to dig deep (e.g.
       | washing machines).
       | 
       | How about guitar strings? Sandpaper? Printers? google's algorithm
       | has allowed profit motivated websites to displace the commons to
       | too great an extent.
        
         | techdragon wrote:
         | How can they tell it's a discussion forum? Does the scraped
         | search spam that copies stack overflow content look enough like
         | a discussion it fools their heuristics? Is it a manual process
         | (in which case you can bet it "doesn't scale" and won't be
         | built) this is the problem they face. Literally nothing is
         | simple given the size of their dataset, the scope of their user
         | base, and the adversarial nature of the very world around them
         | generating new data they must work with in order to do the job.
         | 
         | There's definitely an element of "we got our profit so fuck it"
         | with respect to the search engine advertising business and
         | Google's incentives to make search quality better, but that
         | doesn't change the difficulty of the underlying problem. If I
         | wanted to pay Google $5 per search for super high quality
         | results, even $50, they can't just make this product better to
         | get my money, they are fighting an ongoing war against
         | adversarial SEO which prevents this from ever being better than
         | stalemate at best or more likely due to economics, the slow
         | slide into declining quality we see due to the SEO side having
         | more money with which to pay for engineering brainpower.
        
         | eitland wrote:
         | https://search.marginalia.nu has a very interesting approach to
         | this:
         | 
         | Use tracking especially and JS generally as a weight in ranking
         | so sites that contains much of any of these needs to be
         | exceptionally high quality to float to the top.
         | 
         | This means sites with limited ads and tracking, typically
         | enthusiast driven pages float to the top.
         | 
         | Now always when someone discusses a novel way to combat webspam
         | someone will immediately counter: if this becomes popular SEO
         | hackers will immediately start doing this.
         | 
         | Well - if reducing page size and removing tracking becomes a
         | leading SEO trick I can deal with a bit of SEO hacking :-)
         | 
         | Yet for some reason I feel Google won't start using this very
         | simple metric :-)
        
           | marginalia_nu wrote:
           | Another big factor in what I do is prioritizing the opinions
           | of certain indieweb sites when ranking domains, basically a
           | segment of the graph consisting of humans with a particular
           | dislike for seo spam. This makes ranking manipulation much
           | less effective.
        
             | ffhhj wrote:
             | Good to see you in this thread! I just added Marginalia to
             | the recommended search engines in a new search tool I'm
             | building, to get programming answers faster. The search
             | assistant builds queries for specific sites with
             | "site:targetsite.com , programming question" (that comma is
             | not a typo). When doing a query like that I get no results
             | but these warnings:
             | 
             | /!\ The term "," contains characters that are not currently
             | supported
             | 
             | /!\ Try rephrasing the query, changing the word order or
             | using synonyms to get different results. Tips.
             | 
             | sample: https://search.marginalia.nu/search?query=site%3Ast
             | ackoverfl...
             | 
             | Please make your engine ignore the comma, it shouldn't
             | affect the search.
             | 
             | Either ignore the site:... expression or filter sites
             | accordingly.
             | 
             | Thanks a lot for creating Marginalia!
        
               | marginalia_nu wrote:
               | Ignoring comma seems doable, I'll have it fixed in a few
               | days, currently away from my work computer.
               | 
               | site:-queries are supported, but only at the first domain
               | level. (e.g. site:marginalia.nu; not
               | site:search.marginalia.nu). I might tune it so that it
               | strips subdomains automatically, that is pretty trivial.
        
         | mjr00 wrote:
         | My current solution for this is to just tag `site:reddit.com`
         | to the beginning of Google searches. A Google search for
         | `site:reddit.com best miter saw` has a lot of relevant results.
         | 
         | Marketers/SEO people are starting to infiltrate this as well,
         | but since they can't control and SEO the content on Reddit
         | nearly as much, this still works pretty well for now.
        
           | abakker wrote:
           | Of course! this is a tip I got from HN years ago. it gets old
           | to add Reddit, PracticalMachinist, Fine Woodworking, etc. I
           | really want a proxy for user generated content where nobody
           | got paid to write it.
           | 
           | Fun story: I knew a person who worked for a home building
           | website part time. She got paid to write stories on home
           | renovations. She had never done _anything_ she wrote about.
           | Mostly, she gathered up other blogspam and recycled and
           | rewrote it without citation. Sometimes she went to forums,
           | sometimes reddit, sometimes youtube. But, the universal part
           | of it was that she had to produce 2x pieces of content per
           | week endlessly. Just for a local LA builder. Most of the
           | content wasn't "wrong" but, it also wasn't exactly incisive
           | and didn't include any details that would have been useful.
           | Instead it was just filler. The worst part is that it
           | consistently improved that company's search ranking.
           | 
           | Content farming needs to die.
        
         | ringworld wrote:
         | You may be interested in searX - it refers to data sources as
         | Engines; you have the ability to run your own instance (or use
         | a public shared one) and only enable engines you want results
         | from (reddit, stackoverflow etc.). Build your own meta-engine
         | recipe, basically.
         | 
         | https://searx.space to learn / get started. Find one and visit
         | it, click Preferences upper right then follow your schnoz.
        
           | ffhhj wrote:
           | The results include all the SEO spam that infected Google,
           | ie. SO clones. How is it better?
        
             | ringworld wrote:
             | The GP commented about using curated sources. One can
             | disable all those (google, bing, etc.) and choose to only
             | enable results from reddit, wikipedia and so forth in
             | searX, which directly queries based on a config inside the
             | project.
        
       | mg wrote:
       | Why not try writing a search engine specifically         for some
       | category dominated by SEO spam?
       | 
       | I like to compare search engine results and wrote this tool to
       | make it easy:
       | 
       | https://www.gnod.com/search
       | 
       | There in fact are many vertical search engines. You can click on
       | "more engines" to see the whole list.
        
         | Debug_Overload wrote:
         | The fact that you included Reddit, SO, Google Scholar etc is
         | awesome (I thought it was only for main search engines). Thanks
         | for sharing. Bookmarked.
        
         | noduerme wrote:
         | Okay that's sort of what the !bang in DDG is for, and why it's
         | a meta-engine. What's the blue sky ideal for a real, no-
         | bullshit, everything search engine that doesn't fall prey to
         | the constant flood of garbage?
         | 
         | I have an exterminator who comes to my house every couple
         | months, and sets up traps here, poison there. I don't have any
         | rats in my house. I do see rats running across the yard
         | sometimes. The exterminator explains it like this: Rat
         | pressure. The rats overpopulate and there's "pressure" (like,
         | uh, "memory pressure", which is also a fluid concept) so they
         | try to get into your house more, through smaller holes, as a
         | function of how much outside drama is going on, how many they
         | are and how overpopulated, how scarce their food supply is, how
         | cold it is outside, and whatever else drives rats into your
         | house. (I love the dude who's my exterminator).
         | 
         | Anyway, this is the same problem every search engine faces. The
         | more surface area they expose, the more pressure they have
         | building, the more ways people have to fake out their systems.
         | 
         | We _have_ to go back to the 1990s Yahoo! model. Curated
         | content. A list of websites that are reputable. _1990s Yahoo is
         | the future_.
        
           | loceng wrote:
           | Creating a "trustless" search crawler, where anybody can
           | participate, and then applying an algorithm to determine
           | trust or value feels like it'd be a never-ending arms race -
           | that'd require AI and extensive/expensive resources that is
           | likely better invested in developing real trust networks and
           | curation; curators are corruptible and regulatory capture of
           | policy is possible if the organization is infiltrated or
           | poorly overseen.
           | 
           | Carte blanche opening your system up to anyone to inject data
           | seems like the wrong foot to start off on, whereas my
           | curating a moderator, someone I personally know and feel good
           | about, trust to whatever level, and hiring them - ideally
           | making sure they're someone you respect and you're someone
           | they respect, pay them well, and at scale will be able to pay
           | for itself; this did just bring to mind however big pharma
           | and pharmaceutical trials structure and how that system can
           | be/is/has been captured - and so perhaps the pressures when
           | dealing with multi-billion dollar market categories will
           | always lead to shenanigans if ever trying to centralize too
           | much, not allowing for de-risking and broader resource
           | distribution via sales/profits to more parties than the
           | "5-star" rated products.
           | 
           | In a thread on HN, I think it was yesterday, a few people
           | posted about review sites where some product reviews are free
           | - but others you had to pay for. A system to facilitate such
           | organizations could allow a highly competitive environment,
           | where organizations develop/build a brand - build trust for
           | their brand as being competent and thorough - and so then
           | over the lifetime of a reader/customer, perhaps they'll spend
           | $1,000 buying reviews (say 333 big purchases over 40 years
           | that you're willing to pay $3 a hit for) to make sure they're
           | ; mind you there will be organized that could be captured to
           | say promote one conglomerate of products over another,
           | perhaps even regionally, but I'm beginning to think it's a
           | necessary layer to combat the shit show that is Amazon (et
           | al) reviews. Ideally these systems and how the reviews
           | present the information, and how thorough - the technical
           | depth and breadth and testing done - will help educate those
           | who dive into using this system, which will sharpen
           | themselves while keeping reviewers on their toes and arguably
           | strengthening their organizations and competency as well.
        
             | fsflover wrote:
             | > Creating a "trustless" search crawler, where anybody can
             | participate, and then applying an algorithm to determine
             | trust or value feels like it'd be a never-ending arms race
             | - that'd require AI and extensive/expensive resources
             | 
             | Not necessarily: https://yacy.net
        
             | noduerme wrote:
             | Legitimate question; if this is a real business model (and
             | I believe it could be) then why the fuck does Yahoo.com
             | look like a dead clickbait aggregator instead of, yknow,
             | what it used to look like? i.e. FINANCE ___ [Stocks] [ etc]
             | ENTERTAINMENT ___ [Movies] [TV] Where's _that_ site?
        
               | loceng wrote:
               | Visionary leadership left the company? And arguably it
               | lost its soul and excitement. I genuinely thought when
               | Marissa Mayer was brought in as CEO and announced they
               | bought Tumblr for $1.1 billion in cash, I thought she
               | could actually turn Yahoo! around - that she understood
               | platforms and holistic systems; perhaps she did but her
               | hands were tied, and then they made terrible decisions
               | like banning porn on Tumblr - so bureaucracy, politics,
               | and arguably the ad industrial complex and "mainstream"
               | pressures (perhaps like billing/financial system being
               | used as a tool to suppress freedom/sexuality/porn because
               | they've not been successful politically) were pressures
               | her or the Board of Directors couldn't counter.
               | 
               | Then Google came along and was a better search engine,
               | for a time, that was a traffic leak for Yahoo! - and then
               | Google has now devolved; I also thought Google had a good
               | shot at competing with Facebook, but whomever's pulling
               | the strings there, the launch of various platforms, they
               | don't seem to understand it can take 5-10 years after the
               | MVP of a product is launched for it to mature - but for
               | whatever reason their executives or managers haven't been
               | comfortable pulling the trigger, arguably because anyone
               | with that entrepreneurial spirit just takes their idea
               | and gets funding and owns a large portion of whatever
               | they've done; but then you can never develop a full
               | breadth, holistic ecosystem, that can grow into every
               | crevice, nor as broadly, or nuanced as possible - so
               | they're stuck being Search, Gmail, Calendar, etc.
               | 
               | I'm quite certain I've figured out the foundational MVP
               | facilitating an "infinitely" growing system and that
               | would allow 3rd parties to integrate, however I have
               | severe chronic pain that messes up my executive function,
               | so it's difficult for me to actually self-direct and
               | execute - I'm stuck mostly in a low activity, stream of
               | consciousness and go-with-the-flow life of routine -
               | otherwise I would try to launch my plans, which I've done
               | plenty of UX/UI for, as that is simple enough that
               | somehow bypasses higher executive function (moving a
               | pixel and then responding via visual feeling of it isn't
               | complex) - but organizing to turn that into adequate
               | specs to get solid estimates or fixed price quotes for
               | work is extremely difficult to me.
               | 
               | On January 11th I do have a surgery that may or may not
               | reduce my pain by 50%+, may or may not improve my
               | executive function, ... I've even attempted to write
               | draft "Show HN:" posts to explain what I am doing, the
               | starting feature sets, the reasoning behind the design
               | decisions I've made - but it just gets too complicated
               | too quickly for me mentally then to be able to organize
               | further or polish it. I think I have the perfect domain
               | name for it too: ENGN (engine), what makes me smile every
               | time I notice it in my layout/mockup of it is in the
               | search input box it says "Search ENGN". My username on HN
               | actually is an older incarnation of a plan I had, loceng
               | being a short form of "local engine" - and ENGN being
               | from engine, a name I brainstormed after Tumblr sold for
               | $1.1 billion - and I realized that eventually I'd want to
               | try to do my "local engine" idea but that that was too
               | long for a brand name. Fortunately engn.com was for sale
               | at the time, I can't remember if it was $2,000 USD or
               | $4,000 USD - either way, not a bad price for a 4-letter
               | .com that's pronounceable to something with meaning.
               | 
               | I've wanted to write a book too - on health, health
               | systems, and on these systems we're talking about here.
               | I'm 38 now and I taught myself to program when I was 11,
               | learned SEO at 15, evolved to design as I'm more creative
               | and programming became mindnumbing to me, and eventually
               | thought I'd need (or want) VC money - so I started
               | engaging on Fred Wilson's of USV.com's blog - AVC.com -
               | so I have plenty of self-taught experience. The problem
               | is even going back to my shorter or longer writings, or
               | comments of mine on HN or other, it's nearly impossible
               | for me to try to do the organization of it all - to
               | compile parts, etc.
               | 
               | Maybe this surgery goes well and I can begin to do more,
               | or maybe it doesn't; I've tried to hire people or get
               | help over the years but 1) no one has been willing to
               | engage enough as I'd need due to my executive
               | dysfunction, and maybe that's a moot point as 2) it's
               | extremely difficult for me to even manage someone or an
               | ongoing project - whereas I could explain things and
               | direct if people are initiating, if others are directing
               | the conversation, then I could respond - but otherwise I
               | can't do normal oversight and management any longer.
               | 
               | The most accurate odds I can give that this surgery will
               | help (piriformis syndrome, my sciatic nerve goes through
               | the piriformis muscles, rather than around it - so
               | there's constant compression + that's worsened with
               | use/engagement of the muscle) is 50/50. There's a high
               | probability that this surgery t's not related to the
               | primary source of my pain, which is from LASIK eye
               | surgery I did 7 years ago - I got arguably the worst of
               | the worst symptoms: central sensitization and
               | hyperalgesia - a hypersensitivity to pain, where all
               | sensations, pain especially, is amplified to as what
               | seems as strongly as possible; and why I must highly
               | limit my activity level as any little stresses on body,
               | likely even normal natural muscle use which causes micro-
               | tears, then compounds the problem and will take many days
               | of very low activity to return to a still difficult-
               | dysfunctional baseline. But perhaps there's a high
               | probability that the sciatic nerve having been compressed
               | for most of my life, my mind, nervous system, could
               | handle that level of pain/sensitization - but then the
               | damage to the cornea that happens in 100% of LASIK
               | surgeries was what finally broke the camels back.
               | 
               | If this surgery doesn't go well, doesn't help - which it
               | took me 1.5 years to even find a surgeon who does this
               | type of specialized surgery - then I'm afraid I may end
               | my life because this pain, the lack of productivity, of
               | being stuck, of quite little social interaction overall -
               | HN is likely the most stimulated my mind gets, only
               | possible to write this fluidly when it's been at least 3
               | days of eating a very low inflammatory diet and very low
               | activity level - and only if I've been mostly inactive,
               | primarily sitting, since waking and getting out of bed -
               | to not trigger any pain in my body. Anyway, it gets
               | boring, repetitive.
               | 
               | I've thought of trying to find an
               | Elixir/Phoenix/React/etc developer or agency on Upwork
               | before surgery and try to struggle to get them on at
               | least developing the initial foundation of ENGN, but
               | aside from the struggles I listed above that I'll
               | encounter, it also will cost additional money - and I've
               | not worked in 5 years, I've spent $250,000+ on stem cell
               | treatments to heal old high school football injuries,
               | that I didn't even know I mostly had and only weren't
               | tolerable after LASIK made my nervous system super
               | hypersensitive - and to pay for this surgery my mother is
               | taking $27,000+ USD out of her retirement; I'm in
               | Ontario, Canada, but the healthcare system has been
               | practically useless to me. Even if the foundation for
               | ENGN would cost just $5,000 to $10,000 to get the ball
               | rolling in terms of starting to get users to signup and
               | bringing in revenues - it's more money, but even thinking
               | about that additional stress would put on my mother, then
               | adds to my already overwhelmed nervous system - so
               | there's plenty of resistance there to overcome on its
               | own. There's also always the potential I'd somehow hire a
               | bad contractor or agency, bad in one or many ways, and
               | then the MVP wouldn't get finished - primarily because of
               | my own incompetency-dysfunction, and then I will just be
               | reminded, again, of how stuck my life is and how it
               | barely moves forward - personally or professionally.
               | 
               | I'm living a version of the Groundhog Day movie that
               | keeps repeating itself, except where I'm in pain, and so
               | far where I can tell people my story and ask as many
               | people as possible to help and nothing happens. It's why
               | last thing I try is this surgery, though I am supposed to
               | do another PICL stem cell treatment - where they treat
               | tissues inside of my neck - the first one did reduce my
               | neck pain and migraine some - for whiplash related
               | issues, in part from football - because they only treat
               | one side of the tissues, not all of the tissues, the
               | first treatment - and so the second treatment they target
               | the remaining high yield tissues. But I'm certain I'll
               | know after the surgery if there's any improvement or hope
               | that my life can start to become different, and even
               | though there is a stromal stem cell treatment that was
               | developed at University of Pittsburgh - that had very
               | successful human clinical trials, that were fast tracked
               | under compassionate grounds in India, to heal/regenerate
               | deeper corneal tissue for severe scarring and chemical
               | burns - the ETA for it being FDA approved was 5 years to
               | be clinically available in the US, perhaps less time
               | before available in India, but I'll have nothing else
               | significant treatment wise for further pain reduction to
               | look forward to in the near term after getting this
               | surgery - and so why I'm afraid I won't be around much
               | longer if it doesn't help much.
        
               | noduerme wrote:
               | Lot to digest.
               | 
               | Listen, first of all, do not consider ending your life.
               | Seriously, you're way too smart for that. I'm sure you've
               | got it worse, but I've had enormous sciatic problems in
               | my life, I've had 3 herniated discs; they're behaving for
               | the moment after massive doses of cortisone and without
               | any painkillers, but I know what it feels like to cough
               | them all out of my back at the same time. Not to be able
               | to put a foot in front of another or turn your neck for
               | weeks. (I'm a huuuuge fan of intramuscular cortisone
               | injections, though. Like 5 or 6 large cortisone over a
               | week, with some B-12. Every couple years. Not in the
               | spine... fuck that. Alternating butt cheeks. You won't
               | feel any benefit until the third day at least. If you can
               | convince a doctor to give you that for a week, you will
               | be fucking superman. They won't do this in America unless
               | you know a doctor personally, but they'll do it in Mexico
               | or Spain. I had it the last time my discs went out and
               | it's been 6 years and the inflammation has not come back.
               | They thought it would).
               | 
               | Anyway, before you off yourself, do try a fuckton of
               | intramuscular steroids. The fifth day I levitated off a
               | bed in the hospital; I hadn't walked in a week; I felt so
               | good I went to a club; I got drunk and spent the night on
               | a beach drinking and making out with an 18 year old model
               | from Denmark. Seriously. There were wild cats walking
               | around; it was winter on the Spanish coast. If you do one
               | thing before you die, go get five cortisone shots in your
               | ass, in a week.
               | 
               | I also got the hiccups for 24 hours and couldn't sleep,
               | but that's neither here nor there. And I got temporary
               | blindness in my left eye from fluid behind the retina,
               | caused probably by too much testosterone. But. Goddamn
               | it, I'm ok. You can be okay.
               | 
               | Enough about that.
               | 
               | About Yahoo and Google. That entrepreneurial spirit is,
               | in my experience, way too often just about getting the
               | funding and fucking off. We all know why these companies
               | go downhill, but somehow it's always such a shock when
               | they actually deteriorate in front of our eyes, huh?
               | Google's search results, for instance. I would have
               | expected their cofre business to stay more or less fine,
               | not collapse a couple years after all the competition was
               | eliminated.
               | 
               | It would be fine if they didn't grow into every crevice.
               | Get search right, that's all we ask. I don't want Google
               | to be my chat room or my shopping site. Why do they need
               | to? Search is huge. They own 90% of the market.
               | 
               | >> but organizing to turn that into adequate specs to get
               | solid estimates or fixed price quotes for work is
               | extremely difficult to me.
               | 
               | That's always the worst. The business side. I've always
               | just built things and hoped for the best. It sounds like
               | you've got something interesting going there, although I
               | have no damn clue what you're building, that's an
               | exciting feeling. ENGN is killer. If you own ENGN.com,
               | hell, money well spent.
               | 
               | I don't understand what you mean about "executive
               | function", since you obviously have the capacity to write
               | well-crafted email and think pretty clearly; perhaps I
               | lack the executive function to discern your lack of
               | executive function (I'm a brutally self-punishing
               | alcoholic, but otherwise a damn good programmer)
               | 
               | Anyway I don't know if you're trying to ask for pointers
               | to workers for this concept, I'm probably not it; I'm
               | $200/hr and I'm already covered for the next year. This,
               | however, should be your symphony. And I think you know
               | how to do it.
        
       | nojito wrote:
       | The issue is and will always be monetizing. Anyone competing with
       | Google will need to have a robust monetizing strategy to survive.
        
       | SavantIdiot wrote:
       | > And boy would Google find it hard to follow you down that road.
       | 
       | This is a good perspective. Where can Google not go? Places that
       | don't lead to profit. They will try (cough Wave cough) but will
       | give up.
        
         | stanleydrew wrote:
         | But isn't profit the point of any company? If you're going down
         | a path that doesn't lead to profit, you'll fail whether Google
         | follows you or not.
        
           | SavantIdiot wrote:
           | No. That's why Non-Profits, 501(c) corps in the US, exist.
           | 
           | E.g., Linus wasn't looking for profit, and Linux ate the
           | world.
        
       | james-redwood wrote:
       | www.neeva.com www.kagi.com Two privacy oriented search engines
       | with results and features better than and surpassing Google (did
       | I mention that they're ad free?)
        
       | aronpye wrote:
       | A lot of the spam results just seem to be copy pasted content.
       | 
       | I wonder how difficult it is to compare the main body of text in
       | search results, then say if it is over a 95% match with another
       | site (I.e. it has been copy-pasted), demote it in the search
       | results. If a site generates too many of these demotions then it
       | gets blacklisted from the index.
        
         | nikanj wrote:
         | How would you avoid throwing the original site out with the
         | bathwater?
        
           | aronpye wrote:
           | Maybe try and time stamp the page, presumably the earliest
           | page is the original source. Could also combine it with a
           | site reputation rating or something similar.
        
         | neoneye2 wrote:
         | I have experimented using LSH (Locality Sensitive Hashing) for
         | identifying similar documents, among 50k documents in total.
         | 
         | My LSH implementation is here: https://github.com/loda-
         | lang/loda-rust/blob/develop/script/t...
         | 
         | Example of the 100 most similar documents:
         | https://github.com/neoneye/loda-identify-similar-programs/bl...
         | 
         | There can be false positives, so after LSH then do a more in-
         | depth comparison.
        
       | nobbis wrote:
       | 10 years ago, the original engineer of Google's search engine
       | told me what he now wanted was asynchronous, human-powered search
       | with curated results, e.g. a Google-like interface, but queries
       | cost $5 and take 15 minutes.
       | 
       | Money's no object for him, so he wanted to outsource the
       | filtering, ranking, and interpreting of results. Would be even
       | more useful today (albeit a tiny TAM.)
        
         | Nuzzerino wrote:
         | I would have told him not to quit his day job. That sounds
         | ridiculous.
        
           | nobbis wrote:
           | I don't agree, plus he's a billionaire.
        
             | leobg wrote:
             | Why's he not building it now, then? He has the means.
        
               | nobbis wrote:
               | Building costs more than money.
        
               | leobg wrote:
               | True. I'm wondering though if he hasn't given up on the
               | idea. It's been ten years. Maybe he doesn't believe
               | anymore that a human could do better, even if given 15
               | minutes of time?
               | 
               | If such a demand exists, should we not be seeing an
               | active "secondary marketplace" for people offering to do
               | 15-minute human meta search & research tasks?
        
               | nobbis wrote:
               | I said "wanted it" not "wanted to build/finance it."
               | Doubt he's given up wanting it.
               | 
               | A human can always do better: take Google's results, then
               | remove SEO spam/duplicates, extract more relevant
               | snippets, combine results from multiple nearby queries,
               | etc.
               | 
               | Demand exists, but someone has to build it. And it's
               | unclear how big the market is.
        
         | mroll wrote:
         | > the original engineer of Google's search engine
         | 
         | you mean Larry Page?
        
           | nobbis wrote:
           | No, the guy who re-wrote Larry's research code into Python
           | and put it in production.
        
       | PaulHoule wrote:
       | For medical search the answer is pubmed. Not only is the
       | collection of documents clean (of low-grade scammers, pharma
       | companies have to pay big $ to play) but the NIH has done a large
       | amount of search quality and ontology work -- the system knows
       | "Tylenol" is synonymous with "Paracetamol", "Acetaminophen", etc.
        
         | titzer wrote:
         | > the system knows "Tylenol" is synonymous with "Paracetamol",
         | "Acetaminophen", etc.
         | 
         | This is the exactly the kind of thing that Google cannot fathom
         | manually doing. As if entering facts into a computer were
         | morally wrong somehow. They'd much rather launch the equivalent
         | of a shell script that harnesses face-melting amounts of
         | computational power, processing literally trillions of webpages
         | in bulk, signal and noise together, junk, spam, and
         | misdirection alike, to learn bad associations and then serve
         | them up with no human review and then put the full force of
         | their reputation behind a results page that apparently people
         | never check and certainly can't correct because of the
         | inscrutability of a machine-learned model that has few to no
         | levers to adjust.
        
           | leobg wrote:
           | Isn't that kind of easy to do? I mean, you can do that kind
           | of thing as a one-man-show on consumer hardware using GloVe
           | or fastText.
        
           | PaulHoule wrote:
           | Google has long lied about what they do.
           | 
           | I had a chance to debrief people who had left their relevance
           | team and they told me things that were outright contradictory
           | to what rank-and-file Google employees have told me. (What
           | they told me did make sense in terms of my experience as an
           | IR system developer, SEO publisher, etc.)
           | 
           | Microsoft bought a company called PowerSet that had extracted
           | a large database of entities and relationships from Wikipedia
           | and used the technology to make the "Bing" search engine.
           | 
           | Earlier Microsoft engines were a joke, but Bing was so good
           | that Google saw it as a threat so they bought Freebase to get
           | a similar kind of database, then they killed it to
           | incorporate it into the "Google Knowledge Graph".
           | 
           | For all of their hating on semantics note that they hired R.
           | V. Guha as their chief scientist, who worked with Doug Lenat
           | on the notorious
           | 
           | https://en.wikipedia.org/wiki/Cyc
        
       | coding123 wrote:
       | > Lots of people want to be amateur police. (pg)
       | 
       | This is very true. How many times have I clicked on a site met
       | with ads so bad that the browser slows down, and after 10 seconds
       | the page gets covered up by more and more crap and then a paywall
       | shows up sometimes too. Now here's the thing - a competitor to
       | Google might detect you clicking back and then pop-up a special
       | set of controls near the search result that lets you say: "too
       | many ads" or "paywall".
       | 
       | However, if such an engine were to start beating Google, I'm sure
       | Google would implement it in their own way: automatically detect
       | why you clicked back in such a short timespan.
        
         | numpad0 wrote:
         | Google already detects immediate returns and knock off that
         | link for you. What's problematic to me is I tend to reactively
         | mash back and forward and link just goes from it.
        
         | DenisM wrote:
         | Sooner or later you'll have to deal with ballot-stuffing -
         | companies trying to bury their competitors by casting lots of
         | negative votes.
         | 
         | Perhaps ML will help in detecting such campaigns.
        
         | cinntaile wrote:
         | > However, if such an engine were to start beating Google, I'm
         | sure Google would implement it in their own way: automatically
         | detect why you clicked back in such a short timespan.
         | 
         | Do you seriously believe that Google doesn't use that as a
         | datapoint already?
        
       | birken wrote:
       | The funny thing is that _if_ the people who worked on spam at
       | Google were free to talk about it, I 'm sure it would become
       | evident that they know more about spam and anti-spam efforts than
       | anybody else in existence. It's a ridiculously hard problem,
       | especially when people are targeting you directly. But they
       | aren't free to talk about it, because if they did it would just
       | give more assistance to the spammers, and make the problem worse.
       | 
       | I'm not saying that curated search results for particular
       | verticals is a terrible idea (though I'm sure like anything the
       | devil is in the details), but on the whole Google search is very,
       | very good considering the constant assault they are under from
       | spammers (which most other search engines are not, at least
       | directly).
        
         | throwaway6845 wrote:
         | > I'm sure it would become evident that they know more about
         | spam and anti-spam efforts than anybody else in existence
         | 
         | Really?
         | 
         | I can point you to Hard Problems that have been solved better
         | at little startups than at Google - or, indeed, at any other
         | bigco. That's why acquisitions happen.
         | 
         | Why does Google having 1000 engineers working on a problem
         | automatically mean they are the smartest?
        
         | colordrops wrote:
         | You talk about this "constant assault from spammers" like it's
         | not Google's fault and it's an intractable problem. That is not
         | a correct characterization. There are plenty of low hanging
         | fruit that could easily be detected and deranked, for instance
         | scraped stack overflow spam. But google chooses not to
         | deprioritize these results. The reason they don't is that they
         | make money on ad clicks, which many responses have already
         | elaborated on.
        
         | [deleted]
        
         | tibbar wrote:
         | But why is Google even dealing with spam? What if they (or
         | someone else) curated top websites for a given category? For
         | instance, when I search for a programming-related term, I
         | already know that I want to see the answer on either Stack
         | Overflow or one of a few reference documentation sites. It is
         | possible that some other site could have the answer instead,
         | but in practice the random sites that often show up at the top
         | of the results are usually SEO spam. A search engine that
         | figured out or let you select the semantic space you are in and
         | then promoted known websites - maybe ones you curate yourself!
         | - would be a big improvement.
         | 
         | Of course you can always hardcode the site you want in the
         | Google search results but this is hacky and not very
         | expressive.
        
         | phsource wrote:
         | This 100%. In travel, we see Google constantly tweaking its
         | algorithms, and compared to Bing, Google surfaces a ton more
         | small, well-written travel blogs [1]
         | 
         | Not only that, Paul and Michael have seen plenty of startups,
         | and at least in recent memory, the number of vertical search
         | and consumer startups that Y Combinator has funded hasn't been
         | that high
         | 
         | As a consumer startup, I know this issue firsthand. Paul and
         | Michael assume that if you build a better product, they will
         | come! That's simply not true these days.
         | 
         | Instead, you need to:
         | 
         | - Build a better product
         | 
         | - Option 1: Figure out a channel with enough growth on an
         | existing platform. This likely means you're doing SEO for your
         | new search engine
         | 
         | - Option 2: Get your customer lifetime value high enough so you
         | can pay for ads. This is tough, since it's a bit of a chicken
         | and the egg problem since most search engines are monetized
         | with ads
         | 
         | As the founder of Wanderlog (YC W19; https://wanderlog.com), a
         | consumer vacation planning app [1], I definitely remember the
         | idealistic days when I thought the best consumer product on its
         | own would win! But growth doesn't just come, and the same can
         | be said of vertical-specific search engines.
         | 
         | [1] Try searching "[your city] itinerary" on Google vs. Bing:
         | it's much more likely you'll find a small blog rather than
         | Lonely Planet or the local travel bureau as the top result
        
           | noduerme wrote:
           | >> - Option 2: Get your customer lifetime value high enough
           | so you can pay for ads. This is tough, since it's a bit of a
           | chicken and the egg problem since most search engines are
           | monetized with ads
           | 
           | nonoonononooonono. No. Don't monetize anything for the first
           | 10 years. That's the only way it can work. Then you can go
           | monetize it and buy an island and not give a shit if you
           | destroy what you created.
           | 
           | Oh but don't worry. You'll have investors.
        
           | judge2020 wrote:
           | [1]: both signed in, but with the profile image removed
           | 
           | Bing: https://i.judge.sh/ShareX/2022/01/www.bing.com_search_q
           | %3Dat...
           | 
           | Google: https://i.judge.sh/ShareX/2022/01/www.google.com_sear
           | ch_q%3D...
           | 
           | Interestingly Google didn't have a top-result ad and the
           | google.com/travel carousel is 4th from the bottom.
           | 
           | For the actual results, both thefearlessforeigner.com and
           | paigemindsthegap.com seem to be actual travel blogs (the
           | pictures didn't appear in a reverse image search, so they are
           | probably organic), but they're clearly geared towards being a
           | 'faq' for visiting the city and have affiliate links where
           | appropriate. Bing went straight for discoveratlanta.com, and
           | frommers.com is well-thought-out but not a personal travel
           | blog.
        
           | padastra wrote:
           | Hi! I used Wanderlog to plan a recent month-long group trip,
           | which was definitely the most complex vacation I've had to
           | plan. For context I am very active when traveling (e.g.
           | multiple activities each day); so not sure how my experiences
           | map to others.
           | 
           | The best part of it was (going to a foreign country) being
           | able to find / identify all the attractions relative to each
           | other, so I could go to cluster A on Monday, cluster B, on
           | Tuesday, etc.
           | 
           | The hardest part of it (and why I needed to create a separate
           | google sheets anyways) was--once I figured out opening hours
           | of different locations, hard-to-book activities with limited
           | reservations--the ease of moving things around more fluidly
           | e.g. cluster B on Monday, cluster A on Tuesday, etc. and
           | having a more information-dense view so I could see larger
           | portions of the itinerary at once.
           | 
           | It would be cool to have an "input everything" --> "input
           | time restrictions / unmovable things" --> output planned
           | activity cluster type workflow.
        
         | dougmwne wrote:
         | I'm sure they know all about it, but are prevented from doing
         | anything by the business model. Pinterest has been spamming up
         | my search results for years. Maybe other people find it
         | helpful, but I do not. It's obvious I am never going to get
         | value from Pinterest. Let me click a button to add it to my
         | block list. One single click would have given me years of
         | massively improved results.
         | 
         | The fact that this feature does not exist shows that there is
         | something deep within Google's core that is preventing them
         | from addressing SEO spam, just like there is something deep
         | within Airbnb that makes it difficult to filter out Airbnbs
         | with problem reviews.
         | 
         | Google has been coasting for a good long time and now major
         | players are realizing they are wide open for disruption.
        
         | ocdtrekkie wrote:
         | I think the problem is just that the solution isn't in Google's
         | wheelhouse: There is no algorithmic ranking system that can't
         | be gamed. Human moderation and curation is the only way to
         | provide true quality, and Google is allergic to solutions that
         | don't automate and scale.
         | 
         | I think a really good search engine would still algorithmically
         | search it's index, but the content library should be human-
         | curated with a goal of ingesting content via author, not via
         | platform. Once a given author was human-approved as a quality
         | source of information, content they produce could be
         | automatically ingested going forwards, and conditionally re-
         | reviewed by a human if there were reports the quality had
         | decreased.
        
           | specialp wrote:
           | This was Yahoo in late 90s early 2000s. They had a human
           | curated directory search where one could look up something
           | like "kayaking" and find a bunch of sites on kayaking. Then
           | if you wanted to search on keyword it was outsourced to
           | AltaVista and later Google. Altavista results were terrible
           | and were almost nothing more than a keyword search (IE the
           | word you were searching appeared on this page). Google got
           | much better at the general search and this was history.
           | 
           | I think the death of the directory search dramatically
           | dropped the number of self-curated, informative sites from a
           | domain expert that were common in the early internet. Now
           | instead of making a website, many people are on content silos
           | like Reddit/FB
        
             | ocdtrekkie wrote:
             | I do still think we could adapt this model on top of
             | content silos... assuming we can index them! Consider that
             | one could also, rather than just ingesting Reddit content,
             | we ingest new posts from particular users who write quality
             | posts on Reddit.
             | 
             | Assuming a method also existed for an author to
             | authenticate themselves with the search engine, one could
             | also enable an author to help identify their content across
             | multiple platforms, as well as suggest other quality
             | authors to consider.
        
         | prepend wrote:
         | I think the issue is that these crappy results are kind of good
         | for revenue. It's not just organic results impacted but all the
         | affiliate ads.
         | 
         | Google is smart so I assume they crunched the numbers and
         | figured out they make more money from people filtering through
         | crappy results that include viewing and clicking ads than by
         | surfacing good content.
         | 
         | I think Google is optimizing for ad revenue, not for good
         | search.
        
         | numbsafari wrote:
         | The problem isn't that Google doesn't employ these people or
         | invest in their activities.
         | 
         | It's that Google has destroyed their own search results in
         | order to continue to expand their revenue opportunities.
         | 
         | If Google:
         | 
         | - Enabled downvoting on results, like YT videos. (Has its own
         | spam problems, just like YT)
         | 
         | - Allowed you to block certain domains from your search
         | results, like YT videos. (If they added some kind of
         | "coordinated network detection" and down-ranked domains
         | coordinating with ones you've blocked, that'd be pretty cool).
         | 
         | - Allowed you to create your own custom search engines, like
         | "Programmable Search Engine".
         | 
         | That would be incredibly valuable. They already have most of
         | the tech. They could even create a subscription service around
         | custom search engines if they really wanted. Plenty of people
         | would find something like that incredibly valuable.
         | 
         | Anyhow, buried in there is your startup idea. Remember: your
         | startup doesn't have to generate the same revenue or profit as
         | the incumbent on day one to be successful.
        
           | fragmede wrote:
           | How do you fight brigading, the organization of groups
           | elsewhere to collectively vote on something? Eg white
           | supremacist groups get together and vote down everything by
           | people of color, and vote up their pages about how great they
           | are?
        
             | numbsafari wrote:
             | How does Google already handle this exact problem on YT?
        
               | duskwuff wrote:
               | They don't. There's a lot of pretty obvious manipulation
               | that goes on in YouTube recommendations and search
               | results.
        
             | giantrobot wrote:
             | Randomly select votes that are actually recorded. Then add
             | in metavoting that votes on the votes with random sampling.
             | At Google's scale with a sufficiently random sampling you'd
             | be extremely hard pressed to successfully brigade or spam
             | the voting.
             | 
             | Google could easily use its current fingerprinting to
             | constrain (to an extent) multiple votes. Even knowing only
             | a portion of the population will participate in the voting
             | they can use a Wilson confidence interval[0] or similar to
             | properly weight votes.
             | 
             | Random sampling works here since you're not guaranteed one
             | vote per user per page and the outcome in binomial, seen
             | and downvoted or seen and not downvoted.
             | 
             | [0] https://www.mikulskibartosz.name/wilson-score-in-
             | python-exam...
        
             | kleer001 wrote:
             | easy, voting blocs, you assign yourself to the results of
             | people who vote similarly to you. additionally there'd be
             | local and regional blocs too. I can't think of a reason
             | that the naive everyone sees everything everyone else is
             | doing would work in the long run. That's Twitter, and it's
             | garbage.
        
             | metabagel wrote:
             | This is a great point. I would think Google could rank
             | users from low quality to high quality in terms of the
             | quality of the websites which they recommend or downvote.
             | Tricky business and could be difficult to control, but
             | basically the same thing they currently do for websites,
             | but extended to humans.
        
           | bogwog wrote:
           | Those shitty SEO spam sites exist only to serve ads, and
           | Google has a monopoly on internet ads. So there is no real
           | incentive for them to solve the problem.
        
             | [deleted]
        
             | kevinmchugh wrote:
             | Google had the same incentives in 2011, 2012 when they
             | built and released Panda and Penguin.
        
             | freyr wrote:
             | Google has 28.9% share, Facebook 25.2%, and Amazon has 10%
             | and growing fast. Not a monopoly, and the incentive is
             | there: if search results are consistently bad, people will
             | stop searching as much, and revenue and market share
             | decline.
        
             | mminer237 wrote:
             | Real review sites serve ads too. I don't think Google has
             | any incentive to make things worse, and they still want
             | people to google reviews instead of just asking friends or
             | people on reddit for reviews.
        
           | Rastonbury wrote:
           | The biggest perverse incentive for Google is that making
           | better search results can mean less clicks to ads (clicking
           | an ad because results are crap, going thru more pages of
           | results means more ads). Clicks are revenue which is much
           | easier to optimise for.
           | 
           | Internal Search owners can push for better algos, but what if
           | the algo causes revenue to fall? Are there strong forces
           | strong enough within the organisation to ensure that search
           | quality prevails?
           | 
           | If this is the case, the problem is existential. It can only
           | be arrested at the very top
           | 
           | https://en.wikipedia.org/wiki/Perverse_incentive
        
             | Closi wrote:
             | > The biggest perverse incentive for Google is that making
             | better search results can mean less clicks to ads
             | 
             | This is also something that Google can control if
             | competitors come along.
             | 
             | i.e. If a reasonable competitor comes along that is willing
             | to sacrifice ad revenue for better search result quality
             | than Google, google can just adjust their search quality
             | upwards to knock them out (and then adjust it back once the
             | competitive threat is gone).
             | 
             | Perverse incentives from Google are all over the place -
             | Searching for the delivery business "Just Eat" in the UK
             | for instance returns an ad for their competitor Deliveroo
             | above the legitimate organic search result for me - and I
             | can also see that JustEat are trying to pay for their own
             | brand name just to compete - and IMO this sort of behaviour
             | is anti-competitive, borderline extortion considering
             | Google is the de-facto way of searching for a business, and
             | shocking from a search-quality perspective (where the wrong
             | result is intentionally shown at the top because they paid
             | more money).
        
             | yabones wrote:
             | If I had to pay $10/month for good search results, I
             | absolutely would. I think most people would. Get rid of the
             | ads and spam, and you have a service worth a premium. The
             | solution is to make it user-centric instead of
             | advertiser(spammer)-centric.
        
               | visarga wrote:
               | Some kind of browser or extension that re-ranks and
               | filters search results on the web.
        
             | nitrogen wrote:
             | _The biggest perverse incentive for Google is that making
             | better search results can mean less clicks to ads_
             | 
             | This gets close to the real root of the issue -- attention
             | is monetizable independently of the quality of content.
             | There would be much less incentive to create SEO spam if
             | search engines negatively weighted pages with ads and
             | affiliate links, and if manufacturers were barred (e.g. by
             | the FTC) from owning or imitating reviewers.
        
           | ErikVandeWater wrote:
           | > Enabled downvoting on results, like YT videos. (Has its own
           | spam problems, just like YT)
           | 
           | Are there any search engines that do this? It's a great,
           | simple idea.
        
             | mad182 wrote:
             | Not really that simple, I see a lot of potential for abuse
             | - using bots and brigading to mass downvote your
             | competitors or political opponents.
             | 
             | Couple of positions up or down in google results for
             | somewhat popular and valuable keywords can mean the
             | difference in thousands of dollars per day of ad or
             | affiliate revenue. I suspect it would get pretty wild if
             | google launched something like this. There already are
             | black-hat seo methods and services, but something so simple
             | and direct would turn it up to 11.
        
               | numbsafari wrote:
               | > I see a lot of potential for abuse
               | 
               | They already have the tech to fight this on YT. They, in
               | theory, are supposed to be doing the same thing to detect
               | inauthentic behavior on ad placement and click abuse.
        
               | ErikVandeWater wrote:
               | Downvotes could apply just to future recommendations for
               | search results you see, and not apply to advertisements.
        
           | willhinsa wrote:
           | > - Enabled downvoting on results, like YT videos. (Has its
           | own spam problems, just like YT)
           | 
           | They're going the OPPOSITE DIRECTION from this!! They
           | recently removed all downvote visibility of YouTube videos
           | from the user, so now downvotes only feed into their
           | algorithm. So in the last line of defense of me ending up
           | watching a shitty video, one of the most valuable tools has
           | been removed by my betters. It's preposterous that people
           | think that Google is doing a good job. They're actively
           | getting worse, and ignoring everyone saying so.
        
             | asiachick wrote:
             | They're doing a great job. I'm so happy dislike visibility
             | have been removed. It removes the effectiveness of pile on
             | coordinated harassment which many youtubers have fallen
             | victim too.
        
           | endisneigh wrote:
           | > That would be incredibly valuable. They already have most
           | of the tech. They could even create a subscription service
           | around custom search engines if they really wanted. Plenty of
           | people would find something like that incredibly valuable.
           | 
           | Why would they do this? Google's customers are the
           | advertisers, not the end-users. And no one is going to pay
           | for a search engine, it's been tried and has failed.
        
             | slt2021 wrote:
             | if you think about it, Google provides advertisers a
             | customized search engine to find customers. So it is not
             | you searching the web, it is web's advertisers searching
             | leads
        
             | lumost wrote:
             | You only need ~1/10000th of Google's revenue to be a
             | financially successful startup. 1/1000th and you'll have a
             | great business, and at 1/100th you'll be somewhere between
             | a unicorn and a decacorn.
        
               | endisneigh wrote:
               | sure, but you'd need a better search engine if people are
               | going to pay for it.
               | 
               | A company with an objectively superior search engine
               | could make even more money with ads so now you're back to
               | the beginning
        
             | numbsafari wrote:
             | I think you have to look at it more like Amazon Prime.
             | 
             | Nobody is going to pay for /just/ a search engine. But they
             | might pay for, say, a /better/ search engine, plus
             | additional features around gmail/gcal/gdrive.
             | 
             | Think of it more as subscribing "to google" and less as
             | subscribing to "google search".
             | 
             | Regardless, the point isn't to "fix" google. It's to
             | highlight a possible path for a new market entrant.
             | 
             | ... If an existing player wanted to make a move here, I
             | would say that both Mozilla and Apple are well positioned
             | to add "personalized search" to a subscription service.
             | Same with Microsoft. DDG could also make moves here if they
             | expanded beyond search.
        
             | wakiza33 wrote:
             | there is a smaller niche market. SEMrush is a tool used in
             | the digital marketing industry that is now public and has a
             | multi-billion dollar market cap. It originally started as a
             | search engine. When they didnt gain traction they used the
             | tech to monitor Google and interface it for customers who
             | are tracking their performance in search results (and much
             | more).
        
               | yawaworht1978 wrote:
               | Can I ask what you mean by public?
               | 
               | It's not open source as far I know and there's only the
               | free trial way to try it.
        
             | roughly wrote:
             | > And no one is going to pay for a search engine, it's been
             | tried and has failed.
             | 
             | Always curious about things like this. I certainly would
             | pay for this; it sounds like many other people here as well
             | would. I'm curious if the constraint is that there aren't
             | enough people to actually pay for the investment required
             | for the service, or if there aren't enough people willing
             | to pay to meet the standard VC notions of success. We seem
             | to have a problem with building and supplying services for
             | niche (read: "not expressable as an integer percent of the
             | world's population") customer bases, and I'm never sure if
             | that's a business problem or a cultural problem.
        
               | SamoyedFurFluff wrote:
               | The people most able to pay for a service like this are
               | the people that advertisers most want because they're the
               | people with enough discretional budget to spend on things
               | like better Google search results. Allowing someone to
               | buy something like this also reduces your attractiveness
               | to your advertising clients.
        
           | ancarda wrote:
           | > Allowed you to block certain domains from your search
           | results
           | 
           | I would love for Google to build this in. Until they do,
           | there is a WebExtension that does this:
           | https://addons.mozilla.org/en-US/firefox/addon/hohser/
           | ("Block or Highlight Search Engine Results"). I use it to
           | block stuff like W3Schools so when I search for something,
           | MDN is always #1. Saves me a lot of time having to add "MDN"
           | to the end of every query.
        
           | PaulHoule wrote:
           | The custom search engine is harder than you'd think.
           | 
           | Google's search algorithm is tuned up for searching the whole
           | web. It turns out the heuristics you need are very different
           | depending on the size of the collection.
           | 
           | When Gerard Salton was doing IR experiments with punched
           | cards he was working with collections of as little as 70
           | documents and in that case you are going to be very concerned
           | about recall and not precision. Maybe there is 1 relevant
           | document and if you miss it you failed.
           | 
           | If you had 70 billion documents you might have 10,000
           | relevant documents and if you lost 60% of them you still have
           | 4,000 documents. The end user gets more results than they can
           | sift through.
           | 
           | Thus I always groan when I see a site is using "Google Site
           | Search" because the relevance is usually worse than you'd get
           | with the alternatives.
           | 
           | Connected with that is the tuning work: Google has sufficient
           | data to tune up a big model for everybody but true
           | personalized search eludes them because they don't have
           | enough data from you to tune up a model for you.
        
             | mrkramer wrote:
             | I agree with you that "true personalized search eludes them
             | because they don't have enough data from you to tune up a
             | model for you". That's what Larry Page said as well "Google
             | doesn't know what you know". His ultimate goal is Answer
             | Machine powered by AI but that's not happening anytime
             | soon. I think internet search engines that we are using
             | today are primitive compared to what we will have in the
             | future.
        
           | edrxty wrote:
           | The problem with all of this is it would help us greatly, but
           | it would be useless to the 99% that the internet is
           | increasingly being designed for. Modern UI trends are
           | becoming obsessed with removing as many options and features
           | as possible so the dumbest humans bordering on smartest
           | vegetables can still use the service.
        
             | saalweachter wrote:
             | And customization breaks caching.
        
           | asiachick wrote:
           | > - Enabled downvoting on results, like YT videos. (Has its
           | own spam problems, just like YT)
           | 
           | Not convinced this would help. The spammers would just hire
           | people to dislike competitors
           | 
           | > - Allowed you to block certain domains from your search
           | results
           | 
           | This I would use. Never show me results form collider,
           | watchmojo, ranker,
           | 
           | > - Allowed you to create your own custom search engines,
           | like "Programmable Search Engine".
           | 
           | I think this would lead to people writing highly polarized
           | engines. The Red Pill engine for example and we'd have a new
           | problem, the proliferation of popular highly biased results.
           | Of course that's not to say Google's results aren't already
           | biased but they certainly are trying to cover everyone.
        
           | PaulHoule wrote:
           | Some kinds of "spam" can improve search results.
           | 
           | Things have changed in the past few years, now that Google
           | has developed advanced transformer models, but for a long
           | time Google's question answering facility has been: "let
           | spammers make 10^8 pages where the title is the question and
           | the answer is in the page".
           | 
           | The trouble is that there's a fine line between "answer is in
           | the page" and "word salad!"
        
           | propogandist wrote:
           | >Enabled downvoting on results, like YT videos.
           | 
           | you mean the dislike counter they just disabled to force
           | people to sit through more low quality content and pre-roll
           | ads to claim increase in platform engagement and viewership?
           | 
           | The only thing matters is revenue and Google had increases in
           | acquisition costs in prior revenue reports. Expect to see the
           | data points for the latter metrics to be highlighted on the
           | earnings announcement, and a record quarter for YT coming out
           | of the change.
        
         | xnx wrote:
         | A lot of people forget that one of the inputs to the Google
         | ranking algorithm is input from human quality raters who work
         | off of an extensive, 172 page, guide that Google publishes and
         | updates for anyone to read:
         | https://static.googleusercontent.com/media/guidelines.raterh...
        
           | visarga wrote:
           | Apparently the "human quality raters" never found the sites
           | reported in this thread.
        
         | whiplash451 wrote:
         | I understood PG's point differently. My understanding is that
         | he is suggesting an angle of attack in which carefully crafted
         | manual reviews (that do dot scale) can be used to bootstrap a
         | product that does scale thanks to something else (e.g.
         | collaborative filtering). All of this being on a niche domain
         | where you can drive a wedge into the mediocre performance of
         | Google (online shopping probably being the worse possible
         | choice, but there are many others).
        
         | acdha wrote:
         | Your point is good but I'm not sure I'd say very good given how
         | easily the same SEO spam domains can stay at the top of search
         | results for ages simply by scraping someone else's content.
         | What I'd be most interested in knowing is what their success
         | metrics are defined as -- for example, how much of a problem
         | does Google's management consider it if someone searches, finds
         | the answer they were looking for on someone's Stack Overflow
         | rip-off, and stops searching? I could easily believe that a
         | significant amount of what we're seeing here is that they're
         | focused on some kind of user frustration metric which doesn't
         | include things like damage to other businesses.
        
           | paulgb wrote:
           | Yes, I've noticed this particularly with technical results. A
           | lot of sites seem to have scraped StackOverflow and GitHub
           | issues, put a crappy ad-loaded interface around them, and
           | somehow out-rank the original SO/GitHub content.
           | 
           | It's like the bad-old-days of ExpertsExchange, which somehow
           | was never delisted by Google for its shady SEO tactics.
        
             | stevenally wrote:
             | They outrank the original content because google is
             | corrupt.
        
             | wott wrote:
             | > A lot of sites seem to have scraped StackOverflow and
             | GitHub issues, put a crappy ad-loaded interface around
             | them, and somehow out-rank the original SO/GitHub content.
             | 
             | Some even made slideshows of SO screen captures and put
             | that on Youtube, with a fake video or spoken intro to make
             | believe an actual content will be discussed... A number of
             | shameless people would go any length to grab bits of money
             | anywhere and anyhow, and I've hit those links a couple of
             | times.
        
             | coldpie wrote:
             | You just have to look at Google's profit motive here. Their
             | motive isn't to provide quality search results. Their
             | motive is to show users ads, either in the search results
             | themselves or on the destination sites via their ad
             | network. The SEO spam sites aren't a bug, they are a
             | feature of Google's profit algorithm. Google's search
             | quality will never improve so long as their motivation is
             | to show you ads. Why should it? Competition may help here,
             | either by an outsider like the OP suggests, or via breaking
             | Google up with anti-trust enforcement, or both (my
             | preference).
             | 
             | As a user, your best personal and ethical move is to
             | install an ad-blocker, to make ad-based business models
             | less viable, which will help promote business models that
             | don't abuse the customer.
        
               | the_other wrote:
               | The core problem, I guess, is that search engines view
               | all their results as ads. That's why they got into the ad
               | business in the first place.
        
               | nickff wrote:
               | > _" The core problem, I guess, is that search engines
               | view all their results as ads. That's why they got into
               | the ad business in the first place.  "_
               | 
               | This seems a bit overly cynical. Some search engines only
               | served ads, but they're long gone. The survivors are
               | those who dedicated themselves to finding links which
               | were responsive to people's search intent. They seem to
               | have gotten into ads because it was the best business
               | model in this market.
        
             | acdha wrote:
             | > It's like the bad-old-days of ExpertsExchange, which
             | somehow was never delisted by Google for its shady SEO
             | tactics.
             | 
             | This is really what made me suspect that Google was
             | teetering on the edge of the MBA death spiral: these
             | problems run for years when they'd be easy to block, which
             | suggests to me that whatever metric gets you a bonus /
             | promoted doesn't include things like that which are long-
             | term threats to their core business even if it's selling a
             | lot of ads short-term.
        
         | cft wrote:
         | The search results markedly worsened in the last 5 years. Why
         | could they keep up with SEO spam until 5 years ago, and now
         | they can't? Their revenue has been growing dramatically, so
         | they could proportionally increase the allocation. It's
         | probably because the focus of their HR/changing workforce is
         | now elsewhere: maybe fighting "disinformation": both COVID and
         | political. Those efforts were non-existent 5 years ago.
        
           | specialp wrote:
           | I think it is also no longer in their interest. If you look
           | at their mobile results now, there are sometimes no search
           | results for webpages, just ads, and their automatically
           | extracted data. So, it is in their interest now to have the
           | search for non-advertisers to be bad. Eventually people will
           | consider those results junk and just use the google extracted
           | data/people who paid to go up.
        
         | behnamoh wrote:
         | Most comments focus on the technical side of things, whereas
         | I'm sure there are also legal restrictions involved in this. If
         | Google delists a website on the grounds that it's a copycat of
         | stack overflow, or because they have low quality content
         | according to Google's taste, there might be lawsuits filed
         | against Google, claiming that the company is discriminating.
        
           | bell-cot wrote:
           | In which countr[y|ies] does Google _not_ have the discretion
           | to decide that certain sites  / pages / etc. "belong
           | considerably further down" in Google's search results pages?
           | Seems to me that sorting the search results to #1, #2, #3,
           | etc. is pretty well baked into their basic product.
        
         | 8ytecoder wrote:
         | I agree it's a hard problem. I don't agree it's "really really
         | good". I regularly encounter obviously scammy websites. With
         | Google's js execution capabilities I'd assume they can detect
         | that. I'm talking about the VPN install pop ups and so on.
         | Right now there's a whole bunch of GitHub.Io hosted sites
         | that's doing that. It's not even porn. It's home decoration
         | stuff.
        
         | noduerme wrote:
         | How hard is spam, really. If you're Google? Here's what I would
         | do as a heuristic (uh, not evil?): We know everything about you
         | and everywhere you've visited and everyone you've talked to in
         | the last 60 days. We know all their phone numbers and email
         | addresses. We even know the girl's phone number you met at the
         | bar, who didn't give you her phone number. So if any of those
         | people email you, we'll categorize that as "not spam". Also, if
         | it's your boss or a coworker, "not spam". If it's a major
         | company that's existed for more than ten years, not spam.
         | Everyone else, spam. Done.
         | 
         | This is hyperbolic, right? But they can solve spam in a split
         | second, if they just admit they're watching you all the time.
         | 
         | [edit] /s thx for reading to the end, folks.
        
         | dageshi wrote:
         | I said this in a similar thread yesterday, but I think this is
         | an unsolvable problem because much of the content either no
         | longer exists in website form or is old.
         | 
         | To put it simply, a new generation of the people who used to
         | make the reliable niche websites that not just answered your
         | questions but also helped you learn a particular topic have
         | moved to youtube instead.
         | 
         | Google search is hollowing out as a result with the meat going
         | and the SEO'd fluff that kinda answers the question but ONLY
         | the direct question being asked with none of the wider
         | expertise that more educated people in what they were searching
         | for.
         | 
         | Of course google owns youtube as well.. so perhaps they just
         | see it as an inevitable transition.
        
           | numpad0 wrote:
           | Is that...essentially a Proof-of-Work system...
        
           | neom wrote:
           | Just a note on that, youtube search is finally getting
           | better, yesterday I noticed it was able to find key words in
           | the middle of a lecture that had nothing in the title or
           | comments. I always wonder about their AI transcription
           | service, it's gotten so good, if they're storing all that
           | audio as text, I guess their search is going to get
           | excellent?
        
         | JohnJamesRambo wrote:
         | I don't doubt it is hard but I'm forced to sign into Google now
         | pretty much, just let me rate results and ban domains again
         | etc. You will solve the seo problem really quick and start
         | giving me results I want.
        
         | tablespoon wrote:
         | > The funny thing is that if the people who worked on spam at
         | Google were free to talk about it, I'm sure it would become
         | evident that they know more about spam and anti-spam efforts
         | than anybody else in existence.
         | 
         | That may be true, but I think one of the good points made on
         | the OP is that it might actually be cultural constraints that
         | keep them from solving the problem:
         | 
         | https://twitter.com/paulg/status/1477761335412809729:
         | 
         | > You might need to do a lot of manual spam fighting initially.
         | That could be both the thing-that-doesn't-scale, and the thing
         | that differentiates you by being alien to Google's DNA. (They
         | must hate manual interventions; so inelegant).
         | 
         | Google has some very smart and knowledgeable people, but the
         | things they do have to fit into certain boxes, which means
         | there are some problems they just can't fix, e.g.
         | 
         | * Everything has to be automated at scale, which leads to
         | consistent poor user experience (unappealable account closures
         | initiated by inscrutable algorithms, SEO spam).
         | 
         | * You get promoted by building new products, not maintaining
         | existing ones, which leads to self-defeating churn outside of
         | core areas (e.g. abandoning Google Talk and squandering their
         | position in the messenger market).
         | 
         | * etc.
        
         | wakiza33 wrote:
         | Spam, yes, but Google has also made meaningful shifts that are
         | clearly directed from the top-down. It's much harder now (imo)
         | to get specific results, they've overall started looping SERPs
         | into broad answers.
         | 
         | This is def a user-engagement strategy -- but it has cons as
         | well.
         | 
         | Part of the complaints in the thread were spam related, other
         | were something deeper
        
         | RealityVoid wrote:
         | The problem, IMO, might be the monoculture we have around
         | search. Because Google is soo big, it's enough for spammers to
         | target it and they have the vast majority of the search
         | visibility. If we had better, more diverse competition, that
         | might manifest as a tradeoff, presumably, they would have
         | competing and diverse criteria so you would probably not be the
         | top result on _all_ dominant search engines. SEO spam needs
         | upkeep and attention to latest algos, else it decays. Competing
         | algos would yeld better results for everyone. Maybe Google is
         | just ripe for a shakeup.
        
           | jefftk wrote:
           | Doesn't your model predict that Bing would have substantially
           | less SEO-gamed results?
           | 
           | (Disclosure: I work at Google, but not on search)
        
             | RealityVoid wrote:
             | Well... Yes, it should. But, no, it seems it does not. I
             | thought about this when typing it but did it anyway, maybe
             | because I thought there is still something worthwhile
             | there.
             | 
             | I still think the model could work if the algorithm is
             | sufficiently different than Google's. Ideally, people would
             | go "I did not find anything I cared about on Google, I
             | know, I'll use Bing!" - but nobody does this, because the
             | results are consistently worse.
             | 
             | Don't get me wrong, I like G as a company, I think they do
             | worthwhile things! But they have left things slip and need
             | competition into this field, I mean real competition, then
             | maybe they would actually address issues.
             | 
             | Maybe the issue is also on the incentive level as well. I
             | mean more searches means more eyeballs and more money for
             | Google. If someone searches one thing and they are done
             | that is less interaction! I hope they don't work like this,
             | but it's possible.
             | 
             | And another possible problem is the opposite. Maybe Google
             | is optimizing search for what it thinks people want, but it
             | uses the wrong metric. Or it gives people what they want
             | but not what they need.
        
         | zozbot234 wrote:
         | Legitimate sites could help a lot by adding machine-readable
         | descriptions of their content, per the schema.org spec. The
         | richness of these descriptions means that this is effectively a
         | "hard", non-forgeable claim to being a worthwhile, non-spam
         | source (quite unlike the old META tags that got abused to death
         | pre-Google). Of course spam sites could simply _lie_ in their
         | schema.org tags, but the lies are easy to spot (with combined
         | machine- and human-review) and then they just get banned. It
         | makes it a _lot_ harder (and hopefully infeasible) to SEO-spam
         | by just copying random content.
        
           | frenchyatwork wrote:
           | A lot of what counts as spam these days isn't something like
           | "I search for bicycle reviews and get penis enlargement
           | pills", it's more like "I search for bicycle reviews and get
           | some blog who searched Amazon for the 5 most popular bikes
           | and posted links to them with a little blurb and called it a
           | 'Review'".
           | 
           | These sort of things are easy to spot, but only if you
           | actually have a basic amount of familiarly with the topic.
           | It's hard to spot with "AI" or super-cheap labor.
        
         | rightbyte wrote:
         | Really? I don't think the bureaucratic bloat at Google cares
         | and the original authors of the search engine in its current
         | incarnation are probably long gone. It is maintenance mode and
         | I don't think they dare touch too much.
         | 
         | It takes time and effort to build up a spam site's ranking but
         | it is trivial to blacklist those who get to the top.
        
         | Sebb767 wrote:
         | I agree. You can say a lot of bad things about Google, but they
         | definitely have some of the smartest and highest paid engineers
         | working on their search. Plus there are already a lot of people
         | trying to compete with Google and so far, no one seems to
         | provide consistently better results.
         | 
         | The only advantage a startup might have is that they could do
         | completely new concepts, such as specifying what area you
         | search in, allow you to modify their classification of your
         | query and/or moderating sites you include - which is probably
         | necessary anyway, since you'll hardly have the budget to fully
         | index the web. I'm not saying it's impossible, but it's not
         | going to be easy at all.
         | 
         | And after all of that, you still need a way to make some money.
        
         | goatherders wrote:
         | This x100000.
         | 
         | There is no scenario - none - where thousands of engineers at
         | Google working on search wake up in the morning and say "we
         | sure have made it good enough wr2 SPAM. I think I'll have
         | another Danish."
        
           | compiler-guy wrote:
           | When the cafes were open, you can bet they said, "I'll have
           | another Danish, and then get back to work on this problem
           | that never seems to go away."
        
             | techdragon wrote:
             | I sure wish I had problems that were totally unsolvable,
             | they are so easy to measure progress on. /sarcasm
             | 
             | I think it's more likely that because they are just
             | building hundreds of tiny tweak experiments and it's
             | someone else who desides what to build and if it even
             | worked. Search quality is such a meta-problem that it goes
             | beyond any real hope of simply working on it in anything
             | beyond piecemeal trial and error fashion on their dataset.
        
               | siva7 wrote:
               | What is a danish?
        
           | UncleMeat wrote:
           | Especially since "made a change that improved search result
           | relevance by X%" is an _extremely_ compelling story for
           | promotions. If indeed there is a launch-driven culture for
           | promos at Google then there 'd be extra incentive for new
           | mechanisms to reduce low quality search results.
        
           | ericbarrett wrote:
           | I agree with this and the grandparent comment wholeheartedly.
           | That said, there's a kind of institutional blindness that can
           | build up in companies--especially ones that dominate their
           | sector. It may have roots in intransigent upper management,
           | ossified and inflexible process, wide-scale burnout, a
           | culture of passing the buck, or any number of other
           | pathologies.
           | 
           | I don't claim that Google has any of these and certainly have
           | no insight into their search group. But I've personally been
           | at powerful companies with best-of-the-best talent that were
           | blind to the decay in their own living room, so I would
           | caution against immediate dismissal of PG's take.
        
         | judge2020 wrote:
         | Also, i'd be very surprised if they didn't have tens of
         | thousands of workers aiding in spam review already.
         | 
         | The hard part in all of this isn't finding and stopping spam -
         | it's defining what spam is. Are all the pie recipes where
         | there's a 2000 word essay about their grandma at the top
         | 'spam'? They still have the recipe, and Google Home devices
         | pick up the recipe instructions just fine so people end up not
         | reading it, but many people would still consider that spam
         | since it adds such an obstacle to getting the information you
         | want. Same for cnet articles like "Best smart home devices to
         | buy in 2022" - it's a reputable brand with a list of smart home
         | devices, but it's hardly a review and exists to funnel people
         | to their Amazon affiliate link.
        
           | awillen wrote:
           | AFAIK the 2000 word essays in recipes are Google's fault - it
           | prioritizes pages with a lot of content, so you have to add
           | that junk to the top in order to rank highly. While I'm sure
           | there's more going on behind the scenes than I'm aware of, it
           | does seem like the rules could be altered on a category-
           | specific basis where a lot of text isn't necessarily a
           | positive.
        
             | kevinmchugh wrote:
             | Recipe intro text is useful for contextualizing the recipe
             | and copyright purposes. In RSS days, it was a way to get
             | readers to click through, so the author got the ad views.
             | Also people who write recipes like to write about food.
        
             | sct202 wrote:
             | While I think the essays are excessive, I appreciate that
             | some of them document that the blogger actually made the
             | recipe with progress pictures. With the more basic recipes
             | websites, I wonder if anyone's actually made it before or
             | if the recipe is from some scrapped database of unknown
             | origin and quality.
        
             | dredmorbius wrote:
             | This reminds me of the page inflation that struck tech
             | books during the late 1990s / early aughts. The Marketing
             | Wisdom was that fat books sold (or took up more shelf
             | space), so texts got padded with weak writing, gratuitous
             | puffery, and other elements, which (much as the recipie
             | essays) simply got in the way of delivering actual
             | informative content.
             | 
             | (The fact that many of these books were rushed out with
             | very poor quality control also didn't help.)
        
             | UncleMeat wrote:
             | This one is hard because it does actually seem to be the
             | case that the cruft around the recipe is valuable if the
             | content is right. Most of the recipe blog stuff is garbage,
             | but if you look at youtube it is clear that creators who
             | add extra flair around the recipe are a powerful force.
        
             | behnamoh wrote:
             | Years ago I wanted to pursue micro blogging, but this
             | "feature" of Google search stopped me from doing it.
             | 
             | What's the point of writing succinct, to-the-point mini
             | articles about problems and solutions if nobody finds them
             | on Google?
        
               | skilled wrote:
               | This is largely because micro-blogging means less
               | content, and less content means you could write five
               | 300-word blog posts instead of one 1,500-word post.
               | 
               | I've done blogging for the last 10+ years, and many of
               | those I spent as a freelancer working with
               | startups/brands/editorials. Everyone is after "word
               | count" and I absolutely hate it.
               | 
               | Whenever I work on articles for my own blog, I just don't
               | consider word-count at all. I think if your content is
               | great and informative, then readership will be natural.
        
               | vntok wrote:
               | This is a very interesting approach. Do you have traffic
               | data collection on your blog?
        
               | skilled wrote:
               | I collect post views, but not using Google Analytics or
               | anything like that. I built a pretty substantial
               | developer blog (tips, resources, etc,.) back in 2014. I
               | think it peaked at around 350,000 monthly visitors after
               | 12 months.
               | 
               | Later on, I sold it because I needed the money. Not so
               | much that I didn't want to keep working on it.
               | Unfortunately, the new owners didn't have any idea how to
               | maintain a "healthy" content blog, and it has plummeted
               | down to around 30,000 monthly visitors. All the content
               | they're publishing now is some thin headline-clickbait
               | bullshit.
               | 
               | I even gave them free advice on how to fix it, but I
               | think that for a lot of people, they just don't care and
               | will mindlessly pump out as many pieces of content as
               | possible. And such blogs can be identified from a mile
               | away.
               | 
               | And therein lies the problem with Google SEO at the
               | moment. Even myself, someone who has done SEO work for
               | more than a decade, I can see that results are getting
               | worse. In some niches, the same crappy articles that
               | dominated 6-7 years ago are still dominant today.
               | 
               | I guess we're stuck in time, or so Google thinks.
        
               | behnamoh wrote:
               | Could it also be due to reduction of public interest in
               | blogs over the past few years? Most stuff are now
               | published in the form of vlogs instead of blogs. I do
               | miss the good old blogs era, tho, and I wish there were
               | still high quality blogs around.
        
             | throwawayboise wrote:
             | It's two-fold. If Google priortizes pages with a lot of
             | content that's one thing, but longer content also means
             | more space for ads, or more scroll events to trigger ads,
             | etc.
             | 
             | Incidentally, prioritizing long content seems odd to me, in
             | my experience the best pages are short and get right to the
             | point, at least in the context of something like a recipe
             | or other "how to" resources.
        
             | wakiza33 wrote:
             | prioritizes is correct, but in some ways it's not the best
             | descriptor.
             | 
             | Google's algos, while advanced, still rely a ton on text to
             | actually tell what the page is about. They need it.
             | 
             | If they just relied on other factors (title, links,
             | website, etc.) they would end up with worse results for
             | users. Im sure they've tested it.
             | 
             | Google's core algo in a lot of ways is much simpler than
             | people think (in other ways of course it's very complex).
        
           | giaour wrote:
           | Compounding the problem, the 2000 word essay is sometimes
           | really useful if it's describing a technique used in the
           | recipe (cf Stella Parks' recipe for homemade bagels on
           | Serious Eats: https://www.seriouseats.com/homemade-bagels-
           | recipe). But somehow only spammy blogs with plagiarized
           | recipes, AI-generated "essays," and affiliate links for every
           | ingredient and tool used make it into the first page of
           | results on Google (or DDG, for that matter).
           | 
           | At some point, Google must have moved away from using site-
           | level reputation in search rankings, as I almost never see
           | recipes from reputable sources like King Arthur Baking,
           | Serious Eats, or Food52 in the first page of results.
        
           | IshKebab wrote:
           | Yeah, the newest nuisance seems to be sites that clone Github
           | Issues and StackOverflow with a crapper interface. Somehow
           | they rank higher than the original sources. I'd say it's spam
           | but it's definitely not traditional spam.
        
             | joshuaissac wrote:
             | And the strange Wikipedia mirrors that are shown in Google
             | Verbatim searches instead of the original. If I disable
             | Verbatim, they disappear and I get regular Wikipedia
             | instead.
        
             | notreallyserio wrote:
             | I'm not going to say solving spam programmatically is easy,
             | but the gitmemory garbage site (for one example) has been
             | around long enough that there's no excuse for not
             | downranking or removing it. How hard could it possibly be
             | for humans to spot these few sites and nuke em? I'm sure
             | Google engineers see them all the time.
        
             | behnamoh wrote:
             | Somehow you got down votted by their creators here :)
        
           | joshuaissac wrote:
           | > The hard part in all of this isn't finding and stopping
           | spam - it's defining what spam is.
           | 
           | This is one area where Google could use personalised results
           | to provide a better experience for the user. Let me decide
           | what spam is for me. Let me mark results as good or bad, so
           | that the algorithm knows what kind of pages should be
           | prioritised or filtered out the next time. Google SearchWiki
           | was a step towards this but they killed it off.
        
             | nathanyz wrote:
             | Is conservative leaning info spam or not spam? What about
             | liberal leaning info?
             | 
             | We have seen what this leads to inside the social networks
             | as well as YouTube, and at a macro scale I think we might
             | want to have a shared concept of what constitutes a good
             | search result for a given query.
             | 
             | At micro scale, it can seem more optimized to get exactly
             | the type of result you want, but if we take an absurd
             | example like an Apple Pie recipe shouldn't we all have
             | shared understanding of what types of ingredients would
             | make for an Apple Pie?
             | 
             | The shared understanding, I believe, is core to
             | communication. If all of us have our own specific ideas of
             | Apple Pie, then who is actually right on what an Apple Pie
             | really is? What happens when your search results insist
             | that an Apple Pie doesn't actually have apples in it, but
             | instead pears?
        
           | ehnto wrote:
           | > and Google Home devices pick up the recipe instructions
           | just fine so people end up not reading it
           | 
           | I think this isn't entirely related, but that's perhaps the
           | beginning of a bias you might end up having that everyone
           | experiences technology in the same way as it marches on. I've
           | yet to encounter a Google Home in the wild, I imagine far
           | more people are consuming recipes on phones, tablets and PCs.
        
           | thewarrior wrote:
           | Let's have niches where the content is hand curated by human
           | beings instead of pure statistics by machines.
           | 
           | Hmm why stop there let's actually make the users do the
           | curating and even the content creation by rewarding them with
           | social validation. Let's have hard working moderators who
           | work on the community full time.
           | 
           | Then we could just build a search engine over it. We could
           | call it Reddit. Or HackerNews.
           | 
           | Maybe the users aren't all as good as professionals at
           | curating the information. Let's hire professionally trained
           | curators pay them well and we could call them newspapers.
           | Then we can come in disrupt them and replace them with an
           | algorithmic marketplace that eventually becomes infested with
           | click bait.
        
         | hardtke wrote:
         | The curated search results business model doesn't work. Google
         | gives "aggregators" and other search engines the death sentence
         | for organic search traffic from economically meaningful
         | queries, so you'd get no free traffic. This is one of the major
         | antitrust complaints against Google in the EU. Since you get no
         | organic search traffic, you need to build a brand using
         | advertising, and once you start down that road you need to
         | monetize the first click which compromises the quality of your
         | site.
        
           | magicalist wrote:
           | > _This is one of the major antitrust complaints against
           | Google in the EU._
           | 
           | The complaints I've read are from exactly the kind of
           | generated content farms people are complaining about in this
           | thread.
        
         | Alex3917 wrote:
         | > But they aren't free to talk about it, because if they did it
         | would just give more assistance to the spammers, and make the
         | problem worse.
         | 
         | The reality is more that some Google engineer will come up with
         | an algorithm change that makes the result 40% better, but it
         | will come at the expense of making that search 3ms slower so
         | the change won't get merged. Or it will make the results worse
         | for some niche set of queries that the business team really
         | cares about, so again it won't get merged.
         | 
         | There are lots of consumers who would gladly pay $1 a month or
         | whatever in order to use a couple extra milliseconds of compute
         | power per per search in exchange for drastically better
         | results, so there is lots of room for a startup to compete.
        
           | zozbot234 wrote:
           | > There are lots of consumers who would gladly pay $1 a month
           | or whatever in order to use a couple extra milliseconds of
           | compute power per per search in exchange for drastically
           | better results
           | 
           | Google has a paid-for Search API, so they _could_ do that if
           | they chose to pursue it. And then they could let Google One
           | users opt-in to the same thing via ordinary Search. I 'm not
           | sure whether Bing has anything equivalent.
        
         | stoicjumbotron wrote:
         | Highly OT, but if a technical person (not at a managerial
         | level) involved in tackling spam at Google were to leave the
         | team, are they allowed to work on the similar problem space at
         | a different company?
        
       | baby-yoda wrote:
       | search ads responsible for the rise of google search[1], content
       | ads (seo spam) responsible for google search's fall?
       | 
       | my guess is the rate of spam content production far outpaces the
       | rate of original content creation. so the power law concentrates
       | even further in the tiny percentage of OC and a moat forms around
       | them (highest ad $, highest authority/authenticity).
       | 
       | where do we end up 5 years from now? further consolidation and
       | the continued return to aol style portals (telco/media giants and
       | fast-lane to own content?) pay-to-access silos dominating the
       | internet?
       | 
       | [1] oversimplifying a bit of course, there was a novel ranking
       | method that was more than accurate enough, and it scaled, which
       | allowed for the search ad business to go gangbusters.
        
       | gomox wrote:
       | I believe that the only moat protecting $100B of AdWords revenue
       | is the quality of the Google Search results. There is no
       | meaningful switching cost to using a new search engine, and the
       | spend inertia in ad spend is not very significant (i.e. any
       | online marketing manager will happily spend 5% of their budget in
       | a different search engine adwords-like program if they get better
       | ROI, there is no incentive the be a "Google Ads only shop").
       | 
       | On the other hand, Google needs to maintain the ballistic
       | trajectory of its revenue growth. So how can they fix search
       | quality when they've minmax'd themselves into this situation in
       | the first place? If they were to make the ads background yellow
       | again, that would have negative short term effects that I doubt
       | any career exec can stomach.
        
         | dredmorbius wrote:
         | No.
         | 
         | The other moats are lock-in to advertising networks, website
         | metrics, and effective control over Web standards through the
         | Chrome browser.
         | 
         | An alternative search platform might provide better search. It
         | would be fighting Google on at least three other fronts. It
         | might have some success, but it would be challenging. (As
         | history largely demonstrates.)
         | 
         | Even a rival tech monopolist, Microsoft, _barely_ holds even
         | with its own search offering (I use that indirectly via DDG),
         | and scrapped its own web-browser development
        
           | gomox wrote:
           | I mean, Google is big and it has that advantage like any big
           | enterprise, but the search engine market is very permeable
           | compared to what people traditionally refer to as a moat
           | (say, trying to compete with YouTube as a video hosting
           | platform or with Salesforce as a CRM).
           | 
           | If you have a good search engine, people will flock to it,
           | and search ads will be valuable. That's it. That's how Google
           | became Google.
           | 
           | The fact that Microsoft couldn't do it honestly doesn't mean
           | much. Microsoft also couldn't do a phone OS, a portable music
           | player, and many other things. They have a complex web of
           | conflicting interests that $SEARCH_ENGINE_STARTUP does not.
        
             | dredmorbius wrote:
             | Just for the record, I'm in at least mild agreement that
             | search is looking increasingly vulnerable. Google are
             | falling down here.
             | 
             | It's just that "search" is really a web of interrelated
             | services, capabilities, and revenue streams, and they tend
             | to reinforce each other strongly. I'd like to see the
             | monopoly disrupted.[1] But I don't think it's _just_ a
             | matter of  "build a better mousetrap^Wsearch engine."
             | Attack one corner, and Google will snipe at you from the
             | others.
             | 
             | And with the AdWords cash cow, they've got an immense
             | revenue stream.
             | 
             | ________________________________
             | 
             | Notes:
             | 
             | 1. Well, mostly. Google's acquired so goddamned much
             | personal data that the premise is frankly kind of
             | terrifying as well --- a weakened Google with neither the
             | revenues nor talent to defend that pile.... And I'd really
             | like to see the toppling occur _without_ simply raising a
             | new monopoly in its place.
        
         | pythux wrote:
         | " There is no meaningful switching cost to using a new search
         | engine."
         | 
         | Unfortunately, defaults matter and Google is spending billions
         | of dollars yearly to make sure they are the default search
         | engine wherever they can. Most people don't switch from
         | default.
        
           | gomox wrote:
           | The way I see it that's only a problem if you want to be
           | bigger than Google. But you can get to $1B in revenue with
           | just the deliberate adopters (i.e. under 1% of the market).
        
       | yuliyp wrote:
       | Was this linked for the irony of everyone spamming replies
       | advertising their startups which don't solve the problem but
       | kinda-sorta do, resulting in something hard to read and
       | understand?
        
       | dang wrote:
       | This was in response to mwseibel's thread, which had a big
       | discussion yesterday:
       | 
       |  _Google no longer producing high quality search results in
       | significant categories_ -
       | https://news.ycombinator.com/item?id=29772136 - Jan 2022 (1167
       | comments, spread over multiple pages - note the "X more comments"
       | links at the bottom)
        
       | fuckcensorship wrote:
       | Go read any default subreddit on Reddit to see what this idea
       | would look like long-term, especially the "amateur police" part.
        
       | baby wrote:
       | Searching code is also impossible on Google. If there's a
       | competing search engine for that I'll use it at least for this
       | use case.
        
         | darinf wrote:
         | Give Neeva a try. We have improved ranking and some nice
         | features around tech queries.
        
           | baby wrote:
           | hot take: why would I need to enter my email to do a search
           | online? You already lost me :o
        
       | hammock wrote:
       | _> This may not just be a problem with Google but possibly also
       | the recipe for beating Google. A startup usually has to start
       | with a niche market. Why not try writing a search engine
       | specifically for some category dominated by SEO spam?
       | 
       | >You might need to do a lot of manual spam fighting initially.
       | That could be both the thing-that-doesn't-scale, and the thing
       | that differentiates you by being alien to Google's DNA. (They
       | must hate manual interventions; so inelegant)._
       | 
       | Is he describing...Yahoo circa 1994? A manually curated directory
       | service.
        
         | [deleted]
        
         | tonyedgecombe wrote:
         | I'm starting to think Yahoo circa 1994 might be better than
         | Google today.
        
           | behnamoh wrote:
           | I wouldn't just complain about Google. Google search results
           | mostly reflect a deeper problem with the web today. I do miss
           | the simplicity of the 2000s.
        
         | edoceo wrote:
         | dmoz!
         | 
         | https://en.m.wikipedia.org/wiki/DMOZ
        
         | matt_heimer wrote:
         | I always used DMOZ more than Yahoo! Directory. It looks like
         | [dmoz](https://en.wikipedia.org/wiki/DMOZ) became
         | https://curlie.org/ which is still active.
        
         | [deleted]
        
         | loceng wrote:
         | And makes me think that StumbleUpon had a similar curation
         | ability, in that the value qualifier is how often [hopefully]
         | real people interact with content - tracked by who's using SU
         | and agreed to allow tracking; can't remember if sharing that
         | was optional or not?
         | 
         | The gamification of the system then would have to come through
         | onboarding fake users, pretending/mimicking real user behaviour
         | to send that signal into the system; not sure if SU ever ran
         | into that problem or was actively paying attention to trying to
         | identify and removing fake or suspicious signals from their
         | output?
         | 
         | I feel a much better system is easily within reach, it's simply
         | getting the right structure to it, the right foundation, and
         | then it will quickly take off due to the quality difference.
         | I've already figured out a design pattern that Twitter and
         | Facebook has indoctrinated us with, making us think it is
         | normal - and keeping us blind to an actual normal way or
         | organizing or communicating, but that isn't conducive to
         | control or ad revenues - and so extending my future plans to
         | include a better search-directory system would fit snugly into
         | my efforts.
        
           | visarga wrote:
           | SU was a great way to surface random interesting stuff. I bet
           | most blog entries today could be picked from Twitter, even if
           | they are unlinked.
        
         | gorbachev wrote:
         | Some of the examples used in the Twitter thread Paul was
         | referring to would be better served by a manually curated
         | directory service with a possible addition of a search engine
         | only surfacing content from the sites in the directory.
         | 
         | For health information and recipes in particular there are only
         | a handful of really high quality sites that have quality
         | content for 95% of the information most people need. I bet if
         | you wanted to increase the coverage to 99%, that list would
         | expand to less than a thousand sites. At those numbers manually
         | curating the information would be easily achievable.
         | 
         | How to get people to use your top notch Google replacement
         | instead of Google, however. That's the hard problem.
        
           | basch wrote:
           | Isn't that what google Programmable Search is?
           | 
           | https://cse.google.com/cse?cx=dc408db269da4e769 (try
           | searching for something you want a review of)
           | 
           | Make a search, whitelist the domains. Every time you run into
           | a good review site, add it to the searchable list.
        
             | DangitBobby wrote:
             | That's all fine and dandy, but the goal isn't to just make
             | some good sites a bit easier to find, it's to keep the top
             | of your search results from being interspersed or
             | superseded by SEO spam. Unless I misunderstood your
             | suggestion.
        
           | llaolleh wrote:
           | It's really hard to get people to use something other than
           | Google. If you were to launch such a product, it would have
           | to be so much better that people recommend it organically to
           | other people.
        
             | visarga wrote:
             | Should be able to run on top Google in a browser extension
             | to insert itself only when the topic allows.
        
             | Rastonbury wrote:
             | That was what Google was to Yahoo/Altavista back in the
             | day, a 10x improvement. Reading this thread, people feel
             | pain enough do all sorts of hacky stuff - appending 'reddit
             | 'or 'forum' to queries, blacklisting spam domains,
             | switching search engines depending on topic. If G keeps
             | declining and a new product does things better, the penny
             | will drop and people will swap.
             | 
             | Siebel and PG see blood in the water no doubt, they see G's
             | market share and want to fund companies to take some of
             | this.
        
         | noduerme wrote:
         | He's right as often as he's wrong
        
           | sillysaurusx wrote:
           | What has he been wrong about?
        
             | [deleted]
        
             | noduerme wrote:
             | Okay. I was gonna respond with something snarky about what
             | a crappy mod he was, or how he was a Saint and you don't
             | deserve to worship at his functional feet. But I'll tell
             | you what he was wrong about: He was, as a leader and a
             | human and a mod, _petty_. He pied pipered himself into a
             | sweet spot and no one would deny he 's a good coder, but
             | there the ego took off and forever left behind a skidmark.
             | The cool exterior, the sense of self-importance, the
             | punching down, above all the love of spreading one's
             | revelatory wisdom to the poor little guy; you can love that
             | sort of thing too much, and he did. Perhaps you weren't
             | here or didn't interact with him directly. In my view he
             | became dismissive and derogatory toward people who
             | worshiped him (like you) once he acquired a small degree of
             | fame.
        
               | Jorengarenar wrote:
               | What in the world are you babbling about? Who here does
               | worship who?
        
       | wenbin wrote:
       | FYI - Google hires 10,000+ search result raters [1], who are
       | contractors, to evaluate search result quality.
       | 
       | In an ideal world, you build a thing, and it's done. It runs
       | automatically and prints out money.
       | 
       | In reality, you still need human labors to do manual tasks, even
       | in tech industry.
       | 
       | [1] https://www.searchenginejournal.com/google-eat/quality-
       | rater...
        
         | Ros2 wrote:
         | Thanks for posting this. An acquaintance of mine did this job 6
         | years ago and I wasn't sure if it still existed.
         | 
         | Crowd-sourced humans are making Google appear more intelligent
         | than they actually are. I always envisioned that spam efforts
         | would just immediately set off an alarm that would be handled
         | by a bot to blacklist you without a human even knowing your
         | site existed, but there still seem to be at least a few ways to
         | game Google's search ratings.
        
       | heisenbit wrote:
       | Affiliate links are environmental toxic waste and it would be
       | only logical to tax such affiliate payments to fund cleanup and
       | mitigation efforts.
        
         | cblconfederate wrote:
         | Who would write good or even decent content for free?
        
       | waynesonfire wrote:
       | what a great idea.
        
       | donio wrote:
       | What I am looking for is control over the results. Personalized
       | blacklists and lists of sites to be (de)prioritized and also the
       | ability to subscribe to community curated versions of the same.
       | 
       | And to be clear I want to be able to control these myself, not
       | algorithm trying to guess my preferences. No guessing, just do
       | what I tell you to.
       | 
       | Multiple search profiles with different priorities would be nice
       | too.
       | 
       | I would like the search algorithm to be transparent, I should be
       | able to tell why I got a certain result and how I can avoid such
       | results in the future.
        
       | curiousllama wrote:
       | Good idea. You could start with fitness. Lots of high-quality
       | information out there that's entirely, 100% inaccessible via
       | google.
       | 
       | Over COVID, I did the whole fitness thing from a few different
       | angles (overhauled diet, trained for a marathon, now lifting
       | weights a lot). I found I could only find good info by going
       | directly to a trusted source - literally, typing http://www. like
       | I'm in the 90s or something. This is the exact issue a search
       | engine should solve, but Google doesn't.
        
         | thewarrior wrote:
         | Could you share your trusted sources ? Human search engine :P
        
           | curiousllama wrote:
           | JPG on Tik Tok, and Geoffrey verity schofield on
           | instagram/quora/buy his book.
        
           | ok123456 wrote:
           | https://www.boards2go.com/boards/board.cgi?user=tfannon
        
         | paulcole wrote:
         | Is your trusted source the same as my trusted source which is
         | the same as my neighbor's trusted source which is the same as
         | my Australian cousin's trusted source?
         | 
         | If not, at least one of us is going to hate this new search
         | engine.
        
         | nefitty wrote:
         | This is a problem I'm working on.
         | 
         | What sorts of unique things did you do that Google failed at?
         | Maybe you read through discussion sites or got tips from books
         | or something like that
        
           | curiousllama wrote:
           | Social media and trial and error. I knew a bit, and asked
           | some friends for advice. I used that to find people who said
           | stuff I thought made sense on Quora, Tik Tok, and Instagram
           | (ie they agreed with what I knew to be true and false, so I
           | could assume the other stuff they said was likely to be true
           | as well). I tried what they said, found what worked, and went
           | all in when I saw results.
           | 
           | Importantly, this was bottom up: it was largely
           | recommendation engines suggesting people I then filtered
           | through for what I wanted (running, bodybuilding) vs didn't
           | (traditional weight loss). I couldn't specify what I wanted,
           | or it would be garbage SEO spam.
        
             | nefitty wrote:
             | That's a new angle to me. I rely on pseudonymous content
             | like Reddit and HN. It does make sense to look for people
             | or groups focused on a larger topic like fitness.
             | 
             | That helps a lot! Thank you.
        
       | djoldman wrote:
       | Google knows how to surface relevant results and they choose not
       | to because they aren't optimizing for relevant results, they're
       | optimizing for revenue or profit within some constraints (don't
       | lose too many users, privacy, avoid actually terrible or
       | completely irrelevant results).
       | 
       | All the various suggestions in this thread plus far more complex
       | and insightful solutions are known to Google. Most of it boils
       | down to using automated user feedback to improve or measure
       | search result relevancy.
       | 
       | Google doesn't need to solicit user upvotes / downvotes to
       | improve rankings. They can monitor user clicks on results in
       | addition to analytics on the sites the users visit to determine
       | which sites are relevant to which searches.
       | 
       | Google doesn't optimize for search relevancy.
        
         | [deleted]
        
         | lubesGordi wrote:
         | Likewise Youtube doesn't optimize for relevant results, only
         | engagement to maximize ad exposure. The side effect of this is
         | polarizing content gets returned more than relevant content
         | (polarizing content being more engaging than relevant content
         | apparently).
        
       | Jenk wrote:
       | Single-page thread:
       | https://threadreaderapp.com/thread/1477760548787920901.html
        
       | notananthem wrote:
       | We still need a search engine that actually blacklists everything
       | serving ads. Google beat altavista, now we need to beat google.
       | 
       | I mean no mincing about- recipe sites that are ads are blocked.
       | Results with pixel tracker etc are blocked. Hell, results that
       | are paywalled are blocked because they're useless.
        
       | rhtgrg wrote:
       | I think pg is missing something important here. The reason Google
       | was able to beat Yahoo, Altavista, Ask, etc. was not just because
       | they had a better formula -- it was also because they started in
       | the era where 'search' was still seen as secondary to 'portals'
       | by the big guys. Had these companies known how important search
       | is to the internet back then, they would've copied Google's
       | secret sauce and crushed it long before it could suck up their
       | traffic.
       | 
       | This isn't going to happen again. Google isn't going to sit
       | around twiddling its thumbs while a competitor develops a better
       | algorithm.
       | 
       | You have to attack the problem from a different angle entirely
       | (make something that looks nothing like a search engine), I don't
       | think a niche market is going to be enough.
       | 
       | Perhaps you just want to make something that scares Google into
       | acquiring you, rather than actually bettering the situation. If
       | that's the case, I implore you to think of doing better ways to
       | spend your life.
        
         | titzer wrote:
         | > I don't think a niche market is going to be enough.
         | 
         | To displace a giant gobbling 180 billion dollars a year? Yeah,
         | no kidding. But nobody is asking for that. They are just asking
         | for decent search results.
        
           | [deleted]
        
         | jart wrote:
         | Altavista was the only search engine in the same league as
         | Google. So Google hired the guy who built it. Altavista
         | infrastructure couldn't scale beyond a single server, because
         | that's how DEC was, so it was a smart move for him.
        
       | legohead wrote:
       | It would work until you got big enough, then you'd end up
       | following the same path as Google, as that's where the money is.
        
       | hooande wrote:
       | What are some search categories that are so dominated by spam
       | that they are unusable?
       | 
       | I'll start: "how to rent a car" [0]
       | 
       | [0] worth noting that I personally get somewhat reasonable
       | results for this, with a 3rd result from nerdwallet.com and a 4th
       | from wikihow.com, both of which seem to answer the question in an
       | unbiased way
        
         | hammock wrote:
         | Nerdwallet and wikiHow are both SEO spam content farms. They
         | just happen to have above-average quality content.
         | 
         | They don't exist without a search engine.
        
           | hooande wrote:
           | Not sure how you're differentiating "seo spam content farm"
           | from "website"
        
             | hammock wrote:
             | That is part of the problem
        
       | imranhou wrote:
       | I believe google tracks click throughs from search results pages,
       | which should provide in theory plenty of insight into what links
       | aren't really working for specific keywords and what are... thus
       | helping improve or reduce rankings of SEO laden sites.
       | 
       | Wonder if someone can throw light on to why this isn't effective.
        
         | [deleted]
        
       | pwdisswordfish9 wrote:
       | > Lots of people want to be amateur police. And boy would Google
       | find it hard to follow you down that road.
       | 
       | Kinda like they tried with YouTube Heroes?
       | 
       | But then, who's to say you won't get the same kind of backlash?
        
       | rickdeveloper wrote:
       | I think a lot of this is due to Google both owning search and the
       | ads on the websites (AdSense). There's an incentive for them to
       | prioritize click farms (and other sites filled with their ads). I
       | think in general there may be a correlation between the number of
       | ads on a site and its usefulness to me, which is inverse to its
       | usefulness to google.
       | 
       | I'm curious what would happen if those products were split up
       | into 2 separate companies.
        
         | thomasmarcelis wrote:
         | I also can't help but wonder this. I'm certain people at google
         | search want to provide the best quality search results and do
         | this with integrity. But at some point in the business
         | hierarchy you are at a level where people set objectives for
         | both these departments ( search & ads ) and are trying to
         | optimise for things like total revenue/profit.
        
         | tonyedgecombe wrote:
         | Yes. In fact if you wanted to cut out the SEO spam then
         | delisting anything with Adsense would probably be a good start
         | for a competitor.
        
       | canyonero wrote:
       | I've been troubled by the just plain awful results being
       | delivered by Google search over the last few years. I think these
       | are just plain hard problems to solve and that Google is not
       | incentivized to solve. Google wants you to click on ads at the
       | end of the day, full-stop.
       | 
       | Often times I find myself searching for "best
       | ($product|$thing_to_do)" which I think many other people do as
       | well because we all want the best. Other times I'm looking for a
       | music or a book recommendation with some depth. This of course
       | nearly always leads to SEOd trash. There is no relevance nor is
       | there trust. So, I like others to use keywords like "reddit" or
       | "forum" to get to real humans who I trust and intentions are not
       | to sell via affiliate links.
       | 
       | These issues often lead to the need in finding trust in real
       | human-centered recommendations that stem from real human
       | interests and needs. I've never found an algorithmic solution to
       | this problem. This is why I think college radio stations or those
       | south-of-the-dial end up being so, so much better. And why beer
       | recommendations from your local brew-shop owner are better than
       | anything you can find on the net.
       | 
       | I think building search vertical that are hand-curated would be
       | very interesting to see. But I also think we need to build more
       | communities which allow recommendations to be shared without an
       | incentive to get hits via search and aren't paid for by large
       | corporations and where community impact/quality _is_
       | incentivized. I do worry that those days may be gone and there
       | are just not may be enough folks (not in tech) willing to spend
       | so much time online and contributing to niche communities. A lot
       | of folks spend much of their time in walled-gardens like
       | Facebook, Instagram or Twitter, so it'll be challenging to be
       | sure.
        
         | allochthon wrote:
         | > I think building search vertical that are hand-curated would
         | be very interesting to see.
         | 
         | That was my inspiration behind a side project I made a few
         | years ago -- a decentralized, hand curated "search engine" [0].
         | Never got beyond the side project stage. But I see promise in
         | this in the future. Eventually we'll figure out that moderated
         | crowd-sourced curation is better than the best machine
         | learning. The filtering capabilities have to be pretty
         | sophisticated to make it work, though.
         | 
         | [0] https://github.com/emwalker/digraph
        
         | cblconfederate wrote:
         | > So, I like others to use keywords like "reddit" or "forum" to
         | get to real humans who I trust and intentions are not to sell
         | via affiliate links.
         | 
         | And therein lies the problem. Reddit makes very little money.
         | Forums probably make negative money nowadays. Google has
         | decided to demonetize the organic internet and subsidizes SEO
         | crap and AMP or whatever dumb thing their signals consider
         | valuable. We get what we incentivize, and right now the
         | incentives in almost all of tech are pretty atrocious.
        
         | visarga wrote:
         | > Often times I find myself searching for "best
         | ($product|$thing_to_do)" which I think many other people do as
         | well because we all want the best.
         | 
         | I do too. I'm wondering why didn't Google invest some effort
         | into "best X" searches? I bet they could extract such
         | information from the web and correlate various sources. They
         | already answer all sorts of semantic knowledge questions.
        
         | ijidak wrote:
         | In some ways paid search disincentives Google from delivering
         | quality organic results.
         | 
         | The larger the gap between paid results vs organic results, the
         | more users click the paid results.
         | 
         | Not sure how to solve this problem.
        
           | topicseed wrote:
           | But paid results do not always, if ever, answer the search
           | query better in any shape of form.
           | 
           | So this would end up with displeased users and bounce backs.
        
       | mirekrusin wrote:
       | The whole thing seems "simple" to me - graph of identities with
       | url vetting/liking/approve-this-message-like actions, you don't
       | need anything else.
       | 
       | Reputation, non-fakeness etc. can be derived from it for anybody
       | - you just list identities you trust/follow (with weights?) and
       | anything you look at can be scored.
       | 
       | Virtual identities can also be created, ie. identity listing all
       | links mentioned on HN (with positive sentiment only?), links from
       | wikipedia etc. so people can follow those to create their reality
       | graphs.
       | 
       | The interesting part is that it doesn't claim universal truthness
       | - depending on who you follow your results will be skewed towards
       | their opinion of the world. Ie. if you follow MIT, Wikipedia and
       | E. Musk you'll see different view of truthness than somebody
       | following FOX News and Flat Earth Society for example.
       | 
       | It could be interesting to focus on "dislike" marking (only?) as
       | it may be much more lightweight to approach it from blacklisting
       | side.
        
       | floatingatoll wrote:
       | He's just describing Webrings, except in a reactive tense
       | ("filter out spam sites") rather than a proactive tense
       | ("associate your site with other worthwhile sites"). Google's
       | ranking algorithm only works when someone is proactively
       | curating, and only SEO spammers do so these days. Reactive
       | curation is not a viable way to manage information.
       | 
       | The simplest way to compete with Google is to create a DIY
       | Webrings site that disallows harvesting of data by Google. Charge
       | curators to create a webring, and let curators select three
       | hashtags and a description that represent their list of fifty or
       | fewer sites. Use the revenue to pay a human to curate the list of
       | hashtags, and let users tip a webring curator in gratitude with
       | an Apple Pay button.
       | 
       | This is how to make a million dollars, Pinboard-style, out of the
       | ashes of the original curated Yahoo idea and the information
       | structures of hashtagging. It doesn't work if you allow free-for-
       | all infinite-sized lists, it doesn't work if you allow free-for-
       | all hashtags, but with clear limits and moderation of tags
       | (instead of webrings), it would thrive. By moderating tags, users
       | can keep the webring they paid for, and SEO rings will be stick
       | out for having no shared network with any other rings, which
       | allows for easier detection and culling of malicious non-
       | participatory actors. Plus, with the curation networks in place,
       | it becomes possible to bubble up rings that have unusual content
       | for _positive_ human moderation activity.
       | 
       | I tried to find some good podcast lists yesterday and each site I
       | visited had a really interesting cross-section, but there were so
       | many duplicates. I wish the ring site existed, so that it could
       | remember what it had shown me already, and I could say "show me
       | rings that intersect with this podcast and have something new I
       | haven't seen before".
       | 
       | That's where the theory of pagerank and the practice of curation
       | and the capabilities of search align, and given that moderation
       | of hashtags scales very cheaply, is a billion dollar opportunity
       | that Google and Amazon cannot compete with if handled properly.
       | It's not about trying to get a cut of every visit's revenue
       | potential. It's about giving human beings a directory that
       | respects their time and remembers what they've seen.
        
         | jart wrote:
         | If I understand correctly, you're saying you'd create an
         | exclusive webring, but the rule to joining is that you have to
         | Disallow: Google, Bing, etc. in your robots.txt file. That
         | sounds outrageous, but speaking as a content creator, I
         | wouldn't be giving up much. My blog gets about 3% of its
         | traffic from search engines. I have no idea who these visitors
         | are or what they searched for, since browsers no longer send
         | referral links. If a webring offered me the benefit of positive
         | regularly engaged community, then having my blog part ways with
         | search engines would be a no brainer. That is after all the
         | Facebook model, except tailored for the open web. Believe me
         | when I say that we bloggers are waiting to be rescued.
        
           | floatingatoll wrote:
           | No, you can join the webring and be still indexed by Google,
           | but the sitelist of the webring cannot be. That's all. It
           | prevents Google from leeching off the human curation and not
           | paying a fair market value for it. Given that Google makes
           | billions of dollars a year on pagerank data, webring curators
           | got screwed over pretty hard already once by Google twenty
           | years ago, so no reason to allow it again.
        
             | jart wrote:
             | So the webring would use rel="nofollow" hyperlinks? Blog
             | authors might see that as an insult.
        
       | anyfactor wrote:
       | I have seen several google alternative search engine projects
       | being posted in HN every other week. You have your privacy
       | focused open source google alternative search engine for "insert
       | niche here" with big hopes of disruption.
       | 
       | I will give you my two cents. I have used duckduckgo, bing, searx
       | etc. for extended periods of time and hated every one of those
       | things. The problem is that what you search seem to be
       | essentially the gateway to wild west of internet. I understand
       | the proposition of spam control in search engines, but atleast to
       | me I think the early days of google without DMCA and copyright
       | bans made google the best.
       | 
       | I fear SEO spam control will only bring the worst of the
       | moderated internet. It will not be the first time big tech tried
       | to douse a gasoline fire with more gasoline because they taught
       | the more fire meant the previous fire will get suffocated by the
       | lack of oxygen(?). Rather than using "AI" as a crutch to solve
       | SEO as a problem, I want to see an option that is true to 2005
       | era google.
        
       | AtNightWeCode wrote:
       | I think it will be hard to create a great a search engine while
       | the web works as it does today. Maybe there could be like a
       | sitemap but for text content that has the content structured,
       | indexed, and signed by a trusted party in a way that makes it
       | easy to analyze for plagiarism and so on.
        
       | mrlanderson69 wrote:
       | If anyone has the skills to work on something like this please
       | email me (email address in my profile.)
       | 
       | I can show you a demo. Just to show I am not screwing around: if
       | you don't like the demo I will pay you $500.
        
       | freediver wrote:
       | Google's job is to serve its customers, and it does that really,
       | really well.
       | 
       | The problems being discussed today (and yesterday in the similar
       | thread) come from the fact that for Google user != customer.
       | 
       | When you have incentives that are misaligned like this, you can
       | only go so far! We seem to have reached that point with Google,
       | where there is not much more that can be done on the search
       | experience front without jeopardizing customer experience (ad
       | revenue).
       | 
       | Disclosure: I'm working on a paid search engine to solve this
       | problem on a fundamental level, by aligning the incentives and
       | making user also the customer so we can best serve them and their
       | needs. It is called Kagi and is currently in closed beta
       | accepting beta-testers.
       | 
       | https://kagi.com
        
       | krono wrote:
       | Just a minute ago, I made a small typo in a non-obscure
       | programming-related search term.                   Showing
       | results for searchterm         No results found for searchterm
       | 
       | Followed by an unending list of random celebrities I don't know
       | nor care about, businesses I've never been that sell items I have
       | absolutely no use of, and random foreign news articles.
       | 
       | Failure to recognise the typo is unexpected but forgivable. But
       | then, rather than helping me with my search, they attempt to
       | distract and lead me away from it - using triggers that you'd
       | think they should have known wouldn't work.
       | 
       | I really don't understand how this is even possible, and it's not
       | a rare occurrence.
        
       | mitchtbaum wrote:
       | How will these search engines interoperate?
        
       | new_here wrote:
       | > _Maybe ultimately you open up spam fighting to your users. If
       | you managed this well, you could harness a lot of energy._
       | 
       | Doesn't Google already consider that if a user returns to the
       | results page (or clicks a second link) then the first link
       | visited was not satisfactory. Seems like a pretty elegant
       | solution.
        
         | tonyedgecombe wrote:
         | That's to Google's benefit though, they get another chance to
         | present some adverts to the user.
        
       | EamonnMR wrote:
       | Being able to flag Fandom, Quora, and Pinterest results would
       | bring me great joy.
        
       | cassianoleal wrote:
       | I'm not very well versed in SEO but isn't this just good old
       | Goodhart's Law?
       | 
       | Come up with criteria to determine which websites are "better
       | quality". Measure them, rank them, put the ones that fit the
       | criteria best at the top.
       | 
       | On the other side, there's the people promoting their websites.
       | Do what you can to get as close to Google's ideal as possible
       | through whatever means. Profit.
       | 
       | At this point the criteria becomes useless for any real quality
       | analysis.
        
       | pkamb wrote:
       | A search engine that only indexed Reddit, Stack Exchange,
       | Wikipedia, and a small number of other "good" sites would get 80%
       | of the way there.
       | 
       | No, DDG bang operators don't let you do this. I want an SERP, not
       | a shortcut to a single site's on-site search.
        
         | marban wrote:
         | For business news, I do this with https://yup.is
        
       | Jenk wrote:
       | > What would a paid version of Google Search results look like -
       | where Google can just try to give me the best possible results
       | and not be worried about generating revenue?
       | 
       | God please no. YouTube premium shows what Google would do, i.e.,
       | they would further ruin the free experience by ramping up the
       | amount of ads you see to "incentivize" the premium search.
        
         | judge2020 wrote:
         | Premium offerings like that are amazing simply for the fact
         | that you can 'return' to the days where you obtained services
         | by paying for them directly, not by looking at ads and paying
         | with your mindshare. Google and YT aren't free services, and
         | it's a miracle they continue to be accessible with ad blockers
         | enabled.
        
           | Jenk wrote:
           | Orthogonal to my point. Deliberately worsening the free
           | experience after you introduced a premium service is a dark
           | pattern for UX.
        
       | netcan wrote:
       | I'm almost certain that pg gave " _compete with Google by
       | competing in some niche_ " advice 10+ years ago.
       | 
       | In any case, I'm not sure that competing in search is a very
       | attractive notion. AdWords is the only meaningfully profitable
       | search and business. Even if you steal 10% of Google's market,
       | that absolutely doesn't translate into 10% of the revenue.
       | 
       | That said, recipes. Someone make a search engine where the top
       | results don't start with 500 words on the history & etymology of
       | butter, because that's what Google want.
        
         | colordrops wrote:
         | It's usually more than 500 words lol
        
       ___________________________________________________________________
       (page generated 2022-01-03 23:00 UTC)