[HN Gopher] Search engines and SEO spam
___________________________________________________________________
Search engines and SEO spam
Author : iamjbn
Score : 438 points
Date : 2022-01-03 16:08 UTC (6 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| NmAmDa wrote:
| One example, the website called gitmemory which crawls github
| data regularly and have better SEO than github that usually you
| will find results above original github links.
| PaulHoule wrote:
| I have wondered about this.
|
| When I run web sites I frequently look at the log and find a
| large fraction of the traffic is from search engines. This is a
| problem because it costs me money to serve that traffic. It might
| not be initially obvious but it costs more than serving real
| users because the search engines will scan everything and break
| the cache.
|
| Google sends a significant amount of traffic. Bing sends a
| detectable amount of traffic. Baidu's crawler might be more
| active than the two of those together but I never get hits from
| Baidu. Other crawlers deliver me trouble instead of value: even
| if I'm not interested in hosting pirate or plagiarized content, a
| crawler that is looking for trouble is only going to bring me
| trouble.
|
| I hate doing it but I turn off crawlers other than Google and
| Bing both at the robots.txt and web server level because I just
| can't afford to serve Baidu queries.
|
| I'd like to sign an exclusivity contract with a search engine
| such that they get exclusive access to crawl it and in turn I get
| a privileged position in search results. This would give the
| search engine and myself an incentive to deliver end-to-end
| quality results.
| snth wrote:
| Several people mention DuckDuckGo in that Twitter thread. I use
| DuckDuckGo for my main search engine, and it's not obviously any
| better than Google regarding SEO spam.
| Kiro wrote:
| I don't think they apply much spam fighting to the results they
| get from the underlying search index (Bing), but I could be
| wrong.
| ttiurani wrote:
| Is there a search engine for programming? One that not only
| searches stackoverflow, github, relevant subreddits and the other
| big sites, but also finds programming articles in personal blogs?
|
| That would be valuable to me.
| anovikov wrote:
| Sadly the only way of fixing it is making search results
| unattractive for cracking. Ranking of a page in search results is
| a metric, and every metric is a hackable metric. Only way they
| won't be hacked is if there's no incentive to.
|
| Sure a search engine that specialises on narrow area of knowledge
| without much money in it, can be very relevant and bullshit-free.
|
| But there's no way to make it work for the general web search.
| People hack things. If they didn't we'd have Communism built by
| now (yes the "good" - classless, stateless one).
| deltarholamda wrote:
| The quote-Tweeted thread mentioned recipes as one of the things
| that has been SEObliterated. It's a great example of the problem,
| and also a great example of the problems any solution will
| encounter.
|
| Recipes have become a bellwether Internet problem. In the past,
| your great-grandmother had a card file with a bunch of 3x5 index
| cards with the ingredients and instructions on how to make
| everything, and they pretty much all fit on one side. There was a
| great deal of domain knowledge required (e.g. "whip to stiff
| peaks"), but these things reveled in their terseness.
|
| Internet recipes all begin with 9 paragraphs of the author's
| first time encoutering the dish in a Moroccan bazaar in 1997, and
| the life story of the chef. There are two embedded 10-minute
| videos of the lifecycle of the vanilla bean. And then you get to
| the ingredient list. Then two more 10-minute videos, then
| instructions.
|
| The drive to make recipes full-contact Internet content has
| changed what it means to be a recipe. This is similar to how
| cooking shows evolved from Julia Childs working on a sound stage
| to a carnival barker presentation with vivid personalities
| dominating the scene.
|
| I'm not sure there is any technological solution to a problem
| that has fundamentally changed what it means to be a recipe,
| short of establishing a new informational silo in the form of a
| new Web site devoted to recipes only. You could encourage an RSS-
| like format for recipes, but that requires buy-in from places
| that profit from the new evolution. This new status quo may be
| good or bad--you can make the argument either way--but it is what
| it is. A cultural change is required more than tweaking
| algorithms.
|
| (Unless tweaking algorithms can be foundational to cultural
| change, in which case we really, really, really need to take a
| hard look at the corporate behemoths and their algorithms, and
| sooner better than later.)
| max49 wrote:
| >I'm not sure there is any technological solution
|
| The technological solution would be to stop rewarding them for
| these monstrosities. One of the main motivator for turning a
| short recipes into a 19 page essai about the chef's life is
| that more words = better ranking.
| notreallyserio wrote:
| And funny enough, it's obvious that Google's engineers know
| this because they're adding more small, self-hosted featured
| results to the top of the page all the time.
| Nextgrid wrote:
| And the end game is that ads pay for it. Nuke ads or downrank
| them and the incentive goes away, making space for enthusiast
| non-profit-driven websites.
| deltarholamda wrote:
| But the technology is solely controlled by a single
| multinational advertising corporation. The motivations are
| controlled by that same advertising corporation.
|
| Which is the same as there being no technological solutions.
| foobarian wrote:
| The recipe problem is mostly because actual recipes are not
| copyrightable. See e.g.
| https://www.plagiarismtoday.com/2015/03/24/recipes-copyright...
| leobg wrote:
| Neither are ideas. Hence article spinning and book summaries.
|
| We need a semantic dupe filter: If it doesn't add new facts
| or new ideas, treat it as an identical copy.
| lifeisstillgood wrote:
| Google is not important because it has all the information - it's
| important because it has hardly any.
|
| A major complaint is that there used to be good free reviews of
| commercial products that could be easily found.
|
| That is not "all the information". Information about the current
| round of commerically advertised products is something like 5-10%
| of all commerce (or less).
|
| And we are entering a world where "all the information" is what
| we do all day, what we say, how we react to different stimuli.
|
| That is the real review sites - why do people take this train and
| not that, why is that park safe and this one full of muggings.
|
| We need to solve the Google problem not because we want blogging
| like it's 2009 but because epidemiology is about to open
| humanity's eyes. And it's going to hurt if we don't make it free
| and open.
| [deleted]
| streamofdigits wrote:
| Eventually search will become a decentralized activity (No, not a
| web3/crypto/coin type decentralization, I am talking about the
| useful type).
|
| Is there any particular reason why internet search has to have a
| distorting gatekeeper to the global commons (that pretends
| playing Maxwell's demon). For chrissake, the stuff being indexed
| is _public_.
| imranhou wrote:
| Not all that google indexes is public, primarily
| paywalls/loginwalls allow google IP's to crawl information
| unhindered but as users you are not, so a new search engine
| will have to get to a scale and popular enough for others to
| open up. Quick example: Google can index many news sites, or
| LinkedIn profiles for example that a regular user with no
| account cannot.
| streamofdigits wrote:
| Thats true, but probably something that can be tackled later
| and in any case it would not be a show-stopper for creating a
| valuable alternative (There are similar thorny related issues
| around IP e.g. for news sites)
| mrkramer wrote:
| >Eventually search will become a decentralized activity (No,
| not a web3/crypto/coin type decentralization, I am talking
| about the useful type).
|
| People care about UX not about technology remember that unless
| people are willing to sacrifice good UX in order to have
| greater security and privacy. These things are tricky and there
| is no right formula.
| tester756 wrote:
| How about Bing?
|
| Is it viable competition?
| anfilt wrote:
| An other thing does not help is how some sites gate content from
| being scraped. Also forums are not as popular today again
| reducing the amount of indexable content. Think about some sites
| have migrated from using forums to something like discord.
| bretpiatt wrote:
| This is already happening for a bunch of verticals:
|
| Travel - Expedia, Hotels.com, Kayak, etc.
|
| Consumer Goods - Amazon, WalMart, EBay, Etsy, etc.
|
| Automobile Purchase - Cars.com, Autotrader, etc.
|
| Career/Job - Indeed, LinkedIn, etc.
|
| As Google continues to lose search volume on these big revenue
| categories it is going to make spam much more difficult as they
| are working to sort out long tail spam. Way harder.
| jeffbee wrote:
| Wow so that's completely opposite of my feelings on this topic.
| I would never, ever use Expedia for travel search over Google
| flights/hotels. Google travel _is_ the meta-search engine for
| this vertical. Expedia etc. are all-in on spam and scams,
| trying at every opportunity to take an extra dollar from you.
|
| The same with Amazon. You'd _think_ that with all the armchair
| search quality experts cropping up lately there might be more
| vocal complaints about the fact that Amazon 's own search can't
| find basic consumer products sold by Amazon itself. If I want
| to find stuff on Amazon, I search Google for it.
| bradyo wrote:
| Yeah the comments in this thread are baffling (or they didn't
| read the 100 character tweet lol). The tweet is just describing
| domain specific database-like websites. Have people not heard
| of allrecipes.com? Yummly? Or one of the other thousands of
| recipe db sites? No blogspam, just structured recipe search.
| You can even search by ingredient!
|
| Mayo clinic, Harvard health, and pubmeb do a great job with
| health info. IMDb for movies, Goodreads for books, *gearlab.com
| for reviews, booking.com for accomodations.
|
| I think the biggest threat to Google isn't a better general
| search engine, it's user behavior switching to more domain-
| specific websites as the top of the funnel. E.g. people going
| directly to Amazon to search for products instead of first
| searching Google.
|
| To some extent, Google has figured this out, which is why they
| now have a dedicated flight search, hotel search, product
| search (Google shopping still exists and it's pretty good!),
| etc.
| cpeterso wrote:
| Amazon's search results and scammy third party sellers are a
| similar trust problem. When possible I try to purchase directly
| from the product manufacturer's website. Similarly, I don't
| search Google for product reviews, I go directly to trustworthy
| review websites.
| throwaway14356 wrote:
| ah, so everyone wanted to move from carefully crafted personal
| websites where every detail counts and low effort publications
| are harshly punished to platforms with guranteed readership and
| now we have a curration problem?
|
| Someone (who probably doesnt have a website) said that comment
| moderation on your own website is to much work. Perhaps the whole
| internet is to much work?
|
| But i like the spam search engine by and for spammers as a way of
| finding the latest and greatest affiliate marketing and
| blockchain swindle.
| leoc wrote:
| https://twitter.com/mwseibel/status/1477707884632834049
|
| > I'm pretty sure the engineers responsible for Google Search
| aren't happy about the quality of results either. I'm wondering
| if this isn't really a tech problem but the influence of some
| suit responsible for quarterly ad revenue increases.
|
| Please no more of this. Two men, Page and Brin, together have
| basically unfettered control over Google.* If Google does
| something bad then, unless it's genuinely something small enough
| that those two could not be expected to hear about it, it's
| happening with--at the very least--their acquiescence. And low
| overall search quality is not something that some "suit" is
| successfully hiding from Good Czar Larry. They could fire the
| "suit", or command him or her to make other decisions. This is--
| again, at the very least--something that they have chosen not to
| do. The responsiblity lies with them.
|
| * There _is_ the risk of lawsuits from the minority shareholders,
| I assume. But IIUC this is not realistically that big a restraint
| on what shareholders with a majority of votes can do. However
| IANAL.
| wslh wrote:
| In 2013 I elaborated about this topic:
| http://blog.databigbang.com/letters-from-the-future-challeng... I
| would add that in 2021 we can easily do Natural Language
| Understading (NLU) and Natural Language Generation (NLG) and can
| build zillions of web pages that don't follow the original page
| ranking concept of Google. Probably important sites share less
| low rank pages and there are many more link rings and clusters.
| More decentralized blogs seems a thing of the past (expecting to
| be rebooted in the future).
| ape4 wrote:
| One approach would be to have moderators from the community who
| are allowed to make decisions about results.
| cblconfederate wrote:
| Good. In fact, if we want people to visit websites other than
| google.com (and then read the answer in the snippet or the box in
| the sidebar) then it's good that google results are crap. Use
| google less.
| beefield wrote:
| Okay, given that we have pretty successful examples of wikipedia
| as a general crowdsourced information storage and stackoverflow
| as a specialized domain crowdsourced Q&A site, would it be
| impossible to build a crowdsourced search engine? Not even
| scraping the web, but I would just type my search term, if that
| is already searched and results voted, I would see those. If it
| wasa completely new search term, I would get no immediate
| results, but my search would be displayed in "new searches page",
| which some voluntary people would be following and trying to add
| relevant results.
| usrusr wrote:
| "You might need to do a lot of manual spam fighting initially"
|
| How would this be limited to "initially"? Wouldn't it be a lot,
| initially, and then only get worse?
| thr0wawayf00 wrote:
| I'm honestly not trying to take a potshot against PG or YC here,
| but it's kinda funny to see him saying this after I worked for a
| YC-backed startup years ago that built its core revenue streams
| around generating SEO spam, we just marketed it as something
| else. Just to be clear, I don't think PG or YC are responsible
| for all or even most SEO spam, but I know firsthand that they've
| profited from it through at least one of their incubated
| companies.
|
| I never considered the possibility that an incubator would
| support a specific product, then later on call for alternatives
| that would essentially freeze out the original product that they
| supported. I'm sure this very rarely happens, but it's
| interesting to see a real-world example in action.
| vlovich123 wrote:
| Don't know how rarely it happens. After all, weapons dealers
| frequently arm both sides of a conflict. They don't really need
| to care who wins - they're just making twice the amount of
| money.
| awillen wrote:
| I don't think it's as counterintuitive as it sounds - just
| because they're playing the game doesn't mean that they think
| it's right. If your options are to not be successful at SEO or
| to do the SEO spam thing, I don't think it's necessarily wrong
| to do the latter - it's your job to make your startup
| successful, not to make a stand against the way Google does
| things.
|
| I view it as something like the rich folks who call for
| additional taxation of the rich. They're not going to just pay
| extra money that they don't have to under the current tax
| rules, both because it's not particularly fair and because one
| person paying extra taxes, even if they're very wealthy, isn't
| going to make a big impact. That doesn't mean they can't lobby
| to change the rules and be totally fine with it if everyone is
| paying additional taxes.
| jeffbee wrote:
| All of the amateur search quality experts forget to mention the
| regulatory environment. Obviously, Google could nuke Pinterest
| from orbit, dramatically improving image search results. Clearly,
| Google could effectively take down Statista, technically. But
| various Eurocracies have shown an extreme willingness to take the
| side of Yelp, Pinterest, and whatever other spam/scam mills are
| able to form a shadow alliance with Microsoft's astroturf
| campaigns like "fairsearch" and whatever.
| wakiza33 wrote:
| big part of it. and the scrutiny is only going to get more
| intense
| notreallyserio wrote:
| If this is a real concern they can side step it by simply
| allowing users to block specific domains, like they have in the
| past.
| coffeeroach wrote:
| Covzire wrote:
| Could this issue be related to Gmail's spam filtering? For
| approximately 2 years now it's been downright porous, I'm getting
| on average 1 obvious spam message in my inbox that is something
| like:
|
| c0nGrats-You_HaVe_Won_ThE_Pr1ze!
|
| ..Or some silly variation of this that takes literally 0.1 ms for
| a human to discern that it's spam. Yet something happened to
| Gmail's spam algorithm in the last couple years that has been
| consistently letting these through. To be fair, it does catch
| most spam but it's only batting something like 75% and the spam
| it does catch is often times much less obvious to human eyes than
| the stuff it lets through.
| igammarays wrote:
| Interestingly, Google Maps doesn't suffer as much from the issues
| with Google Search. Maybe because it has those community-driven
| curation features that PG is talking about? Google Maps is
| fantastic at finding places to go to (and getting you there).
|
| Also, why hasn't Apple built a search engine yet? It baffles me
| that they chose to go head-to-head with Google on Maps, yet
| outsourced their search engine. I would've liked it the other way
| around: Google Maps and Apple Search.
| nostromo wrote:
| Apple makes 10+ billion a year from Google by setting It to the
| default search engine on iPhone.
|
| I agree they should make their own search engine. But currently
| they're being paid a ton of money not to.
| techdragon wrote:
| It's much harder to get SEO spam style content to maps given
| the geographic region limits involved... But it happens my
| favourite example is searching for suppliers of something
| basic, say structural aluminium extrusions, big and heavy and
| you ideally don't want to ship it far so it's an ideal thing to
| search for a local supplier of. In Australian results is
| basically a given that I'll get results for my city because
| they will list it as a delivery area they supply to however
| when you actually try to find them on the map as a pin, nope
| it's either not their or just is a sales office not a warehouse
| or workshop, so they have tricked the system into listing them
| as local to my area in a way that pollutes my maps search
| results.
|
| But this only works for certain industries. It's much less
| common to see this kind of tactic if your searching for say a
| coffee shop because they sheer number of local results let's
| Google be "hyper local" with these kinds of results.
| Nextgrid wrote:
| Google Maps is spammed by fake locksmith or other trades that
| make it look like they're local but all route to the same
| boiler room (probably right next to the tech support or IRS
| scammers) from where they dispatch a crooked & most likely
| unlicensed tradesman that will do a poor job and overcharge you
| (destroying the lock so they can sell you an overpriced
| replacement instead of picking it, etc).
|
| For licensed trades the solution is to go to your official
| trade licensing body (for the UK it's the NICEIC for
| electricians and the Gas Safe Register for gas/HVAC
| technicians), for unlicensed ones it's more difficult. There
| are "review" sites that claim to provide good results but their
| business incentives & vulnerability to spam/fake reviews are
| unknown.
| ben7799 wrote:
| It'd be really interesting if Google allowed upvote/downvote on
| search results... but it'd be super hard to imagine them every
| taking the votes into account much versus ad revenue.
|
| And the upvote/downvote would be very tricky to implement in a
| way that the SEO crowd couldn't just game it horribly.
| jeffbee wrote:
| Clicking a result is essentially an upvote.
|
| Immediately returning to the results page is essentially a
| downvote.
|
| You can't really crowdsource this stuff, because the problem of
| brigading and other forms of abuse is way too high. Just
| imagine what the crowdsourced results for "trump won" or "trump
| lost" would look like, or hydroxychloroquine, or ivermectin, or
| to go with some older cults of personality, Hitler or Ataturk.
| 1024core wrote:
| ... and the moment you gain some traction, the SEO monster will
| train it's eye on you like Sauron; and without a billion dollar
| budget, you will be toast.
| nabla9 wrote:
| When the search engine is funded by ads, there is incentive to
| produce results that people who click ads like.
| Veen wrote:
| > Why not try writing a search engine specifically for some
| category dominated by SEO spam?
|
| Back in the olden days, there were lots of organizations that
| collated high quality content from the best writers. They
| nurtured expert writers and paid them well. They fact-checked the
| content and employed diligent editors and proofreaders so it was
| accurate and well-written. Over the years, they'd build a
| reputation for reliability and trustworthiness that kept people
| coming back for more. If you wanted to learn about fitness, or
| cars, or cooking, or science, you'd find a reputable author and
| publisher and buy their magazines or books.
|
| But then, in the early 2000s, the geniuses from SV "disrupted"
| the publishing industry and its financial model. They brought us
| a much better way to find content, the search engine. Because
| they were so much better than the old-fashioned publishers,
| search engines gobbled up the advertising money and became the
| dominant gateway to content. Publishers had to abandon expensive
| high-quality writing because rankings and eyeballs now mattered
| more than quality and trustworthiness. Instead of investing in
| writers, they invested in marketers and SEO specialists.
|
| The result: worthless content, writers banging out garbage for
| peanuts, and useless search engines.
|
| Two decades later, looking at the barren wasteland they had
| created, the SV geniuses thought: I know what we need, more
| search engines, but smaller ones that collate high-quality
| content from the best writers. There must be money in that,
| right?
| mistermann wrote:
| Something else that has largely disappeared is that there used
| to be a fair amount of organization of content, whereas now a
| lot of content is just thrown into a big pile and the user is
| left to go fishing on their own with search engines, whose
| ability to search seems to be declining (ie: Google often seems
| to no longer support mandatory include/exclude search
| parameters). Generally speaking, the result seems to be
| decreasing order and increasing chaos.
|
| Of course, the massive volume of content creates a fundamental
| problem, but user curation & categorization on sites like
| Youtube would be possible, were Google to provide the software
| support so people could do that. Whether this and similar
| decisions are deliberate or accidental is likely one of those
| things that we will never know.
| allochthon wrote:
| > Google often seems to no longer support mandatory
| include/exclude search parameters
|
| I've noticed this, and it's frustrating. I have assumed it's
| intentional. I am left to guess as to what a change in this
| behavior would accomplish.
| Volker_W wrote:
| What does SV stand for?
| smk_ wrote:
| Silicon Valley
| [deleted]
| allochthon wrote:
| Silicon Valley, i.e., the California tech scene.
| salt-thrower wrote:
| It's been really sad to grow up and watch the cool techie
| optimism of the 2000's internet get sucked dry by profit
| motives and left to rot. The change has occurred pretty much
| entirely within my adult lifetime (I'm only 27 and I still
| remember when Google was the cool new thing on the internet).
|
| It went from "search engines and the web will usher in a new
| era of wisdom and democracy" to "useful content is dying at the
| hands of monetization schemes, and also the internet will be
| the death of liberal democracy, woe unto us all" in about 15
| years.
| renewiltord wrote:
| Those guys covered literally nothing compared to what I can get
| recommendations for with "product type Reddit". No thanks.
| Veen wrote:
| You may not be aware, but the written word can be used for
| more than product reviews.
| renewiltord wrote:
| Oh, they were just usually wrong on everything else.
| Fortunately, these days we have individuals debunking the
| nonsense. Back then, people just uncritically believed
| total horseshit.
|
| The invariant has always been: find people who make
| falsifiable predictions and improve. Back then the pool was
| small and you had no choice. Now, fortunately we have a
| choice.
| GarlicToum wrote:
| I tried creating a search engine for recipes. It works well and
| people like it, but the struggle is no one remembers that it
| exists and Google is just their default for search.
|
| So from an individual developer perspective, it's very hard to
| get people to change their habits. And Google/duck/Bing is the
| one stop shop.
|
| It's still out there, but I haven't worked on it much lately. I
| always think that if I had some good advertisers, a better UI,
| and a salary coming in, maybe it could take over some of
| Google's usage!
| Volker_W wrote:
| > I tried creating a search engine for recipes. It works well
| and people like it, but the struggle is no one remembers that
| it exists and Google is just their default for search.
|
| Link please
| GarlicToum wrote:
| https://garlictoum.com/
| imilk wrote:
| Always nice to see other sites using Svelte!
| kevingrahl wrote:
| Just an idea but what about making it easier for folks to
| remember to use you search somehow!?
|
| I like and use Duck Duck Go's !bangs [1] all the time,
| maybe try to add your site with a rememberable name.. may
| I suggest !garlic ?
|
| [1] - https://duckduckgo.com/bang [2] -
| https://duckduckgo.com/newbang
| GarlicToum wrote:
| Just submitted it, let's see what happens!
| citizenkeen wrote:
| (1) What is it?
|
| (2) Do you have a bang on DuckDuckGo? I'm pretty aggressive
| with bangs, and I suspect a lot of DDG users end up being
| aggressive with them as well.
| GarlicToum wrote:
| Linked above, I didn't know you could just submit a random
| site to a DDG to be included in bangs
| [deleted]
| dehrmann wrote:
| I think you're unfairly putting blame on Silicon Valley.
| Publishers were only able to produce high-quality content
| because, with no conversion metrics, advertisers were willing
| to overpay for placement. Tech undermined publishers' revenue,
| but what it revealed was that people don't actually want high-
| quality journalism, they want entertainment, and they're
| definitely not going to pay a premium for it. This was hidden
| behind publishers' business model.
| nemothekid wrote:
| > _Publishers were only able to produce high-quality content
| because, with no conversion metrics, advertisers were willing
| to overpay for placement._
|
| This implies that big budget advertisers (the CPGs, like Coke
| and P&G), are buying Google/FB because they have better
| conversion metrics. That isn't true today; only SMBs and
| gaming companies care about conversion metrics. There are
| interns in LA/NY probably collectively spending millions on
| FB for P&G and only reporting the number of likes back to
| their bosses. Google and FB has never meaningfully delivered
| on conversions past anything like app downloads.
|
| Tech undermined publisher's revenue because the internet
| cratered distribution costs. Advertising revenues for big
| media crashed because the eyeballs moved away, not because it
| was any less efficient.
| dehrmann wrote:
| > the CPGs, like Coke and P&G
|
| Did any of these heavily buy newspaper ads before 2000?
| Definitely TV, possibly magazine, but newspaper? I just
| don't remember seeing ads for Tide in newspapers.
| jayd16 wrote:
| I guess there's also the rise of influencers in the mix here.
| The commoditization of publishing means content creators can
| more easily work independently.
| blunte wrote:
| Google search results are garbage, at least from a developer's
| perspective.
|
| Most of the results are poorly formatted content "gathered" from
| stackoverflow, github, quora, etc.
|
| And from a "person who wants to see an image" perspective, Google
| is purely a gateway to Pinterest or Gettyimages.
| liveoneggs wrote:
| do google search engineers use ad blockers?
| noduerme wrote:
| Hey, smart people: It's called _CURATION BY HUMANS_.
| bombcar wrote:
| Just give users the ability to blacklist domains when searching;
| pretty soon you'll have a decent list of what users consider
| worthless.
|
| And pintrest would die.
| krono wrote:
| uBlock Origin static filters to the rescue!
|
| Block results from specific domains on Google or DDG:
| google.*##.g:has(a[href*="thetopsites.com"])
| duckduckgo.*##.results > div:has(a[href*="thetopsites.com"])
|
| And it's even possible to target element content with regex
| with the `:has-text(/regex/)` selector.
| google.*##.g:has(*:has-text(/bye topic of noninterest/i))
| duckduckgo.*##.results > div:has(*:has-text(/bye topic of
| noninterest/i))
|
| Bonus content: Ever tried getting rid of Medium's obnoxious
| cookie notification? Just nuke it from orbit:
| *##body>div:has(div:has-text(/To make Medium work.*Privacy
| Policy.*Cookie Policy/i))
| li2uR3ce wrote:
| Google used to have such a feature.
|
| It would be nice if Google would ask you the simple question:
| "Did you find what you're looking for?" Instead they rely on
| the assumption that users only stop looking when they've found
| what they're looking for.
|
| These days, there's a reasonably high chance that I quit
| looking because I gave up in futility--not because I found what
| I was looking for.
|
| It's also the case that there's no way to train Google not to
| omit search terms or generalize them to the point of
| uselessness.
|
| I really wish abusive SEO were the only problem but it's far
| from the case. Search results being crappy is a cumulative
| effect. You could solve SEO spam and I'll still not be able to
| find a USB SuperSpeed cable because it gets generalized to "usb
| cable" and there are a gazillion more charging cables than
| there are SuperSpeed cables.
|
| Used to be that you could quote things to indicate that you
| really meant it. That's fuzzy now too. Every time we figure out
| how to circumvent the bad results, features are removed.
| [deleted]
| yumraj wrote:
| If I remember correctly, it used to be called About.com - with
| categorized and human curated links.
|
| It was big during the dot com days, but withered after Google.
|
| Interestingly, I do think that that model may need to be
| revisited.
|
| Edit: I feel that Reddit is filling some of this need, at least
| for things like Vaccuum and Espresso machines with dedicated
| spaces.
| celestialcheese wrote:
| It's funny you mentioned About.com. Or as it's known today
| DotDash AKA IAC.
|
| It's easily one of the largest SEO players out there, and
| they've been on a buying spree recently with their purchase of
| Meredith. The quality of the content has gotten better, but
| it's still a monster of an SEO optimized content machine.
| yumraj wrote:
| Yes, I know nothing about what they're doing today. In fact
| my pihole blocks them, must be in some list. So I'm pretty
| sure they're crap today.
|
| I just remember the concept from way back.
| mrkramer wrote:
| Good idea Paul I had similar one but no way you would do it
| manually. Machine learning algorithms need to detect spam not
| people because that way search engine can't scale. If people were
| marking what's good content and what's not such search engine
| would be reduced to content curation not organic search and
| discovery.
| lubesGordi wrote:
| How about a platform for curation. Curators who know a subject
| well can link to content that looks good to them. Search goes
| through the curators, people can favorite certain curators. Lots
| of people like to curate. This is a better idea than trying to go
| after spam.
| li2uR3ce wrote:
| 301 Battlefield moved: curator spam.
| nanna wrote:
| If you were to start a search engine, what stack would you use?
| aantix wrote:
| Typesense looks easy to use.
|
| But with 3x memory needed for the indexes, the server costs
| probably aren't going to be bootstrap'able.
|
| Especially for a "small" crawl of a billion web pages, event at
| just 10k per page.
| marginalia_nu wrote:
| Eh, I run a 100M-index off consumer hardware in my living
| room. Very doable if you avoid bloated off the shelf
| solutions.
| aantix wrote:
| What search software do you run?
|
| What sort of memory and space do you have on the single
| server?
|
| What's the average document size that you index?
|
| Genuinely curious on how doable a modern search engine is
| on modern hardware.
| marginalia_nu wrote:
| I rum all custom software, I feel most off the shelf
| solutions aren't very resource effective.
|
| The server has 128 Gb RAM and the index currently fits on
| a single 1 Tb SSD + an Optane drive of 480 Gb.
|
| I find the average document to clock in at 7 Kb, in terms
| of raw HTML. In the index that's, dunno, probably less
| than 1 KB/doc.
| jrockway wrote:
| To some extent, I worry that the problem with search engines is
| that there isn't any data worth returning. Yesterday's thread
| talked a lot about reviews. Writing a review is hard work that
| requires deep domain expertise, experience with similar products,
| and months of testing. If you want a review for something that
| came out today, there is no way that work could have been done,
| so there simply isn't anything to find. Instead you'll get a list
| of "Best TVs 2021" or whatever, with some blurb and an affiliate
| link, not an actual review. That's what people can make for free
| with a day's notice, so if you write a search engine that
| discards those sites, that's fine, you'll just return "no
| results" for every interesting query.
|
| I guess what I'm saying is that if you want better reviews, you
| probably want to start writing reviews and figuring out how to
| sell them for money. Many have tried, few have succeeded. But
| there probably isn't some Javascript that will fix this problem.
| winternett wrote:
| The main driver of SEO spam, and online scams in general are
| countries that have little to no opportunity for economic
| growth. There are literally millions of Internet savvy people
| who would be able to survive on what we would consider barely
| anything profit-wise in adsense revenue, which also usually
| pays out in US dollars. In this currently terrible global
| economy, desperation turns the most intelligent minds bound
| into poverty into bootleg SEO engineers, online catfishers,
| scammers, and ransomware creators, and God bless their
| creativity...
|
| Instead of creating income opportunity and crowdsourcing people
| in foreign countries for common (more positive) good, companies
| rarely create opportunities for the people who would normally
| turn into spammers and scammers, and that's what creates an
| endless army of people that constantly destroy online
| communities like Soundcloud, FaceBook, Twitter, and TikTok with
| stolen content, trend scams, fake news, and spam messages.
|
| Google search has been invalidating and subverting their most
| accurate search results based on abstract SEO rules for quite
| some time now. It was likely done so that they could implant
| paid ads first into content, because that makes them the most
| profit. Doing that has destroyed their reliability and
| reputation as a search service leader, and they're never going
| to admit it, but payola is the undertone that is ruining their
| search results... There is a certain type of corruption that
| occurs when a company turns away from upholding customer
| service and value towards a monopolistic "profit-first economic
| stranglehold" business model... That strategy never ultimately
| works out well for both companies AND users in the long run.
| The next leader will likely be a search that avoids the same
| pitfalls until they themselves become a profit-driven monopoly.
|
| There is no algorithm that will usefully and fairly counter
| spam based on desperation, companies need to realize that
| creating opportunity for people to operate equally on their
| platforms is the best move, otherwise, spam will drive any
| community of rule abiding users away or into madness.
| mrkramer wrote:
| >The main driver of SEO spam, and online scams in general are
| countries that have little to no opportunity for economic
| growth.
|
| Not quite right because cybercrime aka hacking, cracking,
| spamming etc. originated in US not in East Europe, Russia and
| third world countries which are dominating hacking and
| spamming scene today. Main motivation of cybercriminals is
| quick money and ease of getting away with it since you are
| not physically committing a crime but
| digitally/electronically.
| adventured wrote:
| It is quite right. Those are the main drivers and it's due
| to lack economic opportunities.
|
| Hacking heavily originated in the US because the US
| practically built the entire modern tech universe from the
| ground up. The US was far out in front when it came to
| utilizing the Internet and the Web, so of course unethical
| people in the US pioneered various types of online crime,
| the US was the early adopter.
|
| If you're an elite engineer in the US, you can make
| millions of dollars doing legal work for big tech. It helps
| in a big way to drain the labor pool as it pertains to
| criminal activity online. You generally can't do that today
| in the countries that dominate SEO spam, online scams, etc.
| In those countries elite engineers suffer terrible wages
| doing legal work compared to what they should be able to
| earn for their abilities; commonly they can earn a lot more
| doing illegal work instead, it's a very potent lure.
|
| You're an elite engineer in Russia, top ~1%-3% globally.
| What do you do? Earn several thousand dollars per month
| doing legit software development in Russia (with either
| zero or little consequential equity compensation); flee
| Russia for a more affluent market; or do illegal work where
| the rewards can be dramatically greater. It would be
| difficult to resist if you were unable or unwilling to
| leave Russia.
| mrkramer wrote:
| >You're an elite engineer in Russia, top ~1%-3% globally.
| What do you do? Earn several thousand dollars per month
| doing legit software development in Russia
|
| Become software entrepreneur?
|
| And many international software companies have software
| development teams and presence in Russia.
| emerongi wrote:
| > Become software entrepreneur?
|
| Exactly. Hacking for hire, making cheats, botnets, SEO
| farms, selling exploits and hacked social media accounts;
| practically anything you can think of that US software
| engineers can't be bothered with, as they already earn a
| healthy salary. That _is_ entrepreneurism.
| mrkramer wrote:
| I wasn't speaking about that kind of entrepreneurism but
| about making legal software and legal web services that
| solve problems and are useful. So many Russian hackers
| got arrested when they travelled somewhere outside Russia
| and now they are serving 10 or 20 year sentences in US
| jails.
| PeterisP wrote:
| > So many Russian hackers got arrested when they
| travelled somewhere outside Russia
|
| How many? 20? 30? IMHO the cases are rare (and get widely
| publicized whenever that happens, creating a
| disproportional visibility), you get a couple captures
| per year but the number is just a tiny fraction of the
| actual participants, more like an exception than the
| rule.
| PeterisP wrote:
| In places with a less-established legal system it's
| harder to make money by above-board entrepreneurship and
| keep it instead of handing it over to local strongmen
| (two colorful examples that have stayed in my memory and
| unlike many others have become public and have also been
| described in non-Russian media -
| https://www.independent.co.uk/news/world/europe/valery-
| pshen... and
| https://abcnews.go.com/International/wireStory/us-
| embassy-ru... , but of course those are the exceptions
| because the usual result is complying with threats and
| handing over your business or most of it). But it's not
| really about Russia, it's a general issue with parallels
| in other countries as well. And of course, there's the
| issue of the local market; the financial advantages for a
| skilled tech person going towards entrepreneurship
| legitimately are less attractive in most places compared
| to USA; heck, even EU potential tech entrepreneurs often
| just go 'across the pond' to start their business.
|
| If you can't get a work visa to a first world country,
| you do have less options than someone already living
| there; and the salaries offered by first-world
| "international software companies" in their remote
| subsidiaries tend to be 'according to local market rates'
| (the same "several thousand dollars per month" mentioned
| by the parent poster is a decent rate) and thus not as
| competitive with "black entrepreneurship" which pays
| according to global standards.
| ALittleLight wrote:
| Maybe there's a hardware engineer out there with a decade of
| experience shipping and reviewing TVs publishing his thoughts
| to his blog. He's heard about the latest and greatest and he's
| offering his expectations based on the promotional material,
| his friends at the company, history from the brand - whatever.
| Maybe, if he's built a reputation of good reviews he's got a
| big audience. Big audience? TV brands give him an early review
| model.
|
| Modern Google actually makes the content problem worse. When
| our notional TV blogger is starting out in our world he
| publishes two or three essays, nobody reads them, he stops
| putting in do much effort, posts occasionally, dwindles off. In
| a world with a perfect search engine his early essays get some
| attention to encourage him to post more, a feedback loop
| starts, and before you know it he's a full time TV reviewer.
| mc32 wrote:
| Reviews are a special category. It suffers from a couple of
| issues:
|
| 1. You need to have enthusiastic reviewers (people who care
| enough about a product category to review them semi-throughly.)
|
| 2. Proper reviews can take time and may need domain knowledge.
|
| 3. Competition. When there were one or two people doing reviews
| on some category of products, maybe the economics worked out.
| Once you have hundreds or thousands competing with you, the
| time demand may be overwhelming and not worth it.
|
| 4. If you are a trusted reviewer or site, you will get economic
| pressure to review a particular thing or brand you may not like
| very much but the money may be good. So you will begin to
| experience conflicts of interest.
|
| 5. If reviews are just a hobby and not a way to make money,
| eventually you will slow down or move on, opening a hole that
| gets filled up by spammers.
|
| 7. Some things are timeless (a pipewrench, let's say) and some
| are seasonal (consumer electronics, toys, etc). The former
| deserves a through review but the latter doesn't deserve as
| much but it may get the bulk of interest due to seasonal
| demand). Does it really matter if the latter's latest iteration
| has 2% increased battery life to discuss?
|
| I'm sure there is a lot I didn't think of. But it's a doomed
| category, unless people are willing to pay for professional
| reviews (Consumer Reports types and other independents).
| eatonphil wrote:
| I pay for Consumer Reports. I'd encourage more people to too.
| I don't trust it completely but it's a good companion to
| manual searches on Reddit/HN/car forums, etc.
| aemreunal wrote:
| Someone pointed out yesterday on that other search thread
| that [most?] libraries provide free access to Consumer
| Reports through a membership. I just looked at the San
| Francisco Public Library and it does indeed give me access
| to the magazine and a searchable database.
| chongli wrote:
| _If reviews are just a hobby and not a way to make money,
| eventually you will slow down or move on, opening a hole that
| gets filled up by spammers._
|
| In my experience, the best reviewers are hobbyists. The thing
| is, it's not _reviews_ that are their hobby. Rather, they
| review the products go along with their hobby.
|
| So, for my hobbies (espresso and aquariums), there are tons
| of easily accessible reviews on all kinds of aquarium gear
| and coffee machines, grinders, etc. On the other hand, nobody
| does plumbing or HVAC as a hobby (that I know of) so it's
| very difficult to find high quality reviews of water
| softeners or furnaces. It takes a very special rare sort of
| person who would install these things just to review them.
| The closest thing I could find was this video [1] on a DIY
| water filtration system by an RV/off the grid type hobbyist
| (from what I can tell).
|
| [1] https://www.youtube.com/watch?v=WCC4TOYYGF8
| treis wrote:
| People like to talk about their work too. There's plenty of
| those sort of reviews out there. Mostly on reddit because
| like others have mentioned organic search results are
| completely gamed.
| giantrobot wrote:
| > Reviews are a special category. It suffers from a couple of
| issues
|
| Review sites suffer from a singular problem. They are
| overwhelmingly SEO spam content farms. People go find some
| product niche and pay some Fivvver/whatever people to write
| literally fake reviews of products. Because they're pulling
| all the SEO tricks and are in a niche category they shoot to
| the top of search results for that niche.
|
| Their reviews _sound_ realistic and viable but they 're pure
| fantasy. The writers never touch the products being reviewed.
| Many times they'll pull details from Amazon listings
| (including factual errors) and even other "review" sites.
|
| Once they get established in their niche they'll accept paid
| placement from product manufacturers without marking it as
| such. A single scammer might own dozens of these sites, even
| supposedly competing ones.
| snth wrote:
| > If you want a review for something that came out today, there
| is no way that work could have been done, so there simply isn't
| anything to find.
|
| That's not strictly true, given that reviewers are often sent
| pre-release versions of things in order to do that work before
| release day.
| loceng wrote:
| Not sure why you're being downvoted, as you're correct -
| however to point out there seems to be a trend where
| reviewers are only given pre-release versions if they
| practically always give favourable reviews to the products
| they list, especially if they're provided the product for
| free; there doesn't have to be an express relationship or
| contract between a reviewer and a company either, it's the
| reverse of how Bill Gates apparently has given $200 million+
| to different news channels/media organizations - and so
| they're less likely going to as freely share negative news
| about him or perhaps his organizations, so then ; this makes
| me think, similarly to how stocks being sold by CEOs (etc)
| must be pre-planned to avoid shenanigans like market
| manipulation, that anyone giving large sums of money to any
| media/journalism organization must divide the amount up over
| 20-40+ years, so that organization at least has a runway and
| not dependant on larger "dopamine hits" at shorter intervals.
| nradov wrote:
| I trust DC Rainmaker's reviews of fitness tech products
| because he always returns products back to the
| manufacturers after writing reviews. So there's no conflict
| of interest based on free products.
|
| https://www.dcrainmaker.com/product-reviews
| ceejayoz wrote:
| If companies don't like his reviews, they'll stop sending
| review units. That hits both in the pocketbook and the
| race to be one of the earlier reviewers of a new product.
| Reduced conflict, perhaps, but not none.
| whakim wrote:
| This presupposes that companies think their products are
| bad. If you have (what you believe to be) a good product,
| you definitely want DC Rainmaker to review it. I think
| this is a reasonably general point across industries -
| companies _want_ to get their products into the hands of
| the most reputable reviewers.
| bluGill wrote:
| Depends, if someone is popular you can't afford not to
| have them review your things. A a certain point a bad
| review will still generate more money than no review at
| all. Few reach that level though, most reviewers don't
| have that much following.
| tchvil wrote:
| In DC Rainmaker's case it is probably the opposite. A
| fitness product not reviewed by him is a bad signal.
| nradov wrote:
| If companies don't send him review units then he just
| buys them retail. He has already done this for many
| products.
| ceejayoz wrote:
| Yes, I'm aware. That's less money in his pocket, and less
| ability to have the review be available on or before the
| product launch. There's still some conflict of interest,
| even if it's lessened.
|
| Only purchasing review units at retail would remove this
| conflict.
| _delirium wrote:
| Yeah, that's been a problem with reviews for a long time.
| In fact it's what Consumer Reports used initially to
| differentiate themselves: their "thing" was that they only
| reviewed products bought anonymously at retail (no free
| samples or manufacturer-provided review items) and didn't
| accept any advertising from manufacturers either.
|
| Sites that receive free review samples and are supported by
| affiliate links are kind of the exact opposite model.
| fassssst wrote:
| Companies like Sweetwater do this right. They have "sales
| engineers" that help you find what you're looking for over the
| phone or text message or email. It probably doesn't scale but
| as a customer, I don't care as it saves me so much research
| time and I consistently get what I'm looking for.
| xondono wrote:
| There's a lot of good quality reviews on YT on launch date of
| pretty much anything these days.
|
| It's not a problem of doing the review, it's that there's not
| much of a market for written reviews, most people would rather
| watch a video instead.
| fauigerzigerk wrote:
| Interesting. I never watch video reviews. They're painfully
| slow and impossible to search.
| midasuni wrote:
| There's not much money in written reviews, and people can't
| find them amongst the automatically written SEo/affiliate
| crap
| Karrot_Kream wrote:
| > It's not a problem of doing the review, it's that there's
| not much of a market for written reviews, most people would
| rather watch a video instead.
|
| I'd say it's more that YouTube offers a clearer path to
| content monetization than text does. YT is a much more
| lucrative platform for the same level of effort as SEO for
| their text blogs.
| basch wrote:
| How do you compare 3 or 4 videos before watching them?
| Watching video reviews of the reviewers?
| xondono wrote:
| Like most things, it's a reputation ladder.
|
| There's top channels like LTT and if what you are looking
| for is out of their niche, you look for the biggest
| channels in that niche and go mostly by association (who
| they have made collaborations with,..).
|
| EDIT: of course the big win of video reviews is that you
| can _see_ the thing working.
| verve_rat wrote:
| This resonates with me a lot. A few months back I upgraded my
| desktop's insides. New motherboard, CPU, graphics card, etc.
| That was the first time in about seven years I've gone looking
| for review for that sort of stuff.
|
| I remember doing the exact same thing in the past and being
| overwhelmed with information. The detail and data in reviews
| would take a long time to collate and make sense of. But this
| time even the big name sites seem to be much shallower. Less
| models reviewed, less testing and benchmarking, more
| regurgitated press releases and other news.
|
| Last time it took me a while to sort out all of the
| information, this time all my time was spent trying to find any
| that wasn't 100% fluff.
| NumberCruncher wrote:
| This resonates with my experience. A couple of years ago I
| invested more time than I proud of into buying the right
| bluetooth headset for me. I have found a site with pretty
| detailed reviews and tested their reviews standing in stores
| and trying dozens of headsets out. I also bought 3 headsets on
| Amazon and sent back all of them later. My impression was that
| the reviews on this particular site are 100% unbiased, where
| all other reviews I read just want to sell whatever product is
| in focus.
|
| I wonder how a search engine could distinguish between "honest
| & professional" and "fake & amateur" headset reviews without
| having a head and two ears?
| grvdrm wrote:
| Well said - it is among my biggest annoyances with the web.
| Reviews are almost always packaged into best-of or top-X lists.
| The quality of the Wirecutter is gradually trending down but it
| is still the website I use to find the "best" of something. I
| don't have to waste time sorting through hundreds of list-spam
| sites.
| thereddaikon wrote:
| Its a double edged sword. Reviews take effort so you want to
| make them easier for customers to write. But making them easier
| for your users also makes them easier for those trying to game
| the system. This is why Amazon's product reviews are useless as
| well as pretty much any other community based review system.
|
| But on top of that you do have the problem of whether or not
| someone is really qualified to write a review. So Joe User
| thinks product X is good. What is their metric for good? It
| reminds me of an LTT review of the Amazon TV from a few months
| ago. They gave it an awful rating but noted that the reviews on
| the product page were generally very positive. And their
| reasoning was that the people buying these TVs and reviewing
| them didn't have a good comparison point for what a good TV
| actually is. They are probably comparing it to a much older and
| less advanced product not to a contemporary one.
|
| So then you think the answer must be get reviews from industry
| related media. But then you fall into the classic problems of
| unethical journalists or simply ones that are out of touch.
| Spivak wrote:
| It's not a question of whether someone is qualified or not,
| everybody is more than qualified to write about their own
| feelings toward a product they bought or service they used.
| In fact no one is more qualified to talk about their own
| feelings and experience. How useful that review is to you is
| a combination of the writing quality and depth, and how
| similar the reviewer is to your own experience and
| preferences.
|
| Professional critics usually try to distinguish themselves by
| producing well written in-depth reviews but not from their
| own perspective but that of a hypothetical everyman who,
| ideally, is similar enough to a critical mass of their
| audience.
|
| So it always interests me when people complain about popular
| gaming review sites being out of touch because almost always
| it's the reader that's out of touch but doesn't realize their
| bubble. It's not an absolute rule but I'm in enough niche
| hobbies to realize that my desires for products are way out
| of whack.
| mminer237 wrote:
| There are still good reviews. For TVs, RTINGS produces high-
| quality reviews (although they're not listed super
| straightforwardly). For computer internals, AnandTech does even
| better. You don't have to talk about the absolute latest
| product out that same day for you to have quality reviews of
| other options in the meantime.
|
| Everyone just makes blogspam because its far less work than
| actually buying products and developing expertise and testing
| them and writing out a whole thorough review. Google's
| algorithms just can't tell a quality review from a surface-
| level, uneducated take.
| mikeryan wrote:
| I always liked the wire cutter for just kind of cutting through
| the crap and saying "this is the one". I wonder if we need some
| sort of thing for reviews where humans filter out the sites
| that are credible.
|
| It's a bit funny because this we sort of done by Jason
| Calcanis' Mahalo back in the day - but maybe he was just ahead
| of the SEO curve.
| loceng wrote:
| Developing common standards/protocols for everything required
| for a quality review vs. a "candy" or shallow hype review would
| be a good place to start, making it culture that everyone
| educated knows about to follow - and then they can only go to
| or support reviewers who list what testing protocols they
| follow.
|
| Industry has already done this with the "food pyramid" -
| influencing, capturing governments to make the food pyramid
| more based on economic reasons and much less on science - with
| the government putting it out and distributing it into
| schooling of different levels, giving it an unearned or
| undeserved authority which then people blindly trust/follow -
| not understanding that or when systems and their output or
| oversight have been captured; why the pandemic bringing the
| classroom home via Zoom, so parents could see/hear the learning
| material has outraged many parents - an example I've heard,
| where white children are being taught to feel guilty about
| their 'white privilege', or parents being upset their children
| are being taught at a very young age that they can decide what
| gender they are; I'm not stating what I believe here, just
| giving examples I've heard of.
|
| This capturing of the government is why I think ultimately the
| government should be developing and maintaining such platforms,
| as per law, and requiring individuals and organizations to in
| real-time add and update their data (simply example being
| restaurants, their menu's ingredients, their open hours) - in
| part to de-risk the government having an unnatural power as
| "the single arbiter" of truth, perhaps instead to de-risk
| capture that the government funds multiple independent
| organizations at the federal level - that States can decide
| which ones they follow, if necessary, part of why States exist
| - to de-risk the potential capture of the Federal umbrella;
| however the system is in an imbalanced, broken, captured state
| - with the duopoly evolving to be more extreme lead or formed
| by the establishment, with a broken voting system in arguably
| most countries of the world, and mainstream media being
| captured by for-profit industrial complexes that fund MSM
| through ad revenues - which further develop or mould our
| culture and narratives/talking points and beliefs, whether
| truthful or not; without fixing these the other
| platforms/systems excelling won't be possible.
| frenchyatwork wrote:
| I think one of the fundamental things that make search work
| well about 1-2 decades ago was that web sites would link to
| each other, and that those links could vaguely correlate with
| reputation. There were link spammers, but there was actually a
| some decent organic content as well.
|
| What's happened since then is that almost all the normal
| "people linking to things they like" has gone behind walled
| gardens (chiefly Facebook), and vast majority of what remains
| on the open web are SEO spammers.
| prepend wrote:
| I wish FB would be more open, but since they have all this
| walled garden info, are they well placed to start a competing
| search engine? Would be interesting if their activity could
| help filter out seo hackers.
| kritiko wrote:
| FB search seems to have gotten worse and worse. Unless I
| can remember the specific Group where I saw something, it's
| very unlikely that I can find it again. And they know which
| posts I've been highly engaged on...
| Mezzie wrote:
| Yup, early Google relied on a lot of unpaid , unseen human
| intervention and choices. I ran some weblinks and curatorial
| sites during the search wars, and PageRank could only work
| because there were people behind the sites choosing links
| based on their usefulness to their audience.
| wodenokoto wrote:
| Why has blogs and articles stopped linking to things? I'm
| reading a restaurant review site, and they won't link to the
| restaurant. The chef name is a link to a list of all articles
| tagged with the chefs name, rather a wikipedia link or
| something useful that can tell me who that person is.
| ufo wrote:
| There aren't as many blogs now as there used to be.
| adventured wrote:
| That will get worse yet most likely. Younger people no
| longer produce public text to the extent they did prior
| to the the smartphone heavy era. Supply of that blog
| style content will continue to dwindle as the producers
| age out. I'm sure there's a stability point it may reach,
| of course, because some tiny percentage of people will
| always want to write long-form.
|
| Younger people TikTok, they Instagram, they chat in
| private conversations with eachother, they occasionally
| post short messages in walled gardens like Facebook, they
| YouTube, they listen to music, they watch Netflix & Co.
| That's what they do. They do not persistently write
| LiveJournals, Tumblrs, blogs. That pre video/audio-
| focused era is over and it's not coming back (even if
| there's occasionally a bubbling up of hipster fakery
| centered around how cool it is to write text).
| nkrisc wrote:
| I find that claim surprising considering how many more
| people there are simply using the internet at all.
|
| Fewer unique blog domains due to "blogging" sites that
| aggregate users? Sounds plausible. Fewer people blogging
| overall? I'm not convinced yet.
| lethologica wrote:
| I wonder if the majority are moving to vlogging instead?
| ufo wrote:
| I think the bigger issue now is that more content is
| inside social media "silos" like twitter, instagram or
| youtube. I don't have the numbers though.
| Volker_W wrote:
| Why is this a problem? Can't google index social media
| silos?
| bluGill wrote:
| Which ones. They can index their own, but for the others
| only the public stuff. Facebook has a lot of things
| private so nobody can see them except your friends. (they
| are by no means perfect, but a lot of things are private
| and only seen by friends - most of it isn't of interest
| to a search engine anyway but comments of the form "I
| love X product" could in a perfect world be indexed as a
| sign of what people find good)
| Baeocystin wrote:
| I'd believe it. As an IT consultant, I interact with a
| lot of people who are semi-techs themselves- mostly small
| business owners who are used to wearing a lot of hats,
| and also the type to have been motivated to run their own
| personal blogs about
| diving/photography/conlangs/quilting/gardening/whatever
| their personal hobbies are.
|
| Ten years ago, the majority(!) had at least something up
| and running, where they would post essays, thoughts,
| whatever came to mind.
|
| Nowadays? All gone. All! When asked why, the answer
| almost always is along a mix of ever-increasing negative
| feedback and harassment from randos, and aggressive
| automated spamming of their forums. Loss of the pseudo-
| anonymity plays a large role as well. Many have deleted
| years' worth of work, simply because they are afraid of
| someone trolling through their posts to find something to
| harass them with.
|
| I was never a blogger myself, but I am sad about the
| change. There was a lot of good stuff out there for a
| while, and sometimes it just plain made me happy to read
| someone joyfully nerding out on a favorite subject of
| theirs.
| nkrisc wrote:
| I think a lot of people are still writing this kind of
| content, but you have to look elsewhere for it: Reddit,
| Facebook, Twitter; to name the obvious ones. It's also
| harder to find, but you can find all kinds of personal
| content written in comments and posts on these sites.
| Baeocystin wrote:
| I realize that this is a hard thing to 'prove', but I am
| personally certain that the amount and quality of such
| things has dropped significantly from a decade ago.
|
| Not to zero. You can still find things tucked away in a
| post on reddit or the like. Almost never, as far as I
| have experienced, on Facebook or its ilk, as the
| affordances are different. I genuinely think there has
| been a loss.
| kritiko wrote:
| I frequently append site:reddit.com to searches for a
| niche search term these days. I think a lot of people who
| would have blogged or commented on blogs are posting
| there instead.
| prox wrote:
| I wonder if it would be possible to have a big filter
| button "commercial" or "non-profit" or something along
| those lines. So you get results that are not deemed
| commercial or are.
|
| Don't know how hard it would be to know which is which.
| Maybe non-commercial : don't run ads, don't sell a
| product or service and provide information only.
| freeflight wrote:
| _> I find that claim surprising considering how many more
| people there are simply using the internet at all._
|
| Most of these many more people are mobile users, where
| creating long-style text content can be quite bothersome.
|
| What ain't bothersome, with a smartphone, is taking
| pictures and videos to slap filters over them, alas
| that's why we are where we are with TicToc, Instagram and
| Twitter dominating large parts of the web.
|
| It's even noticeable in a lot of online discussions with
| text outside of these communities; The average length of
| forum posts feels like it's gotten way shorter over the
| decades. People have less attention to read anything that
| looks longer than a few sentences, often declaring it a
| "wall of text" based on quantity of text alone.
|
| Imho it's a big part of what drives misinformation; Doing
| any kind of online research on a small phone screen is
| extremely bothersome compared to the workspace an actual
| computer/laptop, particularly with multi-monitor, gives.
|
| There's also the difference in attention; When I sit down
| at my laptop/desktop, I actively decide to spend and
| focus my attention on that task and device.
|
| While smartphone usage is mostly dominated by short
| bursts of "can't do anything else right now", I don't
| chose to take out my phone and surf the web, it's
| something I do when I'm stuck in some place with nothing
| else to do and no access to an actual computer.
|
| But for the majority of web-users [0], that smartphone
| access to the web is all they know, which then ends up
| heavily shaping the ways they consume and contribute to
| it.
|
| [0] https://techjury.net/blog/what-percentage-of-
| internet-traffi...
| lmkg wrote:
| I heard an interesting theory the other day: blog
| viability declined because Google killed Reader. Which
| indirectly ends up poisoning Google's biggest well, since
| blogs are an important source of relevant cross-domain
| links.
|
| I'm somewhat skeptical, it seems a little _too_ poetic to
| blame Google 's ultimate downfall on a decision that was
| notably hated at the time. But it's plausible. If you
| want it to be a conspiracy theory, you can posit that
| killing off independent blogs was the _intent_ , to
| convince bloggers to migrate to Google Plus.
| mtgx wrote:
| orcasushi wrote:
| Average websites goal is now to keep you on them as long as
| possible. According to some metric folks, the longer you
| stay on a website the more money you spend there. Linking
| to another website destroys that metric.
|
| Also if you are going to make a purchase somewhere, any
| website would try to get a cut of the money you spend by
| actually sending referral links to the product. So small
| websites that do not allow this service will not get linked
| so much.
|
| On a metalevel it is thus that links or connections between
| items are information. Information is money. And as soon as
| that became evident links and connections also became more
| scarce.
| ijidak wrote:
| Because, years ago, linking to lower reputation sites would
| drain your page rank.
|
| So everyone worried about SEO became afraid to link to
| anything except:
|
| 1) Their own website 2) High reputation sites like NYTimes,
| etc.
|
| It's sad. Makes it harder to navigate the web.
| mrkramer wrote:
| Wouldn't it be reasonable from Google to show how their
| ranking algorithms work so all webmasters and content
| creators now how to behave on the web. Now we have black
| box that's causing confusion and is misdirecting websites
| and web users.
| freeflight wrote:
| _> It 's sad. Makes it harder to navigate the web._
|
| Some would even say it killed the web by centralizing all
| the content in the hands of a few [0].
|
| Which is the direct consequence of everybody optimizing
| to better show up on Google/Facebook/Amazon/Microsoft and
| ultimately even migrating all their hosting to these
| companies.
|
| [0] https://staltz.com/the-web-began-dying-in-2014-heres-
| how.htm...
| chongli wrote:
| Bang on. Saying that "there isn't anything out there
| anymore" is missing the point: Google's algorithms
| _created this situation_ , intentionally or not. Before
| Google, people linked to what they wanted and communities
| would naturally cluster around topics of interest. Google
| came in and made reputation into a currency which
| effectively destroyed all these communities through
| incentivizing selfishness.
| wussboy wrote:
| Surely there is just a different algo that could bring
| about better communities?
| jonathankoren wrote:
| Different, but not better.
|
| The incentives to game the algo remain. People adapt to
| the environment.
| amelius wrote:
| > The incentives to game the algo remain. People adapt to
| the environment.
|
| Perhaps it could work if the algorithm changed its
| algorithm all the time.
| kilburn wrote:
| That's why mechanism design [1] exists as a field of
| study. The whole idea of that field is to provide the
| proper incentives to steer the participants towards your
| objective. Yes, considering they will try to "game" the
| system however they can.
|
| I'm pretty sure google could do strictly better (i.e.:
| better in all reasonable accounts) than they do now if
| they focused on the users' experience instead of revenue
| for a couple terms.
|
| [1] https://en.wikipedia.org/wiki/Mechanism_design
| marcosdumay wrote:
| Only if implemented by the monopolist.
|
| People's best chance is stopping using Google and pushing
| for it to be broken-up.
| Beldin wrote:
| "When a measure becomes a target, it ceases to be a good
| measure"
|
| -- Goodhart's Law.
|
| Google's algorithms didn't create this situation; people
| chasing high Google rankings did. Had Google used
| completely different algorithms yet became equally
| dominant, people still would have poured their hearts and
| souls into getting higher rankings.
|
| Basically, an application of the tragedy of the commons.
| Or: "why we can't have nice things".
| chongli wrote:
| But that's taking for granted that Google would have
| become dominant. Perhaps if they hadn't chosen the
| algorithm they did then they wouldn't have been as
| overwhelmingly successful. Instead, I could imagine a
| world in which there are multiple search engines and none
| of them are all that good. In fact, that's the world I
| remember from before Google existed. Search was bad but
| communities were strong and life was good.
|
| Then Google came along and we all found it a lot more
| convenient than the bad search engines we were used to.
| And of course, we all know where that led. In some sense,
| Google built an 8-lane superhighway and bypassed all the
| small towns.
|
| We all traded away paradise in exchange for convenience.
| Now we have neither.
| Beldin wrote:
| On the glass-half-full side of this: we're getting those
| communities again! Here on HN, on reddit, for certain
| topics on various social media (there are pearls there
| too), on Mastodon, various blog authors, Ars Technica,
| Quanta, etc. [1]
|
| It's just fragmented - i.e., catering to a specific
| group. Because if it isn't, it's awesome for 5 minutes
| and then monetization rot sets in.
|
| [1] None of these work for everyone; conversely, all of
| these are seen as great things by some and have people
| who prefer that one thing over others for its quality.
| mrkramer wrote:
| >Google's algorithms didn't create this situation; people
| chasing high Google rankings did.
|
| But lowkey Google incentivized such behaviour by not
| being open and transparent on how exactly their
| algorithms work.
| B-Con wrote:
| That would have allowed people to artificially chase
| rankings even faster and more efficiently. It makes the
| problem worse, not better.
| inlined wrote:
| Does this mean that Facebook is the only company well poised
| to take on google search?
| dpeck wrote:
| I agree very much with this. It seems that between the walled
| gardens and also people being so reluctant to have "their"
| audience leave their site/page/etc the discoverability of the
| web has dropped dramatically.
| ZetaZero wrote:
| That's an interesting observation. IMO, we stopped linking to
| good content because Google was good at finding it. Now
| Google is suffering, and we need to go back to doing more
| links.
| hubraumhugo wrote:
| Search engines are pretty good at solving the problem they were
| designed to solve, which is "finding pages which contain all
| the query words". But they are pretty bad at solving the much
| harder problem of rating the trustworthiness & authenticity,
| intentions of the owner, monetization of the site, etc.
|
| One possible solution to this could be:
|
| - Let the community vote on the most trusted sources
|
| - Include results from enthusiasts that have little incentive
| to write biased reviews (Reddit, HN, expert forums)
|
| - Look at the ownership of the site and how transparent they
| are about it
|
| - Regularly reassess these criteria
|
| This wouldn't scale for a generic search engine, but I'm
| working on a service that does this for many product
| verticals/niches.
| darkwizard42 wrote:
| Agreed here, but in your second bullet, people have great
| incentive to write good quality reviews on Reddit, HN, expert
| forums... karma/recognition etc. It just so happens that
| these "forums" have built in voting systems that they spend
| time preventing from being gamed so the search engine doesn't
| have to.
|
| Not sure if this is a good model for a search engine, but it
| does work to a small degree in those forums.
| onion2k wrote:
| _people have great incentive to write good quality reviews
| on Reddit, HN, expert forums... karma /recognition_
|
| Internet points are a terrible reason to write anything.
| They're completely meaningless. We should all judge
| comments on their own merit and not because the author has
| a lot of karma. Apart from mine, obvs.
| sam0x17 wrote:
| > If you want a review for something that came out today, there
| is no way that work could have been done, so there simply isn't
| anything to find.
|
| I think in practice this is actually largely untrue -- with
| technology products, video games, movies, and just about
| anything I can think of, most well known reviewers are given
| early access to the product so that reviews can come out on or
| before day 1 of general availability. That said this does
| create a dirth of 100% trustworthy reviews on day 1 since
| companies are naturally disincentivized from giving early
| access to reviewers who they know are going to write a negative
| review.
| ColinHayhurst wrote:
| "SEO" spam is "Google SEO" problem. So SE ranking Optimization
| is not (yet) so much a problem for other Crawler/index SEs
| (Bing, Mojeek, Gigablast). You might say that Amazon (in
| eCommerce) and TRIP (in Travel) have cracked the problem of
| combining good/deep Content/Reviews and Category expertise with
| Search.
|
| We regularly see partnership opportunities with customers
| interested in our API [0]. I presume Bing see the same, though
| their terms are more fixed and require you sharing more data.
| Definitely big opportunities in other categories, which are
| often squandered through a naive, if understandable route, of
| choosing a Scrape and index route.
|
| [0] https://www.mojeek.com/support/api/
| deadalus wrote:
| I also consider Paywalls to be spam. Clicking on a link and
| finding out that it is paywalled, is a massive waste of time.
| imranhou wrote:
| Agree that is annoying, but if you start excluding such results
| then how does one find that type of content?
| deadalus wrote:
| You clearly label paywalled content with a symbol or image.
| imranhou wrote:
| I think the only issue there might be that google might be
| unaware that it is a paywalled content due to how many
| sites allow crawlers access to content but not to users
| (based on crawler ip ranges). Agree such a flag would save
| time when available or even a search filter option to skip
| those results.
| pictur wrote:
| People don't want to search anymore. they want to see well-
| categorized data. For example, instead of searching for cheap
| vacuum cleaners, I think they want a site that lists vendors that
| sell cheap vacuums.
| [deleted]
| Marazan wrote:
| Google's ranking alhorithm shaoes the web.
|
| And the web now looks like a 1500-2000 word listicle with 3
| images becasue that is what thr ranking algorithm favours.
|
| If you find the info you need and leave quickly that actually
| down ranks the page. That is is idiotic. Pages that give you what
| you want quickly are punished!
| jefftk wrote:
| I'm seeing a lot of comments along the lines of, "Google shows
| ads on the SEO-gamed sites that show up in results, so their
| incentive is to give spammy results". But wouldn't this predict
| that results would be much better on Bing and other search
| engines that don't have much presence in the "put ads on random
| sites" market?
|
| (Disclosure: I work on ads at Google, speaking only for myself)
| going_ham wrote:
| > But wouldn't this predict that results would be much better
| on Bing and other search engines that don't have much presence
| in the "put ads on random sites" market?
|
| Honestly what I think is every search engine sucks these days,
| and Google manages to suck a little bit less.
|
| The reason is because how easy it has been to publish low
| quality content. It's rare to find high quality contents. The
| issue with search engine is that they don't show these rare
| contents. These aren't recommended by default. These are
| hidden.
|
| What happened is the recommendation system is broken! If there
| weren't any neural networks making decision, it would have been
| different issue. But with modern search engines deploying
| recommendation system, I think it is all about rich gets richer
| scheme. You can't recommended new or fresh but quality content
| because it was never visited! So, when the entire backend is
| relying on user data, the system is being fed crappy data
| because users don't care and those who do are few in number.
|
| As long as system is making revenue, it will be this way. Most
| people never care at all and would never be bothered because
| they only care for simple queries. If anyone deviates from the
| norm, Google search results are pretty bad like every other
| search engines.
| [deleted]
| Quenhus wrote:
| For developers, you can remove some spam websites from Google and
| other search engines, with these uBlock filters:
| https://github.com/quenhus/uBlock-Origin-dev-filter
| RichardHeart wrote:
| His suggestion basically is to become DMOZ.org If you are old
| enough to remember it.
| hnbad wrote:
| I guess Paul's definition of "beating Google" is "creating a
| startup without clear revenue path aiming to be acquired by
| Google or a competitor" as I can't think of any meaningful way a
| niche search engine would provide a good enough value proposition
| against existing Google competitors or embeddable search engines
| (as well as SaaS like Algolia).
| aaron695 wrote:
| abakker wrote:
| I have a version of fixing this that I would personally enjoy a
| lot. Leave google alone, let it crawl the web, prioritize what it
| wants to via algorithms. But, give me a version of that which
| ONLY surfaces results from discussion forums (including SO,
| Reddit, HN, etc). For most of the stuff where I am actively
| _searching_ and not just looking stuff up, discussion forums of
| motivated, self-selected contributors have the stuff I need with
| the context I need. It used to be that blogs had answers, but
| that media has been categorically ruined by SEO.
|
| Now, one of the deficiencies here has been examples. Try this:
| "best miter saw". you will not find any websites that actually
| discuss the answer to this question, despite it being a product
| category with a lot of price variability and performance
| tradeoffs (weight, capacity, power, cord vs cordless, accuracy).
|
| Nearly any product reviews for large purchases follow the same
| pattern unless consumer reports has decided to dig deep (e.g.
| washing machines).
|
| How about guitar strings? Sandpaper? Printers? google's algorithm
| has allowed profit motivated websites to displace the commons to
| too great an extent.
| techdragon wrote:
| How can they tell it's a discussion forum? Does the scraped
| search spam that copies stack overflow content look enough like
| a discussion it fools their heuristics? Is it a manual process
| (in which case you can bet it "doesn't scale" and won't be
| built) this is the problem they face. Literally nothing is
| simple given the size of their dataset, the scope of their user
| base, and the adversarial nature of the very world around them
| generating new data they must work with in order to do the job.
|
| There's definitely an element of "we got our profit so fuck it"
| with respect to the search engine advertising business and
| Google's incentives to make search quality better, but that
| doesn't change the difficulty of the underlying problem. If I
| wanted to pay Google $5 per search for super high quality
| results, even $50, they can't just make this product better to
| get my money, they are fighting an ongoing war against
| adversarial SEO which prevents this from ever being better than
| stalemate at best or more likely due to economics, the slow
| slide into declining quality we see due to the SEO side having
| more money with which to pay for engineering brainpower.
| eitland wrote:
| https://search.marginalia.nu has a very interesting approach to
| this:
|
| Use tracking especially and JS generally as a weight in ranking
| so sites that contains much of any of these needs to be
| exceptionally high quality to float to the top.
|
| This means sites with limited ads and tracking, typically
| enthusiast driven pages float to the top.
|
| Now always when someone discusses a novel way to combat webspam
| someone will immediately counter: if this becomes popular SEO
| hackers will immediately start doing this.
|
| Well - if reducing page size and removing tracking becomes a
| leading SEO trick I can deal with a bit of SEO hacking :-)
|
| Yet for some reason I feel Google won't start using this very
| simple metric :-)
| marginalia_nu wrote:
| Another big factor in what I do is prioritizing the opinions
| of certain indieweb sites when ranking domains, basically a
| segment of the graph consisting of humans with a particular
| dislike for seo spam. This makes ranking manipulation much
| less effective.
| ffhhj wrote:
| Good to see you in this thread! I just added Marginalia to
| the recommended search engines in a new search tool I'm
| building, to get programming answers faster. The search
| assistant builds queries for specific sites with
| "site:targetsite.com , programming question" (that comma is
| not a typo). When doing a query like that I get no results
| but these warnings:
|
| /!\ The term "," contains characters that are not currently
| supported
|
| /!\ Try rephrasing the query, changing the word order or
| using synonyms to get different results. Tips.
|
| sample: https://search.marginalia.nu/search?query=site%3Ast
| ackoverfl...
|
| Please make your engine ignore the comma, it shouldn't
| affect the search.
|
| Either ignore the site:... expression or filter sites
| accordingly.
|
| Thanks a lot for creating Marginalia!
| marginalia_nu wrote:
| Ignoring comma seems doable, I'll have it fixed in a few
| days, currently away from my work computer.
|
| site:-queries are supported, but only at the first domain
| level. (e.g. site:marginalia.nu; not
| site:search.marginalia.nu). I might tune it so that it
| strips subdomains automatically, that is pretty trivial.
| mjr00 wrote:
| My current solution for this is to just tag `site:reddit.com`
| to the beginning of Google searches. A Google search for
| `site:reddit.com best miter saw` has a lot of relevant results.
|
| Marketers/SEO people are starting to infiltrate this as well,
| but since they can't control and SEO the content on Reddit
| nearly as much, this still works pretty well for now.
| abakker wrote:
| Of course! this is a tip I got from HN years ago. it gets old
| to add Reddit, PracticalMachinist, Fine Woodworking, etc. I
| really want a proxy for user generated content where nobody
| got paid to write it.
|
| Fun story: I knew a person who worked for a home building
| website part time. She got paid to write stories on home
| renovations. She had never done _anything_ she wrote about.
| Mostly, she gathered up other blogspam and recycled and
| rewrote it without citation. Sometimes she went to forums,
| sometimes reddit, sometimes youtube. But, the universal part
| of it was that she had to produce 2x pieces of content per
| week endlessly. Just for a local LA builder. Most of the
| content wasn't "wrong" but, it also wasn't exactly incisive
| and didn't include any details that would have been useful.
| Instead it was just filler. The worst part is that it
| consistently improved that company's search ranking.
|
| Content farming needs to die.
| ringworld wrote:
| You may be interested in searX - it refers to data sources as
| Engines; you have the ability to run your own instance (or use
| a public shared one) and only enable engines you want results
| from (reddit, stackoverflow etc.). Build your own meta-engine
| recipe, basically.
|
| https://searx.space to learn / get started. Find one and visit
| it, click Preferences upper right then follow your schnoz.
| ffhhj wrote:
| The results include all the SEO spam that infected Google,
| ie. SO clones. How is it better?
| ringworld wrote:
| The GP commented about using curated sources. One can
| disable all those (google, bing, etc.) and choose to only
| enable results from reddit, wikipedia and so forth in
| searX, which directly queries based on a config inside the
| project.
| mg wrote:
| Why not try writing a search engine specifically for some
| category dominated by SEO spam?
|
| I like to compare search engine results and wrote this tool to
| make it easy:
|
| https://www.gnod.com/search
|
| There in fact are many vertical search engines. You can click on
| "more engines" to see the whole list.
| Debug_Overload wrote:
| The fact that you included Reddit, SO, Google Scholar etc is
| awesome (I thought it was only for main search engines). Thanks
| for sharing. Bookmarked.
| noduerme wrote:
| Okay that's sort of what the !bang in DDG is for, and why it's
| a meta-engine. What's the blue sky ideal for a real, no-
| bullshit, everything search engine that doesn't fall prey to
| the constant flood of garbage?
|
| I have an exterminator who comes to my house every couple
| months, and sets up traps here, poison there. I don't have any
| rats in my house. I do see rats running across the yard
| sometimes. The exterminator explains it like this: Rat
| pressure. The rats overpopulate and there's "pressure" (like,
| uh, "memory pressure", which is also a fluid concept) so they
| try to get into your house more, through smaller holes, as a
| function of how much outside drama is going on, how many they
| are and how overpopulated, how scarce their food supply is, how
| cold it is outside, and whatever else drives rats into your
| house. (I love the dude who's my exterminator).
|
| Anyway, this is the same problem every search engine faces. The
| more surface area they expose, the more pressure they have
| building, the more ways people have to fake out their systems.
|
| We _have_ to go back to the 1990s Yahoo! model. Curated
| content. A list of websites that are reputable. _1990s Yahoo is
| the future_.
| loceng wrote:
| Creating a "trustless" search crawler, where anybody can
| participate, and then applying an algorithm to determine
| trust or value feels like it'd be a never-ending arms race -
| that'd require AI and extensive/expensive resources that is
| likely better invested in developing real trust networks and
| curation; curators are corruptible and regulatory capture of
| policy is possible if the organization is infiltrated or
| poorly overseen.
|
| Carte blanche opening your system up to anyone to inject data
| seems like the wrong foot to start off on, whereas my
| curating a moderator, someone I personally know and feel good
| about, trust to whatever level, and hiring them - ideally
| making sure they're someone you respect and you're someone
| they respect, pay them well, and at scale will be able to pay
| for itself; this did just bring to mind however big pharma
| and pharmaceutical trials structure and how that system can
| be/is/has been captured - and so perhaps the pressures when
| dealing with multi-billion dollar market categories will
| always lead to shenanigans if ever trying to centralize too
| much, not allowing for de-risking and broader resource
| distribution via sales/profits to more parties than the
| "5-star" rated products.
|
| In a thread on HN, I think it was yesterday, a few people
| posted about review sites where some product reviews are free
| - but others you had to pay for. A system to facilitate such
| organizations could allow a highly competitive environment,
| where organizations develop/build a brand - build trust for
| their brand as being competent and thorough - and so then
| over the lifetime of a reader/customer, perhaps they'll spend
| $1,000 buying reviews (say 333 big purchases over 40 years
| that you're willing to pay $3 a hit for) to make sure they're
| ; mind you there will be organized that could be captured to
| say promote one conglomerate of products over another,
| perhaps even regionally, but I'm beginning to think it's a
| necessary layer to combat the shit show that is Amazon (et
| al) reviews. Ideally these systems and how the reviews
| present the information, and how thorough - the technical
| depth and breadth and testing done - will help educate those
| who dive into using this system, which will sharpen
| themselves while keeping reviewers on their toes and arguably
| strengthening their organizations and competency as well.
| fsflover wrote:
| > Creating a "trustless" search crawler, where anybody can
| participate, and then applying an algorithm to determine
| trust or value feels like it'd be a never-ending arms race
| - that'd require AI and extensive/expensive resources
|
| Not necessarily: https://yacy.net
| noduerme wrote:
| Legitimate question; if this is a real business model (and
| I believe it could be) then why the fuck does Yahoo.com
| look like a dead clickbait aggregator instead of, yknow,
| what it used to look like? i.e. FINANCE ___ [Stocks] [ etc]
| ENTERTAINMENT ___ [Movies] [TV] Where's _that_ site?
| loceng wrote:
| Visionary leadership left the company? And arguably it
| lost its soul and excitement. I genuinely thought when
| Marissa Mayer was brought in as CEO and announced they
| bought Tumblr for $1.1 billion in cash, I thought she
| could actually turn Yahoo! around - that she understood
| platforms and holistic systems; perhaps she did but her
| hands were tied, and then they made terrible decisions
| like banning porn on Tumblr - so bureaucracy, politics,
| and arguably the ad industrial complex and "mainstream"
| pressures (perhaps like billing/financial system being
| used as a tool to suppress freedom/sexuality/porn because
| they've not been successful politically) were pressures
| her or the Board of Directors couldn't counter.
|
| Then Google came along and was a better search engine,
| for a time, that was a traffic leak for Yahoo! - and then
| Google has now devolved; I also thought Google had a good
| shot at competing with Facebook, but whomever's pulling
| the strings there, the launch of various platforms, they
| don't seem to understand it can take 5-10 years after the
| MVP of a product is launched for it to mature - but for
| whatever reason their executives or managers haven't been
| comfortable pulling the trigger, arguably because anyone
| with that entrepreneurial spirit just takes their idea
| and gets funding and owns a large portion of whatever
| they've done; but then you can never develop a full
| breadth, holistic ecosystem, that can grow into every
| crevice, nor as broadly, or nuanced as possible - so
| they're stuck being Search, Gmail, Calendar, etc.
|
| I'm quite certain I've figured out the foundational MVP
| facilitating an "infinitely" growing system and that
| would allow 3rd parties to integrate, however I have
| severe chronic pain that messes up my executive function,
| so it's difficult for me to actually self-direct and
| execute - I'm stuck mostly in a low activity, stream of
| consciousness and go-with-the-flow life of routine -
| otherwise I would try to launch my plans, which I've done
| plenty of UX/UI for, as that is simple enough that
| somehow bypasses higher executive function (moving a
| pixel and then responding via visual feeling of it isn't
| complex) - but organizing to turn that into adequate
| specs to get solid estimates or fixed price quotes for
| work is extremely difficult to me.
|
| On January 11th I do have a surgery that may or may not
| reduce my pain by 50%+, may or may not improve my
| executive function, ... I've even attempted to write
| draft "Show HN:" posts to explain what I am doing, the
| starting feature sets, the reasoning behind the design
| decisions I've made - but it just gets too complicated
| too quickly for me mentally then to be able to organize
| further or polish it. I think I have the perfect domain
| name for it too: ENGN (engine), what makes me smile every
| time I notice it in my layout/mockup of it is in the
| search input box it says "Search ENGN". My username on HN
| actually is an older incarnation of a plan I had, loceng
| being a short form of "local engine" - and ENGN being
| from engine, a name I brainstormed after Tumblr sold for
| $1.1 billion - and I realized that eventually I'd want to
| try to do my "local engine" idea but that that was too
| long for a brand name. Fortunately engn.com was for sale
| at the time, I can't remember if it was $2,000 USD or
| $4,000 USD - either way, not a bad price for a 4-letter
| .com that's pronounceable to something with meaning.
|
| I've wanted to write a book too - on health, health
| systems, and on these systems we're talking about here.
| I'm 38 now and I taught myself to program when I was 11,
| learned SEO at 15, evolved to design as I'm more creative
| and programming became mindnumbing to me, and eventually
| thought I'd need (or want) VC money - so I started
| engaging on Fred Wilson's of USV.com's blog - AVC.com -
| so I have plenty of self-taught experience. The problem
| is even going back to my shorter or longer writings, or
| comments of mine on HN or other, it's nearly impossible
| for me to try to do the organization of it all - to
| compile parts, etc.
|
| Maybe this surgery goes well and I can begin to do more,
| or maybe it doesn't; I've tried to hire people or get
| help over the years but 1) no one has been willing to
| engage enough as I'd need due to my executive
| dysfunction, and maybe that's a moot point as 2) it's
| extremely difficult for me to even manage someone or an
| ongoing project - whereas I could explain things and
| direct if people are initiating, if others are directing
| the conversation, then I could respond - but otherwise I
| can't do normal oversight and management any longer.
|
| The most accurate odds I can give that this surgery will
| help (piriformis syndrome, my sciatic nerve goes through
| the piriformis muscles, rather than around it - so
| there's constant compression + that's worsened with
| use/engagement of the muscle) is 50/50. There's a high
| probability that this surgery t's not related to the
| primary source of my pain, which is from LASIK eye
| surgery I did 7 years ago - I got arguably the worst of
| the worst symptoms: central sensitization and
| hyperalgesia - a hypersensitivity to pain, where all
| sensations, pain especially, is amplified to as what
| seems as strongly as possible; and why I must highly
| limit my activity level as any little stresses on body,
| likely even normal natural muscle use which causes micro-
| tears, then compounds the problem and will take many days
| of very low activity to return to a still difficult-
| dysfunctional baseline. But perhaps there's a high
| probability that the sciatic nerve having been compressed
| for most of my life, my mind, nervous system, could
| handle that level of pain/sensitization - but then the
| damage to the cornea that happens in 100% of LASIK
| surgeries was what finally broke the camels back.
|
| If this surgery doesn't go well, doesn't help - which it
| took me 1.5 years to even find a surgeon who does this
| type of specialized surgery - then I'm afraid I may end
| my life because this pain, the lack of productivity, of
| being stuck, of quite little social interaction overall -
| HN is likely the most stimulated my mind gets, only
| possible to write this fluidly when it's been at least 3
| days of eating a very low inflammatory diet and very low
| activity level - and only if I've been mostly inactive,
| primarily sitting, since waking and getting out of bed -
| to not trigger any pain in my body. Anyway, it gets
| boring, repetitive.
|
| I've thought of trying to find an
| Elixir/Phoenix/React/etc developer or agency on Upwork
| before surgery and try to struggle to get them on at
| least developing the initial foundation of ENGN, but
| aside from the struggles I listed above that I'll
| encounter, it also will cost additional money - and I've
| not worked in 5 years, I've spent $250,000+ on stem cell
| treatments to heal old high school football injuries,
| that I didn't even know I mostly had and only weren't
| tolerable after LASIK made my nervous system super
| hypersensitive - and to pay for this surgery my mother is
| taking $27,000+ USD out of her retirement; I'm in
| Ontario, Canada, but the healthcare system has been
| practically useless to me. Even if the foundation for
| ENGN would cost just $5,000 to $10,000 to get the ball
| rolling in terms of starting to get users to signup and
| bringing in revenues - it's more money, but even thinking
| about that additional stress would put on my mother, then
| adds to my already overwhelmed nervous system - so
| there's plenty of resistance there to overcome on its
| own. There's also always the potential I'd somehow hire a
| bad contractor or agency, bad in one or many ways, and
| then the MVP wouldn't get finished - primarily because of
| my own incompetency-dysfunction, and then I will just be
| reminded, again, of how stuck my life is and how it
| barely moves forward - personally or professionally.
|
| I'm living a version of the Groundhog Day movie that
| keeps repeating itself, except where I'm in pain, and so
| far where I can tell people my story and ask as many
| people as possible to help and nothing happens. It's why
| last thing I try is this surgery, though I am supposed to
| do another PICL stem cell treatment - where they treat
| tissues inside of my neck - the first one did reduce my
| neck pain and migraine some - for whiplash related
| issues, in part from football - because they only treat
| one side of the tissues, not all of the tissues, the
| first treatment - and so the second treatment they target
| the remaining high yield tissues. But I'm certain I'll
| know after the surgery if there's any improvement or hope
| that my life can start to become different, and even
| though there is a stromal stem cell treatment that was
| developed at University of Pittsburgh - that had very
| successful human clinical trials, that were fast tracked
| under compassionate grounds in India, to heal/regenerate
| deeper corneal tissue for severe scarring and chemical
| burns - the ETA for it being FDA approved was 5 years to
| be clinically available in the US, perhaps less time
| before available in India, but I'll have nothing else
| significant treatment wise for further pain reduction to
| look forward to in the near term after getting this
| surgery - and so why I'm afraid I won't be around much
| longer if it doesn't help much.
| noduerme wrote:
| Lot to digest.
|
| Listen, first of all, do not consider ending your life.
| Seriously, you're way too smart for that. I'm sure you've
| got it worse, but I've had enormous sciatic problems in
| my life, I've had 3 herniated discs; they're behaving for
| the moment after massive doses of cortisone and without
| any painkillers, but I know what it feels like to cough
| them all out of my back at the same time. Not to be able
| to put a foot in front of another or turn your neck for
| weeks. (I'm a huuuuge fan of intramuscular cortisone
| injections, though. Like 5 or 6 large cortisone over a
| week, with some B-12. Every couple years. Not in the
| spine... fuck that. Alternating butt cheeks. You won't
| feel any benefit until the third day at least. If you can
| convince a doctor to give you that for a week, you will
| be fucking superman. They won't do this in America unless
| you know a doctor personally, but they'll do it in Mexico
| or Spain. I had it the last time my discs went out and
| it's been 6 years and the inflammation has not come back.
| They thought it would).
|
| Anyway, before you off yourself, do try a fuckton of
| intramuscular steroids. The fifth day I levitated off a
| bed in the hospital; I hadn't walked in a week; I felt so
| good I went to a club; I got drunk and spent the night on
| a beach drinking and making out with an 18 year old model
| from Denmark. Seriously. There were wild cats walking
| around; it was winter on the Spanish coast. If you do one
| thing before you die, go get five cortisone shots in your
| ass, in a week.
|
| I also got the hiccups for 24 hours and couldn't sleep,
| but that's neither here nor there. And I got temporary
| blindness in my left eye from fluid behind the retina,
| caused probably by too much testosterone. But. Goddamn
| it, I'm ok. You can be okay.
|
| Enough about that.
|
| About Yahoo and Google. That entrepreneurial spirit is,
| in my experience, way too often just about getting the
| funding and fucking off. We all know why these companies
| go downhill, but somehow it's always such a shock when
| they actually deteriorate in front of our eyes, huh?
| Google's search results, for instance. I would have
| expected their cofre business to stay more or less fine,
| not collapse a couple years after all the competition was
| eliminated.
|
| It would be fine if they didn't grow into every crevice.
| Get search right, that's all we ask. I don't want Google
| to be my chat room or my shopping site. Why do they need
| to? Search is huge. They own 90% of the market.
|
| >> but organizing to turn that into adequate specs to get
| solid estimates or fixed price quotes for work is
| extremely difficult to me.
|
| That's always the worst. The business side. I've always
| just built things and hoped for the best. It sounds like
| you've got something interesting going there, although I
| have no damn clue what you're building, that's an
| exciting feeling. ENGN is killer. If you own ENGN.com,
| hell, money well spent.
|
| I don't understand what you mean about "executive
| function", since you obviously have the capacity to write
| well-crafted email and think pretty clearly; perhaps I
| lack the executive function to discern your lack of
| executive function (I'm a brutally self-punishing
| alcoholic, but otherwise a damn good programmer)
|
| Anyway I don't know if you're trying to ask for pointers
| to workers for this concept, I'm probably not it; I'm
| $200/hr and I'm already covered for the next year. This,
| however, should be your symphony. And I think you know
| how to do it.
| nojito wrote:
| The issue is and will always be monetizing. Anyone competing with
| Google will need to have a robust monetizing strategy to survive.
| SavantIdiot wrote:
| > And boy would Google find it hard to follow you down that road.
|
| This is a good perspective. Where can Google not go? Places that
| don't lead to profit. They will try (cough Wave cough) but will
| give up.
| stanleydrew wrote:
| But isn't profit the point of any company? If you're going down
| a path that doesn't lead to profit, you'll fail whether Google
| follows you or not.
| SavantIdiot wrote:
| No. That's why Non-Profits, 501(c) corps in the US, exist.
|
| E.g., Linus wasn't looking for profit, and Linux ate the
| world.
| james-redwood wrote:
| www.neeva.com www.kagi.com Two privacy oriented search engines
| with results and features better than and surpassing Google (did
| I mention that they're ad free?)
| aronpye wrote:
| A lot of the spam results just seem to be copy pasted content.
|
| I wonder how difficult it is to compare the main body of text in
| search results, then say if it is over a 95% match with another
| site (I.e. it has been copy-pasted), demote it in the search
| results. If a site generates too many of these demotions then it
| gets blacklisted from the index.
| nikanj wrote:
| How would you avoid throwing the original site out with the
| bathwater?
| aronpye wrote:
| Maybe try and time stamp the page, presumably the earliest
| page is the original source. Could also combine it with a
| site reputation rating or something similar.
| neoneye2 wrote:
| I have experimented using LSH (Locality Sensitive Hashing) for
| identifying similar documents, among 50k documents in total.
|
| My LSH implementation is here: https://github.com/loda-
| lang/loda-rust/blob/develop/script/t...
|
| Example of the 100 most similar documents:
| https://github.com/neoneye/loda-identify-similar-programs/bl...
|
| There can be false positives, so after LSH then do a more in-
| depth comparison.
| nobbis wrote:
| 10 years ago, the original engineer of Google's search engine
| told me what he now wanted was asynchronous, human-powered search
| with curated results, e.g. a Google-like interface, but queries
| cost $5 and take 15 minutes.
|
| Money's no object for him, so he wanted to outsource the
| filtering, ranking, and interpreting of results. Would be even
| more useful today (albeit a tiny TAM.)
| Nuzzerino wrote:
| I would have told him not to quit his day job. That sounds
| ridiculous.
| nobbis wrote:
| I don't agree, plus he's a billionaire.
| leobg wrote:
| Why's he not building it now, then? He has the means.
| nobbis wrote:
| Building costs more than money.
| leobg wrote:
| True. I'm wondering though if he hasn't given up on the
| idea. It's been ten years. Maybe he doesn't believe
| anymore that a human could do better, even if given 15
| minutes of time?
|
| If such a demand exists, should we not be seeing an
| active "secondary marketplace" for people offering to do
| 15-minute human meta search & research tasks?
| nobbis wrote:
| I said "wanted it" not "wanted to build/finance it."
| Doubt he's given up wanting it.
|
| A human can always do better: take Google's results, then
| remove SEO spam/duplicates, extract more relevant
| snippets, combine results from multiple nearby queries,
| etc.
|
| Demand exists, but someone has to build it. And it's
| unclear how big the market is.
| mroll wrote:
| > the original engineer of Google's search engine
|
| you mean Larry Page?
| nobbis wrote:
| No, the guy who re-wrote Larry's research code into Python
| and put it in production.
| PaulHoule wrote:
| For medical search the answer is pubmed. Not only is the
| collection of documents clean (of low-grade scammers, pharma
| companies have to pay big $ to play) but the NIH has done a large
| amount of search quality and ontology work -- the system knows
| "Tylenol" is synonymous with "Paracetamol", "Acetaminophen", etc.
| titzer wrote:
| > the system knows "Tylenol" is synonymous with "Paracetamol",
| "Acetaminophen", etc.
|
| This is the exactly the kind of thing that Google cannot fathom
| manually doing. As if entering facts into a computer were
| morally wrong somehow. They'd much rather launch the equivalent
| of a shell script that harnesses face-melting amounts of
| computational power, processing literally trillions of webpages
| in bulk, signal and noise together, junk, spam, and
| misdirection alike, to learn bad associations and then serve
| them up with no human review and then put the full force of
| their reputation behind a results page that apparently people
| never check and certainly can't correct because of the
| inscrutability of a machine-learned model that has few to no
| levers to adjust.
| leobg wrote:
| Isn't that kind of easy to do? I mean, you can do that kind
| of thing as a one-man-show on consumer hardware using GloVe
| or fastText.
| PaulHoule wrote:
| Google has long lied about what they do.
|
| I had a chance to debrief people who had left their relevance
| team and they told me things that were outright contradictory
| to what rank-and-file Google employees have told me. (What
| they told me did make sense in terms of my experience as an
| IR system developer, SEO publisher, etc.)
|
| Microsoft bought a company called PowerSet that had extracted
| a large database of entities and relationships from Wikipedia
| and used the technology to make the "Bing" search engine.
|
| Earlier Microsoft engines were a joke, but Bing was so good
| that Google saw it as a threat so they bought Freebase to get
| a similar kind of database, then they killed it to
| incorporate it into the "Google Knowledge Graph".
|
| For all of their hating on semantics note that they hired R.
| V. Guha as their chief scientist, who worked with Doug Lenat
| on the notorious
|
| https://en.wikipedia.org/wiki/Cyc
| coding123 wrote:
| > Lots of people want to be amateur police. (pg)
|
| This is very true. How many times have I clicked on a site met
| with ads so bad that the browser slows down, and after 10 seconds
| the page gets covered up by more and more crap and then a paywall
| shows up sometimes too. Now here's the thing - a competitor to
| Google might detect you clicking back and then pop-up a special
| set of controls near the search result that lets you say: "too
| many ads" or "paywall".
|
| However, if such an engine were to start beating Google, I'm sure
| Google would implement it in their own way: automatically detect
| why you clicked back in such a short timespan.
| numpad0 wrote:
| Google already detects immediate returns and knock off that
| link for you. What's problematic to me is I tend to reactively
| mash back and forward and link just goes from it.
| DenisM wrote:
| Sooner or later you'll have to deal with ballot-stuffing -
| companies trying to bury their competitors by casting lots of
| negative votes.
|
| Perhaps ML will help in detecting such campaigns.
| cinntaile wrote:
| > However, if such an engine were to start beating Google, I'm
| sure Google would implement it in their own way: automatically
| detect why you clicked back in such a short timespan.
|
| Do you seriously believe that Google doesn't use that as a
| datapoint already?
| birken wrote:
| The funny thing is that _if_ the people who worked on spam at
| Google were free to talk about it, I 'm sure it would become
| evident that they know more about spam and anti-spam efforts than
| anybody else in existence. It's a ridiculously hard problem,
| especially when people are targeting you directly. But they
| aren't free to talk about it, because if they did it would just
| give more assistance to the spammers, and make the problem worse.
|
| I'm not saying that curated search results for particular
| verticals is a terrible idea (though I'm sure like anything the
| devil is in the details), but on the whole Google search is very,
| very good considering the constant assault they are under from
| spammers (which most other search engines are not, at least
| directly).
| throwaway6845 wrote:
| > I'm sure it would become evident that they know more about
| spam and anti-spam efforts than anybody else in existence
|
| Really?
|
| I can point you to Hard Problems that have been solved better
| at little startups than at Google - or, indeed, at any other
| bigco. That's why acquisitions happen.
|
| Why does Google having 1000 engineers working on a problem
| automatically mean they are the smartest?
| colordrops wrote:
| You talk about this "constant assault from spammers" like it's
| not Google's fault and it's an intractable problem. That is not
| a correct characterization. There are plenty of low hanging
| fruit that could easily be detected and deranked, for instance
| scraped stack overflow spam. But google chooses not to
| deprioritize these results. The reason they don't is that they
| make money on ad clicks, which many responses have already
| elaborated on.
| [deleted]
| tibbar wrote:
| But why is Google even dealing with spam? What if they (or
| someone else) curated top websites for a given category? For
| instance, when I search for a programming-related term, I
| already know that I want to see the answer on either Stack
| Overflow or one of a few reference documentation sites. It is
| possible that some other site could have the answer instead,
| but in practice the random sites that often show up at the top
| of the results are usually SEO spam. A search engine that
| figured out or let you select the semantic space you are in and
| then promoted known websites - maybe ones you curate yourself!
| - would be a big improvement.
|
| Of course you can always hardcode the site you want in the
| Google search results but this is hacky and not very
| expressive.
| phsource wrote:
| This 100%. In travel, we see Google constantly tweaking its
| algorithms, and compared to Bing, Google surfaces a ton more
| small, well-written travel blogs [1]
|
| Not only that, Paul and Michael have seen plenty of startups,
| and at least in recent memory, the number of vertical search
| and consumer startups that Y Combinator has funded hasn't been
| that high
|
| As a consumer startup, I know this issue firsthand. Paul and
| Michael assume that if you build a better product, they will
| come! That's simply not true these days.
|
| Instead, you need to:
|
| - Build a better product
|
| - Option 1: Figure out a channel with enough growth on an
| existing platform. This likely means you're doing SEO for your
| new search engine
|
| - Option 2: Get your customer lifetime value high enough so you
| can pay for ads. This is tough, since it's a bit of a chicken
| and the egg problem since most search engines are monetized
| with ads
|
| As the founder of Wanderlog (YC W19; https://wanderlog.com), a
| consumer vacation planning app [1], I definitely remember the
| idealistic days when I thought the best consumer product on its
| own would win! But growth doesn't just come, and the same can
| be said of vertical-specific search engines.
|
| [1] Try searching "[your city] itinerary" on Google vs. Bing:
| it's much more likely you'll find a small blog rather than
| Lonely Planet or the local travel bureau as the top result
| noduerme wrote:
| >> - Option 2: Get your customer lifetime value high enough
| so you can pay for ads. This is tough, since it's a bit of a
| chicken and the egg problem since most search engines are
| monetized with ads
|
| nonoonononooonono. No. Don't monetize anything for the first
| 10 years. That's the only way it can work. Then you can go
| monetize it and buy an island and not give a shit if you
| destroy what you created.
|
| Oh but don't worry. You'll have investors.
| judge2020 wrote:
| [1]: both signed in, but with the profile image removed
|
| Bing: https://i.judge.sh/ShareX/2022/01/www.bing.com_search_q
| %3Dat...
|
| Google: https://i.judge.sh/ShareX/2022/01/www.google.com_sear
| ch_q%3D...
|
| Interestingly Google didn't have a top-result ad and the
| google.com/travel carousel is 4th from the bottom.
|
| For the actual results, both thefearlessforeigner.com and
| paigemindsthegap.com seem to be actual travel blogs (the
| pictures didn't appear in a reverse image search, so they are
| probably organic), but they're clearly geared towards being a
| 'faq' for visiting the city and have affiliate links where
| appropriate. Bing went straight for discoveratlanta.com, and
| frommers.com is well-thought-out but not a personal travel
| blog.
| padastra wrote:
| Hi! I used Wanderlog to plan a recent month-long group trip,
| which was definitely the most complex vacation I've had to
| plan. For context I am very active when traveling (e.g.
| multiple activities each day); so not sure how my experiences
| map to others.
|
| The best part of it was (going to a foreign country) being
| able to find / identify all the attractions relative to each
| other, so I could go to cluster A on Monday, cluster B, on
| Tuesday, etc.
|
| The hardest part of it (and why I needed to create a separate
| google sheets anyways) was--once I figured out opening hours
| of different locations, hard-to-book activities with limited
| reservations--the ease of moving things around more fluidly
| e.g. cluster B on Monday, cluster A on Tuesday, etc. and
| having a more information-dense view so I could see larger
| portions of the itinerary at once.
|
| It would be cool to have an "input everything" --> "input
| time restrictions / unmovable things" --> output planned
| activity cluster type workflow.
| dougmwne wrote:
| I'm sure they know all about it, but are prevented from doing
| anything by the business model. Pinterest has been spamming up
| my search results for years. Maybe other people find it
| helpful, but I do not. It's obvious I am never going to get
| value from Pinterest. Let me click a button to add it to my
| block list. One single click would have given me years of
| massively improved results.
|
| The fact that this feature does not exist shows that there is
| something deep within Google's core that is preventing them
| from addressing SEO spam, just like there is something deep
| within Airbnb that makes it difficult to filter out Airbnbs
| with problem reviews.
|
| Google has been coasting for a good long time and now major
| players are realizing they are wide open for disruption.
| ocdtrekkie wrote:
| I think the problem is just that the solution isn't in Google's
| wheelhouse: There is no algorithmic ranking system that can't
| be gamed. Human moderation and curation is the only way to
| provide true quality, and Google is allergic to solutions that
| don't automate and scale.
|
| I think a really good search engine would still algorithmically
| search it's index, but the content library should be human-
| curated with a goal of ingesting content via author, not via
| platform. Once a given author was human-approved as a quality
| source of information, content they produce could be
| automatically ingested going forwards, and conditionally re-
| reviewed by a human if there were reports the quality had
| decreased.
| specialp wrote:
| This was Yahoo in late 90s early 2000s. They had a human
| curated directory search where one could look up something
| like "kayaking" and find a bunch of sites on kayaking. Then
| if you wanted to search on keyword it was outsourced to
| AltaVista and later Google. Altavista results were terrible
| and were almost nothing more than a keyword search (IE the
| word you were searching appeared on this page). Google got
| much better at the general search and this was history.
|
| I think the death of the directory search dramatically
| dropped the number of self-curated, informative sites from a
| domain expert that were common in the early internet. Now
| instead of making a website, many people are on content silos
| like Reddit/FB
| ocdtrekkie wrote:
| I do still think we could adapt this model on top of
| content silos... assuming we can index them! Consider that
| one could also, rather than just ingesting Reddit content,
| we ingest new posts from particular users who write quality
| posts on Reddit.
|
| Assuming a method also existed for an author to
| authenticate themselves with the search engine, one could
| also enable an author to help identify their content across
| multiple platforms, as well as suggest other quality
| authors to consider.
| prepend wrote:
| I think the issue is that these crappy results are kind of good
| for revenue. It's not just organic results impacted but all the
| affiliate ads.
|
| Google is smart so I assume they crunched the numbers and
| figured out they make more money from people filtering through
| crappy results that include viewing and clicking ads than by
| surfacing good content.
|
| I think Google is optimizing for ad revenue, not for good
| search.
| numbsafari wrote:
| The problem isn't that Google doesn't employ these people or
| invest in their activities.
|
| It's that Google has destroyed their own search results in
| order to continue to expand their revenue opportunities.
|
| If Google:
|
| - Enabled downvoting on results, like YT videos. (Has its own
| spam problems, just like YT)
|
| - Allowed you to block certain domains from your search
| results, like YT videos. (If they added some kind of
| "coordinated network detection" and down-ranked domains
| coordinating with ones you've blocked, that'd be pretty cool).
|
| - Allowed you to create your own custom search engines, like
| "Programmable Search Engine".
|
| That would be incredibly valuable. They already have most of
| the tech. They could even create a subscription service around
| custom search engines if they really wanted. Plenty of people
| would find something like that incredibly valuable.
|
| Anyhow, buried in there is your startup idea. Remember: your
| startup doesn't have to generate the same revenue or profit as
| the incumbent on day one to be successful.
| fragmede wrote:
| How do you fight brigading, the organization of groups
| elsewhere to collectively vote on something? Eg white
| supremacist groups get together and vote down everything by
| people of color, and vote up their pages about how great they
| are?
| numbsafari wrote:
| How does Google already handle this exact problem on YT?
| duskwuff wrote:
| They don't. There's a lot of pretty obvious manipulation
| that goes on in YouTube recommendations and search
| results.
| giantrobot wrote:
| Randomly select votes that are actually recorded. Then add
| in metavoting that votes on the votes with random sampling.
| At Google's scale with a sufficiently random sampling you'd
| be extremely hard pressed to successfully brigade or spam
| the voting.
|
| Google could easily use its current fingerprinting to
| constrain (to an extent) multiple votes. Even knowing only
| a portion of the population will participate in the voting
| they can use a Wilson confidence interval[0] or similar to
| properly weight votes.
|
| Random sampling works here since you're not guaranteed one
| vote per user per page and the outcome in binomial, seen
| and downvoted or seen and not downvoted.
|
| [0] https://www.mikulskibartosz.name/wilson-score-in-
| python-exam...
| kleer001 wrote:
| easy, voting blocs, you assign yourself to the results of
| people who vote similarly to you. additionally there'd be
| local and regional blocs too. I can't think of a reason
| that the naive everyone sees everything everyone else is
| doing would work in the long run. That's Twitter, and it's
| garbage.
| metabagel wrote:
| This is a great point. I would think Google could rank
| users from low quality to high quality in terms of the
| quality of the websites which they recommend or downvote.
| Tricky business and could be difficult to control, but
| basically the same thing they currently do for websites,
| but extended to humans.
| bogwog wrote:
| Those shitty SEO spam sites exist only to serve ads, and
| Google has a monopoly on internet ads. So there is no real
| incentive for them to solve the problem.
| [deleted]
| kevinmchugh wrote:
| Google had the same incentives in 2011, 2012 when they
| built and released Panda and Penguin.
| freyr wrote:
| Google has 28.9% share, Facebook 25.2%, and Amazon has 10%
| and growing fast. Not a monopoly, and the incentive is
| there: if search results are consistently bad, people will
| stop searching as much, and revenue and market share
| decline.
| mminer237 wrote:
| Real review sites serve ads too. I don't think Google has
| any incentive to make things worse, and they still want
| people to google reviews instead of just asking friends or
| people on reddit for reviews.
| Rastonbury wrote:
| The biggest perverse incentive for Google is that making
| better search results can mean less clicks to ads (clicking
| an ad because results are crap, going thru more pages of
| results means more ads). Clicks are revenue which is much
| easier to optimise for.
|
| Internal Search owners can push for better algos, but what if
| the algo causes revenue to fall? Are there strong forces
| strong enough within the organisation to ensure that search
| quality prevails?
|
| If this is the case, the problem is existential. It can only
| be arrested at the very top
|
| https://en.wikipedia.org/wiki/Perverse_incentive
| Closi wrote:
| > The biggest perverse incentive for Google is that making
| better search results can mean less clicks to ads
|
| This is also something that Google can control if
| competitors come along.
|
| i.e. If a reasonable competitor comes along that is willing
| to sacrifice ad revenue for better search result quality
| than Google, google can just adjust their search quality
| upwards to knock them out (and then adjust it back once the
| competitive threat is gone).
|
| Perverse incentives from Google are all over the place -
| Searching for the delivery business "Just Eat" in the UK
| for instance returns an ad for their competitor Deliveroo
| above the legitimate organic search result for me - and I
| can also see that JustEat are trying to pay for their own
| brand name just to compete - and IMO this sort of behaviour
| is anti-competitive, borderline extortion considering
| Google is the de-facto way of searching for a business, and
| shocking from a search-quality perspective (where the wrong
| result is intentionally shown at the top because they paid
| more money).
| yabones wrote:
| If I had to pay $10/month for good search results, I
| absolutely would. I think most people would. Get rid of the
| ads and spam, and you have a service worth a premium. The
| solution is to make it user-centric instead of
| advertiser(spammer)-centric.
| visarga wrote:
| Some kind of browser or extension that re-ranks and
| filters search results on the web.
| nitrogen wrote:
| _The biggest perverse incentive for Google is that making
| better search results can mean less clicks to ads_
|
| This gets close to the real root of the issue -- attention
| is monetizable independently of the quality of content.
| There would be much less incentive to create SEO spam if
| search engines negatively weighted pages with ads and
| affiliate links, and if manufacturers were barred (e.g. by
| the FTC) from owning or imitating reviewers.
| ErikVandeWater wrote:
| > Enabled downvoting on results, like YT videos. (Has its own
| spam problems, just like YT)
|
| Are there any search engines that do this? It's a great,
| simple idea.
| mad182 wrote:
| Not really that simple, I see a lot of potential for abuse
| - using bots and brigading to mass downvote your
| competitors or political opponents.
|
| Couple of positions up or down in google results for
| somewhat popular and valuable keywords can mean the
| difference in thousands of dollars per day of ad or
| affiliate revenue. I suspect it would get pretty wild if
| google launched something like this. There already are
| black-hat seo methods and services, but something so simple
| and direct would turn it up to 11.
| numbsafari wrote:
| > I see a lot of potential for abuse
|
| They already have the tech to fight this on YT. They, in
| theory, are supposed to be doing the same thing to detect
| inauthentic behavior on ad placement and click abuse.
| ErikVandeWater wrote:
| Downvotes could apply just to future recommendations for
| search results you see, and not apply to advertisements.
| willhinsa wrote:
| > - Enabled downvoting on results, like YT videos. (Has its
| own spam problems, just like YT)
|
| They're going the OPPOSITE DIRECTION from this!! They
| recently removed all downvote visibility of YouTube videos
| from the user, so now downvotes only feed into their
| algorithm. So in the last line of defense of me ending up
| watching a shitty video, one of the most valuable tools has
| been removed by my betters. It's preposterous that people
| think that Google is doing a good job. They're actively
| getting worse, and ignoring everyone saying so.
| asiachick wrote:
| They're doing a great job. I'm so happy dislike visibility
| have been removed. It removes the effectiveness of pile on
| coordinated harassment which many youtubers have fallen
| victim too.
| endisneigh wrote:
| > That would be incredibly valuable. They already have most
| of the tech. They could even create a subscription service
| around custom search engines if they really wanted. Plenty of
| people would find something like that incredibly valuable.
|
| Why would they do this? Google's customers are the
| advertisers, not the end-users. And no one is going to pay
| for a search engine, it's been tried and has failed.
| slt2021 wrote:
| if you think about it, Google provides advertisers a
| customized search engine to find customers. So it is not
| you searching the web, it is web's advertisers searching
| leads
| lumost wrote:
| You only need ~1/10000th of Google's revenue to be a
| financially successful startup. 1/1000th and you'll have a
| great business, and at 1/100th you'll be somewhere between
| a unicorn and a decacorn.
| endisneigh wrote:
| sure, but you'd need a better search engine if people are
| going to pay for it.
|
| A company with an objectively superior search engine
| could make even more money with ads so now you're back to
| the beginning
| numbsafari wrote:
| I think you have to look at it more like Amazon Prime.
|
| Nobody is going to pay for /just/ a search engine. But they
| might pay for, say, a /better/ search engine, plus
| additional features around gmail/gcal/gdrive.
|
| Think of it more as subscribing "to google" and less as
| subscribing to "google search".
|
| Regardless, the point isn't to "fix" google. It's to
| highlight a possible path for a new market entrant.
|
| ... If an existing player wanted to make a move here, I
| would say that both Mozilla and Apple are well positioned
| to add "personalized search" to a subscription service.
| Same with Microsoft. DDG could also make moves here if they
| expanded beyond search.
| wakiza33 wrote:
| there is a smaller niche market. SEMrush is a tool used in
| the digital marketing industry that is now public and has a
| multi-billion dollar market cap. It originally started as a
| search engine. When they didnt gain traction they used the
| tech to monitor Google and interface it for customers who
| are tracking their performance in search results (and much
| more).
| yawaworht1978 wrote:
| Can I ask what you mean by public?
|
| It's not open source as far I know and there's only the
| free trial way to try it.
| roughly wrote:
| > And no one is going to pay for a search engine, it's been
| tried and has failed.
|
| Always curious about things like this. I certainly would
| pay for this; it sounds like many other people here as well
| would. I'm curious if the constraint is that there aren't
| enough people to actually pay for the investment required
| for the service, or if there aren't enough people willing
| to pay to meet the standard VC notions of success. We seem
| to have a problem with building and supplying services for
| niche (read: "not expressable as an integer percent of the
| world's population") customer bases, and I'm never sure if
| that's a business problem or a cultural problem.
| SamoyedFurFluff wrote:
| The people most able to pay for a service like this are
| the people that advertisers most want because they're the
| people with enough discretional budget to spend on things
| like better Google search results. Allowing someone to
| buy something like this also reduces your attractiveness
| to your advertising clients.
| ancarda wrote:
| > Allowed you to block certain domains from your search
| results
|
| I would love for Google to build this in. Until they do,
| there is a WebExtension that does this:
| https://addons.mozilla.org/en-US/firefox/addon/hohser/
| ("Block or Highlight Search Engine Results"). I use it to
| block stuff like W3Schools so when I search for something,
| MDN is always #1. Saves me a lot of time having to add "MDN"
| to the end of every query.
| PaulHoule wrote:
| The custom search engine is harder than you'd think.
|
| Google's search algorithm is tuned up for searching the whole
| web. It turns out the heuristics you need are very different
| depending on the size of the collection.
|
| When Gerard Salton was doing IR experiments with punched
| cards he was working with collections of as little as 70
| documents and in that case you are going to be very concerned
| about recall and not precision. Maybe there is 1 relevant
| document and if you miss it you failed.
|
| If you had 70 billion documents you might have 10,000
| relevant documents and if you lost 60% of them you still have
| 4,000 documents. The end user gets more results than they can
| sift through.
|
| Thus I always groan when I see a site is using "Google Site
| Search" because the relevance is usually worse than you'd get
| with the alternatives.
|
| Connected with that is the tuning work: Google has sufficient
| data to tune up a big model for everybody but true
| personalized search eludes them because they don't have
| enough data from you to tune up a model for you.
| mrkramer wrote:
| I agree with you that "true personalized search eludes them
| because they don't have enough data from you to tune up a
| model for you". That's what Larry Page said as well "Google
| doesn't know what you know". His ultimate goal is Answer
| Machine powered by AI but that's not happening anytime
| soon. I think internet search engines that we are using
| today are primitive compared to what we will have in the
| future.
| edrxty wrote:
| The problem with all of this is it would help us greatly, but
| it would be useless to the 99% that the internet is
| increasingly being designed for. Modern UI trends are
| becoming obsessed with removing as many options and features
| as possible so the dumbest humans bordering on smartest
| vegetables can still use the service.
| saalweachter wrote:
| And customization breaks caching.
| asiachick wrote:
| > - Enabled downvoting on results, like YT videos. (Has its
| own spam problems, just like YT)
|
| Not convinced this would help. The spammers would just hire
| people to dislike competitors
|
| > - Allowed you to block certain domains from your search
| results
|
| This I would use. Never show me results form collider,
| watchmojo, ranker,
|
| > - Allowed you to create your own custom search engines,
| like "Programmable Search Engine".
|
| I think this would lead to people writing highly polarized
| engines. The Red Pill engine for example and we'd have a new
| problem, the proliferation of popular highly biased results.
| Of course that's not to say Google's results aren't already
| biased but they certainly are trying to cover everyone.
| PaulHoule wrote:
| Some kinds of "spam" can improve search results.
|
| Things have changed in the past few years, now that Google
| has developed advanced transformer models, but for a long
| time Google's question answering facility has been: "let
| spammers make 10^8 pages where the title is the question and
| the answer is in the page".
|
| The trouble is that there's a fine line between "answer is in
| the page" and "word salad!"
| propogandist wrote:
| >Enabled downvoting on results, like YT videos.
|
| you mean the dislike counter they just disabled to force
| people to sit through more low quality content and pre-roll
| ads to claim increase in platform engagement and viewership?
|
| The only thing matters is revenue and Google had increases in
| acquisition costs in prior revenue reports. Expect to see the
| data points for the latter metrics to be highlighted on the
| earnings announcement, and a record quarter for YT coming out
| of the change.
| xnx wrote:
| A lot of people forget that one of the inputs to the Google
| ranking algorithm is input from human quality raters who work
| off of an extensive, 172 page, guide that Google publishes and
| updates for anyone to read:
| https://static.googleusercontent.com/media/guidelines.raterh...
| visarga wrote:
| Apparently the "human quality raters" never found the sites
| reported in this thread.
| whiplash451 wrote:
| I understood PG's point differently. My understanding is that
| he is suggesting an angle of attack in which carefully crafted
| manual reviews (that do dot scale) can be used to bootstrap a
| product that does scale thanks to something else (e.g.
| collaborative filtering). All of this being on a niche domain
| where you can drive a wedge into the mediocre performance of
| Google (online shopping probably being the worse possible
| choice, but there are many others).
| acdha wrote:
| Your point is good but I'm not sure I'd say very good given how
| easily the same SEO spam domains can stay at the top of search
| results for ages simply by scraping someone else's content.
| What I'd be most interested in knowing is what their success
| metrics are defined as -- for example, how much of a problem
| does Google's management consider it if someone searches, finds
| the answer they were looking for on someone's Stack Overflow
| rip-off, and stops searching? I could easily believe that a
| significant amount of what we're seeing here is that they're
| focused on some kind of user frustration metric which doesn't
| include things like damage to other businesses.
| paulgb wrote:
| Yes, I've noticed this particularly with technical results. A
| lot of sites seem to have scraped StackOverflow and GitHub
| issues, put a crappy ad-loaded interface around them, and
| somehow out-rank the original SO/GitHub content.
|
| It's like the bad-old-days of ExpertsExchange, which somehow
| was never delisted by Google for its shady SEO tactics.
| stevenally wrote:
| They outrank the original content because google is
| corrupt.
| wott wrote:
| > A lot of sites seem to have scraped StackOverflow and
| GitHub issues, put a crappy ad-loaded interface around
| them, and somehow out-rank the original SO/GitHub content.
|
| Some even made slideshows of SO screen captures and put
| that on Youtube, with a fake video or spoken intro to make
| believe an actual content will be discussed... A number of
| shameless people would go any length to grab bits of money
| anywhere and anyhow, and I've hit those links a couple of
| times.
| coldpie wrote:
| You just have to look at Google's profit motive here. Their
| motive isn't to provide quality search results. Their
| motive is to show users ads, either in the search results
| themselves or on the destination sites via their ad
| network. The SEO spam sites aren't a bug, they are a
| feature of Google's profit algorithm. Google's search
| quality will never improve so long as their motivation is
| to show you ads. Why should it? Competition may help here,
| either by an outsider like the OP suggests, or via breaking
| Google up with anti-trust enforcement, or both (my
| preference).
|
| As a user, your best personal and ethical move is to
| install an ad-blocker, to make ad-based business models
| less viable, which will help promote business models that
| don't abuse the customer.
| the_other wrote:
| The core problem, I guess, is that search engines view
| all their results as ads. That's why they got into the ad
| business in the first place.
| nickff wrote:
| > _" The core problem, I guess, is that search engines
| view all their results as ads. That's why they got into
| the ad business in the first place. "_
|
| This seems a bit overly cynical. Some search engines only
| served ads, but they're long gone. The survivors are
| those who dedicated themselves to finding links which
| were responsive to people's search intent. They seem to
| have gotten into ads because it was the best business
| model in this market.
| acdha wrote:
| > It's like the bad-old-days of ExpertsExchange, which
| somehow was never delisted by Google for its shady SEO
| tactics.
|
| This is really what made me suspect that Google was
| teetering on the edge of the MBA death spiral: these
| problems run for years when they'd be easy to block, which
| suggests to me that whatever metric gets you a bonus /
| promoted doesn't include things like that which are long-
| term threats to their core business even if it's selling a
| lot of ads short-term.
| cft wrote:
| The search results markedly worsened in the last 5 years. Why
| could they keep up with SEO spam until 5 years ago, and now
| they can't? Their revenue has been growing dramatically, so
| they could proportionally increase the allocation. It's
| probably because the focus of their HR/changing workforce is
| now elsewhere: maybe fighting "disinformation": both COVID and
| political. Those efforts were non-existent 5 years ago.
| specialp wrote:
| I think it is also no longer in their interest. If you look
| at their mobile results now, there are sometimes no search
| results for webpages, just ads, and their automatically
| extracted data. So, it is in their interest now to have the
| search for non-advertisers to be bad. Eventually people will
| consider those results junk and just use the google extracted
| data/people who paid to go up.
| behnamoh wrote:
| Most comments focus on the technical side of things, whereas
| I'm sure there are also legal restrictions involved in this. If
| Google delists a website on the grounds that it's a copycat of
| stack overflow, or because they have low quality content
| according to Google's taste, there might be lawsuits filed
| against Google, claiming that the company is discriminating.
| bell-cot wrote:
| In which countr[y|ies] does Google _not_ have the discretion
| to decide that certain sites / pages / etc. "belong
| considerably further down" in Google's search results pages?
| Seems to me that sorting the search results to #1, #2, #3,
| etc. is pretty well baked into their basic product.
| 8ytecoder wrote:
| I agree it's a hard problem. I don't agree it's "really really
| good". I regularly encounter obviously scammy websites. With
| Google's js execution capabilities I'd assume they can detect
| that. I'm talking about the VPN install pop ups and so on.
| Right now there's a whole bunch of GitHub.Io hosted sites
| that's doing that. It's not even porn. It's home decoration
| stuff.
| noduerme wrote:
| How hard is spam, really. If you're Google? Here's what I would
| do as a heuristic (uh, not evil?): We know everything about you
| and everywhere you've visited and everyone you've talked to in
| the last 60 days. We know all their phone numbers and email
| addresses. We even know the girl's phone number you met at the
| bar, who didn't give you her phone number. So if any of those
| people email you, we'll categorize that as "not spam". Also, if
| it's your boss or a coworker, "not spam". If it's a major
| company that's existed for more than ten years, not spam.
| Everyone else, spam. Done.
|
| This is hyperbolic, right? But they can solve spam in a split
| second, if they just admit they're watching you all the time.
|
| [edit] /s thx for reading to the end, folks.
| dageshi wrote:
| I said this in a similar thread yesterday, but I think this is
| an unsolvable problem because much of the content either no
| longer exists in website form or is old.
|
| To put it simply, a new generation of the people who used to
| make the reliable niche websites that not just answered your
| questions but also helped you learn a particular topic have
| moved to youtube instead.
|
| Google search is hollowing out as a result with the meat going
| and the SEO'd fluff that kinda answers the question but ONLY
| the direct question being asked with none of the wider
| expertise that more educated people in what they were searching
| for.
|
| Of course google owns youtube as well.. so perhaps they just
| see it as an inevitable transition.
| numpad0 wrote:
| Is that...essentially a Proof-of-Work system...
| neom wrote:
| Just a note on that, youtube search is finally getting
| better, yesterday I noticed it was able to find key words in
| the middle of a lecture that had nothing in the title or
| comments. I always wonder about their AI transcription
| service, it's gotten so good, if they're storing all that
| audio as text, I guess their search is going to get
| excellent?
| JohnJamesRambo wrote:
| I don't doubt it is hard but I'm forced to sign into Google now
| pretty much, just let me rate results and ban domains again
| etc. You will solve the seo problem really quick and start
| giving me results I want.
| tablespoon wrote:
| > The funny thing is that if the people who worked on spam at
| Google were free to talk about it, I'm sure it would become
| evident that they know more about spam and anti-spam efforts
| than anybody else in existence.
|
| That may be true, but I think one of the good points made on
| the OP is that it might actually be cultural constraints that
| keep them from solving the problem:
|
| https://twitter.com/paulg/status/1477761335412809729:
|
| > You might need to do a lot of manual spam fighting initially.
| That could be both the thing-that-doesn't-scale, and the thing
| that differentiates you by being alien to Google's DNA. (They
| must hate manual interventions; so inelegant).
|
| Google has some very smart and knowledgeable people, but the
| things they do have to fit into certain boxes, which means
| there are some problems they just can't fix, e.g.
|
| * Everything has to be automated at scale, which leads to
| consistent poor user experience (unappealable account closures
| initiated by inscrutable algorithms, SEO spam).
|
| * You get promoted by building new products, not maintaining
| existing ones, which leads to self-defeating churn outside of
| core areas (e.g. abandoning Google Talk and squandering their
| position in the messenger market).
|
| * etc.
| wakiza33 wrote:
| Spam, yes, but Google has also made meaningful shifts that are
| clearly directed from the top-down. It's much harder now (imo)
| to get specific results, they've overall started looping SERPs
| into broad answers.
|
| This is def a user-engagement strategy -- but it has cons as
| well.
|
| Part of the complaints in the thread were spam related, other
| were something deeper
| RealityVoid wrote:
| The problem, IMO, might be the monoculture we have around
| search. Because Google is soo big, it's enough for spammers to
| target it and they have the vast majority of the search
| visibility. If we had better, more diverse competition, that
| might manifest as a tradeoff, presumably, they would have
| competing and diverse criteria so you would probably not be the
| top result on _all_ dominant search engines. SEO spam needs
| upkeep and attention to latest algos, else it decays. Competing
| algos would yeld better results for everyone. Maybe Google is
| just ripe for a shakeup.
| jefftk wrote:
| Doesn't your model predict that Bing would have substantially
| less SEO-gamed results?
|
| (Disclosure: I work at Google, but not on search)
| RealityVoid wrote:
| Well... Yes, it should. But, no, it seems it does not. I
| thought about this when typing it but did it anyway, maybe
| because I thought there is still something worthwhile
| there.
|
| I still think the model could work if the algorithm is
| sufficiently different than Google's. Ideally, people would
| go "I did not find anything I cared about on Google, I
| know, I'll use Bing!" - but nobody does this, because the
| results are consistently worse.
|
| Don't get me wrong, I like G as a company, I think they do
| worthwhile things! But they have left things slip and need
| competition into this field, I mean real competition, then
| maybe they would actually address issues.
|
| Maybe the issue is also on the incentive level as well. I
| mean more searches means more eyeballs and more money for
| Google. If someone searches one thing and they are done
| that is less interaction! I hope they don't work like this,
| but it's possible.
|
| And another possible problem is the opposite. Maybe Google
| is optimizing search for what it thinks people want, but it
| uses the wrong metric. Or it gives people what they want
| but not what they need.
| zozbot234 wrote:
| Legitimate sites could help a lot by adding machine-readable
| descriptions of their content, per the schema.org spec. The
| richness of these descriptions means that this is effectively a
| "hard", non-forgeable claim to being a worthwhile, non-spam
| source (quite unlike the old META tags that got abused to death
| pre-Google). Of course spam sites could simply _lie_ in their
| schema.org tags, but the lies are easy to spot (with combined
| machine- and human-review) and then they just get banned. It
| makes it a _lot_ harder (and hopefully infeasible) to SEO-spam
| by just copying random content.
| frenchyatwork wrote:
| A lot of what counts as spam these days isn't something like
| "I search for bicycle reviews and get penis enlargement
| pills", it's more like "I search for bicycle reviews and get
| some blog who searched Amazon for the 5 most popular bikes
| and posted links to them with a little blurb and called it a
| 'Review'".
|
| These sort of things are easy to spot, but only if you
| actually have a basic amount of familiarly with the topic.
| It's hard to spot with "AI" or super-cheap labor.
| rightbyte wrote:
| Really? I don't think the bureaucratic bloat at Google cares
| and the original authors of the search engine in its current
| incarnation are probably long gone. It is maintenance mode and
| I don't think they dare touch too much.
|
| It takes time and effort to build up a spam site's ranking but
| it is trivial to blacklist those who get to the top.
| Sebb767 wrote:
| I agree. You can say a lot of bad things about Google, but they
| definitely have some of the smartest and highest paid engineers
| working on their search. Plus there are already a lot of people
| trying to compete with Google and so far, no one seems to
| provide consistently better results.
|
| The only advantage a startup might have is that they could do
| completely new concepts, such as specifying what area you
| search in, allow you to modify their classification of your
| query and/or moderating sites you include - which is probably
| necessary anyway, since you'll hardly have the budget to fully
| index the web. I'm not saying it's impossible, but it's not
| going to be easy at all.
|
| And after all of that, you still need a way to make some money.
| goatherders wrote:
| This x100000.
|
| There is no scenario - none - where thousands of engineers at
| Google working on search wake up in the morning and say "we
| sure have made it good enough wr2 SPAM. I think I'll have
| another Danish."
| compiler-guy wrote:
| When the cafes were open, you can bet they said, "I'll have
| another Danish, and then get back to work on this problem
| that never seems to go away."
| techdragon wrote:
| I sure wish I had problems that were totally unsolvable,
| they are so easy to measure progress on. /sarcasm
|
| I think it's more likely that because they are just
| building hundreds of tiny tweak experiments and it's
| someone else who desides what to build and if it even
| worked. Search quality is such a meta-problem that it goes
| beyond any real hope of simply working on it in anything
| beyond piecemeal trial and error fashion on their dataset.
| siva7 wrote:
| What is a danish?
| UncleMeat wrote:
| Especially since "made a change that improved search result
| relevance by X%" is an _extremely_ compelling story for
| promotions. If indeed there is a launch-driven culture for
| promos at Google then there 'd be extra incentive for new
| mechanisms to reduce low quality search results.
| ericbarrett wrote:
| I agree with this and the grandparent comment wholeheartedly.
| That said, there's a kind of institutional blindness that can
| build up in companies--especially ones that dominate their
| sector. It may have roots in intransigent upper management,
| ossified and inflexible process, wide-scale burnout, a
| culture of passing the buck, or any number of other
| pathologies.
|
| I don't claim that Google has any of these and certainly have
| no insight into their search group. But I've personally been
| at powerful companies with best-of-the-best talent that were
| blind to the decay in their own living room, so I would
| caution against immediate dismissal of PG's take.
| judge2020 wrote:
| Also, i'd be very surprised if they didn't have tens of
| thousands of workers aiding in spam review already.
|
| The hard part in all of this isn't finding and stopping spam -
| it's defining what spam is. Are all the pie recipes where
| there's a 2000 word essay about their grandma at the top
| 'spam'? They still have the recipe, and Google Home devices
| pick up the recipe instructions just fine so people end up not
| reading it, but many people would still consider that spam
| since it adds such an obstacle to getting the information you
| want. Same for cnet articles like "Best smart home devices to
| buy in 2022" - it's a reputable brand with a list of smart home
| devices, but it's hardly a review and exists to funnel people
| to their Amazon affiliate link.
| awillen wrote:
| AFAIK the 2000 word essays in recipes are Google's fault - it
| prioritizes pages with a lot of content, so you have to add
| that junk to the top in order to rank highly. While I'm sure
| there's more going on behind the scenes than I'm aware of, it
| does seem like the rules could be altered on a category-
| specific basis where a lot of text isn't necessarily a
| positive.
| kevinmchugh wrote:
| Recipe intro text is useful for contextualizing the recipe
| and copyright purposes. In RSS days, it was a way to get
| readers to click through, so the author got the ad views.
| Also people who write recipes like to write about food.
| sct202 wrote:
| While I think the essays are excessive, I appreciate that
| some of them document that the blogger actually made the
| recipe with progress pictures. With the more basic recipes
| websites, I wonder if anyone's actually made it before or
| if the recipe is from some scrapped database of unknown
| origin and quality.
| dredmorbius wrote:
| This reminds me of the page inflation that struck tech
| books during the late 1990s / early aughts. The Marketing
| Wisdom was that fat books sold (or took up more shelf
| space), so texts got padded with weak writing, gratuitous
| puffery, and other elements, which (much as the recipie
| essays) simply got in the way of delivering actual
| informative content.
|
| (The fact that many of these books were rushed out with
| very poor quality control also didn't help.)
| UncleMeat wrote:
| This one is hard because it does actually seem to be the
| case that the cruft around the recipe is valuable if the
| content is right. Most of the recipe blog stuff is garbage,
| but if you look at youtube it is clear that creators who
| add extra flair around the recipe are a powerful force.
| behnamoh wrote:
| Years ago I wanted to pursue micro blogging, but this
| "feature" of Google search stopped me from doing it.
|
| What's the point of writing succinct, to-the-point mini
| articles about problems and solutions if nobody finds them
| on Google?
| skilled wrote:
| This is largely because micro-blogging means less
| content, and less content means you could write five
| 300-word blog posts instead of one 1,500-word post.
|
| I've done blogging for the last 10+ years, and many of
| those I spent as a freelancer working with
| startups/brands/editorials. Everyone is after "word
| count" and I absolutely hate it.
|
| Whenever I work on articles for my own blog, I just don't
| consider word-count at all. I think if your content is
| great and informative, then readership will be natural.
| vntok wrote:
| This is a very interesting approach. Do you have traffic
| data collection on your blog?
| skilled wrote:
| I collect post views, but not using Google Analytics or
| anything like that. I built a pretty substantial
| developer blog (tips, resources, etc,.) back in 2014. I
| think it peaked at around 350,000 monthly visitors after
| 12 months.
|
| Later on, I sold it because I needed the money. Not so
| much that I didn't want to keep working on it.
| Unfortunately, the new owners didn't have any idea how to
| maintain a "healthy" content blog, and it has plummeted
| down to around 30,000 monthly visitors. All the content
| they're publishing now is some thin headline-clickbait
| bullshit.
|
| I even gave them free advice on how to fix it, but I
| think that for a lot of people, they just don't care and
| will mindlessly pump out as many pieces of content as
| possible. And such blogs can be identified from a mile
| away.
|
| And therein lies the problem with Google SEO at the
| moment. Even myself, someone who has done SEO work for
| more than a decade, I can see that results are getting
| worse. In some niches, the same crappy articles that
| dominated 6-7 years ago are still dominant today.
|
| I guess we're stuck in time, or so Google thinks.
| behnamoh wrote:
| Could it also be due to reduction of public interest in
| blogs over the past few years? Most stuff are now
| published in the form of vlogs instead of blogs. I do
| miss the good old blogs era, tho, and I wish there were
| still high quality blogs around.
| throwawayboise wrote:
| It's two-fold. If Google priortizes pages with a lot of
| content that's one thing, but longer content also means
| more space for ads, or more scroll events to trigger ads,
| etc.
|
| Incidentally, prioritizing long content seems odd to me, in
| my experience the best pages are short and get right to the
| point, at least in the context of something like a recipe
| or other "how to" resources.
| wakiza33 wrote:
| prioritizes is correct, but in some ways it's not the best
| descriptor.
|
| Google's algos, while advanced, still rely a ton on text to
| actually tell what the page is about. They need it.
|
| If they just relied on other factors (title, links,
| website, etc.) they would end up with worse results for
| users. Im sure they've tested it.
|
| Google's core algo in a lot of ways is much simpler than
| people think (in other ways of course it's very complex).
| giaour wrote:
| Compounding the problem, the 2000 word essay is sometimes
| really useful if it's describing a technique used in the
| recipe (cf Stella Parks' recipe for homemade bagels on
| Serious Eats: https://www.seriouseats.com/homemade-bagels-
| recipe). But somehow only spammy blogs with plagiarized
| recipes, AI-generated "essays," and affiliate links for every
| ingredient and tool used make it into the first page of
| results on Google (or DDG, for that matter).
|
| At some point, Google must have moved away from using site-
| level reputation in search rankings, as I almost never see
| recipes from reputable sources like King Arthur Baking,
| Serious Eats, or Food52 in the first page of results.
| IshKebab wrote:
| Yeah, the newest nuisance seems to be sites that clone Github
| Issues and StackOverflow with a crapper interface. Somehow
| they rank higher than the original sources. I'd say it's spam
| but it's definitely not traditional spam.
| joshuaissac wrote:
| And the strange Wikipedia mirrors that are shown in Google
| Verbatim searches instead of the original. If I disable
| Verbatim, they disappear and I get regular Wikipedia
| instead.
| notreallyserio wrote:
| I'm not going to say solving spam programmatically is easy,
| but the gitmemory garbage site (for one example) has been
| around long enough that there's no excuse for not
| downranking or removing it. How hard could it possibly be
| for humans to spot these few sites and nuke em? I'm sure
| Google engineers see them all the time.
| behnamoh wrote:
| Somehow you got down votted by their creators here :)
| joshuaissac wrote:
| > The hard part in all of this isn't finding and stopping
| spam - it's defining what spam is.
|
| This is one area where Google could use personalised results
| to provide a better experience for the user. Let me decide
| what spam is for me. Let me mark results as good or bad, so
| that the algorithm knows what kind of pages should be
| prioritised or filtered out the next time. Google SearchWiki
| was a step towards this but they killed it off.
| nathanyz wrote:
| Is conservative leaning info spam or not spam? What about
| liberal leaning info?
|
| We have seen what this leads to inside the social networks
| as well as YouTube, and at a macro scale I think we might
| want to have a shared concept of what constitutes a good
| search result for a given query.
|
| At micro scale, it can seem more optimized to get exactly
| the type of result you want, but if we take an absurd
| example like an Apple Pie recipe shouldn't we all have
| shared understanding of what types of ingredients would
| make for an Apple Pie?
|
| The shared understanding, I believe, is core to
| communication. If all of us have our own specific ideas of
| Apple Pie, then who is actually right on what an Apple Pie
| really is? What happens when your search results insist
| that an Apple Pie doesn't actually have apples in it, but
| instead pears?
| ehnto wrote:
| > and Google Home devices pick up the recipe instructions
| just fine so people end up not reading it
|
| I think this isn't entirely related, but that's perhaps the
| beginning of a bias you might end up having that everyone
| experiences technology in the same way as it marches on. I've
| yet to encounter a Google Home in the wild, I imagine far
| more people are consuming recipes on phones, tablets and PCs.
| thewarrior wrote:
| Let's have niches where the content is hand curated by human
| beings instead of pure statistics by machines.
|
| Hmm why stop there let's actually make the users do the
| curating and even the content creation by rewarding them with
| social validation. Let's have hard working moderators who
| work on the community full time.
|
| Then we could just build a search engine over it. We could
| call it Reddit. Or HackerNews.
|
| Maybe the users aren't all as good as professionals at
| curating the information. Let's hire professionally trained
| curators pay them well and we could call them newspapers.
| Then we can come in disrupt them and replace them with an
| algorithmic marketplace that eventually becomes infested with
| click bait.
| hardtke wrote:
| The curated search results business model doesn't work. Google
| gives "aggregators" and other search engines the death sentence
| for organic search traffic from economically meaningful
| queries, so you'd get no free traffic. This is one of the major
| antitrust complaints against Google in the EU. Since you get no
| organic search traffic, you need to build a brand using
| advertising, and once you start down that road you need to
| monetize the first click which compromises the quality of your
| site.
| magicalist wrote:
| > _This is one of the major antitrust complaints against
| Google in the EU._
|
| The complaints I've read are from exactly the kind of
| generated content farms people are complaining about in this
| thread.
| Alex3917 wrote:
| > But they aren't free to talk about it, because if they did it
| would just give more assistance to the spammers, and make the
| problem worse.
|
| The reality is more that some Google engineer will come up with
| an algorithm change that makes the result 40% better, but it
| will come at the expense of making that search 3ms slower so
| the change won't get merged. Or it will make the results worse
| for some niche set of queries that the business team really
| cares about, so again it won't get merged.
|
| There are lots of consumers who would gladly pay $1 a month or
| whatever in order to use a couple extra milliseconds of compute
| power per per search in exchange for drastically better
| results, so there is lots of room for a startup to compete.
| zozbot234 wrote:
| > There are lots of consumers who would gladly pay $1 a month
| or whatever in order to use a couple extra milliseconds of
| compute power per per search in exchange for drastically
| better results
|
| Google has a paid-for Search API, so they _could_ do that if
| they chose to pursue it. And then they could let Google One
| users opt-in to the same thing via ordinary Search. I 'm not
| sure whether Bing has anything equivalent.
| stoicjumbotron wrote:
| Highly OT, but if a technical person (not at a managerial
| level) involved in tackling spam at Google were to leave the
| team, are they allowed to work on the similar problem space at
| a different company?
| baby-yoda wrote:
| search ads responsible for the rise of google search[1], content
| ads (seo spam) responsible for google search's fall?
|
| my guess is the rate of spam content production far outpaces the
| rate of original content creation. so the power law concentrates
| even further in the tiny percentage of OC and a moat forms around
| them (highest ad $, highest authority/authenticity).
|
| where do we end up 5 years from now? further consolidation and
| the continued return to aol style portals (telco/media giants and
| fast-lane to own content?) pay-to-access silos dominating the
| internet?
|
| [1] oversimplifying a bit of course, there was a novel ranking
| method that was more than accurate enough, and it scaled, which
| allowed for the search ad business to go gangbusters.
| gomox wrote:
| I believe that the only moat protecting $100B of AdWords revenue
| is the quality of the Google Search results. There is no
| meaningful switching cost to using a new search engine, and the
| spend inertia in ad spend is not very significant (i.e. any
| online marketing manager will happily spend 5% of their budget in
| a different search engine adwords-like program if they get better
| ROI, there is no incentive the be a "Google Ads only shop").
|
| On the other hand, Google needs to maintain the ballistic
| trajectory of its revenue growth. So how can they fix search
| quality when they've minmax'd themselves into this situation in
| the first place? If they were to make the ads background yellow
| again, that would have negative short term effects that I doubt
| any career exec can stomach.
| dredmorbius wrote:
| No.
|
| The other moats are lock-in to advertising networks, website
| metrics, and effective control over Web standards through the
| Chrome browser.
|
| An alternative search platform might provide better search. It
| would be fighting Google on at least three other fronts. It
| might have some success, but it would be challenging. (As
| history largely demonstrates.)
|
| Even a rival tech monopolist, Microsoft, _barely_ holds even
| with its own search offering (I use that indirectly via DDG),
| and scrapped its own web-browser development
| gomox wrote:
| I mean, Google is big and it has that advantage like any big
| enterprise, but the search engine market is very permeable
| compared to what people traditionally refer to as a moat
| (say, trying to compete with YouTube as a video hosting
| platform or with Salesforce as a CRM).
|
| If you have a good search engine, people will flock to it,
| and search ads will be valuable. That's it. That's how Google
| became Google.
|
| The fact that Microsoft couldn't do it honestly doesn't mean
| much. Microsoft also couldn't do a phone OS, a portable music
| player, and many other things. They have a complex web of
| conflicting interests that $SEARCH_ENGINE_STARTUP does not.
| dredmorbius wrote:
| Just for the record, I'm in at least mild agreement that
| search is looking increasingly vulnerable. Google are
| falling down here.
|
| It's just that "search" is really a web of interrelated
| services, capabilities, and revenue streams, and they tend
| to reinforce each other strongly. I'd like to see the
| monopoly disrupted.[1] But I don't think it's _just_ a
| matter of "build a better mousetrap^Wsearch engine."
| Attack one corner, and Google will snipe at you from the
| others.
|
| And with the AdWords cash cow, they've got an immense
| revenue stream.
|
| ________________________________
|
| Notes:
|
| 1. Well, mostly. Google's acquired so goddamned much
| personal data that the premise is frankly kind of
| terrifying as well --- a weakened Google with neither the
| revenues nor talent to defend that pile.... And I'd really
| like to see the toppling occur _without_ simply raising a
| new monopoly in its place.
| pythux wrote:
| " There is no meaningful switching cost to using a new search
| engine."
|
| Unfortunately, defaults matter and Google is spending billions
| of dollars yearly to make sure they are the default search
| engine wherever they can. Most people don't switch from
| default.
| gomox wrote:
| The way I see it that's only a problem if you want to be
| bigger than Google. But you can get to $1B in revenue with
| just the deliberate adopters (i.e. under 1% of the market).
| yuliyp wrote:
| Was this linked for the irony of everyone spamming replies
| advertising their startups which don't solve the problem but
| kinda-sorta do, resulting in something hard to read and
| understand?
| dang wrote:
| This was in response to mwseibel's thread, which had a big
| discussion yesterday:
|
| _Google no longer producing high quality search results in
| significant categories_ -
| https://news.ycombinator.com/item?id=29772136 - Jan 2022 (1167
| comments, spread over multiple pages - note the "X more comments"
| links at the bottom)
| fuckcensorship wrote:
| Go read any default subreddit on Reddit to see what this idea
| would look like long-term, especially the "amateur police" part.
| baby wrote:
| Searching code is also impossible on Google. If there's a
| competing search engine for that I'll use it at least for this
| use case.
| darinf wrote:
| Give Neeva a try. We have improved ranking and some nice
| features around tech queries.
| baby wrote:
| hot take: why would I need to enter my email to do a search
| online? You already lost me :o
| hammock wrote:
| _> This may not just be a problem with Google but possibly also
| the recipe for beating Google. A startup usually has to start
| with a niche market. Why not try writing a search engine
| specifically for some category dominated by SEO spam?
|
| >You might need to do a lot of manual spam fighting initially.
| That could be both the thing-that-doesn't-scale, and the thing
| that differentiates you by being alien to Google's DNA. (They
| must hate manual interventions; so inelegant)._
|
| Is he describing...Yahoo circa 1994? A manually curated directory
| service.
| [deleted]
| tonyedgecombe wrote:
| I'm starting to think Yahoo circa 1994 might be better than
| Google today.
| behnamoh wrote:
| I wouldn't just complain about Google. Google search results
| mostly reflect a deeper problem with the web today. I do miss
| the simplicity of the 2000s.
| edoceo wrote:
| dmoz!
|
| https://en.m.wikipedia.org/wiki/DMOZ
| matt_heimer wrote:
| I always used DMOZ more than Yahoo! Directory. It looks like
| [dmoz](https://en.wikipedia.org/wiki/DMOZ) became
| https://curlie.org/ which is still active.
| [deleted]
| loceng wrote:
| And makes me think that StumbleUpon had a similar curation
| ability, in that the value qualifier is how often [hopefully]
| real people interact with content - tracked by who's using SU
| and agreed to allow tracking; can't remember if sharing that
| was optional or not?
|
| The gamification of the system then would have to come through
| onboarding fake users, pretending/mimicking real user behaviour
| to send that signal into the system; not sure if SU ever ran
| into that problem or was actively paying attention to trying to
| identify and removing fake or suspicious signals from their
| output?
|
| I feel a much better system is easily within reach, it's simply
| getting the right structure to it, the right foundation, and
| then it will quickly take off due to the quality difference.
| I've already figured out a design pattern that Twitter and
| Facebook has indoctrinated us with, making us think it is
| normal - and keeping us blind to an actual normal way or
| organizing or communicating, but that isn't conducive to
| control or ad revenues - and so extending my future plans to
| include a better search-directory system would fit snugly into
| my efforts.
| visarga wrote:
| SU was a great way to surface random interesting stuff. I bet
| most blog entries today could be picked from Twitter, even if
| they are unlinked.
| gorbachev wrote:
| Some of the examples used in the Twitter thread Paul was
| referring to would be better served by a manually curated
| directory service with a possible addition of a search engine
| only surfacing content from the sites in the directory.
|
| For health information and recipes in particular there are only
| a handful of really high quality sites that have quality
| content for 95% of the information most people need. I bet if
| you wanted to increase the coverage to 99%, that list would
| expand to less than a thousand sites. At those numbers manually
| curating the information would be easily achievable.
|
| How to get people to use your top notch Google replacement
| instead of Google, however. That's the hard problem.
| basch wrote:
| Isn't that what google Programmable Search is?
|
| https://cse.google.com/cse?cx=dc408db269da4e769 (try
| searching for something you want a review of)
|
| Make a search, whitelist the domains. Every time you run into
| a good review site, add it to the searchable list.
| DangitBobby wrote:
| That's all fine and dandy, but the goal isn't to just make
| some good sites a bit easier to find, it's to keep the top
| of your search results from being interspersed or
| superseded by SEO spam. Unless I misunderstood your
| suggestion.
| llaolleh wrote:
| It's really hard to get people to use something other than
| Google. If you were to launch such a product, it would have
| to be so much better that people recommend it organically to
| other people.
| visarga wrote:
| Should be able to run on top Google in a browser extension
| to insert itself only when the topic allows.
| Rastonbury wrote:
| That was what Google was to Yahoo/Altavista back in the
| day, a 10x improvement. Reading this thread, people feel
| pain enough do all sorts of hacky stuff - appending 'reddit
| 'or 'forum' to queries, blacklisting spam domains,
| switching search engines depending on topic. If G keeps
| declining and a new product does things better, the penny
| will drop and people will swap.
|
| Siebel and PG see blood in the water no doubt, they see G's
| market share and want to fund companies to take some of
| this.
| noduerme wrote:
| He's right as often as he's wrong
| sillysaurusx wrote:
| What has he been wrong about?
| [deleted]
| noduerme wrote:
| Okay. I was gonna respond with something snarky about what
| a crappy mod he was, or how he was a Saint and you don't
| deserve to worship at his functional feet. But I'll tell
| you what he was wrong about: He was, as a leader and a
| human and a mod, _petty_. He pied pipered himself into a
| sweet spot and no one would deny he 's a good coder, but
| there the ego took off and forever left behind a skidmark.
| The cool exterior, the sense of self-importance, the
| punching down, above all the love of spreading one's
| revelatory wisdom to the poor little guy; you can love that
| sort of thing too much, and he did. Perhaps you weren't
| here or didn't interact with him directly. In my view he
| became dismissive and derogatory toward people who
| worshiped him (like you) once he acquired a small degree of
| fame.
| Jorengarenar wrote:
| What in the world are you babbling about? Who here does
| worship who?
| wenbin wrote:
| FYI - Google hires 10,000+ search result raters [1], who are
| contractors, to evaluate search result quality.
|
| In an ideal world, you build a thing, and it's done. It runs
| automatically and prints out money.
|
| In reality, you still need human labors to do manual tasks, even
| in tech industry.
|
| [1] https://www.searchenginejournal.com/google-eat/quality-
| rater...
| Ros2 wrote:
| Thanks for posting this. An acquaintance of mine did this job 6
| years ago and I wasn't sure if it still existed.
|
| Crowd-sourced humans are making Google appear more intelligent
| than they actually are. I always envisioned that spam efforts
| would just immediately set off an alarm that would be handled
| by a bot to blacklist you without a human even knowing your
| site existed, but there still seem to be at least a few ways to
| game Google's search ratings.
| heisenbit wrote:
| Affiliate links are environmental toxic waste and it would be
| only logical to tax such affiliate payments to fund cleanup and
| mitigation efforts.
| cblconfederate wrote:
| Who would write good or even decent content for free?
| waynesonfire wrote:
| what a great idea.
| donio wrote:
| What I am looking for is control over the results. Personalized
| blacklists and lists of sites to be (de)prioritized and also the
| ability to subscribe to community curated versions of the same.
|
| And to be clear I want to be able to control these myself, not
| algorithm trying to guess my preferences. No guessing, just do
| what I tell you to.
|
| Multiple search profiles with different priorities would be nice
| too.
|
| I would like the search algorithm to be transparent, I should be
| able to tell why I got a certain result and how I can avoid such
| results in the future.
| curiousllama wrote:
| Good idea. You could start with fitness. Lots of high-quality
| information out there that's entirely, 100% inaccessible via
| google.
|
| Over COVID, I did the whole fitness thing from a few different
| angles (overhauled diet, trained for a marathon, now lifting
| weights a lot). I found I could only find good info by going
| directly to a trusted source - literally, typing http://www. like
| I'm in the 90s or something. This is the exact issue a search
| engine should solve, but Google doesn't.
| thewarrior wrote:
| Could you share your trusted sources ? Human search engine :P
| curiousllama wrote:
| JPG on Tik Tok, and Geoffrey verity schofield on
| instagram/quora/buy his book.
| ok123456 wrote:
| https://www.boards2go.com/boards/board.cgi?user=tfannon
| paulcole wrote:
| Is your trusted source the same as my trusted source which is
| the same as my neighbor's trusted source which is the same as
| my Australian cousin's trusted source?
|
| If not, at least one of us is going to hate this new search
| engine.
| nefitty wrote:
| This is a problem I'm working on.
|
| What sorts of unique things did you do that Google failed at?
| Maybe you read through discussion sites or got tips from books
| or something like that
| curiousllama wrote:
| Social media and trial and error. I knew a bit, and asked
| some friends for advice. I used that to find people who said
| stuff I thought made sense on Quora, Tik Tok, and Instagram
| (ie they agreed with what I knew to be true and false, so I
| could assume the other stuff they said was likely to be true
| as well). I tried what they said, found what worked, and went
| all in when I saw results.
|
| Importantly, this was bottom up: it was largely
| recommendation engines suggesting people I then filtered
| through for what I wanted (running, bodybuilding) vs didn't
| (traditional weight loss). I couldn't specify what I wanted,
| or it would be garbage SEO spam.
| nefitty wrote:
| That's a new angle to me. I rely on pseudonymous content
| like Reddit and HN. It does make sense to look for people
| or groups focused on a larger topic like fitness.
|
| That helps a lot! Thank you.
| djoldman wrote:
| Google knows how to surface relevant results and they choose not
| to because they aren't optimizing for relevant results, they're
| optimizing for revenue or profit within some constraints (don't
| lose too many users, privacy, avoid actually terrible or
| completely irrelevant results).
|
| All the various suggestions in this thread plus far more complex
| and insightful solutions are known to Google. Most of it boils
| down to using automated user feedback to improve or measure
| search result relevancy.
|
| Google doesn't need to solicit user upvotes / downvotes to
| improve rankings. They can monitor user clicks on results in
| addition to analytics on the sites the users visit to determine
| which sites are relevant to which searches.
|
| Google doesn't optimize for search relevancy.
| [deleted]
| lubesGordi wrote:
| Likewise Youtube doesn't optimize for relevant results, only
| engagement to maximize ad exposure. The side effect of this is
| polarizing content gets returned more than relevant content
| (polarizing content being more engaging than relevant content
| apparently).
| Jenk wrote:
| Single-page thread:
| https://threadreaderapp.com/thread/1477760548787920901.html
| notananthem wrote:
| We still need a search engine that actually blacklists everything
| serving ads. Google beat altavista, now we need to beat google.
|
| I mean no mincing about- recipe sites that are ads are blocked.
| Results with pixel tracker etc are blocked. Hell, results that
| are paywalled are blocked because they're useless.
| rhtgrg wrote:
| I think pg is missing something important here. The reason Google
| was able to beat Yahoo, Altavista, Ask, etc. was not just because
| they had a better formula -- it was also because they started in
| the era where 'search' was still seen as secondary to 'portals'
| by the big guys. Had these companies known how important search
| is to the internet back then, they would've copied Google's
| secret sauce and crushed it long before it could suck up their
| traffic.
|
| This isn't going to happen again. Google isn't going to sit
| around twiddling its thumbs while a competitor develops a better
| algorithm.
|
| You have to attack the problem from a different angle entirely
| (make something that looks nothing like a search engine), I don't
| think a niche market is going to be enough.
|
| Perhaps you just want to make something that scares Google into
| acquiring you, rather than actually bettering the situation. If
| that's the case, I implore you to think of doing better ways to
| spend your life.
| titzer wrote:
| > I don't think a niche market is going to be enough.
|
| To displace a giant gobbling 180 billion dollars a year? Yeah,
| no kidding. But nobody is asking for that. They are just asking
| for decent search results.
| [deleted]
| jart wrote:
| Altavista was the only search engine in the same league as
| Google. So Google hired the guy who built it. Altavista
| infrastructure couldn't scale beyond a single server, because
| that's how DEC was, so it was a smart move for him.
| legohead wrote:
| It would work until you got big enough, then you'd end up
| following the same path as Google, as that's where the money is.
| hooande wrote:
| What are some search categories that are so dominated by spam
| that they are unusable?
|
| I'll start: "how to rent a car" [0]
|
| [0] worth noting that I personally get somewhat reasonable
| results for this, with a 3rd result from nerdwallet.com and a 4th
| from wikihow.com, both of which seem to answer the question in an
| unbiased way
| hammock wrote:
| Nerdwallet and wikiHow are both SEO spam content farms. They
| just happen to have above-average quality content.
|
| They don't exist without a search engine.
| hooande wrote:
| Not sure how you're differentiating "seo spam content farm"
| from "website"
| hammock wrote:
| That is part of the problem
| imranhou wrote:
| I believe google tracks click throughs from search results pages,
| which should provide in theory plenty of insight into what links
| aren't really working for specific keywords and what are... thus
| helping improve or reduce rankings of SEO laden sites.
|
| Wonder if someone can throw light on to why this isn't effective.
| [deleted]
| pwdisswordfish9 wrote:
| > Lots of people want to be amateur police. And boy would Google
| find it hard to follow you down that road.
|
| Kinda like they tried with YouTube Heroes?
|
| But then, who's to say you won't get the same kind of backlash?
| rickdeveloper wrote:
| I think a lot of this is due to Google both owning search and the
| ads on the websites (AdSense). There's an incentive for them to
| prioritize click farms (and other sites filled with their ads). I
| think in general there may be a correlation between the number of
| ads on a site and its usefulness to me, which is inverse to its
| usefulness to google.
|
| I'm curious what would happen if those products were split up
| into 2 separate companies.
| thomasmarcelis wrote:
| I also can't help but wonder this. I'm certain people at google
| search want to provide the best quality search results and do
| this with integrity. But at some point in the business
| hierarchy you are at a level where people set objectives for
| both these departments ( search & ads ) and are trying to
| optimise for things like total revenue/profit.
| tonyedgecombe wrote:
| Yes. In fact if you wanted to cut out the SEO spam then
| delisting anything with Adsense would probably be a good start
| for a competitor.
| canyonero wrote:
| I've been troubled by the just plain awful results being
| delivered by Google search over the last few years. I think these
| are just plain hard problems to solve and that Google is not
| incentivized to solve. Google wants you to click on ads at the
| end of the day, full-stop.
|
| Often times I find myself searching for "best
| ($product|$thing_to_do)" which I think many other people do as
| well because we all want the best. Other times I'm looking for a
| music or a book recommendation with some depth. This of course
| nearly always leads to SEOd trash. There is no relevance nor is
| there trust. So, I like others to use keywords like "reddit" or
| "forum" to get to real humans who I trust and intentions are not
| to sell via affiliate links.
|
| These issues often lead to the need in finding trust in real
| human-centered recommendations that stem from real human
| interests and needs. I've never found an algorithmic solution to
| this problem. This is why I think college radio stations or those
| south-of-the-dial end up being so, so much better. And why beer
| recommendations from your local brew-shop owner are better than
| anything you can find on the net.
|
| I think building search vertical that are hand-curated would be
| very interesting to see. But I also think we need to build more
| communities which allow recommendations to be shared without an
| incentive to get hits via search and aren't paid for by large
| corporations and where community impact/quality _is_
| incentivized. I do worry that those days may be gone and there
| are just not may be enough folks (not in tech) willing to spend
| so much time online and contributing to niche communities. A lot
| of folks spend much of their time in walled-gardens like
| Facebook, Instagram or Twitter, so it'll be challenging to be
| sure.
| allochthon wrote:
| > I think building search vertical that are hand-curated would
| be very interesting to see.
|
| That was my inspiration behind a side project I made a few
| years ago -- a decentralized, hand curated "search engine" [0].
| Never got beyond the side project stage. But I see promise in
| this in the future. Eventually we'll figure out that moderated
| crowd-sourced curation is better than the best machine
| learning. The filtering capabilities have to be pretty
| sophisticated to make it work, though.
|
| [0] https://github.com/emwalker/digraph
| cblconfederate wrote:
| > So, I like others to use keywords like "reddit" or "forum" to
| get to real humans who I trust and intentions are not to sell
| via affiliate links.
|
| And therein lies the problem. Reddit makes very little money.
| Forums probably make negative money nowadays. Google has
| decided to demonetize the organic internet and subsidizes SEO
| crap and AMP or whatever dumb thing their signals consider
| valuable. We get what we incentivize, and right now the
| incentives in almost all of tech are pretty atrocious.
| visarga wrote:
| > Often times I find myself searching for "best
| ($product|$thing_to_do)" which I think many other people do as
| well because we all want the best.
|
| I do too. I'm wondering why didn't Google invest some effort
| into "best X" searches? I bet they could extract such
| information from the web and correlate various sources. They
| already answer all sorts of semantic knowledge questions.
| ijidak wrote:
| In some ways paid search disincentives Google from delivering
| quality organic results.
|
| The larger the gap between paid results vs organic results, the
| more users click the paid results.
|
| Not sure how to solve this problem.
| topicseed wrote:
| But paid results do not always, if ever, answer the search
| query better in any shape of form.
|
| So this would end up with displeased users and bounce backs.
| mirekrusin wrote:
| The whole thing seems "simple" to me - graph of identities with
| url vetting/liking/approve-this-message-like actions, you don't
| need anything else.
|
| Reputation, non-fakeness etc. can be derived from it for anybody
| - you just list identities you trust/follow (with weights?) and
| anything you look at can be scored.
|
| Virtual identities can also be created, ie. identity listing all
| links mentioned on HN (with positive sentiment only?), links from
| wikipedia etc. so people can follow those to create their reality
| graphs.
|
| The interesting part is that it doesn't claim universal truthness
| - depending on who you follow your results will be skewed towards
| their opinion of the world. Ie. if you follow MIT, Wikipedia and
| E. Musk you'll see different view of truthness than somebody
| following FOX News and Flat Earth Society for example.
|
| It could be interesting to focus on "dislike" marking (only?) as
| it may be much more lightweight to approach it from blacklisting
| side.
| floatingatoll wrote:
| He's just describing Webrings, except in a reactive tense
| ("filter out spam sites") rather than a proactive tense
| ("associate your site with other worthwhile sites"). Google's
| ranking algorithm only works when someone is proactively
| curating, and only SEO spammers do so these days. Reactive
| curation is not a viable way to manage information.
|
| The simplest way to compete with Google is to create a DIY
| Webrings site that disallows harvesting of data by Google. Charge
| curators to create a webring, and let curators select three
| hashtags and a description that represent their list of fifty or
| fewer sites. Use the revenue to pay a human to curate the list of
| hashtags, and let users tip a webring curator in gratitude with
| an Apple Pay button.
|
| This is how to make a million dollars, Pinboard-style, out of the
| ashes of the original curated Yahoo idea and the information
| structures of hashtagging. It doesn't work if you allow free-for-
| all infinite-sized lists, it doesn't work if you allow free-for-
| all hashtags, but with clear limits and moderation of tags
| (instead of webrings), it would thrive. By moderating tags, users
| can keep the webring they paid for, and SEO rings will be stick
| out for having no shared network with any other rings, which
| allows for easier detection and culling of malicious non-
| participatory actors. Plus, with the curation networks in place,
| it becomes possible to bubble up rings that have unusual content
| for _positive_ human moderation activity.
|
| I tried to find some good podcast lists yesterday and each site I
| visited had a really interesting cross-section, but there were so
| many duplicates. I wish the ring site existed, so that it could
| remember what it had shown me already, and I could say "show me
| rings that intersect with this podcast and have something new I
| haven't seen before".
|
| That's where the theory of pagerank and the practice of curation
| and the capabilities of search align, and given that moderation
| of hashtags scales very cheaply, is a billion dollar opportunity
| that Google and Amazon cannot compete with if handled properly.
| It's not about trying to get a cut of every visit's revenue
| potential. It's about giving human beings a directory that
| respects their time and remembers what they've seen.
| jart wrote:
| If I understand correctly, you're saying you'd create an
| exclusive webring, but the rule to joining is that you have to
| Disallow: Google, Bing, etc. in your robots.txt file. That
| sounds outrageous, but speaking as a content creator, I
| wouldn't be giving up much. My blog gets about 3% of its
| traffic from search engines. I have no idea who these visitors
| are or what they searched for, since browsers no longer send
| referral links. If a webring offered me the benefit of positive
| regularly engaged community, then having my blog part ways with
| search engines would be a no brainer. That is after all the
| Facebook model, except tailored for the open web. Believe me
| when I say that we bloggers are waiting to be rescued.
| floatingatoll wrote:
| No, you can join the webring and be still indexed by Google,
| but the sitelist of the webring cannot be. That's all. It
| prevents Google from leeching off the human curation and not
| paying a fair market value for it. Given that Google makes
| billions of dollars a year on pagerank data, webring curators
| got screwed over pretty hard already once by Google twenty
| years ago, so no reason to allow it again.
| jart wrote:
| So the webring would use rel="nofollow" hyperlinks? Blog
| authors might see that as an insult.
| anyfactor wrote:
| I have seen several google alternative search engine projects
| being posted in HN every other week. You have your privacy
| focused open source google alternative search engine for "insert
| niche here" with big hopes of disruption.
|
| I will give you my two cents. I have used duckduckgo, bing, searx
| etc. for extended periods of time and hated every one of those
| things. The problem is that what you search seem to be
| essentially the gateway to wild west of internet. I understand
| the proposition of spam control in search engines, but atleast to
| me I think the early days of google without DMCA and copyright
| bans made google the best.
|
| I fear SEO spam control will only bring the worst of the
| moderated internet. It will not be the first time big tech tried
| to douse a gasoline fire with more gasoline because they taught
| the more fire meant the previous fire will get suffocated by the
| lack of oxygen(?). Rather than using "AI" as a crutch to solve
| SEO as a problem, I want to see an option that is true to 2005
| era google.
| AtNightWeCode wrote:
| I think it will be hard to create a great a search engine while
| the web works as it does today. Maybe there could be like a
| sitemap but for text content that has the content structured,
| indexed, and signed by a trusted party in a way that makes it
| easy to analyze for plagiarism and so on.
| mrlanderson69 wrote:
| If anyone has the skills to work on something like this please
| email me (email address in my profile.)
|
| I can show you a demo. Just to show I am not screwing around: if
| you don't like the demo I will pay you $500.
| freediver wrote:
| Google's job is to serve its customers, and it does that really,
| really well.
|
| The problems being discussed today (and yesterday in the similar
| thread) come from the fact that for Google user != customer.
|
| When you have incentives that are misaligned like this, you can
| only go so far! We seem to have reached that point with Google,
| where there is not much more that can be done on the search
| experience front without jeopardizing customer experience (ad
| revenue).
|
| Disclosure: I'm working on a paid search engine to solve this
| problem on a fundamental level, by aligning the incentives and
| making user also the customer so we can best serve them and their
| needs. It is called Kagi and is currently in closed beta
| accepting beta-testers.
|
| https://kagi.com
| krono wrote:
| Just a minute ago, I made a small typo in a non-obscure
| programming-related search term. Showing
| results for searchterm No results found for searchterm
|
| Followed by an unending list of random celebrities I don't know
| nor care about, businesses I've never been that sell items I have
| absolutely no use of, and random foreign news articles.
|
| Failure to recognise the typo is unexpected but forgivable. But
| then, rather than helping me with my search, they attempt to
| distract and lead me away from it - using triggers that you'd
| think they should have known wouldn't work.
|
| I really don't understand how this is even possible, and it's not
| a rare occurrence.
| mitchtbaum wrote:
| How will these search engines interoperate?
| new_here wrote:
| > _Maybe ultimately you open up spam fighting to your users. If
| you managed this well, you could harness a lot of energy._
|
| Doesn't Google already consider that if a user returns to the
| results page (or clicks a second link) then the first link
| visited was not satisfactory. Seems like a pretty elegant
| solution.
| tonyedgecombe wrote:
| That's to Google's benefit though, they get another chance to
| present some adverts to the user.
| EamonnMR wrote:
| Being able to flag Fandom, Quora, and Pinterest results would
| bring me great joy.
| cassianoleal wrote:
| I'm not very well versed in SEO but isn't this just good old
| Goodhart's Law?
|
| Come up with criteria to determine which websites are "better
| quality". Measure them, rank them, put the ones that fit the
| criteria best at the top.
|
| On the other side, there's the people promoting their websites.
| Do what you can to get as close to Google's ideal as possible
| through whatever means. Profit.
|
| At this point the criteria becomes useless for any real quality
| analysis.
| pkamb wrote:
| A search engine that only indexed Reddit, Stack Exchange,
| Wikipedia, and a small number of other "good" sites would get 80%
| of the way there.
|
| No, DDG bang operators don't let you do this. I want an SERP, not
| a shortcut to a single site's on-site search.
| marban wrote:
| For business news, I do this with https://yup.is
| Jenk wrote:
| > What would a paid version of Google Search results look like -
| where Google can just try to give me the best possible results
| and not be worried about generating revenue?
|
| God please no. YouTube premium shows what Google would do, i.e.,
| they would further ruin the free experience by ramping up the
| amount of ads you see to "incentivize" the premium search.
| judge2020 wrote:
| Premium offerings like that are amazing simply for the fact
| that you can 'return' to the days where you obtained services
| by paying for them directly, not by looking at ads and paying
| with your mindshare. Google and YT aren't free services, and
| it's a miracle they continue to be accessible with ad blockers
| enabled.
| Jenk wrote:
| Orthogonal to my point. Deliberately worsening the free
| experience after you introduced a premium service is a dark
| pattern for UX.
| netcan wrote:
| I'm almost certain that pg gave " _compete with Google by
| competing in some niche_ " advice 10+ years ago.
|
| In any case, I'm not sure that competing in search is a very
| attractive notion. AdWords is the only meaningfully profitable
| search and business. Even if you steal 10% of Google's market,
| that absolutely doesn't translate into 10% of the revenue.
|
| That said, recipes. Someone make a search engine where the top
| results don't start with 500 words on the history & etymology of
| butter, because that's what Google want.
| colordrops wrote:
| It's usually more than 500 words lol
___________________________________________________________________
(page generated 2022-01-03 23:00 UTC)