[HN Gopher] Ask HN: Let's build an HN uBlacklist to improve our ...
___________________________________________________________________
Ask HN: Let's build an HN uBlacklist to improve our Google search
results?
For the unaware, uBlacklist [0] is a browser extension that lets
you blacklist sites from the google search results page. It lets
you blacklist sites right from the results page, by regex, or by
linking lists hosted somewhere. The low quality of results has
been a problem from a while now and has become worse lately thanks
to all those StackOverflow and Github clones. So I was wondering if
we could come together and contribute to a single blacklist hosted
somewhere and then import it into each of our browsers. Who knows?
We might end up improving the quality of the results we all get.
Lists to get rid of the StackOverflow and Github clones already
exist. [1] I would love to contribute to a project like this, but
won't be able to be a maintainer due to time constraints. Would
greatly appreciate it if someone could host this. A simple txt file
on github would do. What do you say, HN? [0]:
https://github.com/iorate/ublacklist [1]:
https://github.com/rjaus/awesome-ublacklist
Author : sanketpatrikar
Score : 328 points
Date : 2022-01-04 13:19 UTC (9 hours ago)
| hammock wrote:
| Google's existing blacklist is half the reason (for me) that the
| results quality has declined so much. So this is not a great
| solution.
| colesantiago wrote:
| Or just use search.brave.com
|
| No problems there.
|
| Comparison [0]
|
| [0] https://brave.com/search/
| pluc wrote:
| Use a search engine that doesn't fuck up your results instead of
| trying to unfuck the results it gives you. Why go through this
| much effort to still give your money to google?
| littlecranky67 wrote:
| Fully agree, its an endless cats'n'mice game, as SEO spammers
| create new pages on new domains with automated wordpress
| deployments in no-time.
|
| Instead of a blacklist, a whitelist like a curated list of
| links a la Yahoo would be better approach, or something with
| upvote/reputation/karma as on HN.
| sanketpatrikar wrote:
| I'm interested in that alternative too. Does a whitelist like
| that exist?
| chiefalchemist wrote:
| Funny enough. Years back I poked around Google Custom search.
| As a laugh I used it to index and search Quora (which was
| still a high quality site then). But I contemplated setting
| up a series of specialty search engines (e.g., JavaScript)
| such that I would feed Custom Search a list of URLs and/or
| rules for what to index.
|
| I never took any further than "what a cool idea".
| jpalomaki wrote:
| Wonder if the alternatives are really fixing the SEO spam
| problem by improving their algoritms or manually pinning sites
| like StackOverflow to the top.
| Mk2000 wrote:
| I don't like Brave Browser at all with its integrated crypto
| ads, but I've been using Brave Search for a couple of days and
| it seems nice, much better than DDG:
|
| https://search.brave.com
| chasebank wrote:
| Considering this method would be through an ad-block extension,
| it doesn't seem as if they would be giving their money to
| google.
|
| Side question, is there a max number of sites you can block on
| that list?
|
| Edit: fruedian misread. I thought this was through ublock
| origin.
| Cort3z wrote:
| I use ecosia.org wich is like ddg, but a part of the proceeds
| go into charity. Not affiliated with them, just like it a lot.
|
| Like ddg it's just Bing with a gimmick, but it's decent.
| ffhhj wrote:
| > ddg it's just Bing with a gimmick
|
| Is that true? I've read images are served by Bing, not sure
| about the web search.
| Closi wrote:
| Web search is served by the Bing Web Search API.
| sct202 wrote:
| Their ads are also from Microsoft
| https://help.duckduckgo.com/duckduckgo-help-
| pages/company/ad...
| alberth wrote:
| DDG is like "Dr Pepper" of search engines.
|
| Coke/Pepsi create, market, manufacture, distribute their
| soda.
|
| Dr. Pepper just creates and markets their soda. They don't
| manufacture or distribute their own soda (believe it or
| not, Pepsi manufactures & distribute Dr Pepper soda).
| omnicognate wrote:
| Any recommendation? I use ddg but it's Bing based and the
| results are no better.
| javajosh wrote:
| My experience with ddg has been very positive over the years.
| Although I think my eyes have gotten good at skipping over
| the noise in the search results. It might be nice to offload
| that effort to software.
| fowkswe wrote:
| DDG has no idea of the intent of your search based on your
| previous searches and unfortunately this generates abysmal
| results for me when searching for something like `ruby
| xxx`. Those terms rarely return results about the
| programming language. I unfortunately end up using Google
| in these situations, always to immediate success.
| jasonjayr wrote:
| One of the biggest reasons to switch away from Google is
| to shed that "previous search" bias, and refine +
| refilter results till you hone in on what you want.
|
| Leaving it to "the algorithm" to decide what to show you
| based on what it "thinks you want" from previous
| interactions turns the tool into no better than Twitter
| or Facebook optimizing engagement by showing you what
| they think you should see.
| judge2020 wrote:
| It's safe to say that most people searching for 'ruby
| <term>' have absolutely no idea what the ruby programming
| language is, and didn't mean to find results for it. If
| you're running a non-personalized search engine like ddg,
| which results do you show?
|
| For reference, Ruby is only has 6% on the Stack overflow
| dev survey
| https://insights.stackoverflow.com/survey/2021#most-
| popular-...
| nemosaltat wrote:
| Pretty sure you're using "xxx" as "some generic search
| term" but if you include literal xxx as a search term,
| you're almost definitely going to get some skewed search
| results.
| fouc wrote:
| xxx ** ??? same thing, it's obvious by context
| nemosaltat wrote:
| As I indicated in my comment, I _was_ "pretty sure" from
| context, but fowkswe enclosed their whole search term in
| backticks. I'm not familiar enough with ruby to know if
| there's some valid package called "xxx." Based on
| fowkswes response to another comment, it's clear they
| meant "some variable search term" and not literal "xxx."
| Out of curiosity, I tried a few $language xxx searches on
| Google & DDG. I get mostly porn results for Rust, Ruby,
| Julia, Bash, Java in both engines with xxx appended.
| Interestingly, on Google, Python xxx returns mostly
| programming results.
| ryanmcbride wrote:
| I'll take having to write verbose and specific searches
| over google guessing what I really want any day.
| skrtskrt wrote:
| Google seems to have changed to using NLP + "what would
| the average person potentially mean by this", meaning
| anything actually specific is nearly impossible to find.
| I'll have half my search terms completely ignored
| javajosh wrote:
| I really like that it doesn't "know" me. I've come to
| appreciate starting from a well-known shared state, often
| quite empty, and then ranging out from there in my own
| unique way. I don't want my view of the world to change;
| I want my behavior to change, which ultimately yields a
| view of the world that is under my indirect control.
| na85 wrote:
| >ruby xxx
|
| Not sure if the X's were to denote wildcards or if you're
| after porn actresses named Ruby, but either way it might
| be worth adding a term to the query to clarify
| peddling-brink wrote:
| That's literally the conversation though. Google has
| enough context to know that you want programming language
| results, not porn or gemstones.
| fowkswe wrote:
| Oof - muscle memory failure....
| franczesko wrote:
| After trying a few, I currently test Brave Search. It misses
| DDG's bangs, but results seem to be better.
| andrewnc wrote:
| You.com has actually been really good for me.
| stevenleeg wrote:
| I've been using Kagi over the last few days and have actually
| been pleasantly surprised with it performing better than both
| Google and DDG for my use cases. It's still free during the
| beta so might be worth giving it a shot.
| Semaphor wrote:
| Worth mentioning, that they actually have the blacklist
| feature for domains that google removed years ago.
| littlecranky67 wrote:
| Thanks for sharing this link! I described my odyssey to
| finding craft printers in this thread [0], and just
| repeated the search on kagi.com. The results are better
| IMHO, even though the first hit is a link to some kind of
| Mindcraft-clone game that allows to craft printers. The
| second link directs at least to Canon's Craft Landingpage,
| and there are way less Spamsites and more Manufacturers
| links within the first 20 results. Those sites that are
| Spam, are at least a bit more disguised/offer more content
| than usual so I will let kagi get away with that.
|
| [0]: https://news.ycombinator.com/item?id=29772136
| JonathanBuchh wrote:
| I've also been using https://kagi.com for the past couple
| of days and have found the search results to be
| unparalleled to both Google and DuckDuckGo. I haven't come
| across any SEO spam and the results seem to be of better
| quality overall. They have all of the same bangs DDG uses
| and get results from Bing, Google, and a bespoke index.
| Because they query Bing and Google, it will be pretty
| expensive when it launches (it's in private beta) since it
| costs $12 per 1000 queries, but since search is so
| important, it will probably be worth it. They say they're
| privacy focused too, but it's hard to be sure when a
| product isn't open source.
|
| Kagi isn't trying to take significant market share from
| Google or DDG, but find a niche (probably Hacker News type
| people). The one thing I find they're lacking is maps and
| location based results (e.g. you can't search for "lunch
| near me"). If you're interested, I'd recommend checking out
| their FAQ: https://kagi.com/faq
| Melatonic wrote:
| How long did you have to wait for the invite?
| Semaphor wrote:
| 2 days in my case
| technofiend wrote:
| Signed up for the beta, but their signup page has one
| annoyance... clicking back doesn't take you back one page
| in the signup process, it ejects you to the top but your
| answers are saved. Fine if it's an honest mistake, weird
| if it's an aesthetic choice, not a good idea if it's an
| indicator of how they plan to be different. It would be
| much better to prove how smart they are by fixing
| google's search than "fixing" well-established UI
| patterns.
| antihero wrote:
| Hmm, annoyingly some bloody invite based thing instead of
| just being able to try the damn thing. What is this,
| 2004?
| eproxus wrote:
| Apparently their invite, once you receive it, is free to
| share with anyone. Enjoy:
|
| https://kagi.com/signup?invite_code=adfreesearchengine
| Melatonic wrote:
| Might be a limit on the number of users - this link now
| gets an "expired" message after attempting to signup.
| Anybody else have one I can try?
| omnicognate wrote:
| I'm on the waitlist. AIUI they intend this to be a paid
| service eventually, which is highly desirable to me. If
| it can be made to work, I would very much like a search
| service that is directly paid for by its users, rather
| than funded via side channels such as advertising or paid
| results placement that create skewed incentives.
|
| If they're not ready to charge for use yet, a limited
| preview makes more sense than a public free service.
| fallat wrote:
| Signed up; I will gladly pay for better results. Google
| has sold its soul long ago to the capitalism gods. DDG
| and Brave are no better.
| u2077 wrote:
| They also have a browser that's still in beta. Here's the
| direct link to test flight:
| https://testflight.apple.com/join/DeC8ZDnu (got this
| after a month on the waitlist)
| freediver wrote:
| It is in closed, invite-only beta.
| littlecranky67 wrote:
| Just tried and it seems we have an underdog here that
| could be useful to avoid spammy results. I can totally
| see why they want this to be invite only, as they
| probably need to find out how to scale well. If it would
| go viral suddenly, they might not collapse and the whole
| effect would backfire.
|
| Edit: This is really promising and interesting. They show
| a summary of the link when you hover over the crystal
| ball icon next to it, together with a button for "Block"
| and "Boost" - i.e. you can vote with your account. These
| metrics could be used for ranking and kill affiliate
| marketers.
| hellgas00 wrote:
| I have been using Brave Search for the last half year,
| results seem to be better than DDG and you can append g! to
| your searches like DDG for Google results.
| TrueDuality wrote:
| DuckDuckGo is not Bing based.
| [deleted]
| matheweis wrote:
| It does in fact use the Bing backend for the majority of
| its web searches:
|
| > DuckDuckGo gets its results from over four hundred
| sources. These include hundreds of vertical sources
| delivering niche Instant Answers, DuckDuckBot (our crawler)
| and crowd-sourced sites (like Wikipedia, stored in our
| answer indexes). We also of course have more traditional
| links in the search results, which we also source from
| multiple partners, though most commonly from Bing (and none
| from Google).
|
| https://help.duckduckgo.com/duckduckgo-help-
| pages/results/so...
| TrueDuality wrote:
| Of their additional sources they receive the commonly
| from Bing, but that is only from their additional
| sources. They use the results from there sure but by no
| means are they "Bing based".
| pb7 wrote:
| Their intention is to confuse you, so they're successful
| on that front.
|
| Their "hundreds of sources" are for very niche topics and
| appear in separate boxes. The 10 links per page come from
| Bing which would make them Bing-based.
| [deleted]
| _aavaa_ wrote:
| classified wrote:
| My method of improving search results is to not use Google.
| sheeeep86 wrote:
| marpstar wrote:
| I don't think OP is the author of the extension which has
| blacklist in the name...
| slingnow wrote:
| hu3 wrote:
| Related: git, GitHub and GitLab switched the default branch
| name from master to main.
|
| - https://github.blog/changelog/2020-10-01-the-default-
| branch-...
|
| - https://about.gitlab.com/blog/2021/03/10/new-git-default-
| bra...
| bauerd wrote:
| With the consequence that the default branch now differs
| between locally and remotely initiated repositories
| hu3 wrote:
| For old versions of git, yes.
|
| However, recent versions come with "main" as the default.
|
| And since 2.28, released more than a year ago, we have the
| option to change the default: git config
| --global init.defaultBranch main
| FerretFred wrote:
| pddpro wrote:
| Interesting. I'm curious as to what you use instead of
| "whitelist" and "greylist"?
| polygloty wrote:
| Can't say of grey list but white list has been renamed to
| allow list in a few places i know
| hdjjhhvvhga wrote:
| Whitelisting and greylisting is still fine. The current
| movement (primarily in the USA, but not only) to replace the
| word black where the meaning is associated with something
| negative is quite superficial and happy with simple swaps.
| And this fine because if we went deeper, we would have many
| problems related to the fact that many positive values are
| associated with light and many negative with darkness. So
| people would either have to rewrite the fundamentals of our
| culture (and maybe even biology), or to accept the fact that
| these two - the pigments in our skin, which is neither white
| nor black btw, and the culturally-influenced meanings of
| various colors - belong to two different semantic planes.
| sanxiyn wrote:
| allowlist has been suggested for whitelist.
| [deleted]
| troon-lover wrote:
| ColinHayhurst wrote:
| Thus could be really helpful for us at Mojeek, and generally
| helpful for all search engines we think, done as a list rather
| than as an extension.
| rolisz wrote:
| Why not use another search engine, such as Kagi, which has built-
| in support for this? At least for the programming niche, Kagi has
| worked really great for me for a month now.
| sharikous wrote:
| Kagi is still in beta as far as I know. I sure hope someone
| will fill this niche eventually.
| tjungblut wrote:
| Indeed, I'm very happy with kagi so far
| mrkramer wrote:
| Open source software FTW
| AmosLightnin wrote:
| Shouldn't we make a plugin for SearX that learns from the results
| you click, so that the customization and machine learning is on
| the client side? That way search becomes a commodity, but the
| final selection algorithm's behavior is owned by the user.
| quyleanh wrote:
| Could anyone tell me why we don't add these domains to adblock
| filter?
| dgut wrote:
| Shameless plug: I run okeano.com, a privacy friendly search
| engine. We support natively blocklists [0].
|
| [0] https://okeano.com/blocklist
| analognoise wrote:
| Why are we spending the effort to fix Google?
|
| We make $0 doing this, they make...astronomical profits screwing
| it up. So we invest a bunch of time so they can continue to take
| in astronomical amounts of money while abandoning "don't be
| evil"?
|
| Absolutely not.
| short12 wrote:
| Does HN block Google completely. I can't recall any search
| bringing them up even with the wealth of onfo
| kevinslashslash wrote:
| Ironically, when testing an example Google search from this
| thread, this thread came up. So HN doesn't block Google
| completely, but maybe has poor SEO so it rarely shows.
|
| 4th result is HN for me
| https://www.google.com/search?q=%22code+that+protects+users+...
| aaron695 wrote:
| Abimelex wrote:
| xwdv wrote:
| What reasons?
| Dig1t wrote:
| I can't comment on the parent here because its dead, but I
| just want to add:
|
| I think the intent behind the original comment is coming from
| a good place and I agree that its sentiment is good, but also
| is it not obvious that trying to police other people's
| language is going to do nothing but increase division and
| resentment amongst your peers? Especially when the intent is
| so obviously not racist or exclusionary. I invite you to ask
| each one of your friends/family to honestly tell you if the
| sentence "I have blacklisted that website" makes them feel
| personally excluded or discriminated against.
| roneoo wrote:
| Describing something negative with the black color is related
| to racist mindset, in short.
|
| Thank you Abimelex for pointing this out
| wackget wrote:
| This isn't mentioned on Wikipedia:
| https://en.wikipedia.org/wiki/Blacklist
| junon wrote:
| There are no etymological reasons suggesting this.
| xwdv wrote:
| I disagree. It has nothing to do with race. Even in
| societies where blacks didn't exist, black has symbolized
| bad, evil things. It is ridiculous to say a color cannot
| mean something negative just because it has the same name
| as a race of people. I also doubt the rest of the world
| feels using black this way is racist. It seems to be an
| opinion held only by coddled American minds still trying to
| cope with the trauma of learning that everything they enjoy
| about their country was built on the backs of black slave
| labor.
| throwawayboise wrote:
| > everything they enjoy about their country was built on
| the backs of black slave labor.
|
| And that isn't true either, and to the extent it is, you
| can point to any other country and find something
| terrible in its past if you go back far enough. Man's
| inhumanity to man is universal, not exclusive to America.
| xwdv wrote:
| So the point is, using the term blacklist isn't
| problematic.
| Abimelex wrote:
| see: https://inclusivenaming.org/faqs/
| littlecranky67 wrote:
| I wonder why Apple is not starting it's own search engine. I mean
| yes, they get >$1Bn per year making Google the default on
| iOS+macOS, but they have plenty of cash so they wouldn't need it.
| They would get immediately ~10% market share when it is launched,
| just because it would be made the default on their devices. From
| their they just need to present better search results than Google
| (which shouldn't be that hard right now) and can only grow
| further.
|
| As another commenter here said "Google does not make money by
| helping you find what you are searching, it makes money by
| keeping you searching". That only works when there is no
| competition. But once Apple would be in the game, people would
| use what presents them with the better results. Right now, I
| don't feel there is real competition.
| paxys wrote:
| Running a search engine is a massive money sink, regardless of
| its popularity. It's the surrounding ad network which makes
| money. Competing with Google and Facebook in that regard is an
| impossible battle, and something Apple has already failed at a
| couple times now. They have since pivoted into creating a
| privacy friendly image, so emulating Google simply does not
| make sense for them.
| ericbarrett wrote:
| Apple is allegedly paid _lots of money_ to not do this:
| https://www.macrumors.com/2021/08/27/google-could-pay-apple-...
| littlecranky67 wrote:
| Wow, $20Bn per _year_. This smells a lot like anti-
| competetive behavior, I wonder what happened to that lawsuit.
| judge2020 wrote:
| It's legally not anti-competitive behavior (as in, both
| Google and Apple's lawyers believe so) because it's just a
| 'default search engine' fee - everyone knows that
| iOS/Safari is the most lucrative platform to be the default
| search provider for, so a large number like $20 billion is
| to-be-expected. I'm sure that, in a lawsuit, Apple would
| argue "if someone else came and paid us $21 billion, we'd
| take it and drop Google/start a yearly auction" but no
| other search engine has that budget.
|
| Apple is surely free to make their own search engine, and
| to an extent, they do - in Safari, the "suggested sites"
| feature is a search engine but one that only returns
| {single result | no result} and only works on iOS/Safari.
| On that same note, you can prove this by tracking search
| engine user agents for your website to look for
| Applebot/0.1 hits (if your site is popular enough).
| lgats wrote:
| apple already has its own search engine. the crawler is known
| as AppleBot and the results power siri search suggestions.
|
| it's limited to popular queries, so for many searches you may
| get 'no results, search the web (google)'.
|
| i made a bit buggy web front end for siri search so i could
| better play around with the results https://luke.lol/search/
| achtung82 wrote:
| But what would be their incentive to do so? Normally they
| launch products and make it exclusive to their devices so more
| people will buy iPhones, but that is difficult to do with a
| search engine. Otherwise they would have to get into the ad
| business like Google.
| littlecranky67 wrote:
| Apple is a publicly traded company, and every company needs
| to grow into new markets to make more revenue. And they also
| maintain their own browser Safari, even though on macOS they
| could just withdraw from market and leave the field to Chrome
| and Firefox. Even amongst macOS users Safari usage is very
| low and doesn't make Apple any money.
|
| On the other hand you can see how Google is using its
| dominance in Search to push its browser and mobile OS - once
| you login to Google in Chrome on your phone, suddenly they
| can track you when you use their mobile Apps etc. And Apple
| is trying hard to grow in the "Services" field, i.e. through
| Apple Music and Apple TV - both available to Windows and
| Android users too. Just as they made a buttload of money with
| iTunes and the iPod because they also targeted Windows users.
| rubyist5eva wrote:
| I just use duckduckgo with fallback bangs when the results aren't
| satisfactory to me, usually falling back to Bing.
|
| Google doesn't deserve my time or my eyes.
| tagoregrtst wrote:
| Im afraid this is potentially dangerously political.
|
| Are you only going to filter obvious spam and sites that
| republish other's content, or are you going to block sites that
| are "harmful" or disseminate "disinformation"?
|
| Who will get to decide which media bubble I'm in?
| sharikous wrote:
| It can be made configurable - an option for github clones,
| another for SO clones, etc..., the folks who want to filter
| "harmful" content will add their category
| tagoregrtst wrote:
| Who sets the default 90% of people won't touch?
|
| Someone, give me the late 90's cesspool of un-curated
| internet back! Everything is so... trite now.
| Closi wrote:
| > Im afraid this is potentially dangerously political.
|
| It only took 7 days for that other search engine project on HN
| last week (Mwmbl) to add hard-coded weights for certain news
| websites, so it does show how careful you have to be with this
| stuff.
|
| https://github.com/mwmbl/mwmbl/blob/a41088ca9ad7fdcac952a3be...
| schleck8 wrote:
| What's the issue with that? Doesn't Google do this too, based
| on ssl availability for instance?
| Closi wrote:
| Well Google testified against having hardcoded weights for
| news websites in congress.
|
| And the issue is because it creates filter bubbles and
| introduces bias into how information is discovered.
| beart wrote:
| This is a browser extension that you have to explicitly install
| and configure. You get to decide which media bubble you are in
| (setting aside the fact that google may or may not curate their
| search results.)
| aronpye wrote:
| Wouldn't it be easier to simply invert the problem and come up
| with a whitelist instead? When searching for technical info there
| are only a handful of sites I use, namely stackoverflow and
| wikipedia.
| j1elo wrote:
| This was discussed around a month ago, leading me to this post:
|
| https://news.ycombinator.com/item?id=29546433#29549855
|
| and the consequent uBlock Origin list that is what I'm using as
| the so far better solution for this problem:
|
| https://github.com/stroobants-dev/ublock-origin-shitty-copie...
|
| but it will need curation and updates over time, which I'm not
| sure the author is willing or has the time to do.
| jhchabran wrote:
| That's an ambitious goal, I'm not sure to see how that would be
| maintainable on the long run.
|
| On a much smaller scale, if anyone is interested, I maintain a
| black list focused on those code snippet content farms that gets
| in the way when you're searching for some error message or
| particular function here https://github.com/jhchabran/code-
| search-blacklist.
| sanketpatrikar wrote:
| It's worth a try! Also, thanks for maintaining those lists!
| anigbrowl wrote:
| Well there's only one way to find out
| nixcraft wrote:
| May I know why my domain (cyberciti.biz) was added to that
| list? I created my site back in 2000, and there was no
| StackOverflow or anything. So much for creating original
| content and then getting labelled as a spammer. In fact, some
| of the top answers on StackOverflow were copied from my work
| without giving any credit to me. Some people do give credit
| tho. But, go ahead block a site that actual humans maintain
| over 20+ years. Also check my About[1] and Twitter[2] page.
| There is no scrapping or spamming going on my part.
|
| [1]https://www.cyberciti.biz/tips/about-us
| [2]https://twitter.com/nixcraft
| zxexz wrote:
| cyberciti.biz is one of the few sites that come up in Google
| search results for anything code/linux related that has
| valuable content. I do wonder why someone would block it.
| ffhhj wrote:
| > some of the top answers on StackOverflow were copied from
| my work without giving any credit to me
|
| That's really frustrating. I'm building a faster search
| engine for programming queries and just added your site
| cyberciti.biz as a recommended and curated source of
| Unix/Linux material. Hope more devs get aware of your work
| and you (and your collaborators) receive the credits
| deserved. Thanks for your work of many years.
| Karsteski wrote:
| I've noticed cyberciti.biz showing up in my DDG search
| results but I've always ignored it because of the initial
| captcha. I will try it now that I've seen your post here!
|
| The .biz definitely does not help, since it hints to me that
| it's just another one of those worthless reposting sites, as
| someone else commented below.
| nitrogen wrote:
| What CDN do you use? I was immediately asked to solve a
| captcha from my phone.
| nixcraft wrote:
| Cloudflare sometimes triggers those when they think IP
| reputation is not good. Typically happens for data centre
| IP ranges as WAF has an anti-bot feature. So I know it is a
| problem for some.
| nitrogen wrote:
| Maybe it's the fact that I don't use one of the three
| major US ISPs. Hopefully CDNs get used to the idea that
| there can be more than one fiber provider.
| nixcraft wrote:
| Would you mind sharing the Cloudflare ray id displayed at
| the bottom of the screen when you see a captcha? I can
| look into it, and maybe be I will able to fix it too.
| Reply here or email me at webmaster@cyberciti.biz. HTH.
| nitrogen wrote:
| Not sure if you changed your Cloudflare settings, or if
| Cloudflare changed something, but I'm no longer getting
| the captcha, so that's good, but sadly I can't help debug
| the original issue.
| pbowyer wrote:
| I wanted to stop by and say thanks for cyberciti.biz! I've
| been using it since 2001-2002 when I got my first Verio
| Freebsd VPS and had to figure out what was going on.
|
| When I see your site pop up in my search results I know the
| content is going to be more reliable than most of the others.
| Thanks for the effort you've put into it.
| endisneigh wrote:
| Your comment is exactly why spam prevention is difficult.
| Sorry for that.
| mikevin wrote:
| Interesting, I have your site on my mental blocklist as one
| of those scrape and rehost sites.
|
| I'll be honest, I don't remember how I came to that
| conclusion but I suspect I encountered an unsatisfactory
| answer to a question I was looking to answer, saw the .biz
| and drew my conclusions.
|
| The noise to signal ratio for most of my queries is so high
| that I have to start judging a book by its title, not even
| its cover.
| travisporter wrote:
| Not OP but dot biz is associated with spam in my head for
| what it's worth
| nixcraft wrote:
| A couple of years ago, I was at Google's office, and I
| talked with someone who works on search about .biz
| extension. They said domain extension doesn't matter. At
| that time, they said backlinks is one of the most vital
| signals apart from some PR. That was like eight years ago.
| So I never changed the domain name despite owning the .com
| version too. It will break too many backlinks.
| lacksconfidence wrote:
| Sure, google might be fine with a .biz. As a human
| consuming googles responses, my eyes typically glaze over
| seeing .biz and jump to the next search result. It's not
| that there is anything particularly wrong with .biz, but
| this is the first legitimately useful site (to me,
| probably plenty for others) i've heard of using .biz.
| burnished wrote:
| At first scan your site looks like one of those automated
| scrape and republish sites. I'm curious what got you on that
| blacklist (misspelling? bad first impression? automated tool
| gone awry?) though.
|
| Glad you said something though, I wouldn't have looked at it
| twice without a human attestation.
| nixcraft wrote:
| I kept it simple on purpose. As a result, it loads faster
| on both desktop and mobile and passes web.dev PageSpeed
| insight test too.
| GreenWatermelon wrote:
| For me personally, the titles are what makes it
| suspicious for me. I've almost never, NEVER found
| something good in an article titled "(top) xy ways to z",
| I've come to immediately avoid any article with such a
| title.
| dredmorbius wrote:
| TL;DR: I strongly suspect that relatively small, personally-
| curated lists will be much more appropriate and highly effective.
| These might be augmented with specific classifications, but
| probably not on a widespread basis.
|
| Though the proposed solution borrows heavily from concepts long
| used in email and Usenet spam, there are a few critical
| distinction in SEO SERP[1] spam which both make a widely-
| crowdsourced listing less applicable _and_ less necessary.
|
| In the case of email, your inbox is an unlimited resource to the
| spammers --- there's effectively no limit to how much spam they
| can throw at it. As there are also an effectively limitless set
| of source addresses (by either domain name or IPv6 addresses),
| _and_ because email /Usenet spam is itself a quantity/numbers
| game with rapidly shifting origins, collectively-source and
| curated blocklists have value.[2]
|
| A SERP is itself a finite resource --- the default is to display
| 10 results, and not making it into the top ten provides little
| reward. Moreover, _high ranking search takes some effort and time
| to achieve_ , it's not like in email where a new server can spin
| up and immediately start deluging targets.
|
| My experience with annoyances matching this sort (stream-based
| social media is one example) is that _blocking a relatively small
| number of high-profile annoyances hugely improves signal /noise
| ratios._ And I think that will be the case with SERPs as well.
| There are a half-dozen or so sites which tend to dominate results
| in most cases, and those can be individually blocklisted (if the
| capability exists). If more appear, they can similarly be
| removed.
|
| The other factor is that quite a few sites which some people find
| exceedingly annoying and spammish, others find appealing. Coming
| to agreement on what to block, and classifications of such
| domains / sites, is likely to be difficult and/or contentious.
| There may be exceptions in specific instances (hence: specific
| classifications of unwanted results), but less so in the general
| case.
|
| I might be wrong. The case of DNS adbloc, with PiHole as the
| classic example, shows that very large lists _can_ be compiled
| and used. My own Web adblock / malware block configurations have
| typically had from ~10k to ~100k of thousands of entries. That
| said, the really heavy lifting is typically done by a _much_
| smaller fraction of the total. Power laws and Zipf functions work
| in your advantage here.
|
| ________________________________
|
| Notes:
|
| 1. Search engine results page, that is, what you see in response
| to a query.
|
| 2. Even in the case of email spam, the principle value is largely
| from _curated_ lists, usually by experts, e.g., Spamhaus.
| sanketpatrikar wrote:
| In that case, we'd still need a repo bringing together all
| these individual lists. I couldn't finding anything like this.
|
| I suggested a single list to prevent repetition and to limit
| the imports one needs to make to one.
| dredmorbius wrote:
| So, my larger point is that no, that repo _doesn 't_ seem to
| be called for.
|
| For malware and the like, _repurposing extant DNS-based
| blocklists_ as in for uBlock Origin / uMatrix _should_ be
| viable, and _not_ require an additional curation effort.
|
| Note also that we're looking at a _browser extension_ , and
| as such, very large lists and memory load would probably
| carry significant negative impacts.
| kcfantastic wrote:
| I've been building something similar to this with
| https://fantastic.link. Would love to get your feedback!
|
| I think empowering individuals to curate the web would create
| stronger social and financial incentives to improve online
| indexing (I.e: Shopify vs Amazon). 20 years ago we could
| approximate quality from backlinks from credible sites, in the
| age of social media it seems this signal has shifted towards
| what creators, influencers, and online experts endorse.
| unilynx wrote:
| Odds are some of things some HN users want blocked are startups
| started by other HN users. What if w3schools didn't yet exist but
| applied to YC tomorrow?
| TrueDuality wrote:
| Huzzah, that extension supports other browser engines as well.
| It's not nearly as atrocious an issue on DuckDuckGo but there are
| still some of those re-post heavy sites that aggressively get
| through, as well as some low quality content farms. It's nice to
| have a tool available to do local/personal fine grained
| refinement.
| LinuxBender wrote:
| This is just my own personal preference, but I manage my own list
| of what is blocked or allowed on my systems. I would be concerned
| that a group contributed list for this category of blocking could
| quickly devolve into a group-think censorship dominated by
| whomever is the most devoted to blocking and extending echo
| bubbles to peoples browsers.
| throwawayboise wrote:
| That, or it would be gamed by the SEO folks like they do every
| other thing that was once good.
| hayesall wrote:
| Seeing "how other people configure their tools" can be
| interesting. I love seeing how people configure their .bashrc
| with custom commands.
|
| I _don 't_ think I'd want to download a list of the most
| blocked sites and plug it into one of my tools though, for some
| of the reasons you mentioned.
| siva7 wrote:
| I think we are attacking the wrong angle here. This should be
| solved at google
| alangibson wrote:
| Are you sure it hasn't been (from their point of view)? I've
| got that old familiar feeling that there has been a change
| internally about what now constitutes a "quality" search
| result. Google has been moving toward optimizing for engagement
| over what we normally thing of as relevance for a while.
| rc_mob wrote:
| Google is a lost cause by now. The only solution is a
| competitor. DDG is just as bad since they source results from
| google so that is not the solution.
| nyolfen wrote:
| ddg sources from bing
| tjpnz wrote:
| DDG don't source any results from Google. They rely on Bing
| and increasingly their own bot.
| Kiro wrote:
| Their own bot is only used to crawl stuff for their
| widgets.
| new_guy wrote:
| > and increasingly their own bot
|
| Except they so obviously don't. They lie through their
| teeth and their good at marketing, that's it. They've zero
| technical ability.
| marginalia_nu wrote:
| Google has precious little incentive to filter out ad-ridden
| spam content, given that their entire business model is the
| very ads these sites are plastered in.
|
| In short, if you find what you are looking for, they get few ad
| impressions and make less money. Meanwhile, if you have to
| click through half a dozen spam results first, there are many
| ad impressions, and they make more money.
| KoftaBob wrote:
| I imagine that Google's incentive is to first prioritize
| results that have the most profitable Google Ads on their page,
| and then for quality of result, not the other way around.
|
| Until that incentive structure changes, Google will not be
| interested in solving this.
| gorbachev wrote:
| How long are you willing to wait?
| lkbm wrote:
| I coincidentally tweeted a reply[0] to your comment a couple
| minutes before seeing it:
|
| > If I end up having to solve a problem that's someone else's
| fault, yeah, sure, it's unfair.
|
| > But if I end up having to _live with_ a problem because you
| insist that we wait for the responsible party to fix it,
| _knowing_ that they never will, that 's more unfair.
|
| [0] https://twitter.com/lkbm/status/1478425875578302464
| fsflover wrote:
| This looks like a big, time-consuming project that would rely on
| a private Google API that can change any time. I think it's not
| worth to invest your effort into that. I wish more people would
| help to improve FLOSS, peer-to-peer search engine YaCy instead,
| https://yacy.net.
| upbeat_general wrote:
| I'm not sure why you think that a domain blocklist would be
| harder than custom search engine development.
|
| Plus there's no private Google API here, just an extension that
| removes search results from the page. I suppose you could say
| the extension APIs are from Google (Chromium) but they're
| certainly not private and are commonly used.
| fsflover wrote:
| Doesn't this extension depend on how exactly the ads are
| presented on the page? Can't this be changed by Google
| easily?
|
| > I'm not sure why you think that a domain blocklist would be
| harder than custom search engine development.
|
| I didn't say this. The custom search is already created.
| Helping it's development is much easier now. AFAIK it's main
| problem is the lack of hosted servers.
| TrueDuality wrote:
| This improves other search engines as well, not just the Google
| universe. I'm sure even an opensource, peer-to-peer search
| engine will have similar issues of content farm content and
| gamed pages if it becomes large enough to compare with search
| engines like DuckDuckGo.
|
| On the other hand it is absolutely ridiculous to conflate the
| difficulty of occasionally adding a domain to a local filter,
| and _assisting to build a random unproven search engine_.
| People volunteer their development effort for projects they
| personally find interesting or challenging. If you want more
| developers advocate for the project don 't try to scold people
| for wanting to spend a small amount of their time refining a
| solution that works for them.
| badrabbit wrote:
| How about support better search engines instead?
| snth wrote:
| Which are better in this regard?
| badrabbit wrote:
| Kagi imo, will be paid though.
| omnicognate wrote:
| And waitlist atm (which I'm on). Having to pay for a search
| engine would be a feature not a bug, IMO.
| o_m wrote:
| It is based on Google and Bing, so you will get the same
| spam results
| badrabbit wrote:
| Their selling point is that you won't. You will get a lot
| less results with it too. Liking it so far, can't say
| I've run into spam with it
| q1w2 wrote:
| It's usually better to support efforts that have a greater
| chance of succeeding, than supporting the ideal solution that
| is unlikely to happen.
|
| ...assuming "support" is a substantive action and thus a finite
| resource, and not just an upvote on a social media page.
| BuyMyBitcoins wrote:
| I am keenly interested in this idea.
|
| I sense that in the near future the paradigm of search engines
| will go from the current "index everything and become a universal
| answer engine" to "index a small subset of the Internet and
| become an answer engine honed towards a specific topic/domain".
| sanketpatrikar wrote:
| The repo exists here:
|
| https://github.com/sanketpatrikar/hn-search-blacklist/
| imglorp wrote:
| See comments in this thread for a number of lists in progress.
|
| https://news.ycombinator.com/item?id=29546433#29549855
| samuelfekete wrote:
| Blocking bad sites is just one side of the coin. You should also
| be able to promote sites that you are more interested in.
|
| This is the goal of Entfer (Show HN thread:
| https://news.ycombinator.com/item?id=29799867)
|
| Entfer will in the future also allow you to bulk export and
| import your personal rankings, so that they can be shared on
| GitHub, for example.
| alangibson wrote:
| A reply to those in this thread saying that Google should/will
| take care of this:
|
| Imagine you had a position in a huge market that was as close to
| unassailable as there has ever been. Imagine also that you have a
| controlling position over the mechanisms that allow people to
| participate in that market.
|
| Now try to make a case _against_ optimizing for squeezing every
| last cent out at the cost of the user experience.
|
| In 10 years we will regard Google the way we regard cable
| companies today. Maybe even worse since we need to be able to
| search for answers more than we ever needed cable TV.
| greenyoda wrote:
| The good news is that competition is much easier in the search
| market than in the cable TV market (since it's hard to run a
| new cable into everyone's house).
|
| If comparatively tiny operations like DuckDuckGo or Brave can
| launch decent search engines, I think there's hope of reducing
| Google's dominant position in the market.
| [deleted]
| runnerup wrote:
| I'd love to contribute. If my small contributions collectively
| have the potential to save many hours of others' time, it may end
| up the most impactful thing I do this year.
| sanketpatrikar wrote:
| I created this repo.
|
| https://github.com/sanketpatrikar/hn-search-blacklist
| zavkz wrote:
| How about a search engine that does this already?
| darekkay wrote:
| uBlock Origin supports blocking search results, so I don't
| require an additional browser extension. I maintain a blocklist
| for myself, targetting Google and DuckDuckGo [1]. Feel free to
| contribute more websites or use this list as a template for your
| own repository.
|
| [1] https://github.com/darekkay/config-
| files/blob/master/adblock...
| laurentlbm wrote:
| I also use uBlock, but with this list:
| https://github.com/stroobants-dev/ublock-origin-shitty-copie...
| dorianmariefr wrote:
| Blocking w3schools, I was not sure but I think you are right,
| MDN is just much better
| iio7 wrote:
| There is also https://www.mojeek.com/, I haven't tested it out in
| a while, so perhaps it has become better, but they should be
| striving to make it what Google used to be.
| calltrak wrote:
| Google gets worse day by day, the sad thing it has 90% of peoples
| mind share.
|
| Here is a great list of alternative search engines.
|
| Try them all, see which one yields better results for you.
|
| https://fabform.io/a/alternative-search-engines
|
| Please share the link on Twitter and Facebook.
|
| Lets bury Google search, every little link share helps.
|
| Give a helping hand to these search underdogs :)
| geuis wrote:
| Nah. Blocking isn't the answer. What we need is a better search
| index.
|
| I find it difficult to believe that relatively beginner NLP
| projects get posted here all the time, yet no one has adapted
| that stuff to create a new search index.
|
| Personally I don't know enough to really do this well, but I can
| tell just blocking sites from Google's results isn't the way.
| aaron695 wrote:
| decebalus1 wrote:
| How about we just let the free market decide and just go with
| another search engine instead of trying to fix the perception of
| a broken product which we don't own?
| jagged-chisel wrote:
| Sounds like we'd just need an HN uBlacklist subscription.
| Sourcing and validating submissions to the blacklist is the
| problematic bit. Perhaps use HN as an OAuth provider (not
| currently an available feature), use rules based on account age
| and karma for allowing or scoring submissions, voting system like
| HN ... sounds like something that might actually do better hosted
| on HN.
| tyingq wrote:
| >become worse lately thanks to all those StackOverflow and Github
| clones
|
| A google search showing some of these leech type sites:
|
| https://www.google.com/search?q=%22code+that+protects+users+...
|
| For me, "farath.com" is outranking stackoverflow.
| tut-urut-utut wrote:
| Just tried your search in both Google and Duck Duck Go. On
| Google first page spam copies are ~80% of the links, on DDG
| maybe 40%. Not good, but much better than Google.
| Siira wrote:
| > farath.com was first indexed by Google more than 10 years ago
|
| This seems pretty suspicious? Is it reporting the first time
| Google crawled the main domain farath.com? How is that relevant
| information?
| judge2020 wrote:
| This is the first time it crawled the domain at all. It's
| been a website since at least 2008[0], but was recently re-
| registered in 2020[1].
|
| 0: https://web.archive.org/web/20080607010730/http://www.fara
| th...
|
| 1: https://who.is/whois/farath.com
| endisneigh wrote:
| This is a great example of why "Google sucks!!11" is mainly
| FUD. Let's say you're looking for the SO link, which is #2 for
| Google. Let's compare:
|
| Google ("code that protects users from accidentally invoking
| the script when they didn't intend to")
|
| Link:
| https://www.google.com/search?q=%22code+that+protects+users+...
|
| SO - #2
|
| Bing ("code that protects users from accidentally invoking the
| script when they didn't intend to")
|
| Link:
| https://www.bing.com/search?q=%22code+that+protects+users+fr...
|
| SO - #2
|
| Brave Search
|
| Link:
| https://search.brave.com/search?q=%22code+that+protects+user...
|
| SO - Not on page
|
| You.com
|
| Link:
| https://you.com/search?q=%22code%20that%20protects%20users%2...
|
| SO - Doesn't load
|
| DuckDuckGo:
|
| Link:
| https://duckduckgo.com/?q=%22code+that+protects+users+from+a...
|
| SO - #2 (seems to depend on refresh)
|
| Basically they're all the same. Google is faster, but the order
| of the results is identical.
|
| If you did a large scale analysis in this manner I doubt Google
| would lose.
| tyingq wrote:
| I'm not sure it's a good example, really. It's an "exact
| phrase search" with quotes, which doesn't happen much in real
| life.
|
| It was helpful solely to show what some of these leech sites
| are.
|
| Searching for (without quotes): What does if __name__ ==
| "__main__": do?
|
| Is probably a better test of which search engine has better
| results for the real-life query. Google might still win, but
| it should do a better job of screening out the spammy sites.
| It used to be better at this.
| endisneigh wrote:
| All the search engines have seen massive increases in SEO
| and consequently, spam. Do you believe Google et al isn't
| working on it?
|
| It's obviously a very difficult problem.
| tyingq wrote:
| I believe they've become more complacent since there is
| no competitive pressure, yes.
| not2b wrote:
| I doubt it. If their results become noticeably worse than
| Bing (and therefore DDG) results, people will start to
| switch. But Bing is having the same problems with the
| crap sites.
|
| So they at least have to keep their quality even with the
| competition for the average person (HN readers are not
| the average person).
| tyingq wrote:
| That matches what I said, really. "Better than current-
| day Bing" is a much lower bar than they used to shoot
| for.
| endisneigh wrote:
| Bing has always been very comparable to Google.
| Especially in certain niches. For example bing was better
| for social in 2010:
|
| https://www.eweek.com/news/microsoft-bing-beat-google-in-
| soc...
| endisneigh wrote:
| I don't know, a lot of people have claimed this but no
| one has any proof.
| tyingq wrote:
| What would proof be? Over a period of time, they have
| clearly de-prioritized organic search results by pushing
| them down the page with ads, widgets, and so on. They
| used to engage more directly with content producers. Matt
| Cutts was often called the "Head of Web Spam". When he
| left, they dissolved his role and spread it around to
| several other people.
| endisneigh wrote:
| Your claim regarding deprioritized search results for
| example - where's the proof?
|
| If what you are saying is true it should be trivial to
| give sample queries so we can test across all search
| engines and see if it's true.
| tyingq wrote:
| "Deprioritized" means moved down the fold of the page.
| Like, for example, the 3-4 ads at the top that used to
| not be there. And various other widgets that used to not
| be there. Ads in SERPS started out in the right hand
| sidebar only. Such that organic results were at the top.
| They were slowly moved down over time by a slow rollout
| of more ads and widgets. There are many queries now with
| ZERO organic results above the page fold.
|
| None of this is in dispute, so I don't feel compelled to
| dredge up old screenshots or examples.
|
| Comparing the page layout to competitors isn't especially
| helpful when Google has 90%+ market share in search.
| Google is defining the standard for others.
| endisneigh wrote:
| Why don't you give any examples of these queries? There's
| no point in comparing to say, 2008 Google, there are many
| of orders of magnitudes more sites. You can't expect old
| algorithms to keep up. The main thing that matters is how
| long it takes to find what you're looking for.
|
| Can't say I really get your point. You're willing to
| complain and go on this long tirade, but not to give a
| single query example? The one you gave in your original
| post was already debunked easily enough. You're saying
| these things as if they are a fact - I disagree with you,
| it _is_ in dispute. If we 're going to just take random
| claims are facts, then I'll say Google's results are
| better than ever.
|
| Google's market share isn't really relevant. It's very
| easy to just use Bing or any other search engine. I
| actually use Bing half of the time since it offers
| rewards.
| tyingq wrote:
| I didn't give any query to be debunked. I gave an example
| of a query to see the copycat sites. It does that just
| fine. Re-read it with some benefit of the doubt maybe?
| Your rant seems to be based on the idea that my upthread
| post was something that it wasn't. I did note that one
| copycat site was ranking above SO, but that was
| secondary.
|
| As for a query with a shit ton of ads and widgets? Try
| vegas hotels, or anything else with lots of widgets and
| ads. The travel space has a lot of them.
| endisneigh wrote:
| All I'm saying is that claims should have proof
| tyingq wrote:
| Proof of what? That google didn't used to have ads above
| organic results? That they added more and more over time?
| That some queries (vegas hotels) have no organic results
| above the fold? Those are all common knowledge. Burden of
| proof of the opposite would be on you.
|
| I don't happen to have a historical screenshot history of
| SERP results specific to StackOverflow copycats, no. I'll
| concede that.
| endisneigh wrote:
| > Burden of proof of the opposite would be on you.
|
| What, lol. I'm not making any claims. You're the one
| saying Google is becoming complacent. I'm asking for
| proof of this, lol.
|
| >> I believe they've become more complacent since there
| is no competitive pressure, yes.
| tyingq wrote:
| [deleted]
| dang wrote:
| Please don't perpetuate flamewars on HN; especially not
| the tit-for-tat kind where two users butt heads and get
| their horns entangled.
|
| https://news.ycombinator.com/newsguidelines.html
| dang wrote:
| Would you please make your substantive points without
| degenerating into the flamewar style? You did it
| repeatedly in this thread, and worse each time. That's
| the opposite of the direction we want things to go here.
|
| https://news.ycombinator.com/newsguidelines.html
| Terry_Roll wrote:
| I have noticed that Google and Bing seem to present results
| which link to sites like stackoverflow.com where the
| questions and solutions are absolute FUD.
|
| I think someone or an entity has been engaged in a consertive
| effort to manipulate the results if its not something more
| nefarious in Google and Bing's domain.
|
| Very few entities have the resources to do this either, its
| not something a ragtag band of goat herders could, thats for
| sure!
| ahurmazda wrote:
| I tried you.com[1]. The first few results seem quite relevant.
| Best part is that you can actually personalize the weights to
| assign to your search (your very own bubble)
|
| https://you.com/search?q=code%20that%20protects%20users%20fr...
| tobyjsullivan wrote:
| This isn't the same search. The parent post had quotes around
| the phrase. You.com returns identical copy-cat results if you
| do the same search.
|
| To be fair, not sure what other results we'd expect if we're
| going to search for a specific, plagiarized phrase.
|
| Edit: actually, upon review, you.com does indeed give one
| extra useful result within the top three. So one point to
| gryffindor.
| tyingq wrote:
| >To be fair, not sure what other results we'd expect if
| we're going to search for a specific, plagiarized phrase.
|
| Yeah, I posted an exact search solely so people could see
| examples of the copycat sites. It's not a good example of a
| real life query in any way, and not useful for comparisons.
| It is interesting that Google puts one of the copycats
| above SO though.
| ffhhj wrote:
| I saw you.com displays some Code Complete snippets but the
| lines are too short and doesn't get the language
| highlighting, which make it harder to read. Nice try anyway.
| hooande wrote:
| The problem with this is illustrated in another comment where
| nixcraft's site, cyberciti.biz, was added to a personal block
| list. The content on the site does seem to be original and
| productive. I'd guess it was added based on the criteria of "I
| haven't heard of this site and the domain looks suspicious". I
| have a feeling that this will be true for other domains on this
| proposed master list. And the owners of those domains will have
| no recourse.
|
| Specifically blocking github clones seems doable. Adding anything
| else needs equally specific criteria or it will quickly become
| subjective and unfair.
| ZeroGravitas wrote:
| Isn't this Google's job? Are developers a small but lucrative
| target and so the suits at Google don't see the benefit of
| improving that experience by cleaning up the spam?
|
| Can we just nudge them to do so under the threat of an
| influential minority leaving due to their use case being
| affected?
| PragmaticPulp wrote:
| I think the disconnect comes from people expecting perfect
| search results as curated by humans, whereas Google necessarily
| must optimize for automated results. Automated results will
| never be perfect.
| nottorp wrote:
| > Isn't this Google's job?
|
| Have you searched anything on Google lately? The answer is
| "no". Their new job seems to be to stuff your results with
| anything even remotely related (and sometimes related in a way
| that only machine learning can see) so you have things to click
| on.
|
| Edit: with the lone exception of "find me this bussiness
| nearby".
| goodlinks wrote:
| If I search for a business name its normally not the first
| result any more. Usually an advert for another company in the
| same space I dont want to use (basically an offensive /
| scamming result from user perspective) and then also the
| standard "buy _search term_ on amazon ".
| mikevin wrote:
| It's very obvious Google is no longer the equivalent of
| grepping the web. There's some ML/NLP interpretation that's
| rewarded for returning the substring/interpretation that
| returns the most/highest ranking results.
|
| It's very noticeable if your search contains a short keyword
| that has to interpreted in the context of the other keywords.
| As an example, if I search for 'ARM assembly' plus another
| keyword (macro, syntax etc) it will see 'ARM assembly'
| without the extra keyword has way more high ranking results
| and happily show me how much it knows about armchairs that
| don't require assembly. Ignoring the fact that the extra
| keywords are there specifically to limit the search results.
|
| It's tiring, a lot of time I previously spent browsing the
| limited but valuable results it returned I now have to spend
| mangling the keywords enough to outsmart their ML/NLP
| interpretation and get it to admit I am actually asking for
| the thing I am asking for so I can finally get to the part
| where I have to solve the modern captcha: click all the
| results that are:
|
| 1. Not stolen/rehosted 2. Not a "Hello World" level Medium
| blog 3. Written by an actual human
| anigbrowl wrote:
| _Can we just nudge them to do so under the threat of an
| influential minority leaving_
|
| No. This is a classic mistake of intellectual types, who are
| impressed by each others' cogent arguments. But there is a much
| wider pool of people who are not, and among whom the
| intellectual types actual have very little influence, due to
| being boring and hard to understand (plus, it has to be said,
| kind of snobbish about how smart they are).
|
| Now, you might reason that Google is full of smart people who
| should care about cogent arguments. But that assumes as an
| unspoken premise that Google's internal goal is to maximize the
| quality of the service and profit from being The Best. They
| passed that goal years ago are now so awash in money that it's
| cheaper to just squash competition than to innovate. They can
| be moved by threats to advertising revenue got up by angry
| crowds on social media (a market when they have little direct
| power), but Google would probably be delighted if grumpy nerds
| wandered off somewhere else. If they need talent or access to
| some compelling technology they can just throw a pile of cash
| at the problem.
| omnicognate wrote:
| Are you paying them to do it?
| tjpnz wrote:
| >Isn't this Google's job?
|
| Or more fundamentally perhaps this is just the system working
| as Google intended?
| moneywoes wrote:
| No, google wants more clicks so they would prefer poor results
| that keep users searching
| BuyMyBitcoins wrote:
| >" Can we just nudge them to do so under the threat of an
| influential minority leaving due to their use case being
| affected?"
|
| I sense Google is too big to cater to us like this. Despite a
| steady decline in quality, Google is still the dominant search
| engine and the competition isn't even close to its market
| share. Not only would they not notice many of "us" leaving, the
| amount of change they would have to implement in order to
| satisfy our desires would end up changing the product for the
| rest of the market. On some level, the product managers must be
| satisfied with the metrics as they stand since Google is
| continuing with their current course.
| alangibson wrote:
| > cleaning up the spam
|
| I'll be happy to be proven wrong, but I think Google is now
| fully in the 'optimize for engagement' camp. If that's what
| they're doing, it's by definition not spam (from their point of
| view) if people are clicking on it more than the non-spam
| results.
|
| Again, only my guess as to what's going on. I don't see another
| good explanation for them only serving cloned Stackoverflow and
| top X lists for basically everything now.
| onionisafruit wrote:
| From a user point of view, a search engine's job is to link
| you away from the search engine, so how does a search engine
| measure engagement? Is it time on page or maybe number of
| searches performed? When you don't have viable competitors
| both of those are improved by worse search results. Even
| number of ads clicked would be improved with worse search
| results because ads don't have as much competition for your
| attention when there are no relevant results on the page.
| alangibson wrote:
| > From a user point of view, a search engine's job is to
| link you away from the search engine
|
| That hasn't been true for a long while.
|
| Users are there for an answer to their query, not to be
| directed anywhere. Frequently this comes in the form of
| search vertical response that doesn't lead the user
| anywhere.
|
| From Google's point of view, you want the user where you
| can show them ads, full stop. Google does not exist to
| provide you with answers to your queries, it exists to make
| money for shareholders.
|
| > When you don't have viable competitors both of those are
| improved by worse search results.
|
| I don't think "worse search results" is the way to think of
| it. From G's point of view they're better because they make
| them more money.
| MarcelOlsz wrote:
| Worst case scenario if Google drops the ball I just go back to
| the library.
| jstx1 wrote:
| I'm sure they have great books on stackoverflow answers,
| reddit reviews of products and opening times of local stores.
| akho wrote:
| Luckily, stackoverflow, reddit, and your favorite mapping
| thing have their own searches.
| MarcelOlsz wrote:
| Hey it might take me 30 years to find the answer but at
| least my eyes will thank me!
| sanketpatrikar wrote:
| It is Google's job, but they either aren't doing it or are
| failing at it. We could do something about it at least until a
| better alternative or a solution appears.
|
| > Can we just nudge them to do so under the threat of an
| influential minority leaving due to their use case being
| affected?
|
| Many influential people have tried and nothing seems to have
| transpired from it.
|
| Google.com is the most popular website. I don't think the
| leaving of any minority group we manage to create would even
| matter to Google, let alone force them to fix the issue. Not
| that I discourage using alternatives.
| asdfasgasdgasdg wrote:
| Is there a word for this tendency to say, "it's someone else's
| job" as a justification for doing nothing at all to help or
| improve one's own circumstances? I see more and more of it in
| the public discourse over the last years and it kind of bothers
| me. I see it a lot in conversations related to poverty or
| climate change, but it is as we see here by no means exclusive
| to those topics.
|
| To the original replyer: you could wait for Google to do
| something, but if they were going to fix the listicle issue,
| and it were fixable on their end, they'd probably have done it
| by now. I'm disappointed in the situation too but if there is a
| workable solution on our end it would be silly to ignore it
| because fixing the problem is someone else's job.
|
| To the OP: I worry that the number of domains pumping out crap
| might be far greater than we know, and that might hamper the
| effectiveness of this. If the collaborative block list ever got
| big enough you might also have to deal with spam. But I think
| it would be a great thing to try. This is one of those issues
| that annoys me, but it's just below my action potential
| threshold. My biggest objection right now is the spammy recipe
| websites.
| germandiago wrote:
| > Is there a word for this tendency to say, "it's someone
| else's job"
|
| At the risk of sounding pretentious, I call it "socialist",
| since they spend their lives telling others what to do or
| what is good or not for the rest of us but they rarely do
| anything about it. Surprisingly, this is the group that is
| really worried about poverty and climate change and do as
| much as I do for it, with the difference that I do it by
| myself, the few times I do it, not requiring the rest to do
| it.
|
| It is always someone else who will do it. Though the other
| day I had a conversation with a non-socialist person that had
| that same attitude ("other should do it") towards what OTHERS
| should do. I really dislike that attitude, no matter where it
| comes from.
|
| Point at hand: when I want or promote something, I am the
| first one to do it no matter others do it or not. The rest,
| no matter the ideology, all b*llshit.
|
| As imperfect as I am, I try to do what I think is good (and
| sometimes my imperfection prevents me from doing it) but I do
| not spend my life telling other people why they are worse
| than me and telling them what they should do or not. The most
| I have for someone is good suggestions, never requirements.
| anigbrowl wrote:
| I think you should read some more socialist authors.
| Socialists are very into doing things but would also like
| to have access to the resources that facilitate getting
| things done, rather than being forced into the position of
| constantly petitioning bureaucrats. But socialists who run
| for office on a platform of opening up that bureaucracy to
| the public are often denounced as undermining the
| foundations of society, prosperity etc.
| burnished wrote:
| Oh wow, an uncritical hot-take refers to some one they
| disagree with as being a 'socialist', you're so
| revolutionary. I wonder where you could have gotten the
| idea that all of your problems are the fault of the
| 'socialists'.
| germandiago wrote:
| Oh wow, where did you take from I said all my problems
| come from socialists?
|
| I described an attitude I dislike and I often find, like
| it or not, at that side of the spectrum: we have to
| eliminate poverty, stop the climate change and do all
| good things in the world. At the same time I write from
| my big luxury Iphones, my big house (as I promote
| equality) and, of course, tell the rich to pay the bill
| as if their money belonged to everyone else. Mine no, I
| am no rich... I am socialist to point to who should do
| it, not to do it.
|
| I have never, ever seen a person with a capitalist
| mindset pointing to the wealth of others or saying that
| everyone should have the same be equal for the sake of
| being and they are quite more restrained about telling
| all the others what to do. Also, they are often quite
| more frugal people.
| profunctor wrote:
| This has nothing to do with socialism. Since it is a
| problem under capitalism why don't we call it "capitalist".
| Or maybe we could just say lazy.
| germandiago wrote:
| The irony here about socialism for me is that I find
| plenty of people that would fall in those ideals (not
| only in those, but most of the time) that tell you what
| is good or bad for everyone else but they seldom do what
| they say. That is why I said "everyone else's job" is an
| approximation of "socialism" in my view.
|
| I cannot see how that would be a vice of capitalism since
| capitalism is basically non-intervention and freedom.
| mthoms wrote:
| >That is why I said "everyone else's job" is an
| approximation of "socialism" in my view.
|
| I don't think socialism is the word you're looking for.
| If you're trying to say that the described behaviour
| ("it's everyone else's job") is common in, or a byproduct
| of, a socialist system then it would make more sense
| (though it could be argued).
|
| But that doesn't make it a correct "approximation" of
| socialism. Socialism is:
|
| >a political and economic theory of social organization
| which advocates that the means of production,
| distribution, and exchange should be owned or regulated
| by the community as a whole.
|
| https://www.lexico.com/definition/socialism
| ZeroGravitas wrote:
| I like to think of it as solving the problem in the right
| place.
|
| It's often possible to work around issues in lower layers,
| but it's usually at least worth raising it upstream to get it
| fixed 'properly'.
|
| It'll help me when I dont have a blocklist active, and it'll
| help new programmers who arent familiar. It'll reward good
| sites with extra traffic and discourage new spammers entering
| the market.
|
| In the worst case, if Google really can't or won't address
| tge issue, understanding the upstream problem more fully can
| help make a better workaround.
| renewiltord wrote:
| Internal v External Loci of Control perhaps?
| bryguy32403 wrote:
| Exactly this. I've seen it around on this site a lot
| lately.
| jazzyjackson wrote:
| https://en.wikipedia.org/wiki/Bystander_effect
| geofft wrote:
| I think the question was specifically about declining to
| improve _one 's own circumstances_ because someone else
| could, not declining to help someone else. That's different
| from the bystander effect as usually conceived. The
| theoretical bystander effect is something like "Someone is
| being attacked, I could call 911 but someone else will;"
| this is perhaps something like "I am being attacked, I
| could scream for help but other people will notice anyway"
| - but really more like "I can't cross the street in front
| of my house because the storm drains by the crosswalk are
| clogged, I could rake them but the city is supposed to do
| that."
| jazzyjackson wrote:
| Fair enough, maybe Learned Helplessness is a better fit.
| nitrogen wrote:
| I feel like the iterated prisoner's dilemma is part of it
| too -- if you let someone off the hook repeatedly for
| defecting, you just enable them to continue defecting.
|
| Or somewhat related, the "if you touch it you own it"
| problem.
|
| So to overcome this learned helplessness effect, we'd
| need a good strategy to prevent the deterioration of duty
| of whoever is "supposed" to be fixing a problem, and/or a
| way to cut out the derelicts entirely.
| anigbrowl wrote:
| You will also appreciate this:
| https://www.science.org/doi/10.1126/sciadv.1600451
| anigbrowl wrote:
| Skip the abstract and head straight to the caption for
| figure 3, then go back and read the whole thing.
|
| https://www.science.org/doi/10.1126/sciadv.1600451
| sanketpatrikar wrote:
| I suggest this because there can only be so many websites
| that use SEO to game their way to the top and bury the good
| results beneath them.
|
| If we manage to block them, we might be able to get a results
| page with good sites upfront and the other meaningless
| content below it. I assume Google will also surface good
| content along with the bad, so our blacklist might enable the
| good stuff to reach the top.
|
| The spam problem, I'm sure of yet, but we might either be
| able to block enough of it to be satisfied or it won't pose a
| problem for most searches that are currently giving bad
| results.
| andyjohnson0 wrote:
| > I worry that the number of domains pumping out crap might
| be far greater than we know, and that might hamper the
| effectiveness of this.
|
| I'm sure you're right about the number of spam domains, but
| Pareto suggests that blocking even a small percentage of them
| might provide a large gain.
|
| https://en.wikipedia.org/wiki/Pareto_principle
| asdfasgasdgasdg wrote:
| Thanks, that's a good thing to remember. The other concern
| is installing ublocklist -- I'm scared to give more
| extensions access to google.com. I wonder if I could fork
| it and restrict its permissions to the SERP.
| sanketpatrikar wrote:
| I ended up creating a repo with blacklist.txt myself and will
| add to it for my own usage. I don't see anyone else who'd
| maintain this. Feel free to use it / contribute to it.
|
| https://github.com/sanketpatrikar/hn-search-blacklist
| tomhallett wrote:
| I just added to your repo:
|
| * code-search-blacklist based on this jhchabran's repo [1]
|
| * pinterest.com based on SwiftOnSecurity's interesting seo
| analysis [2]
|
| Maybe your repo could be an opinionated list of things
| developers find annoying about google search results, where
| others might value sites like pinterest in their results.
|
| [1]: https://github.com/jhchabran/code-search-blacklist
|
| [2]: https://twitter.com/SwiftOnSecurity/status/12588753334
| 467174...
| tut-urut-utut wrote:
| Instead of spending energy to change Google, why not just leave
| them for good?
|
| Start with changing default search engine to DuckDuckGo or
| something else, install uBlockOrigin and Privacy Badger to
| disable tracking, and gradualy reduce using every Google or
| application, starting with Chrome.
|
| Be the change you want to see.
| sanketpatrikar wrote:
| I relate to this opinion. There are two reasons why my
| suggestion might still be useful:
|
| 1. DuckDuckGo too is affected by these SEO-gaming sites, so
| maintaining a blacklist will help us make that experience
| better too.
|
| 2. There are times when only Google can find us what we're
| looking for, so this will prove useful when we go back to it.
| ziggus wrote:
| I think you're wildly overestimating the influence of the
| minority within HN (or other similar communities) that actually
| care enough to switch to another search engine.
|
| This reminds me of the Linux gamers who claim that they can
| influence game development companies by purchasing games with
| Linux ports, but wind up being less than 0.5% of sales of most
| games with Linux ports, which leads manufacturers to ignore
| that customer base almost completely.
| fileeditview wrote:
| Not disagreeing with you.. big companies truly mostly ignore
| Linux but there are more than a few indie devs who support
| Linux as a platform. And I tend to play only indie games
| these days anyways because all the big commercial games have
| been reduced to some kind of click-and-succeed or free-to-
| play-and-milk-some-whales crap.
|
| I personally am kinda happy where Linux gaming has come to
| be. Sure it could always be better but I remember times where
| there were only like 3 games for Linux and you had to compile
| them yourself..
| yellowsir wrote:
| game companies might have ignored us, but in the end it
| created a space for valve and codeweavers to fill.
| beepbooptheory wrote:
| Google has a fiduciary responsibility to shareholders, which is
| so much work as it is! Why are you trying to ask them to do
| more?
| ineedasername wrote:
| Google's goal isn't to create the best possible search engine,
| it's to have a search engine that is good enough that people
| won't actively seek an alternative at the same time that they
| put as much ad content as possible in there, again before it's
| so much that people seek an alternative.
|
| I doubt many advertisers like the status quo very much either.
| They basically have to pay for ad placement to ensure the first
| results for their product aren't ads for _competing_ products.
| On mobile when I search for Boox the first result linking to
| them is an ad. Same for Kobo. In other instances I 'll search
| for company or product and a competitor ad is the first to
| show. So vendors get stuck paying for ads when their own site
| should probably be the first organic result, above the ads.
| reaperducer wrote:
| _Are developers a small but lucrative target and so the suits
| at Google don 't see the benefit of improving that experience
| by cleaning up the spam?_
|
| Google doesn't make money from people finding what they're
| searching for. Google makes money by keeping people searching.
___________________________________________________________________
(page generated 2022-01-04 23:01 UTC)