[HN Gopher] Google Broke Image Search for Creative Commons
___________________________________________________________________
Google Broke Image Search for Creative Commons
Author : colinprince
Score : 333 points
Date : 2022-09-28 12:16 UTC (10 hours ago)
(HTM) web link (cogdogblog.com)
(TXT) w3m dump (cogdogblog.com)
| [deleted]
| cloutchaser wrote:
| I've noticed recently almost all broad searches on image search
| return watermarked stock photos.
|
| It's terrible. The entire stock photo industry is so bad at
| creativity you can basically instantly tell if a photo is a stock
| photo, making anyone using them look like a complete fool.
|
| Anyway, they either have some serious legal issues with image
| search or they are becoming precautionary, but it's becoming
| almost impossible to find decent images.
| bityard wrote:
| > you can basically instantly tell if a photo is a stock photo,
| making anyone using them look like a complete fool
|
| I find this to be a highly interesting take...
|
| The whole point of stock photography is that you can buy images
| to use various projects, quite often one-off or short-term
| ones. Because for the project, taking your own photos or paying
| a photographer to take them would be too expensive or time-
| consuming relative to the estimated value and return of the
| project.
|
| I feel like once someone has made the decision to buy or use a
| stock photo, they have already decided where they want to stand
| on the scale of authenticity and originality. Deliberately
| seeking out "stock photos that don't look like stock photos"
| just sounds too much like trying to be something you're not.
| Spooky23 wrote:
| After they were sued a few years ago, Google seems to have
| neglected this product. Bing is way better.
| londons_explore wrote:
| It's a common pattern... No good engineer wants to work on a
| product with their feet stuck in legal quicksand. So all the
| good engineers leave, and the product stagnates with no
| direction.
|
| Even if the lawsuit is won, the product is still doomed.
| CamelCaseName wrote:
| With any luck DALL-E will undo this.
|
| What if one day every Google search has an "auto-generated"
| panel of images as well?
| sdflhasjd wrote:
| IMO, DALL-E suffers from the same problem because it's been
| trained on the very same boring stock photos.
| Kye wrote:
| >> _" The entire stock photo industry is so bad at creativity
| you can basically instantly tell if a photo is a stock photo,
| making anyone using them look like a complete fool."_
|
| Only the bad, outdated stock photos. There is still a whole
| market for the "obviously corporate corporate website"
| corporate website, but that's falling out of fashion.
|
| What you're thinking of are the bland, white backgrounded
| photos and photos shot in stale, generic office settings so
| they could be worked into any design. That's not really how
| it's done anymore. Modern stuff doesn't look posed and staged,
| and often isn't.
| dylan604 wrote:
| In stock photography, if there are people in it, they can
| only make money on that image if there are signed model
| releases. If you have signed model releases, it is staged and
| posed.
|
| Sure, someone could take a candid image and then after the
| fact attempt to gain releases. However, that's not workflow
| with a high margin of success. At that point, the "model" has
| all of the power. Also, crowd shots in public streets blah
| blah.
| varispeed wrote:
| I stopped using Image Search like a year ago, simply because it
| is useless.
|
| You cannot find anything.
|
| Personal anecdote - I was at a specialist doctor to discuss the
| results of my tests. I was shocked when he typed a health issue
| to Google Image search, clicked on (what seemed to me) a random
| table matching his search terms and compared it with my results.
| He then said that everything is fine, the parameters are normal.
|
| That got me scared, what if someone intentionally put a doctored
| table and used SEO to promote it to the top to mislead doctors
| and causing bad outcomes to patients?
| gernb wrote:
| 134k results on flickr
|
| https://flickr.com/search/?text=dog&license=2%2C3%2C4%2C5%2C...
| projproj wrote:
| I made canweimage.com. It can't replace all the features of
| Google, but it can fit the bill if you just need a basic search
| of Creative Commons.
| an1sotropy wrote:
| Thanks! But just to be clear, it isn't somehow searching all
| things under a CC license, but just things within wikimedia
| right?
| input_sh wrote:
| Last time I checked, Creative Commons had their own search
| for things under their licenses. But now apparently that
| project got transferred to WordPress and is now named
| Openverse: https://wordpress.org/openverse/?referrer=creative
| commons.or...
|
| Anyways, I'd argue that's the most comprehensive database of
| CC-licensed works.
| an1sotropy wrote:
| Thanks for this.
|
| Also, I just checked that flickr.com still allows you to
| filter their searches by license.
| freediver wrote:
| This looks great. Care to share the API call used for searhing?
| projproj wrote:
| https://commons.wikimedia.org/w/api.php?action=query&generat.
| .. search term here>&format=json
|
| (see
| https://en.wikipedia.org/w/api.php?action=help&modules=query
| for reference)
|
| edit: formatting
| freediver wrote:
| Amazing!
|
| I notice a slight difference between
|
| https://commons.wikimedia.org/w/api.php?action=query&genera
| t...
|
| and searching for 'cat' on canweimage.com
|
| Is there some query processing needed?
| projproj wrote:
| I use these additional parameters:
| `&gsrlimit=25&prop=imageinfo|pageimages&iiprop=url|size`.
| I think it just changes how much and what type of data is
| returned, but maybe that could be the difference?
| eli wrote:
| Unfortunately you can't really be sure that an image with
| metadata claiming it is CC licensed is actually CC licensed or
| that the website offering it has the permission of the author. I
| have been burned by this.
| ficiek wrote:
| You can say the same about anything. I am guessing you don't
| use anything that is open source in any capacity and don't buy
| any proprietary software and libraries in case they are lying
| and they can't actually distribute them?
| ghaff wrote:
| True. Although an additional wrinkle with Creative Commons is
| that, depending upon how conservative you want to be and how
| the copyright owner interprets terms like non-commercial and
| what constitutes appropriate attribution, there are all sorts
| of variations that may or may not be suitable for a given
| use.
|
| Of course, for many casual purposes it's widely ignored and
| for photos of people used for advertising and marketing, you
| need a model release anyway.
| gernb wrote:
| To add what others are saying. I CC-BY all my photos but just
| because the photo is CC-BY doesn't mean it's safe to use. I
| don't know all the other "rights" but for example a photo I
| took of Mickey Mouse, or a movie poster, or a photo I took of
| some art in a museum may have additional rights issues. Even
| pictures of buildings
|
| https://helpx.adobe.com/stock/contributor/help/known-
| image-r...
|
| Note: I get that adobe might be wrong here. Whether they are
| right or wrong on particulars is beside the point. I just
| linked there because it was clearer than most other search
| results I found.
|
| Here's another
|
| https://mymodernmet.com/eiffel-tower-copyright-law/
| eli wrote:
| Sure. But this actually happened. I've been twice bitten by
| using images that claimed to be CC and then an apparent
| copyright owner appeared and said otherwise. I've never had
| that happen with open source software.
|
| I think copyright trolling is more prevalent with images, and
| I think it's generally easier to determine the canonical
| origin of software. But yes, it's absolutely a risk and a
| reason why many companies have a legal review process before
| any new libraries can be used.
| celestialcheese wrote:
| 100% this. I run a decent sized publisher, and have to make
| sure images are licensed properly, and even with proper
| training, we still get the robo-lawyers shakedowns at least
| once a quarter. It's between $400-$1000 per "settlement", so
| still less than Getty licensing costs. Cost of doing business
| :shrug:
| eli wrote:
| I can't prove it, but my theory for one of the images I got
| bitten for is that either the photographer or a coconspirator
| posted the photo to WikiCommons as CC licensed, then later
| the photographer sends a takedown saying it was posted by an
| imposter and isn't authorized. It's deleted from Wiki but
| then they get to hunt down everyone who copied that image and
| send them a bill of $1000. Quite a scam.
| WorldMaker wrote:
| There was an article that trended a few months back too
| about CC "Attribution trolls". (I can't find it in a quick
| search, sorry, but I can paraphrase.) There's a legal "bug"
| in the Attribution clauses of early CC licenses that
| basically says that the copyright owner gets to dictate how
| the Attribution must read down to detailed specifics in
| wording and formatting. They post to WikiCommons as CC
| licensed under specific old versions of CC and rely on the
| fact that most people don't copy and paste the attribution
| strings verbatim to troll for licensing fees.
|
| (So, watch out for CC licenses older than 4.0 for that.)
| gwbas1c wrote:
| One thing with any query parameter API like this is that there's
| no guaranteed signal when the API has breaking changes.
|
| I'm going to assume that there are hundreds or thousands of
| products, tools, hobby projects, ect, that direct to Google
| searches; none of which have any mechanism to know and break
| gracefully when the API changes. Furthermore, Google is under no
| obligation to coordinate with anyone who just arbitrarily send
| queries their way. (I've had a few hobby projects use Google
| Queries.)
|
| Seems like the most we could really ask is to put some kind of
| version stamp into the query parameter; and Google could
| optionally support old parameters or simply return an error.
| Otherwise, we have to accept that sending browsers to other pages
| via query parameters is inherently fragile and has a high
| probability of breaking at any time.
| Kalanos wrote:
| did not know this was a feature and i use the advanced filters a
| lot
| cf141q5325 wrote:
| Its not just creative commons images. In case you havent noticed,
| those billions of search results have max out between page 20 and
| 40 for a while now.
| tssva wrote:
| I'm a little confused by part of this article. The author states
| part of the evidence they used for deciding something was wrong
| is that a search not restricted to Creative Commons licensed
| images included many images with open source licenses and some in
| the public domain. If you restrict your search to Creative
| Commons licensed images why would you expect images under other
| open source licenses or in the public domain to be returned?
| bgro wrote:
| I've been somewhat thoroughly testing Google text/image search by
| coincidence. I'll share my experience about some oddities here.
|
| I do somewhat unusual things, like search for parts of a joke or
| a string of semi-randomly generated words for part of an AI type
| thing I'm working on creating to see if these are either unique
| or original and how they may appear in the context of the
| internet. Often times, if anything, there's something like a
| banned twitter (bot?) account only available through a cached
| backup that said it once in some bizarre context.
|
| I've noticed it significantly can change search results depending
| on if you're logged in, or the country you're searching from (via
| a VPN). Different countries have different levels of success for
| different types of searches, but I don't have any sort of solid
| guide to map this out in any shareable way.
|
| Boot up a virtual machine on a VPN if you're curious to take a
| look yourself. You may need to manipulate it so you don't bring
| in any suspicious cookies or other identifying information to
| show your actual country. Some VPN IPs are well known, and your
| results may be manipulated anyway.
|
| Some results literally will never show up if you're searching
| from the USA. If you switch to Hungary for example, suddenly
| things could start appearing. Even if the matching result is a
| Chinese site that should have relatively equal relevance to both
| countries.
|
| Sometimes I use Bing. It's not exactly better, but it's also not
| really worse. It's just different. In my non-scientific opinion
| after seeing so many of these differences, it's because it feels
| like they just forgot to enable (or haven't yet gotten to
| enabling) the kind of filtering Google has.
|
| If something is no longer showing up in Google search, using Bing
| feels like going back 1 relative year in time before Google
| nerfed your active search. Sometimes I suspect Google is breaking
| down searches to parsable keywords and then sometimes adding
| those to a blacklist.
|
| DuckDuckGo seems to sometimes filter things too, and some of the
| other commonly recommended alternative search engines. I don't
| know if they're actively doing this, or if it's a byproduct of
| forking off some other engine. I don't have much information here
| because I've largely given up bothering with these.
|
| There's some other large engines not widely discussed in the US I
| am currently looking at as well. I don't have enough experience
| to form a solid opinion yet, and I have suspicion of their
| privacy so I don't want to be loosely associated with
| recommending it until I know more.
|
| Miscellaneous other thoughts around this topic:
|
| - Google has been heavily pushing some results more than others
| obviously. Pinterest and Quora are always at the top of searches
| now. I think this is pretty common knowledge.
|
| - Chrome has a right click -> Search with Google Lens button now.
| Are they working on AI object detection of images more so than a
| visual match now? Could this factor into image matches?
|
| - TinEye - When looking into this question myself, a lot of
| people recommend TinEye. I've literally never had TinEye actually
| match a picture by the way. Am I using it wrong?
| noidiocyallowed wrote:
| Google Lens is just terrible. Really terrible. I don't know
| what the hell they are doing there :D.
|
| Tineye is also useless. They have a very outdated library or
| not crawling spaces they should be crawling.
| therealmarv wrote:
| Google Image Search is deteriorating in certain areas for years.
| Namely:
|
| * Reverse Image Search (sometimes no matches although image is
| for sure out there). I wonder if Reverse Image search sometimes
| broken because of copyright?
|
| * NSFW images (I'm really old enough and I don't need to be
| protected by Google or anyone else, I think it's a kind of
| censorship)
|
| * And now also Creative Commons as pointed out by Op
|
| The best alternative (tested them all) is in my opinion
|
| https://yandex.com/images/
|
| Chrome extension for reverse image search supporting Yandex
|
| https://chrome.google.com/webstore/detail/fast-image-researc...
|
| Android app supporting Yandex for Reverse Image Search
|
| https://play.google.com/store/apps/details?id=com.thinkfree....
|
| P.S.: I know Yandex is based in Russia, I only use it for
| specific image searches and I'm happy we have a good alternative
| based outside of USA... I wished there were more in the World.
| neither_color wrote:
| For image search Bing is actually decent, and doesn't serve you
| webp.
| therealmarv wrote:
| Bing is second best after Yandex and better than Google
| sometimes. But Yandex still outperforms in my subjective view
| especially on Reverse Image search.
| Krasnol wrote:
| https://addons.mozilla.org/de/firefox/addon/reveye-ris/
|
| This one is for Firefox with Google, TinyEye, Bing and Yandex.
| jeffbee wrote:
| Hrmm, I tried reverse-searching for the last photo I took - of
| the USNS John Glenn at the port of Oakland - and both Yandex
| and Google return similar results, but the Google result
| returns instantly and Yandex took nearly a minute. What's an
| example of a search where Yandex does much better?
| visarga wrote:
| Yandex is the best image search. It even has something better
| than duplicate search - I believe they do embedding based
| similarity. It works like lexica.art, a recent diffusion image
| search engine.
| Wazako wrote:
| It's amazing how they have destroyed the reverse image search
| for the last 3/4 years. It's since the switch to ML and the
| identification of keywords, they don't seem to have image hash
| search anymore.
| jrochkind1 wrote:
| Does anyone know how Google Image Search (or Bing Image
| Search[1]) identify images as creative commons or public domain
| in the first place?
|
| If I'm a content provider that wants to maximize the chances that
| an image search will flag my images as CC or public domain, what
| should I do? Are there open graph or other meta tags? Or what?
|
| [1]:
| https://www.bing.com/images/search?q=dogs&qft=+filterui:lice...
| visarga wrote:
| Probably they have a license classifier trained on thousands of
| pages that have been manually checked.
| jeffwask wrote:
| Google's ad delivery platform is become a less and less reliable
| search engine as it's sideline.
|
| Support alternate search engines, we used to have a bunch of
| viable options. Let's get back to that.
| cainxinth wrote:
| I read comments and articles all the time about the quality of
| Google's search dropping. I haven't noticed it much in practice,
| but I'm persistent, use copious ad blocking, and my Google-fu is
| strong, so I usually find what I'm looking for.
|
| Image search is another story. I've been less and less satisfied
| with Google Image search results, and visual search has been
| totally neutered. It only returns low res results that rarely
| match the original as well as it used to. I used to be able to
| plug in a 400x400 image and find a dozen copies of it at a usable
| size. No more. Too many copyright complaints, I assume. I've
| started using Bing for image and visual search now. It's not as
| good as old Google Images, but marginally better in some cases
| than the current iteration.
| jackdh wrote:
| If you're interested in a comparison, try Yandex image search.
| Someone here mentioned it a while ago and I now use it as my go
| to.
| verisimi wrote:
| Yandex is good. I use presearch, which searches
| independently, and provides links to other search engines on
| the side.
|
| https://presearch.com/
| [deleted]
| jrochkind1 wrote:
| I hadn't realized that Bing image search still lets you link
| directly to the Image URL, not just the page it's hosted on.
| Something Google stopped doing a few years ago in response to
| content platform complaints.
|
| OK, I'm definitely switching to Bing Image Search for images.
|
| Per OP... they _do_ seem to have a public domain /CC search
| limit feature too. I don't know how well it works. (I'm not
| sure how either it or Google identify CC/public domain content.
| Is there an opengraph tag?)
|
| https://www.bing.com/images/search?q=dogs&qft=+filterui:lice...
| kwhitefoot wrote:
| Try https://tineye.com/
| chordalkeyboard wrote:
| https://youtu.be/bWbytHBp0zI
|
| Google search is _severely_ broken.
| xaedes wrote:
| I was feeling this as well, but didn't pay attention _how_
| bad it is. I mean it really is... Is there ANY search engine
| left that returns more than 500 results for any search? Are
| there any community driven search engines? I mean at this
| point it can't be hard to build a better alternative...
| Melatonic wrote:
| Kagi is great so far!
| ars wrote:
| I've definitely noticed worse results from Google search.
|
| I just get page after page after page of "content" that appears
| to be either GPT written or written by somebody who has no idea
| about the topic.
|
| They all seem to follow a pattern, they have a table of
| contents, and they take sentences from real sites regurgitate
| them and put them together into semi-random paragraphs.
|
| If you know nothing about the topic it appears on the surface
| to be legitimate. And I bet to any quality engineers it all
| seems totally legitimate, because they're not experts in these
| fields.
| nabakin wrote:
| Try Yandex. I've found it to work better when trying to find a
| different, higher res version of certain image and Google to
| work better when trying to find a related image.
| ALittleLight wrote:
| Just a couple days ago I was searching for a funny video I had
| seen. I tried searching on YouTube and Google - typing in
| descriptions, what I remembered of the title, the excerpts of
| dialog I could recall, what was going on in the video -
| couldn't find anything even close to it. Searched on TikTok and
| got it in the first result.
|
| Google search quality is in severe decline in my opinion. I
| have many experiences where I am searching for stuff that I
| know exists and that I know Google of yesteryear would have
| found, and Google comes back with garbage results and spam.
| Personally, I am hopeful that this means a Google-killer will
| be coming along soon.
| idatum wrote:
| > my Google-fu is strong
|
| Initially read this as, well, "FU Google", and thought "yep, FU
| to them too". I guess I will acknowledge I have a bias. Then
| curious about this term, I googled-on-bing "FU Google". Top
| result was Google-fu, not the expletive.
|
| I have no Google-fu.
| logicchains wrote:
| Yandex image search is fairly decent.
| juujian wrote:
| I just noticed more and more copy and pasted content from
| stackoverflow intruding on my searches. Word for word the same
| content. There are so many people out there just creating blogs
| and stealing content and SEOing their way into traffic, it's
| not even funny anymore.
| ghaff wrote:
| And they probably don't even make any meaningful amount of
| money off it. But given enough scammers trying this sort of
| thing at least for a while and the web is polluted even if a
| given individual has already moved onto their next scam.
| MereInterest wrote:
| And because something might be answered either on
| stackoverflow or one of the many stackexchange spinoffs,
| restricting your search to either will remove results from
| the other.
| from wrote:
| On some of them I've noticed the text will overall have the
| same idea but a lot of the words will be different. I think
| some of these sites will translate it to Chinese and then
| translate it back to English to give it a lower similarity
| score. It is truly amazing the amount of effort some people
| will expend to avoid adding anything useful to the world.
| MereInterest wrote:
| Not only to avoid adding anything useful, but to actively
| make it worse.
| 3pt14159 wrote:
| I've stopped being able to reliably find an animated gif and
| copy and paste it.
|
| One used to be able to just right click on an image and copy
| it. Now, I only get a still. I try clicking into the website
| that hosts the image and it's click, click, click just to get
| anywhere close to the size I want and often times its not even
| an animated gif _anyway_ because they do some sort of media
| query and serve me up an uncopyable movie instead.
|
| The web sucks now. People work around it with bots and the like
| on Reddit, but I feel like the economics have been figured out
| and it's not fun anymore.
|
| Instead, we use /giphy in our Slack and hope the algorithm
| finds something that kinda-sorta was what we were thinking.
| munk-a wrote:
| With /giphy I think half the fun is when it falls flat on its
| face. The original command is always visible and most people
| just read that for the sentiment and then enjoy when the gif
| it found misses the mark by a mile.
| artificial wrote:
| I use Kagi frequently, and just for giggles give Yandex image
| search a try, you can actually specify dimensions (like you
| used to on Google).
| NAG3LT wrote:
| Another recommendation to use Yandex Image search as an
| option. They search differently and also try to find
| similar images.
| redeeman wrote:
| i see putin has his troll army everywhere!!!!
|
| (yes, im kidding)
| kingrazor wrote:
| I made my career using google search and it has changed pretty
| significantly from the early 2010s. There was a point where it
| would seemingly scour the entire web for whatever string of
| text I entered, but now it tries so hard to give me what it
| thinks I'm looking for that I very often get results that are
| completely irrelevant. I'm still usually able to find what I'm
| looking for eventually, but it takes a lot more work on my
| part.
| TrinaryWorksToo wrote:
| It's why I used DDG first, because it does this less in my
| opinion.
| tomcam wrote:
| Wait how did you make your career using Google search?
| kingrazor wrote:
| Google search is how I was able to learn how to do the jobs
| I've been doing for the past few years. Fixing one problem
| at a time by googling it.
| ortusdux wrote:
| I used Tineye.com's reverse image search extensively before
| google added the feature and baked it into chrome. Google's
| results have consistently worsened to the point that I've
| switched back to tineye and installed the chrome extension.
| jasonshaev wrote:
| Sorry if this is off-topic but does anyone know if "tineye"
| is a reference to Brandon Sanderson novels? Specifically the
| Mistborn series?
|
| I couldn't find any reference on tineye.com but it seems like
| it has to be.
| tempest_ wrote:
| https://blog.tineye.com/tineye-whats-in-a-name/
| verisimi wrote:
| Google fu? You are actually trying to hack Google in an
| unauthorised way.
|
| 'More than 1 result is a bug, citizen.'
|
| https://www.youtube.com/watch?v=XeIIpLqsOe4
| NayamAmarshe wrote:
| > I read comments and articles all the time about the quality
| of Google's search dropping. I haven't noticed it much in
| practice, but I'm persistent, use copious ad blocking, and my
| Google-fu is strong, so I usually find what I'm looking for.
|
| The results quality has gone down and it's noticeable only
| after you switch to another independent search engine, like
| Brave Search.
|
| For example, search for the term: "javascript undefined vs
| null" on Google and Brave Search. Brave Search gives way more
| information in the sidebar and Google doesn't at all.
|
| The discussions feature on Brave Search is great, you don't
| even need to append queries like 'stackoverflow' or 'reddit'
| for searching discussions.
|
| On top of that, let's say you're trying to search for an npm
| library like 'react-select', if you search that term on Brave
| Search, it gives you a button to copy `npm install react-
| select` right below the npmjs.com link.
|
| It's crazy how good Brave Search is compared to Google
| sometimes, haven't used Google Search in a long time because of
| it.
| judge2020 wrote:
| I thought we were against Google et al. implementing features
| that keep people on their site instead of directing traffic
| to other parts of the web?
| ziml77 wrote:
| I was thinking the same thing. We can't complain about
| Google doing it but then give other search engines a pass.
| MichaelCollins wrote:
| I'm against Google doing it because Google is huge. I
| support Brave doing it because it's a useful feature and
| Brave is inconsequentially small.
|
| Big things are not the same as small things, and should not
| be treated the same.
| ceejayoz wrote:
| Should Brave's useful features be removed at a certain
| threshold if they see significant user growth?
| Dylan16807 wrote:
| As far as the thought experiment goes for making up
| rules, you could probably have a threshold where
| advertising ability gets cut off and it would do enough.
| rrdharan wrote:
| That's how small things become big, and then the cycle
| repeats.
|
| So I guess you're advocating that this dynamic
| equilibrium and "circle of life" is just panglossian
| optimal?
| Dylan16807 wrote:
| The optimal is that nobody ever gets more than 25 (or
| whatever) percent of the market.
| hbn wrote:
| There's also something to be said about how Google should
| be returning information, not answers. Skip to the 1:00
| mark in this Technology Connections video where he
| demonstrates how if you ask Google when the touch lamp was
| invented it'll pop up with a giant, confident answer of
| 1984, despite the fact that if you do the research yourself
| you can find the first patent for it was filed in 1954.
|
| [1] https://youtu.be/TbHBHhZOglw
| falcolas wrote:
| I certainly am. Brave search putting answers in the sidebar
| is not a point in its favor, IMO.
| NayamAmarshe wrote:
| It's certainly controversial so all I can give you is a
| subjective viewpoint but yes, objectively these things make
| search engines more convenient but cut the traffic to the
| original websites by a small margin.
|
| So far, I've noticed Brave Search only shows sidebar
| results for Stackoverflow and a few other popular forums
| that do not advertise on their pages directly, nothing
| else. So they're not really taking any revenue away. As for
| npmjs thing, I'm not sure if their revenue is hurt in any
| way because to view the package documentation you still
| need to open the link. Brave Search just provides you a
| copy command button extra for convenience.
|
| As for discussions, they do not give full context so you
| always need to click the link so that's great for website
| owners as they get more exposure and traffic too.
|
| At this point, Brave Search features are more on the UX
| side of things than anti-competitive so I personally will
| hold off the tinfoil hat for now.
| Retric wrote:
| There is no _we_ on stuff like that, just vocal minorities
| and an apathetic majority.
| throw10920 wrote:
| Who's "we"? HN is not a person - not that an individual
| necessarily has a single, self-consistent set of beliefs
| themselves...
| resfirestar wrote:
| Brave Search is pretty good for basic queries but it doesn't
| support most operators, even basic ones like quotes, making
| it less suitable for people who use their google-fu a lot.
| When I use Brave as my default, I have to fall back on DDG or
| Google much more frequently than I was falling back on Google
| with DDG as my default, and it's usually because I need to
| use an operator.
|
| More on topic, they still use Bing's near-useless image
| search, so even if it's getting worse Google Images is
| seemingly the only decent option in that category.
| neither_color wrote:
| The problem is not just google's algorithm but also the fact
| that people are not sharing useful information the same way
| they used to. These days when I'm trying to hack smarthome
| stuff or looking for advice on 3D printing something or
| software limitation workarounds the best I can hope for is a
| subreddit, otherwise the knowledge is hidden on a discord
| server after I "join the community" for the 98343789th time. I
| can't just read a 10 year old forum thread where people talked
| each other through solving it, I have to join the discord and
| figure out which channel to ask my question in and do some
| dance with frog memes and people react to my question with an
| emoji of a toothless man laughing before we can talk shop.
| ladyattis wrote:
| Even popular things such as finding out what are the more
| popular Path of Exile league starter builds is harder to find
| now than in the past. At this point, I'm often just searching
| the PoE forums or subreddit to get an idea for a build.
| noobermin wrote:
| I'm sympathetic but this is turning to a get off my lawn
| rant. Things weren't easy too depending on what it was.
| Forums and irc were just like discords in that you had to
| deal with insular culture and often serious verbal abuse for
| being stupid enough to ask for help in a forum meant to get
| help in.
|
| That said, I do see the sentiment. It would be nice to have
| the old convience of being able to look up old forum posts
| (especially with summaries in the OP via edit). Stackoverflow
| often fits that role now although I dread the answers even
| worse than old forum posts. I guess what I want is my cake
| and the ability to eat it too, I dont know why we can't just
| have the ability to search forums and have the general kinder
| attitude that modern media tend to have, they shouldn't be
| mutually exclusive.
| redeeman wrote:
| > serious verbal abuse for being stupid enough to ask for
| help in a forum meant to get help in.
|
| in all honesty, isnt this a bit of "beauty is in the eye of
| the beholder", "sticks and stones" etc?
|
| yes, you might be called various words, told to RTFM. but
| seriously here, is it really so bad? do you really want to
| drag down actual verbal abuse to something so absolutely
| trivial?
| Phrodo_00 wrote:
| Yes, forums and chat channels have always been cesspools
| but at least forums (and IRC logs) were _searchable_
| cesspools
| TomSwirly wrote:
| But you could search them. And still can. If it's in
| discord, it's a black hole.
| RF_Savage wrote:
| Hiding documentation in some Discord is infuriating.
|
| It's Yahoo Groups all over again, except that open groups
| could have their messages indexed, unlike on Discord.
| rplnt wrote:
| > my Google-fu is strong
|
| The thing is, this doesn't matter anymore. You have very little
| control as Google tries to be smart. It's very hard if not
| impossible to find something older, obscure, things from other
| regions, languages, etc...
| joshstrange wrote:
| I think "Google-fu" just refers to being able to bend the
| search engine to your will. In the early days it was with
| operators and special keywords (inurl, etc) but today most of
| those are not as useful or actively harmful and so "Google-
| fu" has progressed. It's knowing which terms to drop from the
| error you are searching for, it's knowing how to phrase
| things correctly, it's knowing how to skim the results and
| separate the wheat from the chaff. Or at least that's what it
| means to me and how I use it.
| int_19h wrote:
| It's the opposite now - you increasingly have to use
| "Google-fu" to make sure that your query is not creatively
| reinterpreted in ways that are virtually guaranteed to
| yield irrelevant results (but more of them) - e.g.
| substituting words with "synonyms" (which aren't), or
| removing the most important keyword from the query
| altogether.
| skybrian wrote:
| These days you need -youtube to get rid of the video
| results.
| nvrspyx wrote:
| I don't think they misunderstood and I think their point
| still stands. The irrelevance of Google's search results
| are becoming ever more unyielding to the user's intent. The
| portion of results that are SEO spam for every query is
| increasing. The amount of your query being dropped and
| ignored in your search is increasing. Google results are
| becoming increasingly irrelevant and is on a trajectory to
| a point of completely ignoring your query where the results
| are strictly a combination of spam and a random pick of
| websites.
|
| "Google-fu" is not progressing. It's struggling to hang on
| by the decreasing number of threads before it's utterly
| ineffective.
| munk-a wrote:
| I had a good laugh yesterday. I was setting up a password
| vault for my mother and had to get the login pages for
| various services and a few did like Etsy does - search
| for "Etsy Login" and you'll notice their SEO has managed
| to put /search?q=login above /signin in google's results.
| It amuses me whenever SEO is taken to such an extreme
| that it actually makes the results from your company less
| useful for people actually looking for your company.
| Swizec wrote:
| Google-fu is prompt engineering.
|
| Where we used to say "rome fall why", you'd now write
| "why did the roman empire fall". Because the AI likes
| that phrasing and produces better results.
|
| Soon you'll write a 300 word description of what exactly
| you're looking for, like you would when asking a trusted
| expert, and Google will figure something out. The days of
| keyword searching are long gone.
| munk-a wrote:
| My own google-fu tells me that both of those searches are
| likely to be pretty poor - using the word "fall" instead
| of "collapse" is likely to snap up quite a few weird
| results about autumn tourism in italy and including
| "empire" in the second query feels likely to get you a
| batch of other poor results (like, for instance, the
| collapse of Russia commonly known as the third roman
| empire).
|
| Personally I'd suggest "collapse of rome" which does
| deliver you a rich embedded result specific to the fall
| of rome.
|
| I agree that Google's search parsing peaked a while back
| though, it seems to be getting weaker and weaker and now
| partially relies on the fact that search term
| autocompletion on mobile devices will supplement it by
| helping present an array of options near what you might
| want.
| GolfPopper wrote:
| That may work for common topics, but it does not appear
| to work for niche ones, at least in my experience.
|
| If I want to find a particular user-run forum on some
| obscure bit of some hobby, "<hobby name> <forum topic>"
| brings it up. But if I type out "Forum for <hobbyists>
| discussing <topic>" I get... a random selection popular
| of fora where someone has mentioned <topic>, often in
| passing or with minimal information.
| nerdponx wrote:
| Except that this isn't objectively an improvement, even
| in a perfect world where AI is substantially better than
| it currently is.
|
| > you'd now write "why did the roman empire fall".
| Because the AI likes that phrasing and produces better
| results.
|
| "the AI likes that phrasing" is exactly the problem here.
| How is anyone supposed to know what the AI "likes", other
| than painstaking trial-and-error in the unbounded and
| arbitrarily high-dimensional search space of human
| language?
|
| Even the people who built the model probably don't know.
| Language models (and deep NNs in general) are
| extraordinarily complicated things, and there are
| problems with pretty much every technique that purports
| to provide visibility into their inner workings. There
| are just too many parameters and too many "information
| paths" in such a thing for regular people to wrap their
| heads around it. The ability to incorporate a high amount
| of complexity is a big part of why those models are so
| effective to begin with, but it also makes them really
| hard to reason about.
|
| "AI" is currently in a weird spot where it's starting to
| kinda-sorta behave like an intelligent human in some
| limited settings, but in general is nowhere near as smart
| as a human. Most models still have a very shallow
| _conceptual_ understanding of anything, even if they 're
| becoming uncanny in their ability to match sophisticated
| patterns. It might not even be possible to teach some
| concepts to language models as they currently exist
| today, if only because there is only limited conceptual
| understanding available to be learned from corpora of
| text and images, even huge ones. Humans are still
| tremendously more effective than our best language models
| at understanding meaning and intent. Can an AI ever learn
| about love, regret, fear, or bliss, by reading millions
| of news articles and books and looking at millions of
| images?
|
| Thus AI right now is in a kind of "worst of both worlds"
| situation, where it is complicated enough to be hard to
| reason about precisely, but still mostly unsophisticated
| and therefore highly sensitive to how inputs are crafted.
| Therefore it's hard to formulate inputs that provide
| useful outputs. It's still alpha-level technology at
| best, and there might be one or several _conceptual_
| innovations remaining between what we have today and
| something resembling general intelligence.
|
| Consider also that "AI assistance" is _complementary_ to
| keyword search, not a replacement for it. Google search
| AI is becoming something like a "digital librarian", a
| creature that can understand your queries and guide you
| to a starting place in the relevant literature. But much
| like in a real library, the digital librarian is going to
| be most useful as a starting point. At some point, if you
| already know what you're looking for, you still are going
| to want to search on "structured" criteria, as well as,
| yes, keywords embedded in text.
|
| And finally, do you really _want_ to type a 300-word
| description in order to get good search results? I was
| already getting good results with 3 keywords. I have
| already done the sophisticated pattern-matching and
| concept-graphing in my own brain, and now I know exactly
| what terms I want to look for. Why should I be forced to
| coach an AI on how to redo all that work for itself,
| instead of just letting me do a damn keyword search? Not
| to mention wasting my time and giving me carpal tunnel
| typing it all out.
| munk-a wrote:
| You touched on the fact that the AI right now is
| primitive and unable to parse regular english well - I
| agree that it still has a ways to go in this regard but,
| while all of us complaining here might prefer the old
| google, we are the "in" crowd that actually put in the
| time to learn the old arbitrary rules. It isn't great to
| keep around arbitrary rules purely for the sake of
| consistency if those rules are bad. I think the fact that
| the search results are often unable to clearly
| distinguish different questions (and may return some
| autumn related results for "fall of rome") is a clearly
| bad thing - but the old format we're used to required a
| lot of learning and adaptation (the aforementioned
| "Google-fu") that shouldn't be a necessary skill for
| future generations.
| extr wrote:
| Not to pick on your quick example, but actually testing
| it, these two terms [1] [2] have nearly identical
| results. Top result in both cases is history.com,
| followed by wikipedia, and the next 4-6 results are the
| same but in slightly different order.
|
| I think this is actually an example of the benefit of
| their AI. Despite the big difference in the "style" of
| phrasing (simple english vs more formally naming the
| subject noun), both seem to map to a very similar
| representation in their embedding space. I've run into
| frustrations with this myself, but for basic questions
| like this it seems like the search works Pretty Good.
|
| [1] https://www.google.com/search?q=why+rome+fall&oq=why+
| rome+fa... [2] https://www.google.com/search?q=why+did+th
| e+roman+empire+fal...
| nerdponx wrote:
| Consider that "Why did the Roman Empire fall?" consists
| of 1 word that describes the type of question being asked
| ("why"), 2 useless junk words ("did the"), and 3 "key
| words" ("Roman Empire Fall"), of which 2 should really be
| treated as a single word referring to a single
| concept/entity ("Roman Empire", for which "Rome" is a
| synonym in some cases).
|
| Humans instinctively know this, so we are able to
| construct queries like "why rome fall".
|
| But that "why rome fall" query, which we think of as a
| purely mechanical keyword search, already requires quite
| a bit of sophisticated processing in the search engine.
| The system has to recognize that "fall" is synonymous for
| "collapse" or "wane in power" and not synonymous for
| "autumn". It also has to recognize that "rome" means "the
| (Western) Roman Empire" and not the modern city of Rome
| in Italy or "the Holy Roman Empire" or the city of Rome,
| NY, USA. It furthermore needs to interpret "why" in such
| a way that it emphasizes results with "reasons" or
| "explanations", rather than something like a "timeline"
| or "summary".
|
| Personally I find it really weird that Google is
| interested in pushing users more to interact with its
| digital librarian / AI assistant, instead of continuing
| to improve keyword search.
|
| I have a few guesses as to why they are going this way:
|
| 1. It makes the user interface simpler from an
| engineering perspective (fewer user-facing buttons and
| options to implement and test).
|
| 2. There is strategic benefit to making search more of a
| black box. Maybe they are specifically trying to
| "educate" users to expect and be comfortable with such
| black boxes. Maybe the plan is to get people so
| accustomed to "AI assistant" search that they see keyword
| search as outdated, and thereby secure a competitive
| advantage for the next several years over other search
| engines, by having the biggest and best AI models.
|
| 3. They are trying to increase the amount of rich
| "natural language" user search inputs in their data.
| Making keyword search worse will encourage people to use
| queries that more closely resemble natural language. I
| assume that this has strategic benefit related to Guess 2
| above.
| shortformblog wrote:
| I noticed. I use this tool all the time. Now Google just made it
| useless.
| O__________O wrote:
| Creative Commons always felt broken to me. Is anyone able to
| explain why following is not flawed and not possible, that is:
|
| Content thief steals contents anonymously, posts it as Creative
| Commons content anonymously using free hosting, archives the
| content on Way Back, and then uses content claiming it is
| Creative Commons if anyone asks?
|
| Basically it is content laundering - and something that
| impossible for Creative Commons to address as is.
|
| (If reasoning is not flawed, also interested in possible solution
| to the issue.)
| dleslie wrote:
| Your scheme would apply for any sort of licensing, permissive
| or not, and is certainly not constrained to using creative
| commons.
|
| It's also flawed in the same, albeit weak, manner: the source
| it was stolen from predates the stolen copies, and so can show
| its true provenance. Of course, if it wasn't published or
| otherwise registered, no one would know.
| bityard wrote:
| > Basically it is content laundering - and something that
| impossible for Creative Commons to address as is.
|
| CC is just a usage license that authors can apply to their
| works in order to share them freely with the world. It's
| strange to me that you might think it is the job of CC to
| police how people (mis)use it?
| alt227 wrote:
| This is no different from stealing something physical in the
| real world, and then claiming its yours. Sure some people will
| believe you and be impressed, but as soon as you attract the
| attention of the original owner, they will use the law to prove
| it is not yours but theirs. This is applicable to anything that
| can be 'owned'. Licensing only dictates how something can be
| used lawfully, the law protects the license from being abused.
| katabasis wrote:
| I encourage anyone looking for freely-reusable images to try the
| new MediaSearch feature on Wikimedia Commons. Also works for
| other types of media files. All images on Commons are free to
| reuse but there is also a license filter if you are looking for
| more specific permissions in terms of attribution, etc.
|
| https://commons.wikimedia.org/wiki/Special:MediaSearch
| crawsome wrote:
| Is there a better creative commons search engine that will only
| return free media?
| katabasis wrote:
| Wikimedia Commons recently introduced a new MediaSearch feature
| (everything there is free by definition):
| https://commons.wikimedia.org/wiki/Special:MediaSearch
| macintux wrote:
| The article and comments promote this alternative:
| https://wordpress.org/openverse/
| shadowgovt wrote:
| The key point here is "... and hardly anyone noticed."
|
| In theory, site reliability engineers and software engineers work
| together at Google to maintain existing functionality and check
| for regressions. In practice, regression checks can only cover
| what they've been told to cover, and if breaking a feature
| doesn't cause a metric to crash, Site Reliability doesn't have
| the information to know something is wrong.
|
| The result is that generally speaking, Google prioritizes
| existing features by how popular they are (i.e. "Maintenance by
| Popularity") modulo how much of a stink someone influential can
| make if they screw it up (i.e. "Maintenance by Twitter."). If
| almost nobody uses the CC filter (and I bet, in the grand scheme
| of things, they don't), Google may not have enough signal to know
| they broke the filter unless someone had the foresight to add a
| metric to check search results on that query separate from the
| rest of the search result data (and at Google's scale, you can't
| just add a metric for free; in addition to eng-hours to build and
| tune it, the data has to be stored somewhere and teams have
| finite budgets for space that can only be grown by negotiation
| with the relevant teams managing the monitoring services or the
| project as a whole).
| ajsnigrutin wrote:
| ...or they screw it up, make it worse and worse, and less and
| less people use it, making it less popular, less chance someone
| fixes it, less people use it, less chance, less people, less,
| less, google graveyard.
|
| Both google image search and "normal" google search are
| becoming more and more a pain to use, where you have to use
| quotemarks on pretty much everything plus a few excludes to
| find anything at all.
| shadowgovt wrote:
| Yep. This is definitely the flip-side of Google's process.
|
| It makes some sense; take a step back, and the story is "If
| Google software engineers don't care enough to make it good,
| and Google users don't care enough to keep using it in spite
| of its flaws, why is Google throwing money at it at all?"
|
| Google is a weird company because so many priorities are set
| by software engineers, not managers; some projects die
| because they literally run out of passionate engineers to
| work on them and management isn't incentivized to force
| engineers to work on projects they hate, instead asking the
| question "If nobody wants to work on this, is it worth it to
| keep doing it?"
| Thaxll wrote:
| At some point you reach a complexity level that even the
| engineers don't even know how it's supposed to work.
|
| I mean someone type something on google the result is not what
| is expected how do you troubleshoot that or know that it's a
| regression?
| Hard_Space wrote:
| I go to Yandex for image search first, these days, especially if
| I want to do an image-based search. Whatever you may think of it,
| Yandex uses both facial ID and object recognition, and will
| return side-results that are both visually and semantically
| related to the uploaded image.
|
| Google Images, on the other hand, seems to look for the nearest
| monetizable domain cued by the uploaded image, and gives you
| adjacent results about that. It certainly does no facial
| recognition, etc. (well, none that it will feed back to you in
| results, anyway).
| pGuitar wrote:
| Yandex is better for image search.... specially reverse image
| search.
| NelsonMinar wrote:
| I fear this may be a feature no one at Google is actively caring
| about anymore.
|
| Here's Matt Cutts' tweet about it in 2014:
| https://twitter.com/mattcutts/status/422944316458168320
|
| But I think it dates back to 2009?
| https://googleblog.blogspot.com/2009/07/find-creative-common...
| lovelearning wrote:
| I have noticed multiple problems in Google search results. But I
| don't give them any feedback because I don't feel helping Google
| become better, and therefore more dominant, is the right thing to
| do long-term.
| elashri wrote:
| I am using `kagi` as my search engine. While reading the article
| I searched `sur:fmc` [0]. The first result was a site that gives
| information about search urls [1] and gives correct description
| of what is the function of `sur:fmc`. I couldn't find this page
| on the first 5 pages on google [2] (and didn't look further). Not
| to mention google trying to correct me the query text.
|
| [0] https://kagi.com/search?q=sur%3Afmc
|
| [1] https://sites.google.com/a/arps.org/esresearch/images
|
| [2] https://www.google.com/search?q=sur%3Afmc
| dmonitor wrote:
| $10/mo is a really hard sell for something I can get for free,
| but if it's actually useful I would be very tempted
| tylermenezes wrote:
| Kagi is great! I have to use Google for only about 5% of
| searches with Kagi, vs about 40% with DDG. I was genuinely
| surprised
| ravenstine wrote:
| IMO, it's worth it. People spend that much on Starbucks
| _every day_. The key feature I like about Kagi is that being
| able to downrank or outright block certain domains is built
| right in. I also like that it doesn 't throw a bunch of crap
| on to the page above the fold and it doesn't pretend that it
| found results when it finds no matches.
| noidiocyallowed wrote:
| Kagi search? It said something like after I exceeded my daily
| search limit in the trials: "Go search somewhere else". I mean,
| who are these snotty kids, that have the audacity to talk down
| on people? Not gonna spend my money there for sure.
|
| Kagi is good, but get a proper PR person and don't let
| imbeciles ruin the experience.
| elashri wrote:
| This ia because they are mainly a paid search engine with a
| limited free trial.
| elashri wrote:
| This is because it is a paid search engine with a free
| limited trial. This trial is limited because the search
| inquiry is costing them money and they don't have VC money to
| through on growth. Neither they plan to follow this growth
| path. They focus on quality for the targeted market. And to
| be honest, they succeed in that until now.
| int_19h wrote:
| I think the point is that there are much better ways to
| communicate this.
| rntksi wrote:
| Guessing "ur" in sur means Usage Rights, and M means
| Modification, C means Commercial, F means Free (to reuse)?
| itvision wrote:
| Google image search has been broken for at least two years now -
| looks like Google does not give a damn. I remember it could find
| 5 to 20 times more pictures than before, you could actually trace
| images to their sources - this has become impossible. Why? I've
| no idea.
|
| Then they disabled the Google web cache which was hugely useful
| since it allowed to open dead website/webpages or allowed to
| browse something when your government restricts access to it. I
| guess the copyright lobby and China forced the removal of this
| feature.
|
| Google search is still unmatched in terms of being able to find
| text but other features have seen a huge cut. :-(
|
| And don't remind me about iGoogle. I loved it.
| https://web.archive.org/web/20160314122329/http://linuxfonts...
| dannysullivan wrote:
| I'm with Google Search. I tweeted to the author of the post, but
| will also share -- this looks to be a bug that we're tracking
| down. Definitely not what we'd have expected or wanting to have
| happen for a queries of this nature.
| WaitWaitWha wrote:
| I think this is not that there are only three dog pictures
| because Google broke CC. I think this is more of a Google/ML/etc.
| "over-helping".
|
| If I click on all the other dog pictures bar ("german shepherd,
| puppy, baby, rottweiler, police", etc. right below the option to
| select size, color type, time, licenses) additional CC images
| show up of the selected sub-type. The selection is additive (and)
| filters.
|
| It is more that Google deciding what I really want.
| bacchusracine wrote:
| People still use Google Image search?
___________________________________________________________________
(page generated 2022-09-28 23:01 UTC)