[HN Gopher] What software engineers should know about search (2017)
___________________________________________________________________
What software engineers should know about search (2017)
Author : todsacerdoti
Score : 315 points
Date : 2021-10-18 05:57 UTC (17 hours ago)
(HTM) web link (scribe.rip)
(TXT) w3m dump (scribe.rip)
| whakim wrote:
| I want to plug Algolia a little bit. For small teams, the results
| are truly incredible, and it provides a ton of features out-of-
| the-box without a lot of development time. I've seen small teams
| struggle mightily with both ElasticSearch and Solr; things
| usually start off OK, but results go downhill as the search needs
| get more complicated (indexing multiple kinds of documents;
| adding more and more different kinds of data to the index and
| trying to weight things properly; dealing with tradeoffs such as
| relevancy vs recency; etc.). In the future I would shy away from
| such powerful tools unless I knew I had sufficient engineering
| resources to dedicate to search and my problem was complex enough
| to merit it.
| jamesbriggs wrote:
| Being from 2017 the article misses some of the coolest advances
| in semantic search, which is now pretty easy and lets you search
| in (almost) the same way they would ask a shop assistant when
| looking for something specific - "do you know where the thing
| with the cool circles and pointy bits is?" (maybe being a little
| more specific...)
|
| Google do this and they're very good at it, but a lot of
| companies need their own search capabilities - think about those
| internal help pages. They usually seem super outdated compared to
| the semantic search capabilities of Google.
|
| In the end there's only a few components to it, you use some NLP
| model to create what are called 'dense vectors'. Then you put all
| these dense vectors into an 'index' which is optimized for fast
| search (that comes under the umbrella of ANN search). Then given
| a new query you just compare that to the items in the index and
| return the most similar results.
|
| I covered the search part of it (https://www.pinecone.io/learn/)
| with Pinecone, who provide managed-search - although here we
| mainly focus on Faiss (a great engine from Facebook AI). However,
| we're also looking to create some content covering the first half
| too, which is 'how to build dense vectors' using models like
| BERT, we have one post so far on that
| (https://www.pinecone.io/learn/dense-vector-embeddings-nlp/)
| danpalmer wrote:
| The biggest thing I've learned about search is to be very precise
| about WHAT is being searched. In my experience most people are
| very loose with this, but by being really precise you can skip
| past a lot of bad search experiences.
|
| Take clothing search, "blue t-shirt", what is the search space?
| Most engineers would naively shove the name/description into
| Elastic Search and then be surprised when it doesn't work. Why?
| Because most product descriptions are not as explicit as "This is
| a blue t-shirt" (see also YouTube video descriptions).
|
| What we actually found was that by searching categories (where
| blue t-shirts is roughly a category), and then just filtering
| clothing items to that category, search worked far better.
| Understanding what terms people used (categories/colours) and
| what was actually in the data and what data models we should be
| searching (not products!), we built a far more effective search
| experience.
| dotancohen wrote:
| One strategy for this is to parse the search string for facets.
| Presumably there would be a facet for "t-shirt" and another
| facet for "blue", modulo synonyms and typos. Selecting those
| facets should give very good results, even without hitting the
| full-text search.
| danpalmer wrote:
| Yes that's a good option, but it still requires that you've
| got the facets in the index. I'm surprised at how many people
| say "just use postgress full-text search" without thinking
| through what information will actually be made available to
| search, for example.
| dotancohen wrote:
| No, I'm not saying to have the facets in the index. I'm
| saying to parse the search text for facets, and to search
| on the facets.
|
| If the whole string matches facets, then the full text
| search never gets touched, as in the example above. But if
| the example were e.g. "blue anime t-shirt" then you could
| facet on "blue" and "t-shirt" while FTS only has to contend
| with "anime".
| divbzero wrote:
| HN discussion from 2017 on the same article at a different URL:
|
| https://news.ycombinator.com/item?id=15231302
| amir wrote:
| Not sure who needs to know this but scribe.rip is a relatively
| new alternative reader for Medium. The original article is at:
| https://medium.com/startup-grind/what-every-software-enginee...
|
| HN's special treatment of medium.com links doesn't apply to
| scribe ones.
| mellosouls wrote:
| (2017) as well.
|
| Previous discussion from the time, of this excellent article:
| https://news.ycombinator.com/item?id=15231302
| hnbad wrote:
| That explains why all the link texts are weirdly misaligned. Do
| the authors have any input or does it just scrape Medium for
| content?
| krageon wrote:
| What would they have input about?
| jamesbriggs wrote:
| A lot of websites scrape Medium without the authors having
| any idea, I assume it is another one of those
| zuhayeer wrote:
| Underrated how effective indexOf can be for search if you don't
| have an incredibly large set of data
| jimmytidey wrote:
| Sometimes, search is broken design. The assumption in the article
| is that the search function is trying to return what you are
| looking for, however, Spotify and Netflix have their own ideas
| about what content they want you to consume.
| forwidur wrote:
| Author of the article here: thanks for re-posting this! The
| article seems to be still relevant to many even 4 years later,
| which surprises me, given how quickly everything is changing in
| the field.
|
| However, I personally am now focused (and bullish) on DNN-based
| semantic search. Having built several search experiences based on
| it I'm convinced it is the future.
| marginalia_nu wrote:
| I think more software engineers should do themselves the favor of
| dabbling in search engine design.
|
| It really is the gift that keeps on giving in terms of
| interesting problems, you get everything from large scale graph
| processing, esoteric data structures, down to bit twiddling
| optimization, language processing, dealing with seriously messy
| and sometimes adversarial real world data. Everything is
| challenging, but none of it is impossible, even as a solo
| project.
| jimmyvalmer wrote:
| tldr (4,473 words); Search is a classic NLP problem replete with
| all of natural languages' wide-ranging input and evaluation
| vagaries. I admire Russian get-to-the-point bluntness in most
| contexts, but for whatever cultural reason, their writing is
| always Tolstoyesque.
| rapsey wrote:
| Something completely off topic about the content of the article
| but not when it comes to the title.
|
| Github search is a great resource for any engineer. It has saved
| me so many hours trying to figure some API out when I know there
| must be so many people who have used it before.
|
| "SomeApiCall" extension:.[js|rs|java|cs|etc.]
|
| Or to find projects that use a dependency you are not sure how to
| use
|
| "SomeDependency" extension:.[toml|json|etc.]
| mercora wrote:
| i did not know about these[0] keywords so thanks for that :)
|
| [0] https://docs.github.com/en/search-github/searching-on-
| github...
| alserio wrote:
| GitHub really needs a feature to skip searching in tests. Or
| maybe there is such a feature but I've never found it.
| amelius wrote:
| Donald Knuth's volume on Searching and Sorting needs a successor.
| karterk wrote:
| A problem close to my heart. Good search is certainly still too
| difficult to pull off for small teams, and this was one of my
| motivations for building and open sourcing Typesense[1].
|
| Most people think of search and immediately think of large data
| sets, but the problems that plague smaller datasets are equally
| interesting. It's less about performance and more about
| relevance. For e.g. searching across multiple fields for a
| compound query like "taylor swift style", requires breaking the
| query into segments (taylor swift | style) before searching for
| the appropriate fields. There are also a class of problems that
| traditional search engines that rely on BM25 or TF-IDF for
| ranking cannot reliably solve (e.g. searching on small texts like
| titles) where you have to consider distance between matching
| words (which TF-IDF and BM25 miss). Lastly, there is also
| personalization which is almost always left as an exercise to the
| reader :)
|
| [1]: https://github.com/typesense/typesense
| carlreid wrote:
| Typesense has been great for us so far. Easy to set up, works
| great with the simple queries that we need.
|
| Lots of great additional functionality on top of search, for
| example security, with scoped API keys and the likes that we're
| looking forward to making use of.
| karterk wrote:
| Thank you for your kind words. We are just getting started :)
| jimmyvalmer wrote:
| This phrase "Thank you for your kind words" has in the last
| 1-2 years become the standard way people used to just say
| "Thanks". It sounds so cringey and mawkish.
| AtNightWeCode wrote:
| For sure it depends on the quality of data and the target of
| the service. Basic knowledge in Elastic search will beat Google
| for confined data sets.
|
| Major search engines have grown to the size where it is
| theoretical more beneficial to train a model and then query
| that model. In theory this will win but in the real world this
| will not work as we already can see with the failing Google
| search engine.
| slimsag wrote:
| As a long time lurker of your work, and someone who works on a
| fair number of search engines myself - I'm curious, how well
| does TypeSense handle code search (punctuation, etc.)?
| karterk wrote:
| Finally added support for indexing and/or separating on
| specific symbols in the ongoing pre-release builds. With
| that, I think Typesense should be able to handle code search,
| but I have never tried to index and search on code myself :)
| slimsag wrote:
| Cool! I'd be keen to try it out sometime if you have docs
| on how I'd get it to index specific symbols
| karterk wrote:
| Yes, please see here: https://github.com/typesense/typese
| nse/issues/122#issuecomme...
|
| The latest RC build is `0.22.0.rcs18`.
| bagels wrote:
| First thing would be to see if Elasticsearch works for you,
| because it will for many applications.
| svacko wrote:
| I think, it would benefit readers to mention more than the 2
| mentioned groups of the search-related software - there is a lot
| of new generation tools and libraries like MeiliSearch,
| Typesense, Sonic, Toshi, Bleve, ...
| ramraj07 wrote:
| The simplest advice to engineers would be to not think elastic
| search as some black box that will solve all of your search
| problems. In fact, if you've never implemented search before it's
| the last tool you need. Postgres full text search is all you
| need. The most important thing in search is to surface relevant
| results and no one can quantify relevancy. It's as unquantifiable
| as it gets in technology. You need to understand what results are
| relevant for your users and find a metric that would work to rank
| accordingly.
| ChrisMarshallNY wrote:
| That's a useful and well-written article. It was written 4 years
| ago, though, and NLP has improved significantly in that time (in
| English).
|
| My experience using natural-language search queries has given me
| a set of expectations, as a user, that set a high bar.
|
| I've written a fairly substantial backend for a search, and most
| of my original frontend code has been ripped out and replaced
| with new frontends, by now. I feel that the new frontends are an
| improvement, but we still have a fairly strictly-guided
| "guardrail" search.
| mattbee wrote:
| Since I didn't see this mentioned yet - I was writing a web
| search for a client's support data, and wasn't looking forward to
| setting up ElasticSearch. But I found the FTS5 extension built
| into SQLite. It was trivial to set up, my client was happy with
| the results after very little tuning, and it's SQLite so you've
| probably already got it on your computer:
| https://www.sqlite.org/fts5.html
| yewenjie wrote:
| Link to Whoosh is not correct.
| artembugara wrote:
| This article is incredible.
|
| I've been running a news search engine API [1] for about 18
| months, and I found a lot of new things.
|
| The biggest takeaway (IMO) is to use existing SaaS at first.
| We've been using ElasticSearch Service for our v1, and it worked
| great. 0s downtime with no master nodes.
|
| And yes, ElasticsSearch itself is just a tool: you'd have to
| write your own search logic on top of it.
|
| Also, start with something "strict" that gives decent results for
| ppl who won't read any tips or docs: 99,99% of people who use
| Google Search don't know about any advanced tips, and still get
| relevant results.
|
| [1] https://newscatcherapi.com/news-api
| denysvitali wrote:
| Friendly reminder to the friends at Atlassian. This could be a
| good starting point to fix your damn Confluence search
| donkeydoug wrote:
| look, if they fix search they might have to stop using
| "Confluence; where information goes to die" as a tagline... so
| not an easy sell.
|
| seriously though... default confluence search-as-you-type box
| is horrible, however sometimes (often ? always ?) there is a
| more traditional search endpoint available that seems to give
| better results. see if either of these works for you:
|
| <your confluence domain>/dosearchsite.action
|
| -or-
|
| <your confluence domain>/search/searchv3.action
| scraplab wrote:
| I often come back to the Relevant Search book, by Doug Turnbull
| and John Berryman. I'm sure some of the examples are a little
| dated now, but most of the advice and approach is still sound and
| it's a great tour through all the things you need to consider to
| build a great search experience.
|
| https://www.manning.com/books/relevant-search
| softwaredoug wrote:
| Doug Turnbull here - Thank you! :)
| qwertox wrote:
| > Merging results of different kinds: e.g. Google showing results
| from Maps, News etc.
|
| Google does have an UX bug in its search/map product.
|
| Let's say I google for "kuka deutschland". In my case, the first
| result is a link to KUKA, with sublinks to different sections of
| their website.
|
| Next there's a map with A-, B-, C-markers, and below them fields
| with A: KUKA Deutschland GmbH [other info, website, route] B:
| KUKA Systems GmbH [...] and so on.
|
| If I now click on A: KUKA Deutschland GmbH or on the map, then
| the map opens completely, with a left sidebar containing more
| related places, a popup with the currently selected item open
| which contains plenty-full information about the marker on the
| map.
|
| The issue is that this is not a Google Maps map, it is not hosted
| under maps.google.com, but any normal person would expect to be
| able to interact with this map as if it were a normal map: have a
| button for satellite imagery, 3d-view and all of this, the
| ability have the normal Google Maps sidebar at my disposal. But
| this does no exists.
|
| The worst thing about this view is that there is _no link_ to
| Google Maps which would open that same view in Google Maps.
| throwaway923840 wrote:
| I don't understand why people still waste so much money on making
| some perfect search engine when they can just filter on tags.
| Every retail product sales website in the world works by
| filtering on tags, not deciphering search terms in the "right"
| way.
|
| Want black shoes? I could search for "black shoes", and receive
| shoes with the brand/product name "Black". Or I could select
| 'category: shoes' and 'color: black' in the drop-down box. Hey,
| look, now I have a list of black shoes.
| gifnamething wrote:
| What do I do if I want burgundy shoes? Do I decide if I think
| it's red or purple or blue? Or do I go further and think what
| the person listing it thought it was? Do I hope the decision
| went the same way each time, and all shoes have their colour
| listed?
|
| I don't do any of this. I filter by shoes and look with my
| eyes. Retail search is a poor example because existing retail
| tagging is poor.
| inertiatic wrote:
| Because:
|
| a) speaking "black shoes" into your phone is a better
| experience than scrolling down an arbitrarily long list of
| categories.
|
| b) Is shoes a category? Is it clothing? Is it boots? Oh I need
| an ontology now? Will users find it intuitive to have to drill
| down on it???
|
| c) Wait, the item comes in with an automatic description from a
| thousand different vendors. How do I decide what part of it
| maps to my "color" field? What if the color isn't in a separate
| field? How do I make sure that this item isn't dead in my
| inventory? How do I make sure that this popular item is
| retrievable for me too and not just my competitors that have a
| better search??? In this case just searching for "black"
| anywhere in the fields of the product, weighted for
| significance, gets me valid results without communicating to
| the user that they filtered on a color.
| inshadows wrote:
| Lucky you. I search eshops with black shoes
| site:https://shittyeshop.com
|
| on Google because of abysmal search on 99% of sites.
| kristopolous wrote:
| Biggest advice I can give is you probably don't need search if
| you're indexable by search bots.
|
| No really. Look over people's shoulders sometime.
|
| They'll just go to Google and type in their search followed by
| terms such as Wikipedia, imdb, Stackoverflow, YouTube, Bandcamp,
| Amazon, eBay, Yelp... all sites that spent a lot of time on their
| search and have done quite a decent job. Oh well.
|
| So unless you really need it for some critical reason where you
| know your users aren't going to do their regular patterns of
| going through one of the general engines, close the ticket as out
| of scope and go home early.
|
| Don't bother implementing hard to do and expensive to maintain
| features that nobody will use. It'll become more headache than
| fun real quick.
| dotancohen wrote:
| > Biggest advice I can give is you probably don't need search
| if you're indexable by search bots.
|
| I don't know how relevant it would be today, but ~2005 or so I
| had a site that used Google search for the internal "search"
| functionality. It was a webmaster feature of Google at the
| time, the google results would be displayed with my site's
| branding, colours, etc. Only results from my site were
| displayed, but there was a clear Google logo and link at the
| top. I think that ads were not shown, but I'm not 100% sure.
|
| A/B testing showed significant (I don't remember the numbers)
| loss of visitors on that page. When presented with our own
| internal search, visitors would stay on the site the vast
| majority of the time. But when presented with the google
| search, visitors would often leave. I would say that the
| results that google returned were no less relevant than those
| our internal search provided, in most cases near identical
| except for ordering.
| outlawBand wrote:
| So as a casual web browser, I have been guilty of this. IMHO
| it just breaks the users experience when they feel like they
| have been swept out to google. Almost like when someone asks
| a question, and someone replies with a link from lmgtfy
| dotancohen wrote:
| I wish that we had spoken a decade and a half ago ))
| BiteCode_dev wrote:
| Yes, but a good embeded search makes all the diff.
|
| VueJS and Tailwind CSS both are indexed, but on those
| particular site I use the web site search _when I'm looking
| back for the reference of something I know_, because it's
| faster and more accurate than googling.
|
| Granted, it's rare, but if you manage it, it's great.
| kristopolous wrote:
| Maybe. The point in those examples was that they all solved
| search in a specific contextualized manner impressively well
| and despite that people are creatures of habit and they'll
| want to port the same generalized patterns over that they do
| for everything else.
|
| Not consciously. They'll just do it and expect it to work
| which is why effective indexing and "SEO" (as in, the engine
| can scrape the content and crawl around without getting
| confused) is likely the actual work to do to implement
| "search".
| ricksunny wrote:
| From my experience yesterday I can tell you that Zenodo is
| struggling with reliably repeatably returning results basic
| boolean search queries. I think I'll try parent's search-
| the-site-via-Google approach.
| svcrunch wrote:
| Agree with your sentiment.
|
| I cofounded ZIR AI to provide ML (vector)-based search as a
| PaaS solution, similar to what Algolia or Elasticsearch do
| for keyword searching.
|
| We have a demo (https://zir-ai.com/demo) running over Quanta
| Magazine articles. Not only can it outperform the keyword
| search embedded on quantamagazine.org, but one some queries,
| it even outperforms Google with a site restrict (e.g. how old
| is the universe site:quantamagazine.org).
|
| > Granted, it's rare, but if you manage it, it's great.
|
| ML-powered search will make this commonplace, once the tech
| goes mainstream in the next 2-3 years.
| Noumenon72 wrote:
| That does make you dependent on Google not starting to suck.
| For the first time last week I had the experience of Googling a
| programming question and realizing the results were so bad I
| should probably try searching StackOverflow directly. I did and
| got twice as many hits.
| lessname wrote:
| You might be right in most cases (for example blogs etc) but
| it's just disturbing on apps like spotify if you just can't
| find the music you want to listen inside the app. Or when
| Netflix gives unavailable movies as search suggestions.
| netcan wrote:
| This is true in principle, but in practice, the use of 3rd
| party search has died down over the last decade. This is not
| the phenomenon of a superior way winning out.
|
| Either search is not an important feature, and a suboptimal,
| DIY implementation that looks OK is good enough. Or, search is
| a primary feature and then you need control over it. IE, if you
| have an online store, travel site or dating app with a search
| based UI, then you'll probably roll your own.
|
| There are cases where google really is the best way. As you
| say, stackoverflow, wikipedia & such. Even so, you'll
| eventually roll your own. Spolsky original UI concept totally
| leaned on Google for search, but SO still has its own.
|
| At some point, you'll need results to take inventory into
| account or make autocomplete smarter about tags... and now the
| headache is yours anyway.
|
| It would have been cool if the web had really developed into
| the hopeful, "semantic web era" where this kind of approach
| works. That didn't happen. Half the game is over control, and
| controlling UIs matters the most.
|
| Also, the power of pagerank has dwindled as the web itself
| changed. Links aren't what they used to be. That makes Google
| relatively worse at what it was once best at. Google search as
| a whole is much, much richer but most of what makes Google good
| today has less to do with your use case anymore.
|
| TLDR, Maybe google search works for searching wikipedia, which
| is perfectly loyal to the original WWW concept. Even they have
| a DIY search. If you work at Reddit though, and your search
| sucks, google will not fix this. Your search will just suck,
| and your app will be less usable.
| cratermoon wrote:
| > controlling UIs matters the most
|
| Or as former Google design ethicist Tristan Harris wrote,
| "Whoever controls the menu controls the choices."
|
| https://observer.com/2016/06/how-technology-hijacks-
| peoples-...
| formerly_proven wrote:
| > Either search is not an important feature, and a
| suboptimal, DIY implementation that looks OK is good enough.
| Or, search is a primary feature and then you need control
| over it. IE, if you have an online store, travel site or
| dating app with a search based UI, then you'll probably roll
| your own.
|
| Most stores have garbage search and even worse filtering. >98
| % of stores have only categories for filtering and only trash
| like "price" for sorting. In the 2 % of stores that can
| actually do faceted search the number of facets is too small
| (e.g. shoe stores which only have a "size" facet plus
| categories) or the data in the facets is garbage. This is
| likely a big contributor to sites like Geizhals existing,
| whose sole purpose is to offer decent search and filtering.
| wongarsu wrote:
| Ninety percent of ecommerce sites have terrible search
| because ninety percent of everything is crap [1].
|
| That doesn't mean that having a great search can't help
| your shop get repeat customers, just like good service,
| fast shipping or a good checkout flow can.
|
| 1: https://en.wikipedia.org/wiki/Sturgeon%27s_law
| netcan wrote:
| I don't disagree.
|
| If you have bad search, and search is a primary feature,
| then your ux sucks. I just don't think "just use google" is
| a viable alternative to making search good.
|
| If you're making an online store, search is probably
| important, and a major determinant of how well the software
| works. The best way to handle that is with search that
| doesn't suck. It may be hard, but that's the job. Fair
| point that search is not trivial to do well, but that
| doesn't mean it isn't the job.
| corobo wrote:
| If this is your situation there's a nice site search builder
| for DDG here: https://ddg.patdryburgh.com/
|
| I use that for my static site as it means I don't need to
| introduce any moving parts. Probably not the best search
| experience but it gets the job done
| mjlee wrote:
| I don't see it anymore, but people used to put a google search
| bar on their website that would use the sitesearch parameter to
| restrict results to their domain.
|
| It looks like there is a modern alernative - seems a little
| more complex to get going with which might be why it's not so
| common.
|
| https://developers.google.com/custom-search
| WalterBright wrote:
| We do it at https://dlang.org
|
| It's not the greatest, as it seems to prefer to return
| references in D forums in preference to the manual pages, but
| it works well enough.
| totetsu wrote:
| There used to be a big yellow blackbox server called Google
| search appliance that you could put in your data center and
| get internally indexed white label google search.
| bboreham wrote:
| Before Google figured out what its business model was.
| hinkley wrote:
| I was going to say something similar.
|
| It's not common, but it happens. For small sites it should
| possibly happen more. If you're working hard to achieve
| mediocrity, you should put that energy somewhere else where
| you can at least get to good.
| BatteryMountain wrote:
| This only counts for public sites with semi-static
| information.. I've been on plenty of internal/LOB projects
| where the filtering/searching/slicing of data and exporting to
| various formats are one of the core features of the
| application, where the data will also be different as time
| marches forward (non-static sites). Thousands of non-tech
| people rely on those kinds of features to do their job. Oh, and
| some of them are behind firewalls/proxies/vpn and never see the
| light of day on the public internet, so no search crawlers can
| see them - and even if they could, they are not optimized for
| crawlers and/or too dynamic.
| pc86 wrote:
| Well they said in the _very first sentence_ it only applies
| if you 're indexed by search bots. Internal apps (typically)
| aren't accessible at all, and quickly-changing data isn't
| indexed in time to be relevant.
| conceptme wrote:
| It can still be public and indexable but you will still
| miss finer grade filters.
| friedman23 wrote:
| I really do not agree. There are many technical blogs that I
| like and navigating the content on their site is a pain. They
| will have interesting multipart blog posts and finding all the
| connected parts requires clicking through multiple pages of 10
| entry blog posts to find what I want. Sometimes they will have
| links between related posts in the blog post if you are lucky.
| If they are writing about a specific concept, good luck you
| need to start clicking.
| [deleted]
| thombat wrote:
| Or as a slight refinement implement it as feeding
| ("site:mysite.com " + user_query) through to Google, to spare
| your users 2000 Pinterest hits and 1000 scrapes of a very old
| Stack Overflow question that somehow matches.
| kristopolous wrote:
| Indeed, there's above board white label options for embedded
| search from most of the indexers, usually for a nominal fee.
|
| Really depends on what kind of content your dealing with.
| Search can be as hard as you want it to be.
| forgotmypw17 wrote:
| Third-party search comes at a huge price for me. Crawler bots
| kick the shit out of my websites, introducing tons of
| unnecessary noise and load.
|
| I've chosen to block them all and only spread the word
| organically instead.
|
| I have yet to begin tackling the search problem :)
| woolion wrote:
| Is there any actual evidence that might be true? I encountered
| this argument many times, but only from programmers (who
| thought it was to difficult because of their tech stack), never
| from users.
|
| As a user, there is no site that does not have search, no
| matter how good it is, that does not feel quite horrible
| because of that. Even the search by Google feel horrible --
| romhacking comes to mind. It is also frustrating because some
| instances of title searches are notoriously unindexable, while
| a nice site search lets you narrow the context through tags or
| model specific attributes (system, genre, ...).
| kristopolous wrote:
| The statement is intentionally hedging and qualitative. Some
| people do the usage pattern, some don't.
|
| The advocacy is to push back on search as a difficult and
| often unnecessary problem to solve.
|
| The caveat is my increasingly toxic pattern to push back on
| almost everything as unnecessary. It's probably overly
| antagonistic.
|
| I'm a techno pessimist programmer. I didn't understand this
| attitude when I was younger but then again people often
| become that which they fail most to understand
| woolion wrote:
| Formulated like that, I'm very sympathetic to your point of
| view. Having a small set of well-refined features rather
| than a growing spaghetti of half done ones. I guess my own
| point of contention is that programmers would say "task X
| is hard" when it isn't that much, it is that their
| technical choices make it hard.
|
| For instance, one Front-End engineer was really proud of
| using a very recent and trendy framework to rewrite the
| whole site from scratch. After deployment, people are
| ordering the wrong products. His reply was, "keeping the
| query filters in sync is too hard of a problem". It's
| really not.
| dgb23 wrote:
| I don't want users to leave the site and use google. That's bad
| UX and DX at the same time as it only increases friction and
| uncertainty.
|
| I want them to believe that if it is there, they'll find it
| with search. But that depends on how good it is and how helpful
| the feedback is.
| cerved wrote:
| but if the search isn't good enough I'll just go to Google
| anyway but now you've annoyed me with your own useless site
| search, which is what always happens
| Graffur wrote:
| How is it bad developer experience (DX)?
| dgb23 wrote:
| We don't want to be even more reliant on optimizing for
| something we have little to no control over right? Maybe
| this is more of a mindset/approach thing. But I like making
| things where I can communicate the guarantees and
| assumptions with confidence.
|
| Note I'm not saying we shouldn't care about search engines.
| They are extremely useful and important. I'm saying if you
| have the type of content that benefits from being navigated
| via search, then consider direct control over this.
| amelius wrote:
| Yes, when I'm browsing documentation, e.g. for Python, I
| don't want to leave the context and type "X Python", where X
| is what I'm looking for. Also Google searches in that case
| might give all kinds of irrelevant results, like
| StackOverflow pages (sometimes useful, but not if I want to
| read just the documentation), ads, and such. Typing
| "site:docs.python.org" is also out of the question.
| resonious wrote:
| I think this very much depends on what kind of service your
| site provides, but in many cases I think we can argue that
| the Google fallback is good UX. It means the user doesn't
| need to figure out and navigate a new/different search system
| for each service.
|
| If the GP's right and people really do just drop out and use
| Google every time (even though the target site has a search
| function), then I think it's hard to argue that it's bad UX.
| Unless you want to argue that the specific sites' search
| functions are poorly made.
| dgb23 wrote:
| > Unless you want to argue that the specific sites' search
| functions are poorly made.
|
| You lose trust in the search function very quickly, if you
| gain it at all. Whether yours is effective and helpful does
| not only depend on the search implementation but also on
| whether your site actually has the information (that it
| should or is assumed to have) to be effective and helpful.
| A lot of factors come into this, representation, content,
| design both for search results and for the actual target
| pages.
|
| Now if you have the trust, are helpful and effective, then
| the UX is drastically improved.
|
| For one, search engine results have been degrading. There
| is _so_much_spam_ out there. Countless SEO sites that just
| crawl the web and generate crap output to show you ads,
| some seem to be handwritten as well. It is increasingly
| probable to get top results that are just presenting
| "stolen" content in some form or another. You get low
| effort blog-like posts that are just restating things they
| read in a discussion that is actually not the primary
| source, but a generated site, which is referring to the
| actual information somewhere else.
|
| Secondly if you have in-built, decent search people will
| use it and they will be happy for it, because you'll be
| presenting them much better suggestions in a better way and
| you'll do it faster.
|
| Think of some of the _best_ sites like MDN. Sure, you might
| enter the site via DDG or Google (in the former case you'll
| be searching directly like so: "!mdn [search_term]"). But
| when you're on the site you'll be happily navigating it via
| links and search, because it's just a very good site.
|
| Other examples of this are: wikipedia, tailwind,
| clojuredocs, reactjs, hn.algolia...
|
| They all have rich content, good search, useful links. It's
| a very effective combination.
| jillesvangurp wrote:
| It depends. If your main UI involves search, you might want to
| make sure that it is actually usable, competitive and not
| embarrassing.
|
| If your main business is selling stuff that users find via your
| search features, then doubly so of course since you are
| literally losing cash every time your search ends up not doing
| its job. Easy to measure too and most eCommerce companies that
| survive long enough do that obsessively because it shows in
| their recurring revenue if they don't.
|
| Also, if your competitors have awesome search and you don't,
| your users might realize and jump ship. If you have content
| that is great but nobody is finding it, you might want to fix
| it by allowing them to find it via better search functionality.
|
| If search is actually not critical to your UX or product, then
| by all means, cut corners. Google will happily redirect users
| to your competitors as well. Make sure to give them plenty of
| money for keywords. If you don't, somebody will. Either way,
| your analytics will tell you how people come to your site and
| what they do once they get there.
|
| Either way, it's not that hard to build a decent search
| experience if you know what you are doing. The key point of
| this article is that many engineers kind of don't know what
| they are getting into and mess it up.
| severus_snape wrote:
| I use DuckDuckGo's bangs[0] to search, say Wikipedia (!w, !wf),
| Word Reference (!wref, !wrfe), Hacker News (!hn). They are
| incredibly useful and using Google would be a waste of time for
| such queries. I'm glad these sites implemented search.
|
| [0]: https://duckduckgo.com/bang?q=
| fatnoah wrote:
| >They'll just go to Google and type in their search followed by
| terms such as Wikipedia, imdb, Stackoverflow, YouTube,
| Bandcamp, Amazon, eBay, Yelp... all sites that spent a lot of
| time on their search and have done quite a decent job. Oh well.
|
| I'm someone who does this, and I do it because I've found that
| it either works better than a site's specific search and/or
| it's so much faster to go to Google and search than to try to
| find the search functionality of the specific site.
| roenxi wrote:
| I agree. But I do think there is some room to look in to the
| details. It is possible that these search features targeted at
| someone other than the general audience.
|
| And for something like EBay, or Amazon, they'd be mad as
| hatters to send any search traffic on their website out to
| Google where competitors are buying ad space. They don't really
| have a choice, they have to try and implement their own search
| even if only a small number of people use it. That is likely
| free money as far as they view the world.
| marginalia_nu wrote:
| Wikipedia actually doesn't want to be indexed by third party
| search engines. Mediawiki is a heap of underoptimized PHP
| garbage so pages are (expensively) rerendered every time you
| visit them.
| mistrial9 wrote:
| > underoptimized PHP garbage
|
| rather uncharitable description - that PHP implements dozens
| of interesting features and recall. It is true that it is a
| mess, however.
| kiryin wrote:
| I will very quickly stop using a given website if it doesn't
| provide a good, on-site search. If I'm looking for information
| that I know exists on a particular website, I see no reason to
| involve a third-party search engine in the equation at all.
|
| Granted, a lot of sites out there that provide a search
| function do not necessarily need one, and would instead benefit
| from an organized, hierarchial index. With these types of
| websites, using the search function only adds unnecessary
| friction.
| ratherbefuddled wrote:
| Given the state of search functionality built into websites I
| use often I can only imagine you have a very heavily curated
| list.
|
| In my experience so many sites have searches that return bad
| results, restrict your ability to see a list of results due
| to some poorly thought through typeahead functionality, or
| simply do not work without third party scripts and cookies
| that I often resort to "site:<domain>" in ddg or worse,
| google.
|
| I find hierarchical indexes only of marginal use when looking
| for information. I can only know where in the hierarchy
| something is if I'm very familiar with the website - because
| many things could logically be in more than one place.
| sriram_malhar wrote:
| Would it bother you if foo.com took your query and send
| "query site:foo.com" to google? The results would be foo.com
| specific.
| hihihihi1234 wrote:
| I often find this gives better results. Eg I'm _much_
| likely to find what I 'm looking for if I search for
| "(query) site:reddit.com" than if I use reddit's own search
| feature.
| hihihihi1234 wrote:
| Much more likely*
| matheusmoreira wrote:
| > organized, hierarchial index
|
| Yes!!
|
| So many blogs without a list of posts in chronological order.
| So many sites without a list of pages. So annoying.
| lobocinza wrote:
| I agree with you but many sites use a third-party search
| engine to provide on-site search functionality. e.g.
| Algolia's DocSearch
| bryanrasmussen wrote:
| A free text search on your content via a search input is the
| killer app of a search engine but it is not the only app of a
| search engine.
|
| A search engine (and its backing index), just like a relational
| database, is a technology that allows you to build all sorts of
| solutions on top of it. Nobody says don't use a graph db
| because Facebook already has the best friend of a friend
| solution you could hope to find.
| kristopolous wrote:
| Yeah. Absolutely. There's plenty of merits in the pursuit.
|
| But products are constrained by requirements and most of the
| time the manager thought search was in there, after a little
| talk, search was not in there any more.
|
| There's resources, deadlines, budgets, contracts to satisfy,
| blah blah blah. The insight is this costly feature can
| usually be shaved from the requirements with a little
| conversation.
|
| A big secret of "10x" programming is realizing "90%" of the
| work isn't truly necessary.
|
| But yes, search is totally a fantastic thing to do if you
| have the space and expertise to do it.
| sharikous wrote:
| Ironically I use hackernews's search engine a lot
|
| Also Google search is very "fluid". Sometimes, for example for
| searching documentation, you need something more advanced.
| rdevsrex wrote:
| Yeah, the key part is IF, you are indexable. In the project I'm
| working on everything is behind a paywall.
| albertzeyer wrote:
| Many sites with paywalls still open up their content for
| search bots.
| sorokod wrote:
| "Critical" is subjective but:
|
| * Are Google indexes updated frequently enough to capture
| important changes?
|
| * If you consider certain search terms synonymous, would Google
| do so too?
|
| * Are Google result rankings compatible with yours?
| bjoernbu wrote:
| If I think of myself as a user, that is spot on for
| information/entertainment, but far from the truth for products.
| The most extreme examples being everyone using "$THING
| Wikipedia" but nobody I know using "$THING Amazon" in Google
| search
| tda wrote:
| I have never ever been ale to find anything on Wikipedia with
| search, except if I know the title of the article I'm looking
| for. But as I often do want the information from Wikipedia, I
| just search <search term + wiki>, and it takes me straight to
| where I want to be, whether I'm using DDG or Google. Works so
| much better than !w
| chmod775 wrote:
| Lately I had to scroll down on Google search results a lot to
| find the relevant Wikipedia article, it often being somewhere
| below some irrelevant images, followed by a completely
| unasked for and irrelevant map (why in gods name would I care
| for where the nearest factory for a product is?), some random
| blogspam, and ads.
|
| It used to be that you reliably had the Wikipedia article at
| the top of your results to provide context and basic
| information in case you didn't know what your search term
| means, you could _expect_ it to be there if it exists. Now
| you have to waste mental energy hunting it down, which is a
| waste if there 's no article at all.
| kristopolous wrote:
| I've lost faith in the ability of the Google approach.
|
| Their results seem to have been turning to trash but then
| again, so has everyone else's.
|
| There's a few explanations. Easiest one is me, I'm getting
| dumber with age or have changed my standards. Second one is
| they are all using similar approaches and SEO'ing has
| ruined search. Third is Google sets "the standard" and the
| other engines tweak themselves to follow the goog,
| regardless of results.
|
| Reality is it's probably all 3 and a few more that I
| haven't thought of
| sydthrowaway wrote:
| An interesting example is the search query "<question about
| anything> reddit".
|
| Without the 'reddit' qualifier, the results are nearly always
| spammy and useless (as much as modern Reddit tries to compete
| here notwithstanding)
| EE84M3i wrote:
| Its pretty often that I do a search, get bad results, then
| add reddit and get good results, so I hope some data analyst
| in Google sees those as opportunities to improve.
| abecedarius wrote:
| Reddit itself has an annoying dark-patterns UI. You have to
| edit the URL into old.reddit.com to get a reasonable page to
| read.
| soheilpro wrote:
| That's not always true.
|
| I run https://volt.fm and search is one of the most used
| features. If it didn't have built-in search, I doubt any of the
| users would use Google instead to find the artists/songs they
| were looking for.
| kristopolous wrote:
| I'm sure you're right and you've done the analysis. What's
| the inbound search engine versus your search endpoint as a
| fraction of each other?
| fnord123 wrote:
| Even further, any ! on DDG is using the site's search so even
| if we go to a search engine (e.g. `foo !w`), it's possible
| that we are using the website's search anyway.
| RicoElectrico wrote:
| You're totally right. Especially forum search engines are
| terrible. This is the best bang-for-buck solution.
|
| If someone has any idea why forum search experience is so bad
| and know how to improve it, please chime in. I have my hunches,
| but let's not get ahead of myself.
| mro_name wrote:
| or do it like the w3c - make a search form with action
| https://www.w3.org/Help/search?q=xslt.
|
| Privacy may be a thing, however.
| mekster wrote:
| Are you saying a customer who came from a search engine should
| go back to it when in need of searching another product on your
| site?
|
| It's such a bad practice to let the users leave for whatever
| reason.
|
| Sites are putting every effort not to leave their site and this
| is anti pattern.
|
| Also I'd think the site doesn't even enough budget to put a
| search function and the business or the manager has some
| problem.
| ronvoluted wrote:
| I won't repeat what others have said about advances in natural
| language processing since 2017, but it's true that it's a solved
| problem if your problem isn't "perfect search" but a more
| realistic "excellent search".
|
| > Use existing technologies first: As in most engineering
| problems, don't reinvent the wheel yourself. When possible, use
| existing services or open source tools. If an existing SaaS (such
| as Algolia or managed Elasticsearch) fits your constraints and
| you can afford to pay for it, use it.
|
| I work at an AI search company (Relevance AI) and even we see
| that InstantSearch.js is all some people need in terms of UI. We
| created a version of it that uses our NLP-backend but is still
| the same Algolia components on the frontend:
| https://www.npmjs.com/package/@relevanceai/instant-relevance
|
| The reason was because those components work. Think these days
| you'd need to ask a lot of questions before completely rolling
| your own UI or NLP handling.
| andi999 wrote:
| Good read, misleading title. Shd more be like "what you want to
| know if you seriously need to implement search functionality".
| sne11ius wrote:
| Yes. If I knew everything I "need" to know according to these
| articles ... I would know a lot of stuff I never need.
| patwolf wrote:
| Google ruined search, by which I mean they made it so good that
| everyone expects search to be as good as Google.
|
| When building an inexpensive app, the client will often ask for
| search. The UI designer will oblige and add a search bar to the
| app. Neither will give much thought to what the search will
| actually do except to say "make it work like Google".
| draw_down wrote:
| Seeing that search bar in mock-ups is so triggering. It implies
| 500 follow-up questions about how the search should function in
| different scenarios, none of which has been considered in depth
| by the designer nor the product owner. And if you ask them all
| you'll be considered "difficult".
___________________________________________________________________
(page generated 2021-10-18 23:02 UTC)