hngopher.com

       [HN Gopher] What software engineers should know about search (2017)
       ___________________________________________________________________
        
       What software engineers should know about search (2017)
        
       Author : todsacerdoti
       Score  : 315 points
       Date   : 2021-10-18 05:57 UTC (17 hours ago)
        
 (HTM) web link (scribe.rip)
 (TXT) w3m dump (scribe.rip)
        
       | whakim wrote:
       | I want to plug Algolia a little bit. For small teams, the results
       | are truly incredible, and it provides a ton of features out-of-
       | the-box without a lot of development time. I've seen small teams
       | struggle mightily with both ElasticSearch and Solr; things
       | usually start off OK, but results go downhill as the search needs
       | get more complicated (indexing multiple kinds of documents;
       | adding more and more different kinds of data to the index and
       | trying to weight things properly; dealing with tradeoffs such as
       | relevancy vs recency; etc.). In the future I would shy away from
       | such powerful tools unless I knew I had sufficient engineering
       | resources to dedicate to search and my problem was complex enough
       | to merit it.
        
       | jamesbriggs wrote:
       | Being from 2017 the article misses some of the coolest advances
       | in semantic search, which is now pretty easy and lets you search
       | in (almost) the same way they would ask a shop assistant when
       | looking for something specific - "do you know where the thing
       | with the cool circles and pointy bits is?" (maybe being a little
       | more specific...)
       | 
       | Google do this and they're very good at it, but a lot of
       | companies need their own search capabilities - think about those
       | internal help pages. They usually seem super outdated compared to
       | the semantic search capabilities of Google.
       | 
       | In the end there's only a few components to it, you use some NLP
       | model to create what are called 'dense vectors'. Then you put all
       | these dense vectors into an 'index' which is optimized for fast
       | search (that comes under the umbrella of ANN search). Then given
       | a new query you just compare that to the items in the index and
       | return the most similar results.
       | 
       | I covered the search part of it (https://www.pinecone.io/learn/)
       | with Pinecone, who provide managed-search - although here we
       | mainly focus on Faiss (a great engine from Facebook AI). However,
       | we're also looking to create some content covering the first half
       | too, which is 'how to build dense vectors' using models like
       | BERT, we have one post so far on that
       | (https://www.pinecone.io/learn/dense-vector-embeddings-nlp/)
        
       | danpalmer wrote:
       | The biggest thing I've learned about search is to be very precise
       | about WHAT is being searched. In my experience most people are
       | very loose with this, but by being really precise you can skip
       | past a lot of bad search experiences.
       | 
       | Take clothing search, "blue t-shirt", what is the search space?
       | Most engineers would naively shove the name/description into
       | Elastic Search and then be surprised when it doesn't work. Why?
       | Because most product descriptions are not as explicit as "This is
       | a blue t-shirt" (see also YouTube video descriptions).
       | 
       | What we actually found was that by searching categories (where
       | blue t-shirts is roughly a category), and then just filtering
       | clothing items to that category, search worked far better.
       | Understanding what terms people used (categories/colours) and
       | what was actually in the data and what data models we should be
       | searching (not products!), we built a far more effective search
       | experience.
        
         | dotancohen wrote:
         | One strategy for this is to parse the search string for facets.
         | Presumably there would be a facet for "t-shirt" and another
         | facet for "blue", modulo synonyms and typos. Selecting those
         | facets should give very good results, even without hitting the
         | full-text search.
        
           | danpalmer wrote:
           | Yes that's a good option, but it still requires that you've
           | got the facets in the index. I'm surprised at how many people
           | say "just use postgress full-text search" without thinking
           | through what information will actually be made available to
           | search, for example.
        
             | dotancohen wrote:
             | No, I'm not saying to have the facets in the index. I'm
             | saying to parse the search text for facets, and to search
             | on the facets.
             | 
             | If the whole string matches facets, then the full text
             | search never gets touched, as in the example above. But if
             | the example were e.g. "blue anime t-shirt" then you could
             | facet on "blue" and "t-shirt" while FTS only has to contend
             | with "anime".
        
       | divbzero wrote:
       | HN discussion from 2017 on the same article at a different URL:
       | 
       | https://news.ycombinator.com/item?id=15231302
        
       | amir wrote:
       | Not sure who needs to know this but scribe.rip is a relatively
       | new alternative reader for Medium. The original article is at:
       | https://medium.com/startup-grind/what-every-software-enginee...
       | 
       | HN's special treatment of medium.com links doesn't apply to
       | scribe ones.
        
         | mellosouls wrote:
         | (2017) as well.
         | 
         | Previous discussion from the time, of this excellent article:
         | https://news.ycombinator.com/item?id=15231302
        
         | hnbad wrote:
         | That explains why all the link texts are weirdly misaligned. Do
         | the authors have any input or does it just scrape Medium for
         | content?
        
           | krageon wrote:
           | What would they have input about?
        
           | jamesbriggs wrote:
           | A lot of websites scrape Medium without the authors having
           | any idea, I assume it is another one of those
        
       | zuhayeer wrote:
       | Underrated how effective indexOf can be for search if you don't
       | have an incredibly large set of data
        
       | jimmytidey wrote:
       | Sometimes, search is broken design. The assumption in the article
       | is that the search function is trying to return what you are
       | looking for, however, Spotify and Netflix have their own ideas
       | about what content they want you to consume.
        
       | forwidur wrote:
       | Author of the article here: thanks for re-posting this! The
       | article seems to be still relevant to many even 4 years later,
       | which surprises me, given how quickly everything is changing in
       | the field.
       | 
       | However, I personally am now focused (and bullish) on DNN-based
       | semantic search. Having built several search experiences based on
       | it I'm convinced it is the future.
        
       | marginalia_nu wrote:
       | I think more software engineers should do themselves the favor of
       | dabbling in search engine design.
       | 
       | It really is the gift that keeps on giving in terms of
       | interesting problems, you get everything from large scale graph
       | processing, esoteric data structures, down to bit twiddling
       | optimization, language processing, dealing with seriously messy
       | and sometimes adversarial real world data. Everything is
       | challenging, but none of it is impossible, even as a solo
       | project.
        
       | jimmyvalmer wrote:
       | tldr (4,473 words); Search is a classic NLP problem replete with
       | all of natural languages' wide-ranging input and evaluation
       | vagaries. I admire Russian get-to-the-point bluntness in most
       | contexts, but for whatever cultural reason, their writing is
       | always Tolstoyesque.
        
       | rapsey wrote:
       | Something completely off topic about the content of the article
       | but not when it comes to the title.
       | 
       | Github search is a great resource for any engineer. It has saved
       | me so many hours trying to figure some API out when I know there
       | must be so many people who have used it before.
       | 
       | "SomeApiCall" extension:.[js|rs|java|cs|etc.]
       | 
       | Or to find projects that use a dependency you are not sure how to
       | use
       | 
       | "SomeDependency" extension:.[toml|json|etc.]
        
         | mercora wrote:
         | i did not know about these[0] keywords so thanks for that :)
         | 
         | [0] https://docs.github.com/en/search-github/searching-on-
         | github...
        
           | alserio wrote:
           | GitHub really needs a feature to skip searching in tests. Or
           | maybe there is such a feature but I've never found it.
        
       | amelius wrote:
       | Donald Knuth's volume on Searching and Sorting needs a successor.
        
       | karterk wrote:
       | A problem close to my heart. Good search is certainly still too
       | difficult to pull off for small teams, and this was one of my
       | motivations for building and open sourcing Typesense[1].
       | 
       | Most people think of search and immediately think of large data
       | sets, but the problems that plague smaller datasets are equally
       | interesting. It's less about performance and more about
       | relevance. For e.g. searching across multiple fields for a
       | compound query like "taylor swift style", requires breaking the
       | query into segments (taylor swift | style) before searching for
       | the appropriate fields. There are also a class of problems that
       | traditional search engines that rely on BM25 or TF-IDF for
       | ranking cannot reliably solve (e.g. searching on small texts like
       | titles) where you have to consider distance between matching
       | words (which TF-IDF and BM25 miss). Lastly, there is also
       | personalization which is almost always left as an exercise to the
       | reader :)
       | 
       | [1]: https://github.com/typesense/typesense
        
         | carlreid wrote:
         | Typesense has been great for us so far. Easy to set up, works
         | great with the simple queries that we need.
         | 
         | Lots of great additional functionality on top of search, for
         | example security, with scoped API keys and the likes that we're
         | looking forward to making use of.
        
           | karterk wrote:
           | Thank you for your kind words. We are just getting started :)
        
             | jimmyvalmer wrote:
             | This phrase "Thank you for your kind words" has in the last
             | 1-2 years become the standard way people used to just say
             | "Thanks". It sounds so cringey and mawkish.
        
         | AtNightWeCode wrote:
         | For sure it depends on the quality of data and the target of
         | the service. Basic knowledge in Elastic search will beat Google
         | for confined data sets.
         | 
         | Major search engines have grown to the size where it is
         | theoretical more beneficial to train a model and then query
         | that model. In theory this will win but in the real world this
         | will not work as we already can see with the failing Google
         | search engine.
        
         | slimsag wrote:
         | As a long time lurker of your work, and someone who works on a
         | fair number of search engines myself - I'm curious, how well
         | does TypeSense handle code search (punctuation, etc.)?
        
           | karterk wrote:
           | Finally added support for indexing and/or separating on
           | specific symbols in the ongoing pre-release builds. With
           | that, I think Typesense should be able to handle code search,
           | but I have never tried to index and search on code myself :)
        
             | slimsag wrote:
             | Cool! I'd be keen to try it out sometime if you have docs
             | on how I'd get it to index specific symbols
        
               | karterk wrote:
               | Yes, please see here: https://github.com/typesense/typese
               | nse/issues/122#issuecomme...
               | 
               | The latest RC build is `0.22.0.rcs18`.
        
       | bagels wrote:
       | First thing would be to see if Elasticsearch works for you,
       | because it will for many applications.
        
       | svacko wrote:
       | I think, it would benefit readers to mention more than the 2
       | mentioned groups of the search-related software - there is a lot
       | of new generation tools and libraries like MeiliSearch,
       | Typesense, Sonic, Toshi, Bleve, ...
        
       | ramraj07 wrote:
       | The simplest advice to engineers would be to not think elastic
       | search as some black box that will solve all of your search
       | problems. In fact, if you've never implemented search before it's
       | the last tool you need. Postgres full text search is all you
       | need. The most important thing in search is to surface relevant
       | results and no one can quantify relevancy. It's as unquantifiable
       | as it gets in technology. You need to understand what results are
       | relevant for your users and find a metric that would work to rank
       | accordingly.
        
       | ChrisMarshallNY wrote:
       | That's a useful and well-written article. It was written 4 years
       | ago, though, and NLP has improved significantly in that time (in
       | English).
       | 
       | My experience using natural-language search queries has given me
       | a set of expectations, as a user, that set a high bar.
       | 
       | I've written a fairly substantial backend for a search, and most
       | of my original frontend code has been ripped out and replaced
       | with new frontends, by now. I feel that the new frontends are an
       | improvement, but we still have a fairly strictly-guided
       | "guardrail" search.
        
       | mattbee wrote:
       | Since I didn't see this mentioned yet - I was writing a web
       | search for a client's support data, and wasn't looking forward to
       | setting up ElasticSearch. But I found the FTS5 extension built
       | into SQLite. It was trivial to set up, my client was happy with
       | the results after very little tuning, and it's SQLite so you've
       | probably already got it on your computer:
       | https://www.sqlite.org/fts5.html
        
       | yewenjie wrote:
       | Link to Whoosh is not correct.
        
       | artembugara wrote:
       | This article is incredible.
       | 
       | I've been running a news search engine API [1] for about 18
       | months, and I found a lot of new things.
       | 
       | The biggest takeaway (IMO) is to use existing SaaS at first.
       | We've been using ElasticSearch Service for our v1, and it worked
       | great. 0s downtime with no master nodes.
       | 
       | And yes, ElasticsSearch itself is just a tool: you'd have to
       | write your own search logic on top of it.
       | 
       | Also, start with something "strict" that gives decent results for
       | ppl who won't read any tips or docs: 99,99% of people who use
       | Google Search don't know about any advanced tips, and still get
       | relevant results.
       | 
       | [1] https://newscatcherapi.com/news-api
        
       | denysvitali wrote:
       | Friendly reminder to the friends at Atlassian. This could be a
       | good starting point to fix your damn Confluence search
        
         | donkeydoug wrote:
         | look, if they fix search they might have to stop using
         | "Confluence; where information goes to die" as a tagline... so
         | not an easy sell.
         | 
         | seriously though... default confluence search-as-you-type box
         | is horrible, however sometimes (often ? always ?) there is a
         | more traditional search endpoint available that seems to give
         | better results. see if either of these works for you:
         | 
         | <your confluence domain>/dosearchsite.action
         | 
         | -or-
         | 
         | <your confluence domain>/search/searchv3.action
        
       | scraplab wrote:
       | I often come back to the Relevant Search book, by Doug Turnbull
       | and John Berryman. I'm sure some of the examples are a little
       | dated now, but most of the advice and approach is still sound and
       | it's a great tour through all the things you need to consider to
       | build a great search experience.
       | 
       | https://www.manning.com/books/relevant-search
        
         | softwaredoug wrote:
         | Doug Turnbull here - Thank you! :)
        
       | qwertox wrote:
       | > Merging results of different kinds: e.g. Google showing results
       | from Maps, News etc.
       | 
       | Google does have an UX bug in its search/map product.
       | 
       | Let's say I google for "kuka deutschland". In my case, the first
       | result is a link to KUKA, with sublinks to different sections of
       | their website.
       | 
       | Next there's a map with A-, B-, C-markers, and below them fields
       | with A: KUKA Deutschland GmbH [other info, website, route] B:
       | KUKA Systems GmbH [...] and so on.
       | 
       | If I now click on A: KUKA Deutschland GmbH or on the map, then
       | the map opens completely, with a left sidebar containing more
       | related places, a popup with the currently selected item open
       | which contains plenty-full information about the marker on the
       | map.
       | 
       | The issue is that this is not a Google Maps map, it is not hosted
       | under maps.google.com, but any normal person would expect to be
       | able to interact with this map as if it were a normal map: have a
       | button for satellite imagery, 3d-view and all of this, the
       | ability have the normal Google Maps sidebar at my disposal. But
       | this does no exists.
       | 
       | The worst thing about this view is that there is _no link_ to
       | Google Maps which would open that same view in Google Maps.
        
       | throwaway923840 wrote:
       | I don't understand why people still waste so much money on making
       | some perfect search engine when they can just filter on tags.
       | Every retail product sales website in the world works by
       | filtering on tags, not deciphering search terms in the "right"
       | way.
       | 
       | Want black shoes? I could search for "black shoes", and receive
       | shoes with the brand/product name "Black". Or I could select
       | 'category: shoes' and 'color: black' in the drop-down box. Hey,
       | look, now I have a list of black shoes.
        
         | gifnamething wrote:
         | What do I do if I want burgundy shoes? Do I decide if I think
         | it's red or purple or blue? Or do I go further and think what
         | the person listing it thought it was? Do I hope the decision
         | went the same way each time, and all shoes have their colour
         | listed?
         | 
         | I don't do any of this. I filter by shoes and look with my
         | eyes. Retail search is a poor example because existing retail
         | tagging is poor.
        
         | inertiatic wrote:
         | Because:
         | 
         | a) speaking "black shoes" into your phone is a better
         | experience than scrolling down an arbitrarily long list of
         | categories.
         | 
         | b) Is shoes a category? Is it clothing? Is it boots? Oh I need
         | an ontology now? Will users find it intuitive to have to drill
         | down on it???
         | 
         | c) Wait, the item comes in with an automatic description from a
         | thousand different vendors. How do I decide what part of it
         | maps to my "color" field? What if the color isn't in a separate
         | field? How do I make sure that this item isn't dead in my
         | inventory? How do I make sure that this popular item is
         | retrievable for me too and not just my competitors that have a
         | better search??? In this case just searching for "black"
         | anywhere in the fields of the product, weighted for
         | significance, gets me valid results without communicating to
         | the user that they filtered on a color.
        
         | inshadows wrote:
         | Lucky you. I search eshops with                 black shoes
         | site:https://shittyeshop.com
         | 
         | on Google because of abysmal search on 99% of sites.
        
       | kristopolous wrote:
       | Biggest advice I can give is you probably don't need search if
       | you're indexable by search bots.
       | 
       | No really. Look over people's shoulders sometime.
       | 
       | They'll just go to Google and type in their search followed by
       | terms such as Wikipedia, imdb, Stackoverflow, YouTube, Bandcamp,
       | Amazon, eBay, Yelp... all sites that spent a lot of time on their
       | search and have done quite a decent job. Oh well.
       | 
       | So unless you really need it for some critical reason where you
       | know your users aren't going to do their regular patterns of
       | going through one of the general engines, close the ticket as out
       | of scope and go home early.
       | 
       | Don't bother implementing hard to do and expensive to maintain
       | features that nobody will use. It'll become more headache than
       | fun real quick.
        
         | dotancohen wrote:
         | > Biggest advice I can give is you probably don't need search
         | if you're indexable by search bots.
         | 
         | I don't know how relevant it would be today, but ~2005 or so I
         | had a site that used Google search for the internal "search"
         | functionality. It was a webmaster feature of Google at the
         | time, the google results would be displayed with my site's
         | branding, colours, etc. Only results from my site were
         | displayed, but there was a clear Google logo and link at the
         | top. I think that ads were not shown, but I'm not 100% sure.
         | 
         | A/B testing showed significant (I don't remember the numbers)
         | loss of visitors on that page. When presented with our own
         | internal search, visitors would stay on the site the vast
         | majority of the time. But when presented with the google
         | search, visitors would often leave. I would say that the
         | results that google returned were no less relevant than those
         | our internal search provided, in most cases near identical
         | except for ordering.
        
           | outlawBand wrote:
           | So as a casual web browser, I have been guilty of this. IMHO
           | it just breaks the users experience when they feel like they
           | have been swept out to google. Almost like when someone asks
           | a question, and someone replies with a link from lmgtfy
        
             | dotancohen wrote:
             | I wish that we had spoken a decade and a half ago ))
        
         | BiteCode_dev wrote:
         | Yes, but a good embeded search makes all the diff.
         | 
         | VueJS and Tailwind CSS both are indexed, but on those
         | particular site I use the web site search _when I'm looking
         | back for the reference of something I know_, because it's
         | faster and more accurate than googling.
         | 
         | Granted, it's rare, but if you manage it, it's great.
        
           | kristopolous wrote:
           | Maybe. The point in those examples was that they all solved
           | search in a specific contextualized manner impressively well
           | and despite that people are creatures of habit and they'll
           | want to port the same generalized patterns over that they do
           | for everything else.
           | 
           | Not consciously. They'll just do it and expect it to work
           | which is why effective indexing and "SEO" (as in, the engine
           | can scrape the content and crawl around without getting
           | confused) is likely the actual work to do to implement
           | "search".
        
             | ricksunny wrote:
             | From my experience yesterday I can tell you that Zenodo is
             | struggling with reliably repeatably returning results basic
             | boolean search queries. I think I'll try parent's search-
             | the-site-via-Google approach.
        
           | svcrunch wrote:
           | Agree with your sentiment.
           | 
           | I cofounded ZIR AI to provide ML (vector)-based search as a
           | PaaS solution, similar to what Algolia or Elasticsearch do
           | for keyword searching.
           | 
           | We have a demo (https://zir-ai.com/demo) running over Quanta
           | Magazine articles. Not only can it outperform the keyword
           | search embedded on quantamagazine.org, but one some queries,
           | it even outperforms Google with a site restrict (e.g. how old
           | is the universe site:quantamagazine.org).
           | 
           | > Granted, it's rare, but if you manage it, it's great.
           | 
           | ML-powered search will make this commonplace, once the tech
           | goes mainstream in the next 2-3 years.
        
         | Noumenon72 wrote:
         | That does make you dependent on Google not starting to suck.
         | For the first time last week I had the experience of Googling a
         | programming question and realizing the results were so bad I
         | should probably try searching StackOverflow directly. I did and
         | got twice as many hits.
        
         | lessname wrote:
         | You might be right in most cases (for example blogs etc) but
         | it's just disturbing on apps like spotify if you just can't
         | find the music you want to listen inside the app. Or when
         | Netflix gives unavailable movies as search suggestions.
        
         | netcan wrote:
         | This is true in principle, but in practice, the use of 3rd
         | party search has died down over the last decade. This is not
         | the phenomenon of a superior way winning out.
         | 
         | Either search is not an important feature, and a suboptimal,
         | DIY implementation that looks OK is good enough. Or, search is
         | a primary feature and then you need control over it. IE, if you
         | have an online store, travel site or dating app with a search
         | based UI, then you'll probably roll your own.
         | 
         | There are cases where google really is the best way. As you
         | say, stackoverflow, wikipedia & such. Even so, you'll
         | eventually roll your own. Spolsky original UI concept totally
         | leaned on Google for search, but SO still has its own.
         | 
         | At some point, you'll need results to take inventory into
         | account or make autocomplete smarter about tags... and now the
         | headache is yours anyway.
         | 
         | It would have been cool if the web had really developed into
         | the hopeful, "semantic web era" where this kind of approach
         | works. That didn't happen. Half the game is over control, and
         | controlling UIs matters the most.
         | 
         | Also, the power of pagerank has dwindled as the web itself
         | changed. Links aren't what they used to be. That makes Google
         | relatively worse at what it was once best at. Google search as
         | a whole is much, much richer but most of what makes Google good
         | today has less to do with your use case anymore.
         | 
         | TLDR, Maybe google search works for searching wikipedia, which
         | is perfectly loyal to the original WWW concept. Even they have
         | a DIY search. If you work at Reddit though, and your search
         | sucks, google will not fix this. Your search will just suck,
         | and your app will be less usable.
        
           | cratermoon wrote:
           | > controlling UIs matters the most
           | 
           | Or as former Google design ethicist Tristan Harris wrote,
           | "Whoever controls the menu controls the choices."
           | 
           | https://observer.com/2016/06/how-technology-hijacks-
           | peoples-...
        
           | formerly_proven wrote:
           | > Either search is not an important feature, and a
           | suboptimal, DIY implementation that looks OK is good enough.
           | Or, search is a primary feature and then you need control
           | over it. IE, if you have an online store, travel site or
           | dating app with a search based UI, then you'll probably roll
           | your own.
           | 
           | Most stores have garbage search and even worse filtering. >98
           | % of stores have only categories for filtering and only trash
           | like "price" for sorting. In the 2 % of stores that can
           | actually do faceted search the number of facets is too small
           | (e.g. shoe stores which only have a "size" facet plus
           | categories) or the data in the facets is garbage. This is
           | likely a big contributor to sites like Geizhals existing,
           | whose sole purpose is to offer decent search and filtering.
        
             | wongarsu wrote:
             | Ninety percent of ecommerce sites have terrible search
             | because ninety percent of everything is crap [1].
             | 
             | That doesn't mean that having a great search can't help
             | your shop get repeat customers, just like good service,
             | fast shipping or a good checkout flow can.
             | 
             | 1: https://en.wikipedia.org/wiki/Sturgeon%27s_law
        
             | netcan wrote:
             | I don't disagree.
             | 
             | If you have bad search, and search is a primary feature,
             | then your ux sucks. I just don't think "just use google" is
             | a viable alternative to making search good.
             | 
             | If you're making an online store, search is probably
             | important, and a major determinant of how well the software
             | works. The best way to handle that is with search that
             | doesn't suck. It may be hard, but that's the job. Fair
             | point that search is not trivial to do well, but that
             | doesn't mean it isn't the job.
        
         | corobo wrote:
         | If this is your situation there's a nice site search builder
         | for DDG here: https://ddg.patdryburgh.com/
         | 
         | I use that for my static site as it means I don't need to
         | introduce any moving parts. Probably not the best search
         | experience but it gets the job done
        
         | mjlee wrote:
         | I don't see it anymore, but people used to put a google search
         | bar on their website that would use the sitesearch parameter to
         | restrict results to their domain.
         | 
         | It looks like there is a modern alernative - seems a little
         | more complex to get going with which might be why it's not so
         | common.
         | 
         | https://developers.google.com/custom-search
        
           | WalterBright wrote:
           | We do it at https://dlang.org
           | 
           | It's not the greatest, as it seems to prefer to return
           | references in D forums in preference to the manual pages, but
           | it works well enough.
        
           | totetsu wrote:
           | There used to be a big yellow blackbox server called Google
           | search appliance that you could put in your data center and
           | get internally indexed white label google search.
        
             | bboreham wrote:
             | Before Google figured out what its business model was.
        
           | hinkley wrote:
           | I was going to say something similar.
           | 
           | It's not common, but it happens. For small sites it should
           | possibly happen more. If you're working hard to achieve
           | mediocrity, you should put that energy somewhere else where
           | you can at least get to good.
        
         | BatteryMountain wrote:
         | This only counts for public sites with semi-static
         | information.. I've been on plenty of internal/LOB projects
         | where the filtering/searching/slicing of data and exporting to
         | various formats are one of the core features of the
         | application, where the data will also be different as time
         | marches forward (non-static sites). Thousands of non-tech
         | people rely on those kinds of features to do their job. Oh, and
         | some of them are behind firewalls/proxies/vpn and never see the
         | light of day on the public internet, so no search crawlers can
         | see them - and even if they could, they are not optimized for
         | crawlers and/or too dynamic.
        
           | pc86 wrote:
           | Well they said in the _very first sentence_ it only applies
           | if you 're indexed by search bots. Internal apps (typically)
           | aren't accessible at all, and quickly-changing data isn't
           | indexed in time to be relevant.
        
             | conceptme wrote:
             | It can still be public and indexable but you will still
             | miss finer grade filters.
        
         | friedman23 wrote:
         | I really do not agree. There are many technical blogs that I
         | like and navigating the content on their site is a pain. They
         | will have interesting multipart blog posts and finding all the
         | connected parts requires clicking through multiple pages of 10
         | entry blog posts to find what I want. Sometimes they will have
         | links between related posts in the blog post if you are lucky.
         | If they are writing about a specific concept, good luck you
         | need to start clicking.
        
           | [deleted]
        
         | thombat wrote:
         | Or as a slight refinement implement it as feeding
         | ("site:mysite.com " + user_query) through to Google, to spare
         | your users 2000 Pinterest hits and 1000 scrapes of a very old
         | Stack Overflow question that somehow matches.
        
           | kristopolous wrote:
           | Indeed, there's above board white label options for embedded
           | search from most of the indexers, usually for a nominal fee.
           | 
           | Really depends on what kind of content your dealing with.
           | Search can be as hard as you want it to be.
        
         | forgotmypw17 wrote:
         | Third-party search comes at a huge price for me. Crawler bots
         | kick the shit out of my websites, introducing tons of
         | unnecessary noise and load.
         | 
         | I've chosen to block them all and only spread the word
         | organically instead.
         | 
         | I have yet to begin tackling the search problem :)
        
         | woolion wrote:
         | Is there any actual evidence that might be true? I encountered
         | this argument many times, but only from programmers (who
         | thought it was to difficult because of their tech stack), never
         | from users.
         | 
         | As a user, there is no site that does not have search, no
         | matter how good it is, that does not feel quite horrible
         | because of that. Even the search by Google feel horrible --
         | romhacking comes to mind. It is also frustrating because some
         | instances of title searches are notoriously unindexable, while
         | a nice site search lets you narrow the context through tags or
         | model specific attributes (system, genre, ...).
        
           | kristopolous wrote:
           | The statement is intentionally hedging and qualitative. Some
           | people do the usage pattern, some don't.
           | 
           | The advocacy is to push back on search as a difficult and
           | often unnecessary problem to solve.
           | 
           | The caveat is my increasingly toxic pattern to push back on
           | almost everything as unnecessary. It's probably overly
           | antagonistic.
           | 
           | I'm a techno pessimist programmer. I didn't understand this
           | attitude when I was younger but then again people often
           | become that which they fail most to understand
        
             | woolion wrote:
             | Formulated like that, I'm very sympathetic to your point of
             | view. Having a small set of well-refined features rather
             | than a growing spaghetti of half done ones. I guess my own
             | point of contention is that programmers would say "task X
             | is hard" when it isn't that much, it is that their
             | technical choices make it hard.
             | 
             | For instance, one Front-End engineer was really proud of
             | using a very recent and trendy framework to rewrite the
             | whole site from scratch. After deployment, people are
             | ordering the wrong products. His reply was, "keeping the
             | query filters in sync is too hard of a problem". It's
             | really not.
        
         | dgb23 wrote:
         | I don't want users to leave the site and use google. That's bad
         | UX and DX at the same time as it only increases friction and
         | uncertainty.
         | 
         | I want them to believe that if it is there, they'll find it
         | with search. But that depends on how good it is and how helpful
         | the feedback is.
        
           | cerved wrote:
           | but if the search isn't good enough I'll just go to Google
           | anyway but now you've annoyed me with your own useless site
           | search, which is what always happens
        
           | Graffur wrote:
           | How is it bad developer experience (DX)?
        
             | dgb23 wrote:
             | We don't want to be even more reliant on optimizing for
             | something we have little to no control over right? Maybe
             | this is more of a mindset/approach thing. But I like making
             | things where I can communicate the guarantees and
             | assumptions with confidence.
             | 
             | Note I'm not saying we shouldn't care about search engines.
             | They are extremely useful and important. I'm saying if you
             | have the type of content that benefits from being navigated
             | via search, then consider direct control over this.
        
           | amelius wrote:
           | Yes, when I'm browsing documentation, e.g. for Python, I
           | don't want to leave the context and type "X Python", where X
           | is what I'm looking for. Also Google searches in that case
           | might give all kinds of irrelevant results, like
           | StackOverflow pages (sometimes useful, but not if I want to
           | read just the documentation), ads, and such. Typing
           | "site:docs.python.org" is also out of the question.
        
           | resonious wrote:
           | I think this very much depends on what kind of service your
           | site provides, but in many cases I think we can argue that
           | the Google fallback is good UX. It means the user doesn't
           | need to figure out and navigate a new/different search system
           | for each service.
           | 
           | If the GP's right and people really do just drop out and use
           | Google every time (even though the target site has a search
           | function), then I think it's hard to argue that it's bad UX.
           | Unless you want to argue that the specific sites' search
           | functions are poorly made.
        
             | dgb23 wrote:
             | > Unless you want to argue that the specific sites' search
             | functions are poorly made.
             | 
             | You lose trust in the search function very quickly, if you
             | gain it at all. Whether yours is effective and helpful does
             | not only depend on the search implementation but also on
             | whether your site actually has the information (that it
             | should or is assumed to have) to be effective and helpful.
             | A lot of factors come into this, representation, content,
             | design both for search results and for the actual target
             | pages.
             | 
             | Now if you have the trust, are helpful and effective, then
             | the UX is drastically improved.
             | 
             | For one, search engine results have been degrading. There
             | is _so_much_spam_ out there. Countless SEO sites that just
             | crawl the web and generate crap output to show you ads,
             | some seem to be handwritten as well. It is increasingly
             | probable to get top results that are just presenting
             | "stolen" content in some form or another. You get low
             | effort blog-like posts that are just restating things they
             | read in a discussion that is actually not the primary
             | source, but a generated site, which is referring to the
             | actual information somewhere else.
             | 
             | Secondly if you have in-built, decent search people will
             | use it and they will be happy for it, because you'll be
             | presenting them much better suggestions in a better way and
             | you'll do it faster.
             | 
             | Think of some of the _best_ sites like MDN. Sure, you might
             | enter the site via DDG or Google (in the former case you'll
             | be searching directly like so: "!mdn [search_term]"). But
             | when you're on the site you'll be happily navigating it via
             | links and search, because it's just a very good site.
             | 
             | Other examples of this are: wikipedia, tailwind,
             | clojuredocs, reactjs, hn.algolia...
             | 
             | They all have rich content, good search, useful links. It's
             | a very effective combination.
        
         | jillesvangurp wrote:
         | It depends. If your main UI involves search, you might want to
         | make sure that it is actually usable, competitive and not
         | embarrassing.
         | 
         | If your main business is selling stuff that users find via your
         | search features, then doubly so of course since you are
         | literally losing cash every time your search ends up not doing
         | its job. Easy to measure too and most eCommerce companies that
         | survive long enough do that obsessively because it shows in
         | their recurring revenue if they don't.
         | 
         | Also, if your competitors have awesome search and you don't,
         | your users might realize and jump ship. If you have content
         | that is great but nobody is finding it, you might want to fix
         | it by allowing them to find it via better search functionality.
         | 
         | If search is actually not critical to your UX or product, then
         | by all means, cut corners. Google will happily redirect users
         | to your competitors as well. Make sure to give them plenty of
         | money for keywords. If you don't, somebody will. Either way,
         | your analytics will tell you how people come to your site and
         | what they do once they get there.
         | 
         | Either way, it's not that hard to build a decent search
         | experience if you know what you are doing. The key point of
         | this article is that many engineers kind of don't know what
         | they are getting into and mess it up.
        
         | severus_snape wrote:
         | I use DuckDuckGo's bangs[0] to search, say Wikipedia (!w, !wf),
         | Word Reference (!wref, !wrfe), Hacker News (!hn). They are
         | incredibly useful and using Google would be a waste of time for
         | such queries. I'm glad these sites implemented search.
         | 
         | [0]: https://duckduckgo.com/bang?q=
        
         | fatnoah wrote:
         | >They'll just go to Google and type in their search followed by
         | terms such as Wikipedia, imdb, Stackoverflow, YouTube,
         | Bandcamp, Amazon, eBay, Yelp... all sites that spent a lot of
         | time on their search and have done quite a decent job. Oh well.
         | 
         | I'm someone who does this, and I do it because I've found that
         | it either works better than a site's specific search and/or
         | it's so much faster to go to Google and search than to try to
         | find the search functionality of the specific site.
        
         | roenxi wrote:
         | I agree. But I do think there is some room to look in to the
         | details. It is possible that these search features targeted at
         | someone other than the general audience.
         | 
         | And for something like EBay, or Amazon, they'd be mad as
         | hatters to send any search traffic on their website out to
         | Google where competitors are buying ad space. They don't really
         | have a choice, they have to try and implement their own search
         | even if only a small number of people use it. That is likely
         | free money as far as they view the world.
        
         | marginalia_nu wrote:
         | Wikipedia actually doesn't want to be indexed by third party
         | search engines. Mediawiki is a heap of underoptimized PHP
         | garbage so pages are (expensively) rerendered every time you
         | visit them.
        
           | mistrial9 wrote:
           | > underoptimized PHP garbage
           | 
           | rather uncharitable description - that PHP implements dozens
           | of interesting features and recall. It is true that it is a
           | mess, however.
        
         | kiryin wrote:
         | I will very quickly stop using a given website if it doesn't
         | provide a good, on-site search. If I'm looking for information
         | that I know exists on a particular website, I see no reason to
         | involve a third-party search engine in the equation at all.
         | 
         | Granted, a lot of sites out there that provide a search
         | function do not necessarily need one, and would instead benefit
         | from an organized, hierarchial index. With these types of
         | websites, using the search function only adds unnecessary
         | friction.
        
           | ratherbefuddled wrote:
           | Given the state of search functionality built into websites I
           | use often I can only imagine you have a very heavily curated
           | list.
           | 
           | In my experience so many sites have searches that return bad
           | results, restrict your ability to see a list of results due
           | to some poorly thought through typeahead functionality, or
           | simply do not work without third party scripts and cookies
           | that I often resort to "site:<domain>" in ddg or worse,
           | google.
           | 
           | I find hierarchical indexes only of marginal use when looking
           | for information. I can only know where in the hierarchy
           | something is if I'm very familiar with the website - because
           | many things could logically be in more than one place.
        
           | sriram_malhar wrote:
           | Would it bother you if foo.com took your query and send
           | "query site:foo.com" to google? The results would be foo.com
           | specific.
        
             | hihihihi1234 wrote:
             | I often find this gives better results. Eg I'm _much_
             | likely to find what I 'm looking for if I search for
             | "(query) site:reddit.com" than if I use reddit's own search
             | feature.
        
               | hihihihi1234 wrote:
               | Much more likely*
        
           | matheusmoreira wrote:
           | > organized, hierarchial index
           | 
           | Yes!!
           | 
           | So many blogs without a list of posts in chronological order.
           | So many sites without a list of pages. So annoying.
        
           | lobocinza wrote:
           | I agree with you but many sites use a third-party search
           | engine to provide on-site search functionality. e.g.
           | Algolia's DocSearch
        
         | bryanrasmussen wrote:
         | A free text search on your content via a search input is the
         | killer app of a search engine but it is not the only app of a
         | search engine.
         | 
         | A search engine (and its backing index), just like a relational
         | database, is a technology that allows you to build all sorts of
         | solutions on top of it. Nobody says don't use a graph db
         | because Facebook already has the best friend of a friend
         | solution you could hope to find.
        
           | kristopolous wrote:
           | Yeah. Absolutely. There's plenty of merits in the pursuit.
           | 
           | But products are constrained by requirements and most of the
           | time the manager thought search was in there, after a little
           | talk, search was not in there any more.
           | 
           | There's resources, deadlines, budgets, contracts to satisfy,
           | blah blah blah. The insight is this costly feature can
           | usually be shaved from the requirements with a little
           | conversation.
           | 
           | A big secret of "10x" programming is realizing "90%" of the
           | work isn't truly necessary.
           | 
           | But yes, search is totally a fantastic thing to do if you
           | have the space and expertise to do it.
        
         | sharikous wrote:
         | Ironically I use hackernews's search engine a lot
         | 
         | Also Google search is very "fluid". Sometimes, for example for
         | searching documentation, you need something more advanced.
        
         | rdevsrex wrote:
         | Yeah, the key part is IF, you are indexable. In the project I'm
         | working on everything is behind a paywall.
        
           | albertzeyer wrote:
           | Many sites with paywalls still open up their content for
           | search bots.
        
         | sorokod wrote:
         | "Critical" is subjective but:
         | 
         | * Are Google indexes updated frequently enough to capture
         | important changes?
         | 
         | * If you consider certain search terms synonymous, would Google
         | do so too?
         | 
         | * Are Google result rankings compatible with yours?
        
         | bjoernbu wrote:
         | If I think of myself as a user, that is spot on for
         | information/entertainment, but far from the truth for products.
         | The most extreme examples being everyone using "$THING
         | Wikipedia" but nobody I know using "$THING Amazon" in Google
         | search
        
         | tda wrote:
         | I have never ever been ale to find anything on Wikipedia with
         | search, except if I know the title of the article I'm looking
         | for. But as I often do want the information from Wikipedia, I
         | just search <search term + wiki>, and it takes me straight to
         | where I want to be, whether I'm using DDG or Google. Works so
         | much better than !w
        
           | chmod775 wrote:
           | Lately I had to scroll down on Google search results a lot to
           | find the relevant Wikipedia article, it often being somewhere
           | below some irrelevant images, followed by a completely
           | unasked for and irrelevant map (why in gods name would I care
           | for where the nearest factory for a product is?), some random
           | blogspam, and ads.
           | 
           | It used to be that you reliably had the Wikipedia article at
           | the top of your results to provide context and basic
           | information in case you didn't know what your search term
           | means, you could _expect_ it to be there if it exists. Now
           | you have to waste mental energy hunting it down, which is a
           | waste if there 's no article at all.
        
             | kristopolous wrote:
             | I've lost faith in the ability of the Google approach.
             | 
             | Their results seem to have been turning to trash but then
             | again, so has everyone else's.
             | 
             | There's a few explanations. Easiest one is me, I'm getting
             | dumber with age or have changed my standards. Second one is
             | they are all using similar approaches and SEO'ing has
             | ruined search. Third is Google sets "the standard" and the
             | other engines tweak themselves to follow the goog,
             | regardless of results.
             | 
             | Reality is it's probably all 3 and a few more that I
             | haven't thought of
        
         | sydthrowaway wrote:
         | An interesting example is the search query "<question about
         | anything> reddit".
         | 
         | Without the 'reddit' qualifier, the results are nearly always
         | spammy and useless (as much as modern Reddit tries to compete
         | here notwithstanding)
        
           | EE84M3i wrote:
           | Its pretty often that I do a search, get bad results, then
           | add reddit and get good results, so I hope some data analyst
           | in Google sees those as opportunities to improve.
        
           | abecedarius wrote:
           | Reddit itself has an annoying dark-patterns UI. You have to
           | edit the URL into old.reddit.com to get a reasonable page to
           | read.
        
         | soheilpro wrote:
         | That's not always true.
         | 
         | I run https://volt.fm and search is one of the most used
         | features. If it didn't have built-in search, I doubt any of the
         | users would use Google instead to find the artists/songs they
         | were looking for.
        
           | kristopolous wrote:
           | I'm sure you're right and you've done the analysis. What's
           | the inbound search engine versus your search endpoint as a
           | fraction of each other?
        
           | fnord123 wrote:
           | Even further, any ! on DDG is using the site's search so even
           | if we go to a search engine (e.g. `foo !w`), it's possible
           | that we are using the website's search anyway.
        
         | RicoElectrico wrote:
         | You're totally right. Especially forum search engines are
         | terrible. This is the best bang-for-buck solution.
         | 
         | If someone has any idea why forum search experience is so bad
         | and know how to improve it, please chime in. I have my hunches,
         | but let's not get ahead of myself.
        
         | mro_name wrote:
         | or do it like the w3c - make a search form with action
         | https://www.w3.org/Help/search?q=xslt.
         | 
         | Privacy may be a thing, however.
        
         | mekster wrote:
         | Are you saying a customer who came from a search engine should
         | go back to it when in need of searching another product on your
         | site?
         | 
         | It's such a bad practice to let the users leave for whatever
         | reason.
         | 
         | Sites are putting every effort not to leave their site and this
         | is anti pattern.
         | 
         | Also I'd think the site doesn't even enough budget to put a
         | search function and the business or the manager has some
         | problem.
        
       | ronvoluted wrote:
       | I won't repeat what others have said about advances in natural
       | language processing since 2017, but it's true that it's a solved
       | problem if your problem isn't "perfect search" but a more
       | realistic "excellent search".
       | 
       | > Use existing technologies first: As in most engineering
       | problems, don't reinvent the wheel yourself. When possible, use
       | existing services or open source tools. If an existing SaaS (such
       | as Algolia or managed Elasticsearch) fits your constraints and
       | you can afford to pay for it, use it.
       | 
       | I work at an AI search company (Relevance AI) and even we see
       | that InstantSearch.js is all some people need in terms of UI. We
       | created a version of it that uses our NLP-backend but is still
       | the same Algolia components on the frontend:
       | https://www.npmjs.com/package/@relevanceai/instant-relevance
       | 
       | The reason was because those components work. Think these days
       | you'd need to ask a lot of questions before completely rolling
       | your own UI or NLP handling.
        
       | andi999 wrote:
       | Good read, misleading title. Shd more be like "what you want to
       | know if you seriously need to implement search functionality".
        
         | sne11ius wrote:
         | Yes. If I knew everything I "need" to know according to these
         | articles ... I would know a lot of stuff I never need.
        
       | patwolf wrote:
       | Google ruined search, by which I mean they made it so good that
       | everyone expects search to be as good as Google.
       | 
       | When building an inexpensive app, the client will often ask for
       | search. The UI designer will oblige and add a search bar to the
       | app. Neither will give much thought to what the search will
       | actually do except to say "make it work like Google".
        
         | draw_down wrote:
         | Seeing that search bar in mock-ups is so triggering. It implies
         | 500 follow-up questions about how the search should function in
         | different scenarios, none of which has been considered in depth
         | by the designer nor the product owner. And if you ask them all
         | you'll be considered "difficult".
        
       ___________________________________________________________________
       (page generated 2021-10-18 23:02 UTC)