[HN Gopher] How MDN's Autocomplete Search Works
___________________________________________________________________
How MDN's Autocomplete Search Works
Author : oedmarap
Score : 140 points
Date : 2021-08-03 17:04 UTC (5 hours ago)
(HTM) web link (hacks.mozilla.org)
(TXT) w3m dump (hacks.mozilla.org)
| muxator wrote:
| Now I am curious if, in the real MDN production site, serach-
| index.json loading is triggered by the execution of
| /static/js/autocomplete.js, when their download should really be
| started in parallel by the shim.
|
| Many websites leave a lot of performance on the table because of
| such behaviors.
|
| My hypothesis is that, since this is easier for the developer,
| and works good enough, not many people really care. But these
| things add up, and the web becomes slower and slower.
| peterbe wrote:
| MDN has 2 search things: 1. client-side only which downloads a
| complete list of all titles. 2. full-text search on everything
| with Elasticsearch.
| sa3dany wrote:
| I remember doing something similar a few years ago, I needed
| autocomplete for a shipping ports field, the data was too big
| though so I ended up using a csv file in an aws lamda function
| that filters based on the selected country and returns a much
| smaller subset. It lazy loaded after the user selected the
| country. To keep response times low I had to do a binary search
| on the raw csv bytes. It felt like I was reinventing databases
| but I liked the idea of it being self contained in a function.
| ShrigmaMale wrote:
| I like mosra's search, implemented in m.css for magnum. He wrote
| a blog post on it here:
| https://blog.magnum.graphics/meta/improved-doxygen-documenta...
| and you can try it on the magnum docs site:
| https://doc.magnum.graphics/magnum/#search
|
| Fast and can be served from a static site.
| vdm wrote:
| https://react-spectrum.adobe.com/blog/building-a-combobox.ht...
| thinkloop wrote:
| I thought this was going to be about advanced usage of
| <datalist>: https://developer.mozilla.org/en-
| US/docs/Web/HTML/Element/da...
| peterbe wrote:
| <datalist> is awesome! But I find it works better for short
| options. See https://www.peterbe.com/plog/datalist-looks-great-
| on-mobile-...
| theandrewbailey wrote:
| Typos and fuzzy searches cause <datalist> to break.
| winrid wrote:
| We took a similar approach for our documentation search. [0]
|
| You can see the "inverted index" is rendered inline in the page,
| since everything is generated at build time.
|
| When you type something that matches a key in the index, we fetch
| that index key and add it to the results. [1] [2]
|
| Obviously we could do a lot better in terms of relevancy, but
| it's simple and fast.
|
| [0] https://docs.fastcomments.com/
|
| [1] https://docs.fastcomments.com/index-ublJLBnXgz88.json
|
| [2] https://github.com/FastComments/fastcomments-
| docs/blob/main/...
| peterbe wrote:
| Relevancy is the huge game-changer. MDN uses pageviews
| analytics to determine was a "popular" age is.
| winrid wrote:
| Indeed, that's a great idea.
| ushakov wrote:
| i'm wondering how much kb it loads before ready to search?
|
| update: 144KB for JSON file
|
| a little bit worrying, given their scale and potential bandwidth
| requirements
| ourcat wrote:
| For content like this, it's much easier to download the entire
| search-index.json and run the auto-complete against that.
|
| Rather that than hitting a search endpoint (after typing a
| certain amount of characters).
| simonw wrote:
| Sadly in 2021 adding 140KB to a page isn't a big deal (given
| how heavy the rest of the page probably is) - but it really
| should be.
|
| A large chunk of the world's population still pays a locally-
| expensive rate for mobile bandwidth, and we're increasingly
| leaving them behind - or worse, pushing them into zero-rating
| internet plans which mean they can only use Facebook and
| WhatsApp while avoiding the rest of the web:
| https://en.wikipedia.org/wiki/Zero-rating
| city41 wrote:
| It's only added if the user shows an intent to search. And
| if you want to search, 144kb is a decent price to pay for
| instant search once it's downloaded
| simonw wrote:
| Oh I'd missed that - yeah loading it on-demand the first
| time they attempt to search is a much better strategy.
| hbcondo714 wrote:
| Yeah I would think this file size will increase well over time.
| Maybe a part 2 of the article can go over how updates to the
| file are made when new content is published and possible
| scaling solutions.
| flerovium wrote:
| I can't wait until FlexSearch reaches 1.0.0. Reading the source
| code is like reading great literature.
| peterbe wrote:
| (author here) We're still on FlexSearch 0.6 and the new 0.7 is
| a big refactor. I hope we can upgrade some time.
| earthboundkid wrote:
| I miss the old search that let me narrow things down by category.
| peterbe wrote:
| What do you miss about it? Can you not find what you're looking
| for?
| bityard wrote:
| They hi-jacked the browser's `/` key to focus the field, which is
| something I hate. As a user, I want `/` to bring up Firefox's
| quick search bar, especially when reading documentation.
|
| They should have just had the search field focused automatically
| but that would have done away with their "clever" hack to lazy-
| load the DB containing every page name.
|
| Also, I'm confused, I thought https://mdn.dev/ was the new thing
| because Mozilla was stepping back from MDN. Is it a fork? They
| both carry Mozilla logos, so what's going on there?
| thrdbndndn wrote:
| I knew the existence of "/", but never figure out why I should
| use this instead of Ctrl+F. What's the difference (other than
| have fewer features)?
| jannes wrote:
| The only difference I know of is that "/" focuses links. So
| when you press return, it loads the link instead of jumping
| to the next result.
|
| It's quite nice for keyboard-only web navigation.
| polar wrote:
| > "/" focuses links.
|
| It's ' to trigger quick find in links only mode.
| kxrm wrote:
| I had the same confusion with his comment but I think
| what he meant was that when you highlight a result in a
| link, pressing enter causes you to follow that link
| (which is true). You are correct that ' focuses on only
| searching within links though.
|
| Enter never goes to the next result though, so I am not
| sure if that is just something different between his
| setup and mine. I have to use F3 to go to the next
| result.
| [deleted]
| kxrm wrote:
| This seems to be a good introduction to Quick Find.
|
| https://www.tenforums.com/tutorials/120679-enable-disable-
| qu...
| thrdbndndn wrote:
| Ok, so the difference is:
|
| 1. It disappears after a few seconds.
|
| 2. It has no "next/previous/highlight all" etc. buttons (it
| still have these features, just no clickable buttons)
|
| It still makes no sense to me.
|
| I guess maybe a small portion of people would find the
| auto-disappearing thing useful, even though in normal
| Ctrl+F all you need to do is pressing Esc.
|
| But the second "feature" totally baffles me. It's not like
| Ctrl+F is some expensive GUI to launch, why would I want to
| _not_ have these buttons? Even if you don 't need them at
| all (I don't), you can simply not click them, there is no
| downside by having them.
| lol768 wrote:
| Does the usual Ctrl+F GUI support filtering down to links
| only?
| mejutoco wrote:
| You can use the single quote character to search only
| links
| est31 wrote:
| Yeah discourse does the same. Sometimes i want to search
| _within_ a post for some keyword. But ctrl+f redirects you to
| the global search... that global search only helps if you want
| to find interesting posts, but it does not support searching
| inside one, nor does it allow limited search within a thread.
| So I started using / in discourse discussions. Then that one
| was being overridden as well. I've heard the recommendation
| that you turn js off, which gives you a saner experience.
| mikepurvis wrote:
| I hate this behaviour in discourse as well, but it hadn't
| occurred to me to try using it sans JS altogether, since it
| seemed to be pretty dependent on it. Will give that a shot
| for sure.
| polar wrote:
| > But ctrl+f redirects you to the global search
|
| Press ctrl-f twice.
| est31 wrote:
| Oh thanks for that trick. It violates the principle of
| least surprise so much but it does what I want. Thanks
| again!
| jraph wrote:
| GitHub and GitLab do this too. Is there a way to prevent web
| pages from hijacking this key? I almost never want to use their
| search engine and when I do, I'm fine with clicking on the
| input box.
| Santosh83 wrote:
| > Also, I'm confused, I thought https://mdn.dev/ was the new
| thing because Mozilla was stepping back from MDN. Is it a fork?
| They both carry Mozilla logos, so what's going on there?
|
| It seems to me that mdn.dev is intended to be the future home
| of MDN web docs since it is collaborative now, and no longer
| exclusively managed by Mozilla. But they haven't actually made
| the transition yet, as any link on mdn.dev points back to the
| old (current) site at developer.mozilla.org
| jxcl wrote:
| Firefox lets you disable keyboard overrides on a per-site
| basis, if that's something you're interested in
|
| Page Info -> Permissions -> Override Keyboard Shortcuts
| peterbe wrote:
| > They hi-jacked the browser's `/` key to focus the field,
| which is something I hate.
|
| You're not the first one to point it out. Please join
| github.com/mdn/yari to raise your voice. It's an Open Source
| project after all.
|
| > They should have just had the search field focused
| automatically
|
| Why? There's a lot of JS to load to make that work. If you
| never need to do a search (e.g. from a Google search) it would
| be a potential waste.
|
| > Also, I'm confused, I thought https://mdn.dev/ was the new
| thing because Mozilla was stepping back from MDN. Is it a fork?
|
| That domain is just an alias we don't currently use. It's still
| the old MDN from Mozilla. No fork.
| daleharvey wrote:
| > Why? There's a lot of JS to load to make that work. If you
| never need to do a search (e.g. from a Google search) it
| would be a potential waste.
|
| Confused by what this comment is meant to say exactly, but
| just in case its not known already, seems this situation is
| what the autofocus attribute is for @
| https://developer.mozilla.org/en-
| US/docs/Web/HTML/Global_att..., no JS needed
| ushakov wrote:
| imho, they should've opted for CMD/CTRL + K, which Algolia's
| Doc search uses
|
| > They should have just had the search field auto-focused
| automatically but that would have done away with their "clever"
| hack to lazy-load the DB containing every page name.
|
| this would steal away the focus and is not good for
| accessibility (unless you're building a search engine)
| namanyayg wrote:
| In the code snippet they show the `startAutocomplete()` function
| checks for the "started" variable being true; but never actually
| sets it to true.
| peterbe wrote:
| It's pseudo code. The real code is TypeScript React and looks
| very different and it wouldn't serve the article to take
| snippets from that code to explain how it works.
| encryptluks2 wrote:
| I think adding search to the HTML standard makes more sense
| overall. The thing I hate about search like this is that they
| don't work with JS turned off (e.g. terminal browser). Why not
| just add a JSON search component to HTML itself?
| mg wrote:
| My favorite autocomplete library is an ancient version of
| bootstrap-typeahead.js by Twitter. A single file with less than
| 400 lines of Javascript. They don't make these anymore :)
|
| I use it everywhere where I need autcompletion. For example on
| the Music-Map:
|
| https://www.music-map.com
|
| I made a fork of the code which is available here:
|
| https://www.gibney.org/0g-typeahead
| ourcat wrote:
| I did my first autocomplete search UI with that library.
|
| These days, due to the rest of the project, I've been using
| Angular and Material's Autocomplete component, which I've found
| very easy to customise for in-memory indexes or hits to a
| remote ElasticSearch 'suggester' proxy endpoint.
| peterbe wrote:
| Getting accessibility right is hard. We very much care about
| that. One of the strong reasons for why we're using Downshift.
___________________________________________________________________
(page generated 2021-08-03 23:00 UTC)