[HN Gopher] Show HN: Self-Hostable Algolia DocSearch Replacement
       ___________________________________________________________________
        
       Show HN: Self-Hostable Algolia DocSearch Replacement
        
       Interactive demo: https://getcanary.dev/docs/cloud/demo  Canary
       works with local search indexes like Pagefind too:
       https://getcanary.dev/docs/local/demo  For both demo, you'll find
       small "code" tab to see actual code to build the search UI.  Self-
       hosting guide: https://getcanary.dev/docs/cloud/self-host  Would
       love to hear any feedback!
        
       Author : yujonglee
       Score  : 95 points
       Date   : 2024-10-12 01:55 UTC (21 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | johntopia wrote:
       | How did you manage to keep the size so small :O
        
         | yujonglee wrote:
         | thanks for noticing!
         | 
         | 1. All UI components are written using Lit (lit.dev).
         | 
         | 2. I put a lot of effort into making the components as
         | composable as possible, so you can load only what you need.
         | 
         | For anyone interested, we have a chart here:
         | https://getcanary.dev/docs/why#tiny-components-that-work-any...
        
       | skeptrune wrote:
       | This is sweet. I do think the styling on the component could be a
       | bit cleaner though.
        
         | yujonglee wrote:
         | Thanks! Could you point out any specific part of the UI that
         | you think could be improved?
        
           | techn00 wrote:
           | Opening up the search needs a softer animation. Take
           | https://ui.shadcn.com/ as an example
        
             | yujonglee wrote:
             | got it. I will add some animation in default `canary-modal`
             | implementation.
             | 
             | just FYI - i's very easy to implement custom modal
             | component and swap out the default one.
             | 
             | https://github.com/fastrepl/canary/blob/72723b0/js/apps/doc
             | s...
        
       | Onavo wrote:
       | You should add support for tinkerbird, so the index can be
       | statically generated and queried without a backend.
       | 
       | https://github.com/wizenheimer/tinkerbird
        
         | whilenot-dev wrote:
         | Just played around with tinkerbird on Tinkerboard[0]... it
         | doesn't seem to get good results with the provided example
         | data. Why do you think a support for it would be worthwhile?
         | 
         | [0]: https://tinkerboard.vercel.app/
        
           | Onavo wrote:
           | Getting good results involves tuning, good models, and well
           | defined prompts, the demo not implementing a good RAG has
           | nothing to do with its vector search performance. I suggest
           | reading up on how the technology works.
        
       | hackernewds wrote:
       | The name Canary is a bit confusing, since a lot of companies
       | already use Canary to indicate symptoms of issues (re: canary in
       | coal mine). However the app doesn't fulfill this need.
       | 
       | I will give it a try, impressive compression
        
         | yujonglee wrote:
         | that's a fair point. I don't think I can rename it at this
         | point, but I'll keep in mind that some people might be confused
         | by the name.
         | 
         | please do try it out, and come to Discord if you want to chat.
        
       | hackernewds wrote:
       | How does it compare to Glean?
        
         | yujonglee wrote:
         | Glean is used for searching the workspace (AFAIK, for internal
         | use). Canary is used for searching technical documentation,
         | GitHub issues, etc., and is intended for the users of the
         | project.
        
           | detente18 wrote:
           | +1 on the github issues. It's very useful to have this on the
           | litellm docs
        
             | yujonglee wrote:
             | nice! lmk if you have any feedback while using it in the
             | litellm docs.
        
       | ij23 wrote:
       | Canary is awesome! we use Canary for our doc search at LiteLLM
       | (you can see it here: https://docs.litellm.ai/docs/)
       | 
       | It's really useful to be able to specify the search space for a
       | specific query (example: Canary allows search for the query
       | "sagemaker" on our docs or on our github issues )
        
       | jgalt212 wrote:
       | I have to say Algolia is underwhelming (even after all these
       | years). Perhaps I'm using it wrong, but I often more quickly find
       | the comment or story I'm searching for via a targeted search
       | using Google. I should give Bing a try as I've been been getting
       | better finance related results there lately--especially when
       | trying to locate ratings and / or other docs related to newly
       | issued securities.
        
         | lnrdgmz wrote:
         | Agreed. I dread having to use Algolia search on documentation
         | these days. The search results feel pretty naively selected,
         | and the UI is pretty poor. I get that people want to deploy
         | static sites, but can we please find a way to bring back search
         | _pages_?
        
           | yujonglee wrote:
           | > I dread having to use Algolia search on documentation these
           | days.
           | 
           | agreed.
           | 
           | > but can we please find a way to bring back search _pages_?
           | 
           | could you please explain what do you mean?
        
         | bryanrasmussen wrote:
         | I had to use Algolia in a recent ecommerce solution, I think
         | e-commerce really is the sweet spot for what Algolia offers,
         | quick setup not a lot of need to mess around with your rankings
         | etc. with very simple content.
         | 
         | I'm used to Solr and ElasticSearch for most sites I've ran,
         | which tend to be information sites dense where you need to be
         | able to control rankings to get the best results, which HN is
         | much closer to than to an e-commerce site.
        
           | shooker435 wrote:
           | Have you tried Vertex AI Search for Retail?
        
             | bryanrasmussen wrote:
             | no
        
       | TnS-hun wrote:
       | In Firefox the "Search for anything" input does not get focused
       | after opening the search dialog.
        
         | yujonglee wrote:
         | nice catch! just downloaded firework to test it :) will fix it
         | shortly
        
       | uwemaurer wrote:
       | looks interesting. there is a typo in the headline ("techincal
       | docs") on https://getcanary.dev/
        
         | yujonglee wrote:
         | thanks, fixed!
        
       | Fire-Dragon-DoL wrote:
       | Does it have the same API? Have been looking for a way to mock
       | the service in development
        
         | yujonglee wrote:
         | No, it use different API with Algolia DocSearch.
         | 
         | > Have been looking for a way to mock the service in
         | development
         | 
         | This is what I wanted too! So we have multiple providers,
         | including `canary-provider-mock`. So you can use it to mock it
         | in the dev.
         | 
         | example:
         | 
         | https://github.com/noxify/renoun-docs-template/blob/055b54/s...
        
       | simonw wrote:
       | Took me a little poking around to figure out what the underlying
       | search engine was: it's https://typesense.org/ hosted in a Docker
       | container.
        
         | yujonglee wrote:
         | Yes, we initially started with Paradedb but moved to Typesense
         | for a search-as-you-type experience. We also have an additional
         | layer for query transformation using an LLM though. (only when
         | query is "question-like".
         | 
         | e.g:
         | 
         | if you go to 'https://docs.litellm.ai/' and search for 'how to
         | limit API cost,' it will map the query to 'budget.'
        
           | simonw wrote:
           | Oh neat, is that this bit? https://raw.githubusercontent.com/
           | fastrepl/canary/c1f03cbbee...
        
       | pjot wrote:
       | Can you talk about how you implemented search-as-you-type? Doing
       | so with semantic search seems tricked given the roundtrips needed
       | to compute embeddings on the fly (assuming the use of OpenAI
       | embeddings)
        
         | yujonglee wrote:
         | sure - implementing a search-as-you-type experience with an ai-
         | powered feature was what i wanted to do as well. it doesn't use
         | embeddings at the moment. when you type a short query like
         | 'openai,' it simply runs a basic query using Typesense.
         | however, if you enter a question-like query, such as 'how to
         | llimit api cost,' it transforms it into multiple queries, like
         | 'budget' and 'limit.'
         | 
         | in the self-hosted version, it use the CHAT_COMPLETION_MODEL
         | env variable for selecting the llm model. in our cloud version,
         | we use a fine-tuned version of 4o-mini that we will eventually
         | move to a smaller model like llama8b or even 1b.
        
           | pjot wrote:
           | Got it! I saw this in the code and assumed you were using
           | embeddings def evaluate(input: shared.EvaluationInput): ds =
           | Dataset.from_list(input.dataset) metrics =
           | [metric_map[metric] for metric in input.metrics]
           | llm = ChatOpenAI(
           | model_name=shared.LANGUAGE_MODEL,
           | base_url=os.environ["OPENAI_API_BASE"],
           | api_key=os.environ["OPENAI_API_KEY"],         )
           | embeddings = OpenAIEmbeddings(
           | model=shared.EMBEDDING_MODEL,
           | base_url=os.environ["OPENAI_API_BASE"],
           | api_key=os.environ["OPENAI_API_KEY"],         )
        
       | alexbouchard wrote:
       | Been looking for something like this! Doc search just hasn't kept
       | up with what's possible now and is such a hassle to get the
       | indexing to work properly. Will try it out!
        
         | yujonglee wrote:
         | please let me know how it goes! we have Discord link in the top
         | navbar: https://getcanary.dev/
        
       | dclowd9901 wrote:
       | Ignorant question: how difficult would it be to integrate this
       | with docusaurus?
        
         | yujonglee wrote:
         | It is not that difficult :)
         | 
         | If you want pagefind-based local search:
         | 
         | Doc: https://getcanary.dev/docs/local/integrations/docusaurus
         | Example PR: https://github.com/microsoft/fast/pull/7031/files
         | 
         | If you want hosted-search: Doc:
         | https://getcanary.dev/docs/cloud/integrations/docusaurus
         | Example PR: https://github.com/BerriAI/litellm/pull/6160/files
        
       ___________________________________________________________________
       (page generated 2024-10-12 23:01 UTC)