[HN Gopher] Show HN: Self-Hostable Algolia DocSearch Replacement
___________________________________________________________________
Show HN: Self-Hostable Algolia DocSearch Replacement
Interactive demo: https://getcanary.dev/docs/cloud/demo Canary
works with local search indexes like Pagefind too:
https://getcanary.dev/docs/local/demo For both demo, you'll find
small "code" tab to see actual code to build the search UI. Self-
hosting guide: https://getcanary.dev/docs/cloud/self-host Would
love to hear any feedback!
Author : yujonglee
Score : 95 points
Date : 2024-10-12 01:55 UTC (21 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| johntopia wrote:
| How did you manage to keep the size so small :O
| yujonglee wrote:
| thanks for noticing!
|
| 1. All UI components are written using Lit (lit.dev).
|
| 2. I put a lot of effort into making the components as
| composable as possible, so you can load only what you need.
|
| For anyone interested, we have a chart here:
| https://getcanary.dev/docs/why#tiny-components-that-work-any...
| skeptrune wrote:
| This is sweet. I do think the styling on the component could be a
| bit cleaner though.
| yujonglee wrote:
| Thanks! Could you point out any specific part of the UI that
| you think could be improved?
| techn00 wrote:
| Opening up the search needs a softer animation. Take
| https://ui.shadcn.com/ as an example
| yujonglee wrote:
| got it. I will add some animation in default `canary-modal`
| implementation.
|
| just FYI - i's very easy to implement custom modal
| component and swap out the default one.
|
| https://github.com/fastrepl/canary/blob/72723b0/js/apps/doc
| s...
| Onavo wrote:
| You should add support for tinkerbird, so the index can be
| statically generated and queried without a backend.
|
| https://github.com/wizenheimer/tinkerbird
| whilenot-dev wrote:
| Just played around with tinkerbird on Tinkerboard[0]... it
| doesn't seem to get good results with the provided example
| data. Why do you think a support for it would be worthwhile?
|
| [0]: https://tinkerboard.vercel.app/
| Onavo wrote:
| Getting good results involves tuning, good models, and well
| defined prompts, the demo not implementing a good RAG has
| nothing to do with its vector search performance. I suggest
| reading up on how the technology works.
| hackernewds wrote:
| The name Canary is a bit confusing, since a lot of companies
| already use Canary to indicate symptoms of issues (re: canary in
| coal mine). However the app doesn't fulfill this need.
|
| I will give it a try, impressive compression
| yujonglee wrote:
| that's a fair point. I don't think I can rename it at this
| point, but I'll keep in mind that some people might be confused
| by the name.
|
| please do try it out, and come to Discord if you want to chat.
| hackernewds wrote:
| How does it compare to Glean?
| yujonglee wrote:
| Glean is used for searching the workspace (AFAIK, for internal
| use). Canary is used for searching technical documentation,
| GitHub issues, etc., and is intended for the users of the
| project.
| detente18 wrote:
| +1 on the github issues. It's very useful to have this on the
| litellm docs
| yujonglee wrote:
| nice! lmk if you have any feedback while using it in the
| litellm docs.
| ij23 wrote:
| Canary is awesome! we use Canary for our doc search at LiteLLM
| (you can see it here: https://docs.litellm.ai/docs/)
|
| It's really useful to be able to specify the search space for a
| specific query (example: Canary allows search for the query
| "sagemaker" on our docs or on our github issues )
| jgalt212 wrote:
| I have to say Algolia is underwhelming (even after all these
| years). Perhaps I'm using it wrong, but I often more quickly find
| the comment or story I'm searching for via a targeted search
| using Google. I should give Bing a try as I've been been getting
| better finance related results there lately--especially when
| trying to locate ratings and / or other docs related to newly
| issued securities.
| lnrdgmz wrote:
| Agreed. I dread having to use Algolia search on documentation
| these days. The search results feel pretty naively selected,
| and the UI is pretty poor. I get that people want to deploy
| static sites, but can we please find a way to bring back search
| _pages_?
| yujonglee wrote:
| > I dread having to use Algolia search on documentation these
| days.
|
| agreed.
|
| > but can we please find a way to bring back search _pages_?
|
| could you please explain what do you mean?
| bryanrasmussen wrote:
| I had to use Algolia in a recent ecommerce solution, I think
| e-commerce really is the sweet spot for what Algolia offers,
| quick setup not a lot of need to mess around with your rankings
| etc. with very simple content.
|
| I'm used to Solr and ElasticSearch for most sites I've ran,
| which tend to be information sites dense where you need to be
| able to control rankings to get the best results, which HN is
| much closer to than to an e-commerce site.
| shooker435 wrote:
| Have you tried Vertex AI Search for Retail?
| bryanrasmussen wrote:
| no
| TnS-hun wrote:
| In Firefox the "Search for anything" input does not get focused
| after opening the search dialog.
| yujonglee wrote:
| nice catch! just downloaded firework to test it :) will fix it
| shortly
| uwemaurer wrote:
| looks interesting. there is a typo in the headline ("techincal
| docs") on https://getcanary.dev/
| yujonglee wrote:
| thanks, fixed!
| Fire-Dragon-DoL wrote:
| Does it have the same API? Have been looking for a way to mock
| the service in development
| yujonglee wrote:
| No, it use different API with Algolia DocSearch.
|
| > Have been looking for a way to mock the service in
| development
|
| This is what I wanted too! So we have multiple providers,
| including `canary-provider-mock`. So you can use it to mock it
| in the dev.
|
| example:
|
| https://github.com/noxify/renoun-docs-template/blob/055b54/s...
| simonw wrote:
| Took me a little poking around to figure out what the underlying
| search engine was: it's https://typesense.org/ hosted in a Docker
| container.
| yujonglee wrote:
| Yes, we initially started with Paradedb but moved to Typesense
| for a search-as-you-type experience. We also have an additional
| layer for query transformation using an LLM though. (only when
| query is "question-like".
|
| e.g:
|
| if you go to 'https://docs.litellm.ai/' and search for 'how to
| limit API cost,' it will map the query to 'budget.'
| simonw wrote:
| Oh neat, is that this bit? https://raw.githubusercontent.com/
| fastrepl/canary/c1f03cbbee...
| pjot wrote:
| Can you talk about how you implemented search-as-you-type? Doing
| so with semantic search seems tricked given the roundtrips needed
| to compute embeddings on the fly (assuming the use of OpenAI
| embeddings)
| yujonglee wrote:
| sure - implementing a search-as-you-type experience with an ai-
| powered feature was what i wanted to do as well. it doesn't use
| embeddings at the moment. when you type a short query like
| 'openai,' it simply runs a basic query using Typesense.
| however, if you enter a question-like query, such as 'how to
| llimit api cost,' it transforms it into multiple queries, like
| 'budget' and 'limit.'
|
| in the self-hosted version, it use the CHAT_COMPLETION_MODEL
| env variable for selecting the llm model. in our cloud version,
| we use a fine-tuned version of 4o-mini that we will eventually
| move to a smaller model like llama8b or even 1b.
| pjot wrote:
| Got it! I saw this in the code and assumed you were using
| embeddings def evaluate(input: shared.EvaluationInput): ds =
| Dataset.from_list(input.dataset) metrics =
| [metric_map[metric] for metric in input.metrics]
| llm = ChatOpenAI(
| model_name=shared.LANGUAGE_MODEL,
| base_url=os.environ["OPENAI_API_BASE"],
| api_key=os.environ["OPENAI_API_KEY"], )
| embeddings = OpenAIEmbeddings(
| model=shared.EMBEDDING_MODEL,
| base_url=os.environ["OPENAI_API_BASE"],
| api_key=os.environ["OPENAI_API_KEY"], )
| alexbouchard wrote:
| Been looking for something like this! Doc search just hasn't kept
| up with what's possible now and is such a hassle to get the
| indexing to work properly. Will try it out!
| yujonglee wrote:
| please let me know how it goes! we have Discord link in the top
| navbar: https://getcanary.dev/
| dclowd9901 wrote:
| Ignorant question: how difficult would it be to integrate this
| with docusaurus?
| yujonglee wrote:
| It is not that difficult :)
|
| If you want pagefind-based local search:
|
| Doc: https://getcanary.dev/docs/local/integrations/docusaurus
| Example PR: https://github.com/microsoft/fast/pull/7031/files
|
| If you want hosted-search: Doc:
| https://getcanary.dev/docs/cloud/integrations/docusaurus
| Example PR: https://github.com/BerriAI/litellm/pull/6160/files
___________________________________________________________________
(page generated 2024-10-12 23:01 UTC)