[HN Gopher] MeiliSearch: A Minimalist Full-Text Search Engine
___________________________________________________________________
MeiliSearch: A Minimalist Full-Text Search Engine
Author : crecker
Score : 264 points
Date : 2021-08-15 10:30 UTC (12 hours ago)
(HTM) web link (tech.marksblogg.com)
(TXT) w3m dump (tech.marksblogg.com)
| mrweasel wrote:
| Can you really use the number of lines of code in a comparison of
| MeiliSearch and ElasticSearch, or Sphinx-Search?
|
| Arguably I'm not the biggest fan of ElasticSearch, it's a way too
| complex to manage and interact with, if you just need to add
| search to a product. However, ElasticSearch i also much more than
| just a search engine. I would never use Bleve or Sphinx as a
| primary data store, but ElasticSearch is a perfectly good
| document database.
| rmetzler wrote:
| I would think it's just a rough indicator of complexity.
| Elasticsearch has a lot of features, probably many more than
| MeiliSearch. This can be good or bad, depending on what you're
| looking for.
| ComputerGuru wrote:
| If comparing the same language, _maybe_. But the amount of
| boilerplate when comparing a C /C++ to rust to Java, it
| doesn't work. Even then, some teams might prefer to use more
| dependencies and others less.
| catmanjan wrote:
| >ElasticSearch is a perfectly good document database.
|
| I recently asked about this and people replied that it wasn't
| fit for this purpose
| mrweasel wrote:
| It might depend on the amount of data. We used it store 150 -
| 200GB of product data, and at that scale it was completely
| fine, just hard to manage.
| Cyberdog wrote:
| In my experience, it would be possible to use it as a
| document database, and I suppose it would be good in the
| interest of reducing duplication issues if you were initially
| storing the documents in a traditional DBMS or file system.
|
| However, that's not really what it was made for. Especially
| early on when you're planning out your schema and such,
| dropping and re-indexing your documents is a really simple
| task. If the index itself is your primary document store,
| what are you indexing from? Would you have a DBMS or file
| system as your secondary store in that case? That just seems
| so awkward and backwards.
|
| Keep the square pegs in the square holes and use Elastic (and
| the alternatives discussed in this thread) as a search index.
| lloydatkinson wrote:
| Seems like good timing with the ongoing elastic drama
| jatins wrote:
| what's the ongoing elastic drama?
| AkshitGarg wrote:
| Not sure, but I think they are taking about the recent
| licensing changes of the elasticsearch server [0] and
| restricting elasticsearch clients to only be compatible with
| elasticsearch, not any forks [1]
|
| [0]: https://www.elastic.co/pricing/faq/licensing [1]:
| https://news.ycombinator.com/item?id=28110610
| softinio wrote:
| pretty aweful, this: https://thenewstack.io/this-week-in-
| programming-the-elastics...
| vtail wrote:
| What are some of the advantages of using MeiliSearch as opposed
| to, say, FTS5 in a SQLite database?
| 1vuio0pswjnm7 wrote:
| "MEILI_NO_ANALYTICS=1"
|
| Looks like it was designed with the user in mind. Telemetry by
| default.
| freediver wrote:
| MeiliSearch is minimalist, fast and easy to deploy (like few mins
| to get up and doing its thing). I am using it to power full text
| search at TinyGem.
|
| https://tinygem.org
|
| It is great to put a concept in place. For more advanced use
| (mainly index and search features) I was also evaluating
| TypeSense which didn't win me over as a product. I have not tried
| Algolia because of perception that it is heavier and paid from
| get go.
| h1fra wrote:
| This smells like a sponsored post...
| hdjjhhvvhga wrote:
| So? MeiliSearch is open source, so all the better.
| berkes wrote:
| Why does it smell like that to you?
| crecker wrote:
| It's not mine, but I found it interesting.
| philmcp wrote:
| MeiliSearch also work a 4 day week which is pretty cool
|
| https://4dayweek.io/company/meilisearch/jobs
| ilrwbwrkhv wrote:
| This site is a goldmine.
|
| Finally companies which work in a far more human way.
| chairmanwow1 wrote:
| This is a pretty interesting format for a blog post. I'm not sure
| I've really seen something like this before, but I really enjoyed
| this one~
| spapas82 wrote:
| I couldn't find it from a quick search... Do you know if this
| tool supports non English languages (specifically greek)? Also
| why idea if it also supports stemming for these i.e I would like
| to search for skulos (=dog) and get documents having skulon
| (=dogs).
| null_deref wrote:
| In my journey to learn Rust I sought for a person criticize my
| Rust skills, so I searched for a 'good first issue' in GitHub, by
| writing a PR in Rust I could fullfil two dreams of mine 1. Learn
| Rust. 2. Having a code that I wrote run on large amount of
| devices across the world.
|
| Eventually I ended up on MeiliSearch repo, I fixed an interesting
| bug, and I must say that the maintainers were super nice across
| all the process, a couple of months after my contribution they
| sent hand written letters and a bunch of stickers to all of the
| project contributors, one of the nicest interactions I ever had
| on the internet (ironically the first PR that I wrote that got
| accepted involved one line of CSS, which is field I'm proficient
| at).
| atonse wrote:
| I have wanted to play around with rust literally for years and
| just haven't since I've been intimidated that the compiler is
| really strict (this after 20 years of programming). Just need
| to sit down and get on with it. It's going to be an incredible
| addon to the toolbox of skills, especially with elixir, my
| usual goto these days.
| stavros wrote:
| The compiler isn't so much strict in a pedantic sense. It
| more nudges you to avoid bugs, in a "you may want to rethink
| this" way. It's actually quite nice to have someone point out
| potential bugs in your code.
| option_greek wrote:
| The compiler (or in reality the borrow checker) is strict but
| once the program does compile, it usually is more defect free
| (even functionally) than one written in other languages.
| After a few months writing Rust, I recently had to go back
| and use Javascript for another project (I know drastically
| different worlds) and my own code gave me micro panic attacks
| thinking about all the ways that things can go wrong in it (I
| guess I'll probably end up using typescript instead in
| future).
|
| Overall well worth the time spent learning Rust as I feel it
| makes me a better programmer overall enforcing the thinking
| about lifetimes, return values, shared data and thread
| safety.
| wongarsu wrote:
| I think people make the Rust compiler out to be much more
| scary than it is. Sure, it's much more strict than say
| Python, but compared to strongly typed languages like C++ or
| C# it's not that different in 99% of your code. People just
| talk about that other 1% a lot.
| moralestapia wrote:
| If you're in the market for lightweight but fast search engines,
| I would recommend you take a look to typesense [1], instead; or
| even sonic [2], if it fits your use case. MeiliSearch does not
| give you anything on top of them (i.e. neither as feature
| complete as [1], not as fast as [2]).
|
| And I personally stopped using them after a really bad experience
| I had with their "developers". They don't really care about you
| and it shows, also, they were kind of rude when I reported some
| bugs to them.
|
| I moved to typesense and it's a whole different world, their
| creators truly enjoy that you're using their product; same thing
| with sonic, Valerian is the kind of hacker you'd want as a
| friend, super talented, super easy going, you could ask a
| completely dumb question on their GH and he takes the time to
| explain things to you at length. I know its open source, I know I
| didn't pay a dime, but for me, that kind of attitude makes it or
| break it. Plus, you actually get a superior product.
|
| 1: https://typesense.org/
|
| 2: https://github.com/valeriansaliou/sonic
| karterk wrote:
| Thank you for your kind words. Made my day. Typesense is 100%
| bootstrapped, and a labor of love[0]. We will certainly do our
| best to keep making it better.
|
| [0]: https://typesense.org/blog/the-unreasonable-effectiveness-
| of...
| jabo wrote:
| Echoing what @karterk said.
|
| One of my favorite parts of working on Typesense is the
| opportunity to interact with so many developers from around
| the world, getting to know about the product and domain they
| are working on, their tech stacks and how Typesense fits into
| their world. I find these interactions helpful in enriching
| my own world view and helps me build valuable context as we
| design new features. I've sometimes been blown away by how
| the foundational construct of a fast and distributed search
| engine, is being used for use cases I could not have even
| imagined!
| ternaryoperator wrote:
| This looks very nice. Do you foresee a downloadable version
| for Windows like the Mac and Linux versions (i.e., not as a
| Docker container)?
| jabo wrote:
| We mainly haven't invested time in this because we've heard
| that some folks were able to get the Linux binary working
| on Windows using WSL. Does that work?
| qdequelen wrote:
| Hello, as CEO of MeiliSearch, I'm really sorry from the whole
| team if we did not satisfy you when solving one of your bugs. I
| don't know which bug exactly you are referring to, but in any
| case, we try to answer our users and contributors with the
| maximum of transparency and love.
|
| Moreover, we are certainly, on some features, maybe a little
| bit late. Delay that we will more than compensate before the
| end of the year. Our priority until now has been to offer a
| robust search engine accessible to all. For us, the developer
| experience is really important, whether it is in the use of the
| API or in the communication with the community.
|
| We will continue to try to do our best for the community. If
| you want to help us to improve, I would be happy to take your
| feedback.
| aarondf wrote:
| You chose to respond with kindness and humility here when you
| could've been really hostile and defensive. Really nice to
| see. You've made me like MeiliSearch even more!
|
| Keep up the good work.
| Cyberdog wrote:
| How well do any of these alternatives work for doing an
| automatic "More like this" list? I implemented this in
| Elasticsearch for a client (although it's been so long that I
| don't recall the specifics of how it works) and as much as I'd
| like to move away from Java stuff if possible, it'd be a non-
| starter if I can't replicate that in the new system.
| aidanhs wrote:
| I've never used or looked at Typesense (I've been perfectly
| happy as a Meilisearch user), but your characterisation of
| interacting with Meilisearch is so alien it makes me wonder if
| we've been looking at the same project.
|
| Across the assortment of Meilisearch repositories, I've raised
| two PRs (one accepted, one rejected), five issues, one feature
| request and pinged one issue for an update.
|
| Every single time the Meilisearch team has been responsive,
| communicative and generally a delight to interact with - there
| are very few projects I would consider better.
|
| Just thought I'd throw in my experience.
| agucova wrote:
| I can back this completely. I'm working on a search engine for
| government transparency records with an NGO and Typesense
| really solved most of our problems with MeiliSearch, and our
| experiments with Sonic have been pretty good too.
| moralestapia wrote:
| >[...] a search engine for government transparency records
| with an NGO
|
| What? Are you me? Haha.
|
| Check your email, hermano!
| Labo333 wrote:
| How are the disk and RAM usage? Compared to elasticsearch and
| typesense?
|
| It's an information that is typically missing yet very important!
| Kerollmops wrote:
| As we don't use RocksDB but LMDB, we use a lot less real memory
| than key-value stores that uses a user-side cache system. LMDB
| is memory mapped and therefore let the OS manage memory for it.
| Typesense uses RocksDB and ElasticSearch a custom key-value
| store, used by Lucene internally.
|
| The real advantage of LMDB is that it is a BTree, key-values
| are ordered and do not need any computing when retrieved which
| is not the case of a LSM-Tree key-value store like RocksDB that
| needs to merge/compact pages of key-values pairs before being
| able to return it too you. Wasting CPU when the search engine
| must use its CPU to do union/intersection...
|
| Another advantage of LMDB is that it returns a view into the DB
| itself of the entries, RocksDB can't as it must do operations
| on the entries before returning them to the library user, for
| example: decompressing or compacting the values.
| Rochus wrote:
| > _and lives as a 35 MB binary when installed ... it 's made up
| of 7,600 lines of Rust_
|
| Wow; how on earth can this blow up to 35 MB? For comparison: the
| Crossline stand-allone exe (http://software.rochus-
| keller.info/CrossLine_win32.zip) with built-in
| https://github.com/rochus-keller/Fts and Sqlite (all written in
| C/C++) is less than 7 MB. Where do the other ~30 MB come from?
| berkes wrote:
| Rust uses static linking (by default). So everything and the
| kitchen sink is compiled in.
|
| https://stackoverflow.com/a/29008355
| Rochus wrote:
| The same applies to my example; the exe includes a statically
| linked version of Qt as well as all the other stuff
| mentioned. So the question remains.
| ComputerGuru wrote:
| Binaries should also be stripped before comparing file size.
| Rochus wrote:
| That might be an explanation for the huge size;
| unfortunately we don't know whether the author used a
| stripped version or not.
| generalizations wrote:
| Apparently stripped binaries aren't the common experience,
| so I don't see why they should be used for comparison.
| ComputerGuru wrote:
| Debug info size depends on the language and the compiler.
| Binary packages installed via package managers are also
| typically stripped. There are too many confounding
| variables, simply running `strip foo` before comparing
| evens the playing field.
| mark_l_watson wrote:
| I haven't tried MeiliSearch, but I spent a little bit of time
| this morning looking at the code. Maybe off topic, but Rust
| really is a nice language to read. I wanted to learn another non-
| Lisp language, and after a few evenings of playing with Rust, I
| settled on Swift for a few small side projects. I slightly regret
| that decision, but both languages fill the same application space
| for me.
| softinio wrote:
| I am really impressed with swift. I think its good to have both
| in your toolbelt if you have the time for it.
| xvilka wrote:
| Rust is truly cross-platform, while Swift isn't (though even
| it was initially announced to be).
| mikevm wrote:
| After looking at various alternatives, I'm thinking of trying out
| https://vespa.ai/
| leetrout wrote:
| A reminder about Xapian which the author did not include (it is
| only a library)
|
| https://xapian.org/
| rvz wrote:
| As seen previously on another post about MeiliSearch after
| reading an extensive comparison in [0], I'm sorry but I'm not
| convinced with it yet as it is extremely limited and immature.
|
| The only argument here that is being made here is that it is
| 'written in Rust'.
|
| Just use something production ready like Typesense. [0]
|
| [0] https://typesense.org/typesense-vs-algolia-vs-
| elasticsearch-...
| ledoublegui wrote:
| Hi, rvz! (Product team member of MeiliSearch here). This
| comparative table is not accurate and contains wrong pieces of
| information about MeiliSearch.
|
| However, today TypeSense indeed has more features than
| MeiliSearch. After a long time of refactoring the engine's
| source code, we now have a solid base to welcome new
| features/improvements, and we hope to evolve quickly to solve
| many more search use-cases.
|
| For Q3, we plan to add two new features: sort by and geo-
| search. The geo-search will come out as a first iteration
| allowing to sort documents around a geographical point and
| filter documents within a circle. We will also further improve
| the indexing speed (again yes, because we can do better) and
| provide two new formats for data indexing (csv and ndjson).
|
| For Q4, we plan to add high availability and solve the multi-
| tenancy use-case.
|
| That is just a preview of the upcoming features we are already
| working on.
|
| The end of the year will be rich in evolution for MeiliSearch.
| We are looking forward to seeing you enjoy using MeiliSearch
| one day!
| jabo wrote:
| Hey Meili team, I work on Typesense and I'm the one who put
| that comparison matrix together. My intention was to provide
| as much factual information as possible based on my reading
| of each search engine's documentation. This is one reason I
| stuck to a feature by feature comparison rather than an
| opinion based comparison for this particular page.
|
| So if you see anything that's wrong in the matrix, I
| apologize. Please do let me know which items are wrong and I
| would love to correct them.
| ledoublegui wrote:
| Hi jabo! Thank you very much. That is very kind of you.
| It's always difficult to establish this kind of matrix when
| you can't necessarily know all the technical details and
| features with that level of details you wanted to
| emphasize. I'll get back to you with the list of points we
| think were misunderstood in a couple of days! Which medium
| do you prefer?
| jabo wrote:
| Sounds good! Email would be great: jasonb at typesense
| d0t org
| clon wrote:
| Your plans seem spot on for our needs, especially regarding
| geo search! HA and multi tenancy are a must-have for our use
| case, though. Would it be possible to have a single large
| highly available index with tenants of wildly different
| sizes?
|
| I will surely keep a close eye on your developments. Thank
| you!
| yewenjie wrote:
| They have another prototype engine with more advanced features
| and performance too.
|
| https://github.com/meilisearch/milli
| Kerollmops wrote:
| Hey, this no more a prototype, it is the internal engine under
| MeiliSearch. I forgot to update the README :)
| [deleted]
| dawnerd wrote:
| I'm using it in production for https://opencoaster.com (very wip
| site). It's fast.
|
| Their team is fairly responsive to bugs but I had one negative
| experience when trying to help them fix their instantsearch lib.
| They were grabbing as many pages as you had set for max pages at
| once and would re query it on pagination - huge waste of data
| transfer. They refused to see the problem so I just did a private
| fork just to get it working but far as I know that's still a bug.
|
| I need to upgrade the engine itself but looks like they added the
| ability to upgrade and not lose all the data. That was
| frustrating but understandable.
|
| Overall I'm very impressed how stable it is
| ledoublegui wrote:
| Hi dawnerd! Sorry to hear that, do you have the issue link so
| that we can take a look at it?
|
| Based on the informations I can read here, I think it comes
| from the fact that the engine is not able to give an exhaustive
| finite number of records matching the query for reasons of
| response time. A finite pagination style (with number of pages)
| on the client-side is for now a pure work-around.
|
| From what I understand, some of our users try to use
| MeiliSearch as a primary datastore or expect a classic finite
| pagination coming from a SQL database env, when we are here to
| solve search relevancy problems.
|
| Ideally the search results should be relevant enough so that
| end-users don't have to click on another page selector button,
| that's why we advocate to integrate a pagination without number
| selection. Infinite scroll style or prev/next.
|
| Happy to discuss this further with more context!
|
| Thank you for your feedback :)
| dawnerd wrote:
| Yeah having a cap on the number of results is fine. Problem
| is when it queries for every item at once. I've tested on
| large datasets and my patched version of instantsearch has no
| performance problems over 100 pages w/30 items per page.
| Every time you clicked next page it would request maxPages *
| perPage but start from index 0.
|
| Im not using as a primary data store.
|
| https://github.com/meilisearch/instant-meilisearch/issues/18
| ledoublegui wrote:
| Thanks for your answer dawnerd. I will take a look at it
| with fresh eyes, we may have missed something :)
| cpach wrote:
| Looks very interesting! Anyone here who's tried MeiliSearch?
| ushakov wrote:
| I tried adding search to my markdown blog
|
| MeiliSearch doesn't strip HTML tags and i had to do that
| manually before adding posts to index
| berkes wrote:
| I've ran into the same "strip tags" issue. Having used ES
| before, that does sanitizing and stripping for you, at first
| I was dissapointed.
|
| However, after thinking about it more, I wrote up this
| issue[0] with some ideas and thoughts so I could implement it
| as PR or work around it.
|
| I ended up working around it, because that makes most sense:
| separation of concerns: meilisearch should indeed not get
| involved in stripping or fixing HTML as that i) ties Meili to
| HTML, ii) requires configuration and complexity to allow
| control and iii) adds features that become security-critical.
|
| Indeed, my solution is to sanitize, clean and strip HTML
| before sending into the index.
|
| https://github.com/meilisearch/MeiliSearch/issues/1409
| ledoublegui wrote:
| Hi berkes! (Guillaume from the MeiliSearch team here) I'm
| glad to see you were able to implement a solution for your
| project ;)
| bizzleDawg wrote:
| I'm also using it for a plant species search on hedira.io, it's
| been great for the past 6 months or so, even for a more complex
| faceted search setup. I switched from Algolia (which was easy
| due to instantsearch integration) and have no regrets.
| cies wrote:
| Yes. Small private project. It's quite fast. It's query
| interface is REST+JSON and now has an OpenAPIv3 spec; that said
| some of the query syntax is embedded in strings, so there you
| are still on your own.
|
| I found the default order of results a bit off. Near-matches
| were positioned over exact matches.
|
| I'm looking for a fulltext typo-tolerant search tool that
| integrates well Hasura+PG.
| aidos wrote:
| Also interested in the Hasura+PG options. Have you found
| anything interesting so far? At the moment I'm stringing
| together a few like clauses, which mostly does for my needs.
| cies wrote:
| Nope sadly. I want proper full text search with fault
| tolerance, like meilli and co provide, but it comes down to
| my own integration.
|
| No nice integrations like hasura-backend-plus and combines
| hasura with minio/s3 and authentication service.
| adrianvincent wrote:
| Yes, I use it for https://www.comparedial.com/
|
| It's ridiculously easy to use and has faceted search for my
| needs. However, there are some limitations so I have to use it
| in combination with redis, but the developers have a roadmap to
| fix these problems.
| axhl wrote:
| What's your underlying data store and how do you find the
| experience of runningly synchronising this with MeiliSearch?
| adrianvincent wrote:
| I use postgres as the data store.
|
| Synchronising with MeiliSearch is a bit of an effort
| because of the following limitations: *
| When filtering by facet, it doesn't provide count for
| disjunctive facets * No sort by * No where
| clause (less than 50 for example)
|
| To overcome these problems, I rebuild some parts of the
| database in redis, use code for filtering and query
| MeiliSearch multiple times for different facet counts.
|
| Both redis and MeiliSearch are ridiculously fast so the
| performance loss is negligible, but it makes my code quite
| complex. As soon as the developers add these missing
| features, I want to simplify my code and only use redis for
| query caching. Typesense had some of these limitations too,
| but I'm not sure if that's still the case.
| ledoublegui wrote:
| Hi! MeiliSearch product team here! It's super cool to see
| your feeback!
|
| Concerning the disjunctive count of the facets, we are
| thinking about it. It is feasible on the client side by
| making several requests but we are aware that is it not
| ideal at all from a developer experience point of view.
| We are still thinking about the best way to solve that
| case in one of our future iterations!
|
| The sort feature is coming in v0.22 (string and numeric
| fields) you will be able to easily configure the balance
| between exhaustivity and relevancy at index level through
| the positioning of the ranking rules.
|
| I'm not sure I understand the where clause point so I'd
| love to hear more details!
|
| Thanks for using us and giving us this kind of feedback
| :)
| adrianvincent wrote:
| Thank you for MeiliSearch.
|
| By where clause I mean as in SQL. For example, select
| results where cost <= 50.
| ledoublegui wrote:
| Thanks adrianvincent! Did you see https://docs.meilisearc
| h.com/reference/features/filtering.ht...?
| adrianvincent wrote:
| I'll take a look, thanks.
| the_mitsuhiko wrote:
| Yep. I have a small toy app that uses it and I keep monitoring
| the progess. It's already very useful.
| gnur wrote:
| I've done some stuff with it. Works pretty well but not
| perfect.
|
| Insertion times grows linear with index size, up to tens of
| milliseconds with an index of couple 100k documents.
|
| Go library is very un-go, with not all the options exposed. And
| had a couple of breaking changes without upgrading major
| versions.
|
| Other then that, the search part works really well
| tpayet wrote:
| Hello, MeiliSearch team here :) Please, do not hesitate to
| leave an issue on the Golang repository so we can improve it!
| Also, indexing time will be much better with v0.21 planned to
| be released in a couple of days / weeks. You can test the RC
| in the meanwhile
| wiradikusuma wrote:
| I had been following its development for a while, but then I
| moved to Typesense after evaluating this matrix:
|
| https://typesense.org/typesense-vs-algolia-vs-elasticsearch-...
| (yes it's hosted by Typesense)
| hitekker wrote:
| Some basic features are missing from that doc. For example,
| typesense doesn't handle periods, underscores, dashes and other
| characters as delimiters:
|
| https://github.com/typesense/typesense/issues/122
|
| https://github.com/typesense/typesense/issues/95Oh
|
| The workaround they recommend is to duplicate your index with
| all those characters removed and then strip out those
| characters from your search queries :/
| victor106 wrote:
| How do you handle stop words in Typesense?
| KaoruAoiShiho wrote:
| According to the matrix it doesn't have support for it.
| marcinzm wrote:
| You could remove them from the query text yourself although
| since they'd still be in the index I suspect misspellings
| could cause issues.
| KaoruAoiShiho wrote:
| Damn this matrix made me want to use Algolia.
| qatanah wrote:
| We are using it on our PoC products. It's really great and fast!
| it removes all the traction of doing an autocomplete search.
|
| https://correlate.meetglimpse.com/
|
| If you're doing some test products and just want to have a search
| that is easier to setup than ES. Meilisearch is a great
| alternative.
| joelp wrote:
| MeiliSearch looks fantastic! I haven't tried it but at least it
| is written in Rust so that should be a good reason to try it out
| for a project of mine.
___________________________________________________________________
(page generated 2021-08-15 23:00 UTC)