[HN Gopher] Postgres full-text search: A search engine in a data...
___________________________________________________________________
Postgres full-text search: A search engine in a database (2021)
Author : craigkerstiens
Score : 126 points
Date : 2022-07-11 18:07 UTC (4 hours ago)
(HTM) web link (www.crunchydata.com)
(TXT) w3m dump (www.crunchydata.com)
| itsthecourier wrote:
| we built a listings site similar to ebay in latin america after
| buying the operation from an european company,
| https://corotos.com
|
| we made use of bson in the listings table for metadata and FTS
| inside postgres. spanish dictionaries, weighting, white-listings
| of words and including metadata inside the tsvectors have been
| awesome. really quick search, you never have unsync documents in
| another part of the stack and you save a lot in storage
|
| glad to help anybody going this path, no regrets.
| cpursley wrote:
| A writeup or gist of this approach would be awesome.
| mixcocam wrote:
| I think SQLite is even more batteries included!
| https://www.sqlite.org/fts5.html
| canadiantim wrote:
| More difficult to do facets with sqlite vs postgres though
| shakezula wrote:
| I used SQLite's built in fuzzy search and it is surprisingly
| powerful. I used it to for full text search over a magic the
| gathering card database and it was amazing.
| cduzz wrote:
| I've often wondered why someone doesn't reimplement
| elasticsearch using sqlite as the back-end datastore, instead
| of lucene.
|
| Or did I just describe foundationDB?
|
| I guess the real problem is that such a beast wouldn't have
| kibana, and people _want_ kibana and put up with
| elasticsearch to get it.
| jfbaro wrote:
| I have seen this presentation from a company that used PG Full
| Text search for a pretty complex use case. Interesting ->
| http://matheusoliveira.s3-website-us-east-1.amazonaws.com/pr...
| (updated thanks to ddevault)
| RowanH wrote:
| Good link thinks. Have to say we're moving to pgSearch after
| implementing SOLR alongside a PG/Rails backend. Makes the stack
| simpler, less components to worry about (less headaches for gem
| versioning dependencies with SOLR). Lot to be said for it after
| reading through what's available now...
| sandGorgon wrote:
| are you planning to do relevance ? this is one of the
| blockers for us - TFIDF vs BM25 or something.
|
| From what i know - PG fulltext seaarch does not implement
| relevance.
| rocmcd wrote:
| This has been my understanding of the state of Postgres
| full-text search. It's great if your search requirements
| are fairly vanilla, but I haven't seen any solutions for
| more advanced search needs, such boosting, relevance,
| scoring, etc.
| mamcx wrote:
| An alternative is to just layer the FTS on top of vanilla
| sql to get the extra stuff (this is what I do for my
| eCommerce backend), so is pretty simple to have something
| alike: SELECT .. -- Get the fts
| IN ( FTS QUERY) ORDER BY -- The relevance is
| hardcoded? mayber in another table that store th rankings?
| ( Products, Inventory,
| Invoices,.. )
|
| I found is much easier and predictable if I code the
| "rankings" based on the business logic instead of let the
| FTS engine guess it. You can store that stuff as normal
| columns or use the "sources" (ie: products, inventory) as
| ways to know what could be more important to pull first.
|
| This have the nice property that our search results are ver
| good and better: Never return non-sensical stuff! (like
| searching for a apple in the store and get and blog post!)
| ddevault wrote:
| Please don't use URL shorteners.
|
| http://matheusoliveira.s3-website-us-east-1.amazonaws.com/pr...
| jfbaro wrote:
| Updated! Thanks
| lettergram wrote:
| I've written some tips on this before:
| https://austingwalters.com/fast-full-text-search-in-postgres...
|
| Basically, it's easy to create triggers that'll update an index
| for you on entry. Then you can create custom indices that have
| key words, etc.
| Latty wrote:
| Is there a benefit to using a trigger over a generated stored
| column?
| lettergram wrote:
| You can technically do way more robust things in a trigger.
| If you don't need it then :shrug: but I often create fairly
| robust updates
| crtxcr wrote:
| I was a bit disappointed by some limitations back then when I
| tried it for a project of mine. When searching phrases where
| ordering matters, phraseto_tsquery() does not quite work for
| larger documents, as the tsvector position values are quite
| limited: https://www.postgresql.org/docs/14/textsearch-
| limitations.ht... Here I had much better success with sqlite's
| FTS implemention.
| cpursley wrote:
| Here's the approach I settled on that allows for fuzzy search
| (also some useful links at the bottom):
|
| https://gist.github.com/cpursley/e3586382c3a42c54ca7f5fef166...
|
| Downside is I haven't found a way to do weighting.
| panabee wrote:
| slightly off-topic, but if the goal is to store vast amounts of
| protein and gene nucleotide sequences and allow wildcard searches
| (e.g., %CTAAACGGG%), would postgres be the recommended approach?
|
| assume 200K proteins/genes with nucleotide sequences ranging from
| 20 base pairs to 2.3 million base pairs.
| giovannibonetti wrote:
| Pg_trgrm based indexes works well to make this type of LIKE
| %...% queries fast
| jonatron wrote:
| Last time I looked, Postgres full text search doesn't really do
| CJK without third party parsers, which can be a bit of a deal
| breaker.
| zkirill wrote:
| I love Postgres but lack of CJK full-text search is the main
| reason why I am going with Lucene/Solr for in-app search.
|
| It's possible to make Postgres do amazing things (e.g. job
| queue with SKIP LOCKED!) but at what point do you draw the line
| and use the right tool for the job?
| blowski wrote:
| Any idea how well CJK works in general with Postgres? Or is
| it just on full text where it's lacking?
| zkirill wrote:
| In general, we have absolutely no problem with CJK in
| Postgres. Search is very application specific and some of
| our users want to do things like kana + kanji search. [1] I
| don't even know where you would start with that in
| Postgres.
|
| We also need stuff like language detection and analysis of
| mixed language data.
|
| A big advantage of going with a dedicated search tool is
| that it teaches you what you don't know about search, and
| it turned out that we knew pretty much nothing.
|
| [1] https://www.elastic.co/blog/implementing-japanese-
| autocomple...
| mistrial9 wrote:
| the Japanese national phone company NTT is a major
| postgresql contributor
| chucky_z wrote:
| When the feature doesn't exist! For existing
| workflows/functionality; I think using a new tool for a
| feature that doesn't (and won't for some time) exist is a
| _perfectly_ good reason to introduce the tool. On top of that
| you're looking at using something that is mature and well-
| used, Lucene/Solr.
|
| If you wanted to be cheeky you could go ask your CTO if they
| would foot the bill for someone like 2nd Quadrant to write
| CJK full-text search into pgsql. ;)
|
| I've done this for some features/issues in random OSS
| projects and if you feel someone else has the same problem I
| always feel paying money to have it implemented is well worth
| it.
| [deleted]
| simonw wrote:
| The hardest part of building any search engine is keeping the
| index up-to-date with changes made to the underlying data store.
|
| It's a solvable problem, but it's always a lot of work to build
| and to keep working as the database schema changes in the future.
|
| This is why I really like PostgreSQL FTS: it's good enough that
| for many projects I don't need to use an external search engine
| any more - and it's way easier to keep the search index up-to-
| date than if the index lives in a separate system.
|
| I wrote this tutorial on implementing faceted search with
| PostgreSQL and Django a while ago:
| https://simonwillison.net/2017/Oct/5/django-postgresql-facet...
| canadiantim wrote:
| Have you ever considered using PGroonga
| (https://pgroonga.github.io/) to further extend FTS in
| postgres? They've got a drilldown feature that is basically
| their implementation of facets. Always wondered what your
| thoughts on that might be?
|
| I've got a django setup with postgres and facets implemented as
| you detailed in your blog (thank you for all the amazing info
| over the years btw!), now I'm looking to extend it with
| PGroonga and potentially considering Hasura to try and further
| extend what I can do while still using Postgres as my single
| source of truth without need for keeping any other service in
| sync.
| almog wrote:
| > The hardest part of building any search engine is keeping the
| index up-to-date with changes made to the underlying data
| store.
|
| That makes sense and I'm curious to know if you're referring to
| the challenges that arise from attempting to implement a
| distributed heterogeneous transaction that includes system of
| record (say RDBMS) and derived data (search engine), or did you
| think about something that's more specific to search engines?
| soperj wrote:
| How hard is this on the database?
| ravirajx7 wrote:
| I am feeling real bad now as I used the like query as OP did
| (although in MySQL) for implementing a custom search over two
| fields ( Name & Description ) in one of my personal projects. I
| thought it worked pretty well though and I was so happy about it.
| giovannibonetti wrote:
| Don't confuse full text search with word part search. If you
| have a list of names, you're probably looking for the latter
| one, where LIKE works fine and can be indexed with pg_trgrm.
| phemartin wrote:
| mongodb has its own full-text implementation with Atlas Search.
| it has full Lucene capability. it's worth looking into.
|
| https://www.mongodb.com/docs/atlas/atlas-search/
| cpursley wrote:
| If you're already running postgres, why would you set up mongo
| for search instead of elastic, etc?
| qohen wrote:
| FYI, this crunchydata.com blog-post was discussed 11 months ago
| on HN[0], in a thread awarded 601 points, with 137 comments.
|
| [0] https://news.ycombinator.com/item?id=27973497
| qohen wrote:
| Among other things in that earlier thread, there's some
| discussion of aspects of ZomboDB with its developer.
|
| _ZomboDB is a Postgres extension that enables efficient full-
| text searching via the use of indexes backed by Elasticsearch._
| (From the project 's website[0]).
|
| Lots more info in readme at the project's github[1].
|
| [0] http://zombodb.com
|
| [1] https://github.com/zombodb/zombodb
| puika wrote:
| Mandatory mention of the RUM extension
| (https://github.com/postgrespro/rum) if this caught your eye.
| Lots of tutorials and conference presentations out there
| showcasing the advantages in terms of ranking, timestamps...
| adhoc_slime wrote:
| Very convenient for me that this article swings around, I'm
| weighing pros and cons of PGSQL and elastic search for full-text
| search for the business's use-case.
| avereveard wrote:
| Also consider full ngram search, so you can do client side
| stemming and use a single index for all languages. It's not out
| of the box, but installing the ngram extension is literally one
| line.
___________________________________________________________________
(page generated 2022-07-11 23:00 UTC)