[HN Gopher] Ask HN: Books about full text search?
___________________________________________________________________
Ask HN: Books about full text search?
I would love to learn more about FTS at a very low level and I'm
looking for books to read more on that topic. Any good suggestions
?
Author : sopromo
Score : 86 points
Date : 2022-11-24 17:58 UTC (5 hours ago)
| [deleted]
| pixelmonkey wrote:
| Take a look at my post "Lucene: The Good Parts"--
|
| https://blog.parse.ly/lucene/
|
| The book mentioned there is Lucene in Action.
|
| And then this YouTube presentation by a Lucene/Elasticsearch
| committer will give you a nice overview of some related
| algorithms--
|
| https://youtu.be/eQ-rXP-D80U
| DamonHD wrote:
| Managing Gigabytes
|
| https://books.google.co.uk/books/about/Managing_Gigabytes.ht...
|
| Old but good!
| CoolestBeans wrote:
| Came here to recommend Managing Gigabytes as well. People these
| days are managing far more than gigabytes but the fundamental
| ideas remain useful.
| 100k wrote:
| At a general audience level, "Index" is on my list to read. It
| covers the invention of the index up to digital search engines.
| https://www.nytimes.com/2022/02/09/books/review-index-histor...
|
| "Introduction to Information Retrieval" is a textbook which is
| available online https://nlp.stanford.edu/IR-book/ Here's a
| review: http://glinden.blogspot.com/2009/02/book-review-
| introduction...
|
| Another textbook which IMHO is a bit lower level is "Information
| Retrieval: Implementing and Evaluating Search Engines". The book
| website is down for me right now, but you can find it on Amazon
| here: https://www.amazon.com/Information-Retrieval-Implementing-
| Ev...
|
| Another commenter linked to "Relevant Search", which is great if
| you want to learn how to effectively use a search engine to
| improve relevance (as opposed to how to implement a search
| engine). It's old, but another book in that vein that was really
| helpful for me earlier in my career is Lucene in Action:
| https://www.amazon.com/Lucene-Action-Second-Covers-Apache/dp...
| tgv wrote:
| Check the literature of open courses on Text Retrieval. E.g.
| https://stanford.edu/class/cs276/
| binarymax wrote:
| "Relevant search" by Doug Turnbull and John Berryman, published
| by Manning, is THE best book to get started with tuning search
| engines.
|
| I'be been a search engineer for >10 years and this is always the
| first book I recommend.
|
| https://www.manning.com/books/relevant-search
| softwaredoug wrote:
| Awe thanks Max <3
| francoisprunier wrote:
| Not a book, but this paper from 2019 covers a lot of ground and
| reviews the different topics extensively:
| https://tonellotto.github.io/publication/fntir/fntir_main.pd...
| fiedzia wrote:
| https://www.manning.com/books/relevant-search
|
| Also "taming text"
| arooaroo wrote:
| Manning also have a book on Lucene, the library that powers
| Solr and ElasticSearch. IIRC the book covered how Lucene
| actually works under-the-good and would therefore act as a good
| reference on the subject in general.
| gardenfelder wrote:
| Taming Text is about building a question-answering system; it
| came out about the time Watson came online; it's not a plan,
| rather a cookbook of experiments using Apache products like
| Solr and OpenNLP, but is a great tutorial on how question
| answering works.
| vdfs wrote:
| Lucene in Action, good introduction to Lucene, which can be
| helpful to learn ElasticSearch (most used FTS these days)
| _tom_ wrote:
| Lucene in Action covers Lucene 3.0, and is from 2010. Current
| version is 9.4.2. So much has changed.
| cb321 wrote:
| It's all in the Nim programming language, but if you prefer
| reading code or running diffs then you might get a vague sense of
| (some) low level nuts & bolts from:
| https://github.com/c-blake/nimsearch
| unixhero wrote:
| Just use Postgres fulltext Search, its good enough
| http://rachbelaid.com/postgres-full-text-search-is-good-enou...
| ssn wrote:
| Three reference textbooks are available openly:
|
| * Introduction to Information Retrieval,
| http://informationretrieval.org/
|
| * Information Retrieval in Practice, http://www.search-engines-
| book.com/
|
| * Entity-Oriented Search, https://eos-book.org/
|
| Modern Information Retrieval is also a classic reference. Not
| openly available but some contents are (were?) available online.
| Their site seems to be down but the Internet Archive has a copy.
|
| Additional resources here:
|
| * https://nlp.stanford.edu/IR-book/information-retrieval.html
| http://web.archive.org/web/20220708135205/http://grupoweb.up...
| brudgers wrote:
| Not a book but Hellerstein's CS186 from 2015 starting with
| Lecture 17 gave me a basic understanding (I think).
|
| Playlist
| https://youtube.com/playlist?list=PLhMnuBfGeCDPtyC9kUf_hG_Qw...
|
| Also from that lecture series, the low level is always IO. One
| disk read tends to dwarf n^2 in-memory algorithms.
|
| And IO is all about tuning caches and hardware for the specific
| structural relationships in the data, the way in which it is
| accessed, and the hardware everything runs on.
|
| Good luck.
___________________________________________________________________
(page generated 2022-11-24 23:00 UTC)