Full-text document indexing - part Ⅰ
=======================================
Introduction
-------------
This article shows a simple way of doing full-text indexing and search of Office Open XML and OpenDocument 1 documents using SQLite .
(DIR) 1
(HTM) SQLite
- prepare the database
- insert documents (indexing)
- query using full-text search
Preparation
------------
Create a virtual table using SQLite’s full-text search feature:
(HTM) full-text search
Indexing
---------
Because both the Office Open XML and OpenDocument formats are just zip archives containg a bunch of XML this is easy.
- Add a single document -
Note that this uses parameterized queries, so adapt it to your needs.
Search
-------
This query could result in something like this:
rank | name | size
-0.845353371460758 | fiets.docx | 8814
-0.438947157337124 | products.ods | 8845
Next up?
---------
In Full-text document indexing - part Ⅱ we’ll implement a simple frontend for this search engine.
- Notes -
----------------------------------
- https://en.wikipedia.org/wiki/Comparison_of_Office_Open_XML_and_OpenDocument ↩︎
https://en.wikipedia.org/wiki/Comparison_of_Office_Open_XML_and_OpenDocument ↩︎
(HTM) https://en.wikipedia.org/wiki/Comparison_of_Office_Open_XML_and_OpenDocument
(DIR) ↩︎
(DIR) previous Playwright, didn’t know I missed it
(DIR) next Minimalistic SVG library
Related
--------
(DIR) SQL injection demo
(TXT) /img/fallback.svg (.svg)
(DIR) My .sqliterc
(TXT) /img/fallback.svg (.svg)
(DIR) PicoSearch - TF-IDF in 50 lines
(TXT) /img/fallback.svg (.svg)
(DIR) [Talk] SQLite <3
(TXT) /img/fallback.svg (.svg)
(DIR) WordPress REST API
(TXT) /img/fallback.svg (.svg)
(DIR) About me
(TXT) /img/fallback.svg (.svg)
(DIR) My SQLite knowledge
(TXT) /img/fallback.svg (.svg)