PicoSearch - TF-IDF in 50 lines
================================
Introduction
-------------
For a long time I wanted to implement simple yet effective search using the TF-IDF 1 algorithm. I finally got around to it, and it’s a lot simpler than I thought. The algorithm is simple and the implementation is straightforward.
(DIR) 1
To get the line count to 50 I cheated a little bit by adding intermediate variables, comments and whitespace. This doubled the line count exactly from 25 to 50 :-D
Source code
------------
Output for term ’like':
Rank | Document | Score
#1 | i like fruit like oranges | 0.11507
#2 | i like apples | 0.09589
#3 | i like pears | 0.09589
Performance
------------
Performance is good enough for small corpora, but it’s not the most efficient implementation. It’s a good starting point for understanding the algorithm though.
Source code
------------
(DIR) PicoSearch.cs
(DIR) PicoSearch.csproj
----------------------------------
- Wikipedia: TF-IDF ↩︎
Wikipedia: TF-IDF ↩︎
(HTM) TF-IDF
(DIR) ↩︎
(DIR) previous Critical questions for nocode projects
(DIR) next Syntax low-lighting
Related
--------
(DIR) Blaztris
(TXT) /img/fallback.svg (.svg)
(DIR) Full-text document indexing - part Ⅰ
(TXT) /img/fallback.svg (.svg)
(DIR) Colossus PoC
(IMG) screenshot of colossus visualiser and malfunction controller (.png)
(DIR) Duotris
(TXT) Duotris (.svg)
(DIR) KASM
(TXT) /img/fallback.svg (.svg)
(DIR) Octettenpletter
(TXT) /img/fallback.svg (.svg)
(DIR) JavaScript Tetris in 1.5 kB
(IMG) jstetris.html (.png)
(DIR) Sierpinski fractal in 28 bytes
(TXT) Sierpinski Fractal (.svg)