joriszwart.nl

       
       PicoSearch - TF-IDF in 50 lines 
       ================================
       
       
       Introduction 
       -------------
       
       
       For a long time I wanted to implement simple yet effective search using the TF-IDF 1 algorithm. I finally got around to it, and it’s a lot simpler than I thought. The algorithm is simple and the implementation is straightforward. 
       
 (DIR) 1
       
       To get the line count to 50 I cheated a little bit by adding intermediate variables, comments and whitespace. This doubled the line count exactly from 25 to 50 :-D 
       
       
       Source code 
       ------------
       
       
       Output for term ’like': 
       
       Rank  | Document  | Score 
       #1  | i like fruit like oranges  | 0.11507 
       #2  | i like apples  | 0.09589 
       #3  | i like pears  | 0.09589 
       
       Performance 
       ------------
       
       
       Performance is good enough for small corpora, but it’s not the most efficient implementation. It’s a good starting point for understanding the algorithm though. 
       
       
       Source code 
       ------------
       
 (DIR) PicoSearch.cs
 (DIR) PicoSearch.csproj
       ----------------------------------
       -  Wikipedia: TF-IDF  ↩︎  
       
       Wikipedia: TF-IDF  ↩︎ 
       
 (HTM) TF-IDF 
 (DIR) ↩︎
 (DIR) previous Critical questions for nocode projects
       
 (DIR) next Syntax low-lighting
       
       
       Related 
       --------
       
 (DIR) Blaztris
 (TXT) /img/fallback.svg (.svg) 
 (DIR) Full-text document indexing - part Ⅰ
 (TXT) /img/fallback.svg (.svg) 
 (DIR) Colossus PoC
 (IMG) screenshot of colossus visualiser and malfunction controller (.png) 
 (DIR) Duotris
 (TXT) Duotris (.svg) 
 (DIR) KASM
 (TXT) /img/fallback.svg (.svg) 
 (DIR) Octettenpletter
 (TXT) /img/fallback.svg (.svg) 
 (DIR) JavaScript Tetris in 1.5 kB
 (IMG) jstetris.html (.png) 
 (DIR) Sierpinski fractal in 28 bytes
 (TXT) Sierpinski Fractal (.svg)