[HN Gopher] Using sqlite3 as a notekeeping document graph
       ___________________________________________________________________
        
       Using sqlite3 as a notekeeping document graph
        
       Author : harporoeder
       Score  : 268 points
       Date   : 2021-07-01 17:49 UTC (2 days ago)
        
 (HTM) web link (epilys.github.io)
 (TXT) w3m dump (epilys.github.io)
        
       | Johnyma22 wrote:
       | btw you can use Etherpad for this too, bonus points is that it
       | has real-time collabo and a bunch of other powerful tools! :)
        
       | criddell wrote:
       | HN loves notes applications and I always click through and am
       | almost always disappointed. Am I unusual here in that my notes
       | are not all text? If you look in my notebooks (digital or
       | physical) there are sketches and images on almost every page.
        
         | dheera wrote:
         | I suppose you could always paste in a base64 image URL and make
         | a UI for it?
        
         | eitland wrote:
         | > Am I unusual here in that my notes are not all text?
         | 
         | Maybe unusual but not completely alone.
         | 
         | I use a mix but none are complete:
         | 
         | - Notes on iPad lets me start (or continue) a note by tapping
         | the Pencil on the locked screen. Brilliant. Hard to link notes
         | though.
         | 
         | - Pencil planner is totally _brilliant_ It integrates calendar
         | and lets me write on top of it and in a close to magic way it
         | shows day notes in week view and the other way around. It is
         | easy to use but not dumbed down. Lacks photos for now though
         | but I can live with that for now.
         | 
         | - Joplin. For anything that doesn't contain handwritten or
         | images.
        
         | scamify wrote:
         | Same here. I use OneNote and it does everything I need in that
         | regard, and you can link sections to each other if needed.
        
       | fsiefken wrote:
       | Wow, thanks for documenting the idea, this could be a nice
       | alternative to my tiddlywiki markdown setup. I could do all kinds
       | of queries on my knowledge base. Still I could use grep as well
       | for document counts and lists. It's neat that it's all in one
       | compressed sqlite file - but I can tar ball my markdown directory
       | as well. The only benefit I see is that I can use sql to query my
       | documents and it seems like a step up from org-mode tables by
       | using database tables.
       | 
       | It would be nice if there was cgi script that could serve a
       | sorted and paged index with links to the html, md or gmi to a web
       | or/and gemini browser.
        
         | fsiefken wrote:
         | O wow, I didn't notice there's FUSE file system.
        
       | gandalfgeek wrote:
       | This is massive overkill for notes. Simple grep or org mode will
       | do. You don't need an indexing and querying engine at this scale.
        
       | laurent123456 wrote:
       | FTS is great but it's a solution that's designed to work mostly
       | with the English language. For other languages, with accents and
       | so on, you would probably want to normalise the FTS data first by
       | removing the accents.
       | 
       | For Chinese, Arabic and so on you'll need to a custom tokenizer
       | which may or may not be available on your target platform.
        
         | LAC-Tech wrote:
         | How far behind do other big languages (Chinese, Arabic,
         | Spanish, Hindi) lag behind English when it comes to full text
         | search?
        
       | quaintdev wrote:
       | I run a self hosted notes application[1] on raspberry pi at home.
       | I have been thinking about moving from .md files to sql. Both
       | seem to have their own pros/cons. With SQLLite, I get easy
       | search, tagging while with .md files I get easy editing and
       | viewing by mapping networked drive.
       | 
       | [1]: https://github.com/quaintdev/pinotes
        
         | simonw wrote:
         | I suggest having both.
         | 
         | My TILs site runs uses a GitHub repository where the notes live
         | in markdown: https://github.com/simonw/til
         | 
         | Plus a build script running in a GitHub actions workflow that
         | compiles the notes into a SQLite file using my markdown-to-
         | sqlite tool and publishes the resulting SQLite file using
         | Datasette to https://til.simonwillison.net - which gives me
         | search and an Atom feed and suchlike.
         | 
         | The site has custom templates so it Durant look like regular
         | Datasette, but you can run custom queries against it at
         | https://til.simonwillison.net/tils
        
           | sandGorgon wrote:
           | this is very interesting. Would you still use this
           | architecture if u were building something from scratch today
           | ?
           | 
           | I can see the markdown format being powerful. But how do you
           | parse it and create a document graph ? (or rather...what do u
           | use to persist the document graph)
           | 
           | I have been trying to use Firebase on a side project of mine
           | for this markdown -> graph problem
        
             | setr wrote:
             | I don't see the difficulty here -- with SQLite, you'd just
             | store the set of links as an adjacency list; for the major
             | job of linking things, you don't even need to traverse the
             | graph -- you only need to know 1-level of relationships to
             | generate the links. But when you do traverse, you can use
             | recursive WITH.
             | 
             | Presumably for each build, you'd parse the names of each
             | doc to produce the list of nodes (and their paths), and
             | then as you encounter the markdown reference, update it
             | accordingly.
             | 
             | If you start afresh each time, handling updates would be
             | trivial, but you're at risk of destroying existing URLs
             | when changing doc titles/reorganizing -- so you probably
             | want a canonical name (to generate the final URL for), and
             | the symbolic names (to refer by in markdown).
             | 
             | And then eventually you realize you want multiple ways to
             | organize your docs, so you start tagging, and then
             | eventually you realize you want ways to reference groups of
             | notes, so you'll go ahead and implement hierarchal tagging,
             | and you'll finally have be able to treat your graph as a
             | graph, instead of a forest (list of trees)
             | 
             | And finally you can simplify the whole thing to only
             | reference tags (or parent tags), which may happen to link
             | to a single doc, or multiple. Probably in the special case
             | of 1 doc, the link directly takes you to the doc instead of
             | some collection page.
             | 
             | A document store seems like a terrible idea, because you're
             | really not modeling a tree, and that's probably the
             | majority of your modeling problem -- I imagine you're
             | trying to currently stuff a graph into a tree.
        
         | gchamonlive wrote:
         | Could elasticsearch be a no compromise solution for this case?
         | Or maybe meilisearch as a lightweight alternative
        
           | jll29 wrote:
           | Elasticsearch is probably not needed, but a full-text index
           | based search engine - plain vanilla Java Lucene or CLucene -
           | could be very useful, and indeed be more appropriate given
           | that the notes are unstructured text (I actually don't see
           | the need for SQL unless you need to perform very complex
           | operations on the structured meta-data).
           | 
           | The advantage is an easy-to-learn search syntax that is
           | flexible and similar to Google:
           | https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
           | In addition to a document field for your notes, you can
           | manage and query any number of meta-data fields e.g.
           | Tom M* birthday:July
           | 
           | retrieves all notes that mention Tom with a last name
           | starting with "M" as long as the notes also have a meta-data
           | field called birthday where the value must be July. (Try
           | compare doing this with the SQL equivalent version of the
           | query!)
        
       | axiom92 wrote:
       | IMO https://roamresearch.com/ offers a more practical way of
       | maintaining notes using graphs.
        
         | cosmojg wrote:
         | This looks like a proprietary clone of TiddlyWiki[1] with a
         | slower, clunkier interface.
         | 
         | [1] https://tiddlywiki.com/
        
           | hda111 wrote:
           | I love TiddlyWiki. It's the most useful software I've found
           | in the recent years.
        
       | cartoonworld wrote:
       | This ranks anywhere from "nerdy humble brag" to "absurdly
       | impractical" on usage, but what an innovative way to get
       | extremely familiar using sqlite3. I like it.
       | 
       | Sick of people reading your journal? Do your worst, _MOM_.
        
         | epilys wrote:
         | Thank you :) It was just a demo on how I found sqlite's
         | indexing useful for something I thought would require an
         | extension or external tools. I don't keep notes myself, I was
         | working with full text search because I was indexing my PDFs. I
         | hope it's useful for people that do keep notes, however.
        
           | bloopernova wrote:
           | It's very cool, I enjoyed reading the post describing it!
           | 
           | Personally I use Emacs, org-mode, and org-roam for Zettel-
           | style links between documents. Org-roam uses sqlite to store
           | its links and index, but documents remain in plain text .org
           | files.
        
             | grlass wrote:
             | Indeed, thank for the post!
             | 
             | +1 on org-roam, and may I add org-roam-server, which
             | visualises that graph in the browser very nicely, in an
             | interactive way: <https://github.com/org-roam/org-roam-
             | server>
        
       | rzzzt wrote:
       | Slightly off-topic: where/when does it make sense to "spend" the
       | 5-6 bits in a version 4 (random) UUID to indicate which version
       | and variant it is?
        
         | epilys wrote:
         | According to the RFC, even v4 requires the version bits set. Is
         | it worth it? Depends on what you want. For stuff like the
         | original post, even an AUTOINCREMENT integer primary key or
         | even a title/slug text key would do just fine. For interfacing
         | with external stuff that expects UUIDs it's mandatory for
         | compliance. (I hope I didn't confuse many people by mentioning
         | UUIDs)
        
           | rzzzt wrote:
           | Suppose I want a universally unique identifier, which can be
           | generated in a decentralized manner and there is a very small
           | chance that someone else chooses to use the same number in
           | the same system, resulting in a collision. I decide to use a
           | pseudorandom generator for this purpose, picking numbers out
           | of a sufficiently large space, where the chances of an
           | undesired coincidence are precisely dialed in.
           | 
           | What good rises out of including extra paperwork with each
           | number, essentially saying "attention please, a random
           | number", or "this is a timestamp, a monotonic counter and a
           | MAC address", like if it was a newfangled smartphone? The RFC
           | also says that there is no mechanism for validating a UUID,
           | save for checking if its timestamp part is in the future, for
           | versions that employ such a portion. Why can't any 128 bit
           | value be accepted on the receiving end?
        
             | bccdee wrote:
             | I suppose it further reduces the chance of a collision. If
             | you're in a totally random space, there's always a chance
             | someone will generate a collision if you wait long enough.
             | But if you follow the timestamp and MAC address rules, then
             | for a collision to happen it basically needs to be the same
             | computer hitting the same number twice on the RNG within
             | one second, which is less likely.
             | 
             | But, since that only holds true if people follow the rules,
             | it's worth standardizing on either v1 or v4, and so it's
             | good to be able to distinguish between the two for
             | validation purposes.
        
       | Pamar wrote:
       | Here is the direct link to the Bibliothecula project (it might be
       | just me but it was not so easy to find):
       | https://github.com/epilys/bibliothecula#tooling
        
       ___________________________________________________________________
       (page generated 2021-07-03 23:00 UTC)