[HN Gopher] Using sqlite3 as a notekeeping document graph
___________________________________________________________________
Using sqlite3 as a notekeeping document graph
Author : harporoeder
Score : 268 points
Date : 2021-07-01 17:49 UTC (2 days ago)
(HTM) web link (epilys.github.io)
(TXT) w3m dump (epilys.github.io)
| Johnyma22 wrote:
| btw you can use Etherpad for this too, bonus points is that it
| has real-time collabo and a bunch of other powerful tools! :)
| criddell wrote:
| HN loves notes applications and I always click through and am
| almost always disappointed. Am I unusual here in that my notes
| are not all text? If you look in my notebooks (digital or
| physical) there are sketches and images on almost every page.
| dheera wrote:
| I suppose you could always paste in a base64 image URL and make
| a UI for it?
| eitland wrote:
| > Am I unusual here in that my notes are not all text?
|
| Maybe unusual but not completely alone.
|
| I use a mix but none are complete:
|
| - Notes on iPad lets me start (or continue) a note by tapping
| the Pencil on the locked screen. Brilliant. Hard to link notes
| though.
|
| - Pencil planner is totally _brilliant_ It integrates calendar
| and lets me write on top of it and in a close to magic way it
| shows day notes in week view and the other way around. It is
| easy to use but not dumbed down. Lacks photos for now though
| but I can live with that for now.
|
| - Joplin. For anything that doesn't contain handwritten or
| images.
| scamify wrote:
| Same here. I use OneNote and it does everything I need in that
| regard, and you can link sections to each other if needed.
| fsiefken wrote:
| Wow, thanks for documenting the idea, this could be a nice
| alternative to my tiddlywiki markdown setup. I could do all kinds
| of queries on my knowledge base. Still I could use grep as well
| for document counts and lists. It's neat that it's all in one
| compressed sqlite file - but I can tar ball my markdown directory
| as well. The only benefit I see is that I can use sql to query my
| documents and it seems like a step up from org-mode tables by
| using database tables.
|
| It would be nice if there was cgi script that could serve a
| sorted and paged index with links to the html, md or gmi to a web
| or/and gemini browser.
| fsiefken wrote:
| O wow, I didn't notice there's FUSE file system.
| gandalfgeek wrote:
| This is massive overkill for notes. Simple grep or org mode will
| do. You don't need an indexing and querying engine at this scale.
| laurent123456 wrote:
| FTS is great but it's a solution that's designed to work mostly
| with the English language. For other languages, with accents and
| so on, you would probably want to normalise the FTS data first by
| removing the accents.
|
| For Chinese, Arabic and so on you'll need to a custom tokenizer
| which may or may not be available on your target platform.
| LAC-Tech wrote:
| How far behind do other big languages (Chinese, Arabic,
| Spanish, Hindi) lag behind English when it comes to full text
| search?
| quaintdev wrote:
| I run a self hosted notes application[1] on raspberry pi at home.
| I have been thinking about moving from .md files to sql. Both
| seem to have their own pros/cons. With SQLLite, I get easy
| search, tagging while with .md files I get easy editing and
| viewing by mapping networked drive.
|
| [1]: https://github.com/quaintdev/pinotes
| simonw wrote:
| I suggest having both.
|
| My TILs site runs uses a GitHub repository where the notes live
| in markdown: https://github.com/simonw/til
|
| Plus a build script running in a GitHub actions workflow that
| compiles the notes into a SQLite file using my markdown-to-
| sqlite tool and publishes the resulting SQLite file using
| Datasette to https://til.simonwillison.net - which gives me
| search and an Atom feed and suchlike.
|
| The site has custom templates so it Durant look like regular
| Datasette, but you can run custom queries against it at
| https://til.simonwillison.net/tils
| sandGorgon wrote:
| this is very interesting. Would you still use this
| architecture if u were building something from scratch today
| ?
|
| I can see the markdown format being powerful. But how do you
| parse it and create a document graph ? (or rather...what do u
| use to persist the document graph)
|
| I have been trying to use Firebase on a side project of mine
| for this markdown -> graph problem
| setr wrote:
| I don't see the difficulty here -- with SQLite, you'd just
| store the set of links as an adjacency list; for the major
| job of linking things, you don't even need to traverse the
| graph -- you only need to know 1-level of relationships to
| generate the links. But when you do traverse, you can use
| recursive WITH.
|
| Presumably for each build, you'd parse the names of each
| doc to produce the list of nodes (and their paths), and
| then as you encounter the markdown reference, update it
| accordingly.
|
| If you start afresh each time, handling updates would be
| trivial, but you're at risk of destroying existing URLs
| when changing doc titles/reorganizing -- so you probably
| want a canonical name (to generate the final URL for), and
| the symbolic names (to refer by in markdown).
|
| And then eventually you realize you want multiple ways to
| organize your docs, so you start tagging, and then
| eventually you realize you want ways to reference groups of
| notes, so you'll go ahead and implement hierarchal tagging,
| and you'll finally have be able to treat your graph as a
| graph, instead of a forest (list of trees)
|
| And finally you can simplify the whole thing to only
| reference tags (or parent tags), which may happen to link
| to a single doc, or multiple. Probably in the special case
| of 1 doc, the link directly takes you to the doc instead of
| some collection page.
|
| A document store seems like a terrible idea, because you're
| really not modeling a tree, and that's probably the
| majority of your modeling problem -- I imagine you're
| trying to currently stuff a graph into a tree.
| gchamonlive wrote:
| Could elasticsearch be a no compromise solution for this case?
| Or maybe meilisearch as a lightweight alternative
| jll29 wrote:
| Elasticsearch is probably not needed, but a full-text index
| based search engine - plain vanilla Java Lucene or CLucene -
| could be very useful, and indeed be more appropriate given
| that the notes are unstructured text (I actually don't see
| the need for SQL unless you need to perform very complex
| operations on the structured meta-data).
|
| The advantage is an easy-to-learn search syntax that is
| flexible and similar to Google:
| https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
| In addition to a document field for your notes, you can
| manage and query any number of meta-data fields e.g.
| Tom M* birthday:July
|
| retrieves all notes that mention Tom with a last name
| starting with "M" as long as the notes also have a meta-data
| field called birthday where the value must be July. (Try
| compare doing this with the SQL equivalent version of the
| query!)
| axiom92 wrote:
| IMO https://roamresearch.com/ offers a more practical way of
| maintaining notes using graphs.
| cosmojg wrote:
| This looks like a proprietary clone of TiddlyWiki[1] with a
| slower, clunkier interface.
|
| [1] https://tiddlywiki.com/
| hda111 wrote:
| I love TiddlyWiki. It's the most useful software I've found
| in the recent years.
| cartoonworld wrote:
| This ranks anywhere from "nerdy humble brag" to "absurdly
| impractical" on usage, but what an innovative way to get
| extremely familiar using sqlite3. I like it.
|
| Sick of people reading your journal? Do your worst, _MOM_.
| epilys wrote:
| Thank you :) It was just a demo on how I found sqlite's
| indexing useful for something I thought would require an
| extension or external tools. I don't keep notes myself, I was
| working with full text search because I was indexing my PDFs. I
| hope it's useful for people that do keep notes, however.
| bloopernova wrote:
| It's very cool, I enjoyed reading the post describing it!
|
| Personally I use Emacs, org-mode, and org-roam for Zettel-
| style links between documents. Org-roam uses sqlite to store
| its links and index, but documents remain in plain text .org
| files.
| grlass wrote:
| Indeed, thank for the post!
|
| +1 on org-roam, and may I add org-roam-server, which
| visualises that graph in the browser very nicely, in an
| interactive way: <https://github.com/org-roam/org-roam-
| server>
| rzzzt wrote:
| Slightly off-topic: where/when does it make sense to "spend" the
| 5-6 bits in a version 4 (random) UUID to indicate which version
| and variant it is?
| epilys wrote:
| According to the RFC, even v4 requires the version bits set. Is
| it worth it? Depends on what you want. For stuff like the
| original post, even an AUTOINCREMENT integer primary key or
| even a title/slug text key would do just fine. For interfacing
| with external stuff that expects UUIDs it's mandatory for
| compliance. (I hope I didn't confuse many people by mentioning
| UUIDs)
| rzzzt wrote:
| Suppose I want a universally unique identifier, which can be
| generated in a decentralized manner and there is a very small
| chance that someone else chooses to use the same number in
| the same system, resulting in a collision. I decide to use a
| pseudorandom generator for this purpose, picking numbers out
| of a sufficiently large space, where the chances of an
| undesired coincidence are precisely dialed in.
|
| What good rises out of including extra paperwork with each
| number, essentially saying "attention please, a random
| number", or "this is a timestamp, a monotonic counter and a
| MAC address", like if it was a newfangled smartphone? The RFC
| also says that there is no mechanism for validating a UUID,
| save for checking if its timestamp part is in the future, for
| versions that employ such a portion. Why can't any 128 bit
| value be accepted on the receiving end?
| bccdee wrote:
| I suppose it further reduces the chance of a collision. If
| you're in a totally random space, there's always a chance
| someone will generate a collision if you wait long enough.
| But if you follow the timestamp and MAC address rules, then
| for a collision to happen it basically needs to be the same
| computer hitting the same number twice on the RNG within
| one second, which is less likely.
|
| But, since that only holds true if people follow the rules,
| it's worth standardizing on either v1 or v4, and so it's
| good to be able to distinguish between the two for
| validation purposes.
| Pamar wrote:
| Here is the direct link to the Bibliothecula project (it might be
| just me but it was not so easy to find):
| https://github.com/epilys/bibliothecula#tooling
___________________________________________________________________
(page generated 2021-07-03 23:00 UTC)