[HN Gopher] Wddbfs - Mount a SQLite database as a filesystem
       ___________________________________________________________________
        
       Wddbfs - Mount a SQLite database as a filesystem
        
       Author : vitplister
       Score  : 233 points
       Date   : 2024-02-18 09:15 UTC (13 hours ago)
        
 (HTM) web link (adamobeng.com)
 (TXT) w3m dump (adamobeng.com)
        
       | account-5 wrote:
       | This looks pretty cool.
        
       | pletnes wrote:
       | Wrap in an SSH tunnel and you can do some fun stuff over the
       | network, too.
        
         | rickette wrote:
         | Over the network (when using azure or gcp storage) there's also
         | https://sqlite.org/cloudsqlite/doc/trunk/www/index.wiki which
         | doesn't require loading everything in memory.
        
       | eternauta3k wrote:
       | Why webdav instead of making short sqltocsv, sqltojson, etc
       | scripts? You could make completion work (with some enormous time
       | investment).
        
         | dmd wrote:
         | Because those all exist already, and this is cool?
        
       | renonce wrote:
       | > Although for now, the whole table gets read into memory for
       | every read so this won't work well for very large database files.
       | There's also no write support... yet.
       | 
       | At such a set of features I would prefer a tool that converts
       | databases to a directory of real csv and jsonl files, at least
       | there are no performance issues to worry about
        
       | AnyTimeTraveler wrote:
       | I really like the idea of running a watch on an sqlite table with
       | my common cli tools. If I understand correctly, it is being re-
       | read on every request. Does that mean that changes to the sqlite
       | database will be visible on the next read of a csv file?
        
       | blagie wrote:
       | This seems really nice!
       | 
       | If this is posted by the author looking for feedback:
       | 
       | 1) WebDAV is a much better choice than FUSE. FUSE is a good
       | concept, but buggy and poorly-implemented. Things like sshfs can
       | break in very bad ways if e.g. there is a network connectivity
       | issue. Not a hack.
       | 
       | 2) Writes seem like a very bad idea. Keep those out unless you
       | come up with a clean way to handle them (which seems difficult if
       | not impossible given the differences in FS versus relational
       | abstractions, especially with regards to data validation). Not a
       | limitation.
       | 
       | In other words, the "hacks" seem like design choices a good
       | architect would likely have made. Continuing:
       | 
       | 3) The major use-case I have is if I have a small (<1MB)
       | database, and don't know the structure. Lots of tools use small
       | sqlite databases. There is no way to query all tables for
       | something, whereas tools like `find` and `grep` can look through
       | all files. I was recently trying to recover some lost data, and
       | it was a pain to find it.
       | 
       | 4) I think a major theoretical question is how to fuse the two
       | models. I would like to be able to do 'generic' things like the
       | above on databases, while still being able to be relational.
       | 
       | 5) I don't have an answer to the above, but perhaps natural first
       | step might be to allow something like queries or virtual tables
       | to sit on the file system:
       | 
       | wddbfs --anonymous --db-
       | path=/path/to/an/example/database/like/Chinook_Sqlite.sqlite
       | 
       | wddbfs_query myjoin "SELECT * FROM table_1, table_2 WHERE
       | table_1.id=table_2.id"
       | 
       | And voila! A /virtual/myjoin.csv file pops up.
       | 
       | (Even more) half-baked thoughts:
       | 
       | There might be more clever ways to do it too. I'm thinking
       | through half-baked thoughts on how to make files and tab
       | completion work. My half-baked thoughts are moving towards
       | something like:
       | 
       | wddbfs_SELECT * from Customer.tsv\, Employee.tsv WHERE
       | 
       | But I don't like all the potential bugs with escaping. I'm also
       | thinking about when output wants to go to the console versus into
       | a virtual table.
        
         | alexnewman wrote:
         | Don't people use NFS instead of Fuse now? I'd assume it would
         | have much better performance than webdav and it should handle a
         | bunch of the issues around writes`
        
           | blagie wrote:
           | I am moving outside of my zone of expertise, but when I last
           | looked (decades ago), systems like NSF assumed a filesystem
           | backing store and thinks about things like byte ranges. There
           | are a lot of operations which file systems support which work
           | very poorly when this is not true, such as seeks, memory-
           | mapped files, etc.
           | 
           | If I want the 50,004,123,121th byte of a file from a disk,
           | that's very fast. If I want the same for a virtual object
           | from an HTTP server, object store, virtual table, etc. it
           | literally involves creating the whole object, and stepping
           | through it byte-by-byte until I get there.
           | 
           | If the next request is doing the same 1k ahead, on a disk,
           | that's probably in cache, and if not, I can get there
           | quickly. If this was a SQL query, I probably need to redo the
           | whole thing.
           | 
           | You get a natural explosion from O(1) to O(n) in many common
           | cases, and for something like a complex SQL query, it can be
           | much, much worse.
        
             | m-hilgendorf wrote:
             | NFSv4 is stateful, and while it's true that the server has
             | to support stateless reads for bad clients, most clients
             | are ok with keeping the state ids that are returned after
             | an open. Whether you have to step through the object byte-
             | by-byte isn't inherent to the protocol, it depends on how
             | you've stored the data.
        
           | m-hilgendorf wrote:
           | I've implemented both in the last 6 months and the only
           | reason I would expect anyone to use NFS over FUSE is for
           | MacOS without macfuse/fuse-t. fuse-t is a cool project that
           | provides (iirc) the high level fuse interface (libfuse) via
           | NFS on MacOS. Compare to macfuse, which implements the low
           | level interface (/dev/fuse) via a kext.
           | 
           | FUSE is much, much easier to work with. The protocol (even
           | v4) should also be less chatty, and if performance is your
           | goal there are projects like ExtFuse that can move some
           | caching into the kernel via eBPF which I don't think would be
           | possible with NFS.
           | 
           | If you poke around other user space file systems (sshfs,
           | cephfs, objectivefs, etc) they're all FUSE.
        
         | chx wrote:
         | > There is no way to query all tables for something,
         | 
         | Ha. I would bet anything you could adopt my MySQL findall() to
         | SQLite for the purpose.
         | https://stackoverflow.com/a/68915413/308851
        
           | blagie wrote:
           | Nice! I hope I remember this when I next need it.
        
         | Paul-Craft wrote:
         | Huh. I guess I'm one of the few who's had generally good
         | experiences with FUSE, then. I've used several different FUSE
         | filesystems, ranging from disk-based, to network-oriented,
         | to... well, let's just say a bit odd, kind of fun, but probably
         | not particularly useful. I even wrote a couple of the latter
         | myself, just for funsies.
         | 
         | Oh well, there's _my_ FUSE rant for the day, I guess. Now, on
         | to what I came to comment on....
         | 
         | I'm indifferent on the WebDAV vs FUSE thing, actually. But if
         | the author wanted a nicer way to query a bunch of text files,
         | where each line would roughly represent what a db row would,
         | and there's whatever random delimiter character in there
         | (comma, tab, pipe, or even those weird ASCII control characters
         | nobody ever knows or remember exist for the purpose [0]), then
         | I think he gave in and went to a database too soon. Besides the
         | ubiquitous GNU textutils, there are tons of tools out there for
         | working with record-oriented CSV and TSV files. [1] There's
         | even `jq` if you're a real masochist and want to use json as
         | your text-based data storage, query, and interchange format.
         | 
         | Now you've got me wondering if I was lucky in not having any
         | significant issues with FUSE. But, as far as OP and the
         | technical problems they were facing, I think it was a little
         | premature to jump to `sqlite`. All those things they put down
         | to "clunkiness" really just makes me thing that `sqlite` +
         | WebDAV ended up being one of the members of the set of simple,
         | elegant, and wrong solutions most problems seem to have.
         | -\\_(tsu)_/-
         | 
         | ---
         | 
         | [0]: https://stackoverflow.com/a/18782271
         | 
         | [1]: https://stackabuse.com/running-sql-on-csv-data-data-
         | conversi...
        
           | blagie wrote:
           | I don't quite agree. There's a reason relational won, and
           | continues to win more than a half-century after it was
           | invented, despite no lack of wannabe usurpers.
           | 
           | For your model to work, what would be needed is the reverse:
           | a system which allows me to interact with TSV files with SQL.
           | 
           | For 98% of uses of SQLite, performance and storage efficiency
           | don't matter; we're talking about dozens or hundreds of
           | records. It doesn't need to be well-implemented from that
           | perspective. It does need to be robust. For example,
           | databases solve a lot of problems especially around
           | consistency in threaded systems which this would need to
           | handle as well (although a KISS solution, such as handling
           | all access from one thread / process, and simply blocking
           | other requests during transactions, would be fine).
           | 
           | Something like that would likely eat SQLite's lunch.
           | 
           | q seems close. It'd need to support writes, table creation,
           | and all the other stuff, perhaps be more actively maintained,
           | provide documented API (not just command-line) access, and
           | work with things like ORMs.
        
             | gkbrk wrote:
             | > a system which allows me to interact with TSV files with
             | SQL
             | 
             | ClickHouse and DuckDB can both do this. Pretty sure AWS has
             | some built-in stuff for this too.
        
               | blagie wrote:
               | I think those are a very different use case. sqlite is
               | used for things like app settings. Both of those are
               | heavyweight monsters designed for rather large data.
               | 
               | That said, I'm reading their documentation right now, and
               | I think one or the other might slot into a different use-
               | case I have where I do have rather large data. I'd last
               | looked at this space quite a number of years ago, and
               | both are way ahead of what I'd recalled.
        
             | samatman wrote:
             | > _For your model to work, what would be needed is the
             | reverse: a system which allows me to interact with TSV
             | files with SQL._
             | 
             | Several such systems exist, one of which is SQLite.
             | https://www.sqlite.org/csv.html
             | 
             | I don't know if there's a virtual table extension for _tab_
             | separated values, but SQLite imports them just fine.
        
         | hansvm wrote:
         | > WebDAV is a much better choice than FUSE. FUSE is a good
         | concept, but buggy and poorly-implemented. Things like sshfs
         | can break in very bad ways if e.g. there is a network
         | connectivity issue. Not a hack.
         | 
         | I have had problems with particular fuse implementations (sshfs
         | included, s3fs especially), but I've never had trouble writing
         | FUSE implementations that behave correctly. Is there something
         | deficient in the spec that you want to call out, or is the idea
         | just that WebDAV carefully built networking into its core
         | concepts, where you have to do a lot of extra work to squeeze
         | those ideas correctly into a FUSE implementation?
        
         | loeg wrote:
         | > WebDAV is a much better choice than FUSE.
         | 
         | That's a pretty spicy opinion. WebDAV is slow; FUSE works
         | great. Use interrupt mounts for things like sshfs (or NFS, for
         | that matter).
         | 
         | > Writes seem like a very bad idea. Keep those out unless you
         | come up with a clean way to handle them (which seems difficult
         | if not impossible given the differences in FS versus relational
         | abstractions, especially with regards to data validation).
         | 
         | You could imagine supporting appending new records in json
         | format to the json/jsonl files in a pretty clean way, but I
         | agree it seems well out of scope. And removing records seems
         | basically impossible.
        
         | justin66 wrote:
         | > Things like sshfs can break in very bad ways if e.g. there is
         | a network connectivity issue.
         | 
         | That's a peculiar critique of FUSE. You could write the same
         | thing about NFS or SMB (but especially NFS).
        
       | adius wrote:
       | I'm also working on something like this:
       | 
       | https://github.com/Airsequel/SQLiteDAV
       | 
       | My mapping is: table -> dir, row -> dir, cell -> file
        
         | zokier wrote:
         | I wonder if mapping index->directory would be the best match,
         | that way you could at least hypothetically reclaim some of the
         | SQL benefits, and the directory entries could have more natural
         | names. You'd need to have slightly different structure for
         | unique vs non-unique indices, but that seems like minor issue.
        
       | mharig wrote:
       | Nice. I think exposing the tables as csv, tsv, json & jsonl is to
       | much of a cluttering. Format should be a mount option.
        
         | pjerem wrote:
         | It is
        
       | jFriedensreich wrote:
       | I build a couchDB webdav server back in the day, you could also
       | edit json documents or file blobs directly. the problem i
       | discovered was that all OSes totally staled their webdav support
       | and there are also enough differences between oses to be
       | annoying. In the end to build somthing with great performance you
       | would also need to control the webdav client side and probably
       | build a fuse webdav client. I would have loved to see webdav
       | maturing and becoming what the 9P vision was just for the web,
       | but this obviously never happened as all the applications just
       | went into to the web and used rest intead of webdav and
       | everything else moved to sync protocols that sync to local
       | folders.
        
         | paul_h wrote:
         | I'm with you on wishing WebDAV continued its rollout. These
         | days there are great low-drama server-side deployments like
         | https://github.com/sigoden/dufs. It's run relative too - you
         | could habe multiple dufs processes serving up different
         | directories in different ways. But for WebDAV, you can't simply
         | mount that on the client side for every OS that's equally low
         | configutaion. For that reason, I really like sshfs as it can be
         | initiated from the client-side without a lot of config (just a
         | mkdir of the mapped dir), and it's OK most time despite it's
         | lack of speed and multi-day uptime. I'm on a chromebook now and
         | it turns out that Samba is the easiest client-side tech to use
         | for remote file systems. DAv should've been uniquitous.
        
       | nikeee wrote:
       | Does it support mounting a sqlar file?
        
       | loeg wrote:
       | Cute, seems legitimately useful, succinct. Not everything has to
       | be super technically challenging to be valuable. I can see how
       | this would be really handy.
        
         | zokier wrote:
         | I think its neat proof of concept, but I struggle to see any
         | case where this would be particularly useful. Or rather when
         | this would be more handy than what sqlite cli already offers.
         | 
         | like is this really meaningful improvement                   $
         | tail -n 3 Chinook_Sqlite.sqlite/Album.tsv
         | 
         | over this?                   $ sqlite3 -tabs
         | Chinook_Sqlite.sqlite 'select * from Album' | tail -n 3
        
       | Nican wrote:
       | > the SQL syntax for selecting a few records is much more verbose
       | than head -n or tail -n
       | 
       | I use DBeaver to inspect SQLite files, and to also work with
       | Postgres databases.
       | 
       | I kind of miss MySQL Workbench, but MySQL is pretty dead to me.
       | And SQL Server Management Studio is a relic that keeps being
       | updated.
       | 
       | I also sometimes make dashboards from SQLite files using Grafana,
       | but the time functions for SQLite are pretty bad.
        
       | genocidicbunny wrote:
       | Wasn't there an extension that let you mount a filesystem as a
       | table or db in sqlite? I wonder how far you can inception that.
       | Mount a db as a filesystem, then mount that filesystem as a
       | db..etc.
        
         | dvaun wrote:
         | Borrowing concepts from DB2 here
        
       | yellowapple wrote:
       | I was expecting this to be a way to mount so-called SQL Archives
       | (https://sqlite.org/sqlar.html) but this is just as cool.
        
       ___________________________________________________________________
       (page generated 2024-02-18 23:00 UTC)