[HN Gopher] Wddbfs - Mount a SQLite database as a filesystem
___________________________________________________________________
Wddbfs - Mount a SQLite database as a filesystem
Author : vitplister
Score : 233 points
Date : 2024-02-18 09:15 UTC (13 hours ago)
(HTM) web link (adamobeng.com)
(TXT) w3m dump (adamobeng.com)
| account-5 wrote:
| This looks pretty cool.
| pletnes wrote:
| Wrap in an SSH tunnel and you can do some fun stuff over the
| network, too.
| rickette wrote:
| Over the network (when using azure or gcp storage) there's also
| https://sqlite.org/cloudsqlite/doc/trunk/www/index.wiki which
| doesn't require loading everything in memory.
| eternauta3k wrote:
| Why webdav instead of making short sqltocsv, sqltojson, etc
| scripts? You could make completion work (with some enormous time
| investment).
| dmd wrote:
| Because those all exist already, and this is cool?
| renonce wrote:
| > Although for now, the whole table gets read into memory for
| every read so this won't work well for very large database files.
| There's also no write support... yet.
|
| At such a set of features I would prefer a tool that converts
| databases to a directory of real csv and jsonl files, at least
| there are no performance issues to worry about
| AnyTimeTraveler wrote:
| I really like the idea of running a watch on an sqlite table with
| my common cli tools. If I understand correctly, it is being re-
| read on every request. Does that mean that changes to the sqlite
| database will be visible on the next read of a csv file?
| blagie wrote:
| This seems really nice!
|
| If this is posted by the author looking for feedback:
|
| 1) WebDAV is a much better choice than FUSE. FUSE is a good
| concept, but buggy and poorly-implemented. Things like sshfs can
| break in very bad ways if e.g. there is a network connectivity
| issue. Not a hack.
|
| 2) Writes seem like a very bad idea. Keep those out unless you
| come up with a clean way to handle them (which seems difficult if
| not impossible given the differences in FS versus relational
| abstractions, especially with regards to data validation). Not a
| limitation.
|
| In other words, the "hacks" seem like design choices a good
| architect would likely have made. Continuing:
|
| 3) The major use-case I have is if I have a small (<1MB)
| database, and don't know the structure. Lots of tools use small
| sqlite databases. There is no way to query all tables for
| something, whereas tools like `find` and `grep` can look through
| all files. I was recently trying to recover some lost data, and
| it was a pain to find it.
|
| 4) I think a major theoretical question is how to fuse the two
| models. I would like to be able to do 'generic' things like the
| above on databases, while still being able to be relational.
|
| 5) I don't have an answer to the above, but perhaps natural first
| step might be to allow something like queries or virtual tables
| to sit on the file system:
|
| wddbfs --anonymous --db-
| path=/path/to/an/example/database/like/Chinook_Sqlite.sqlite
|
| wddbfs_query myjoin "SELECT * FROM table_1, table_2 WHERE
| table_1.id=table_2.id"
|
| And voila! A /virtual/myjoin.csv file pops up.
|
| (Even more) half-baked thoughts:
|
| There might be more clever ways to do it too. I'm thinking
| through half-baked thoughts on how to make files and tab
| completion work. My half-baked thoughts are moving towards
| something like:
|
| wddbfs_SELECT * from Customer.tsv\, Employee.tsv WHERE
|
| But I don't like all the potential bugs with escaping. I'm also
| thinking about when output wants to go to the console versus into
| a virtual table.
| alexnewman wrote:
| Don't people use NFS instead of Fuse now? I'd assume it would
| have much better performance than webdav and it should handle a
| bunch of the issues around writes`
| blagie wrote:
| I am moving outside of my zone of expertise, but when I last
| looked (decades ago), systems like NSF assumed a filesystem
| backing store and thinks about things like byte ranges. There
| are a lot of operations which file systems support which work
| very poorly when this is not true, such as seeks, memory-
| mapped files, etc.
|
| If I want the 50,004,123,121th byte of a file from a disk,
| that's very fast. If I want the same for a virtual object
| from an HTTP server, object store, virtual table, etc. it
| literally involves creating the whole object, and stepping
| through it byte-by-byte until I get there.
|
| If the next request is doing the same 1k ahead, on a disk,
| that's probably in cache, and if not, I can get there
| quickly. If this was a SQL query, I probably need to redo the
| whole thing.
|
| You get a natural explosion from O(1) to O(n) in many common
| cases, and for something like a complex SQL query, it can be
| much, much worse.
| m-hilgendorf wrote:
| NFSv4 is stateful, and while it's true that the server has
| to support stateless reads for bad clients, most clients
| are ok with keeping the state ids that are returned after
| an open. Whether you have to step through the object byte-
| by-byte isn't inherent to the protocol, it depends on how
| you've stored the data.
| m-hilgendorf wrote:
| I've implemented both in the last 6 months and the only
| reason I would expect anyone to use NFS over FUSE is for
| MacOS without macfuse/fuse-t. fuse-t is a cool project that
| provides (iirc) the high level fuse interface (libfuse) via
| NFS on MacOS. Compare to macfuse, which implements the low
| level interface (/dev/fuse) via a kext.
|
| FUSE is much, much easier to work with. The protocol (even
| v4) should also be less chatty, and if performance is your
| goal there are projects like ExtFuse that can move some
| caching into the kernel via eBPF which I don't think would be
| possible with NFS.
|
| If you poke around other user space file systems (sshfs,
| cephfs, objectivefs, etc) they're all FUSE.
| chx wrote:
| > There is no way to query all tables for something,
|
| Ha. I would bet anything you could adopt my MySQL findall() to
| SQLite for the purpose.
| https://stackoverflow.com/a/68915413/308851
| blagie wrote:
| Nice! I hope I remember this when I next need it.
| Paul-Craft wrote:
| Huh. I guess I'm one of the few who's had generally good
| experiences with FUSE, then. I've used several different FUSE
| filesystems, ranging from disk-based, to network-oriented,
| to... well, let's just say a bit odd, kind of fun, but probably
| not particularly useful. I even wrote a couple of the latter
| myself, just for funsies.
|
| Oh well, there's _my_ FUSE rant for the day, I guess. Now, on
| to what I came to comment on....
|
| I'm indifferent on the WebDAV vs FUSE thing, actually. But if
| the author wanted a nicer way to query a bunch of text files,
| where each line would roughly represent what a db row would,
| and there's whatever random delimiter character in there
| (comma, tab, pipe, or even those weird ASCII control characters
| nobody ever knows or remember exist for the purpose [0]), then
| I think he gave in and went to a database too soon. Besides the
| ubiquitous GNU textutils, there are tons of tools out there for
| working with record-oriented CSV and TSV files. [1] There's
| even `jq` if you're a real masochist and want to use json as
| your text-based data storage, query, and interchange format.
|
| Now you've got me wondering if I was lucky in not having any
| significant issues with FUSE. But, as far as OP and the
| technical problems they were facing, I think it was a little
| premature to jump to `sqlite`. All those things they put down
| to "clunkiness" really just makes me thing that `sqlite` +
| WebDAV ended up being one of the members of the set of simple,
| elegant, and wrong solutions most problems seem to have.
| -\\_(tsu)_/-
|
| ---
|
| [0]: https://stackoverflow.com/a/18782271
|
| [1]: https://stackabuse.com/running-sql-on-csv-data-data-
| conversi...
| blagie wrote:
| I don't quite agree. There's a reason relational won, and
| continues to win more than a half-century after it was
| invented, despite no lack of wannabe usurpers.
|
| For your model to work, what would be needed is the reverse:
| a system which allows me to interact with TSV files with SQL.
|
| For 98% of uses of SQLite, performance and storage efficiency
| don't matter; we're talking about dozens or hundreds of
| records. It doesn't need to be well-implemented from that
| perspective. It does need to be robust. For example,
| databases solve a lot of problems especially around
| consistency in threaded systems which this would need to
| handle as well (although a KISS solution, such as handling
| all access from one thread / process, and simply blocking
| other requests during transactions, would be fine).
|
| Something like that would likely eat SQLite's lunch.
|
| q seems close. It'd need to support writes, table creation,
| and all the other stuff, perhaps be more actively maintained,
| provide documented API (not just command-line) access, and
| work with things like ORMs.
| gkbrk wrote:
| > a system which allows me to interact with TSV files with
| SQL
|
| ClickHouse and DuckDB can both do this. Pretty sure AWS has
| some built-in stuff for this too.
| blagie wrote:
| I think those are a very different use case. sqlite is
| used for things like app settings. Both of those are
| heavyweight monsters designed for rather large data.
|
| That said, I'm reading their documentation right now, and
| I think one or the other might slot into a different use-
| case I have where I do have rather large data. I'd last
| looked at this space quite a number of years ago, and
| both are way ahead of what I'd recalled.
| samatman wrote:
| > _For your model to work, what would be needed is the
| reverse: a system which allows me to interact with TSV
| files with SQL._
|
| Several such systems exist, one of which is SQLite.
| https://www.sqlite.org/csv.html
|
| I don't know if there's a virtual table extension for _tab_
| separated values, but SQLite imports them just fine.
| hansvm wrote:
| > WebDAV is a much better choice than FUSE. FUSE is a good
| concept, but buggy and poorly-implemented. Things like sshfs
| can break in very bad ways if e.g. there is a network
| connectivity issue. Not a hack.
|
| I have had problems with particular fuse implementations (sshfs
| included, s3fs especially), but I've never had trouble writing
| FUSE implementations that behave correctly. Is there something
| deficient in the spec that you want to call out, or is the idea
| just that WebDAV carefully built networking into its core
| concepts, where you have to do a lot of extra work to squeeze
| those ideas correctly into a FUSE implementation?
| loeg wrote:
| > WebDAV is a much better choice than FUSE.
|
| That's a pretty spicy opinion. WebDAV is slow; FUSE works
| great. Use interrupt mounts for things like sshfs (or NFS, for
| that matter).
|
| > Writes seem like a very bad idea. Keep those out unless you
| come up with a clean way to handle them (which seems difficult
| if not impossible given the differences in FS versus relational
| abstractions, especially with regards to data validation).
|
| You could imagine supporting appending new records in json
| format to the json/jsonl files in a pretty clean way, but I
| agree it seems well out of scope. And removing records seems
| basically impossible.
| justin66 wrote:
| > Things like sshfs can break in very bad ways if e.g. there is
| a network connectivity issue.
|
| That's a peculiar critique of FUSE. You could write the same
| thing about NFS or SMB (but especially NFS).
| adius wrote:
| I'm also working on something like this:
|
| https://github.com/Airsequel/SQLiteDAV
|
| My mapping is: table -> dir, row -> dir, cell -> file
| zokier wrote:
| I wonder if mapping index->directory would be the best match,
| that way you could at least hypothetically reclaim some of the
| SQL benefits, and the directory entries could have more natural
| names. You'd need to have slightly different structure for
| unique vs non-unique indices, but that seems like minor issue.
| mharig wrote:
| Nice. I think exposing the tables as csv, tsv, json & jsonl is to
| much of a cluttering. Format should be a mount option.
| pjerem wrote:
| It is
| jFriedensreich wrote:
| I build a couchDB webdav server back in the day, you could also
| edit json documents or file blobs directly. the problem i
| discovered was that all OSes totally staled their webdav support
| and there are also enough differences between oses to be
| annoying. In the end to build somthing with great performance you
| would also need to control the webdav client side and probably
| build a fuse webdav client. I would have loved to see webdav
| maturing and becoming what the 9P vision was just for the web,
| but this obviously never happened as all the applications just
| went into to the web and used rest intead of webdav and
| everything else moved to sync protocols that sync to local
| folders.
| paul_h wrote:
| I'm with you on wishing WebDAV continued its rollout. These
| days there are great low-drama server-side deployments like
| https://github.com/sigoden/dufs. It's run relative too - you
| could habe multiple dufs processes serving up different
| directories in different ways. But for WebDAV, you can't simply
| mount that on the client side for every OS that's equally low
| configutaion. For that reason, I really like sshfs as it can be
| initiated from the client-side without a lot of config (just a
| mkdir of the mapped dir), and it's OK most time despite it's
| lack of speed and multi-day uptime. I'm on a chromebook now and
| it turns out that Samba is the easiest client-side tech to use
| for remote file systems. DAv should've been uniquitous.
| nikeee wrote:
| Does it support mounting a sqlar file?
| loeg wrote:
| Cute, seems legitimately useful, succinct. Not everything has to
| be super technically challenging to be valuable. I can see how
| this would be really handy.
| zokier wrote:
| I think its neat proof of concept, but I struggle to see any
| case where this would be particularly useful. Or rather when
| this would be more handy than what sqlite cli already offers.
|
| like is this really meaningful improvement $
| tail -n 3 Chinook_Sqlite.sqlite/Album.tsv
|
| over this? $ sqlite3 -tabs
| Chinook_Sqlite.sqlite 'select * from Album' | tail -n 3
| Nican wrote:
| > the SQL syntax for selecting a few records is much more verbose
| than head -n or tail -n
|
| I use DBeaver to inspect SQLite files, and to also work with
| Postgres databases.
|
| I kind of miss MySQL Workbench, but MySQL is pretty dead to me.
| And SQL Server Management Studio is a relic that keeps being
| updated.
|
| I also sometimes make dashboards from SQLite files using Grafana,
| but the time functions for SQLite are pretty bad.
| genocidicbunny wrote:
| Wasn't there an extension that let you mount a filesystem as a
| table or db in sqlite? I wonder how far you can inception that.
| Mount a db as a filesystem, then mount that filesystem as a
| db..etc.
| dvaun wrote:
| Borrowing concepts from DB2 here
| yellowapple wrote:
| I was expecting this to be a way to mount so-called SQL Archives
| (https://sqlite.org/sqlar.html) but this is just as cool.
___________________________________________________________________
(page generated 2024-02-18 23:00 UTC)