[HN Gopher] Hosting SQLite databases on GitHub Pages or any stat...
___________________________________________________________________
Hosting SQLite databases on GitHub Pages or any static file hoster
Author : phiresky
Score : 608 points
Date : 2021-05-02 16:43 UTC (6 hours ago)
(HTM) web link (phiresky.github.io)
(TXT) w3m dump (phiresky.github.io)
| CyberRabbi wrote:
| Over high latency links this would be virtually unusable. Why not
| just download the entire database into memory over XHR on page
| load? SQLite databases of pure data usually aren't over 10MB in
| size.
| petters wrote:
| > SQLite databases of pure data usually aren't over 10MB in
| size
|
| Why do you think this?
| [deleted]
| mcculley wrote:
| That is hilariously wrong for a lot of use cases. I will find
| this very handy for some SQLite databases I have that are
| several GBs in size. I am looking right now at using this
| contribution.
| CyberRabbi wrote:
| It's not hilariously wrong or wrong at all that over high
| latency links this would be virtually unusable.
|
| It's certainly possible that people are using SQLite
| databases with sizes on the order of gigabytes but in my
| experience those are the exception not the rule.
| tehbeard wrote:
| Over high latency links most anything interesting is
| virtually unusable, so stop using it as a high horse to
| stand upon.
| simonw wrote:
| The demo here uses a 670MB database file.
| formerly_proven wrote:
| This particular demo page actually makes queries against an
| almost 700 MB large (fills one CD!) SQLite database. Because
| the amount of data read is almost negligible (few hundred kB),
| performance is limited by latency (as you say). However, high-
| latency links also tend to be slower, so downloading the entire
| database a-priori would almost always be much slower.
|
| For example, on a 1 megabit/s link with 300 ms RTT, one example
| would take about 2 seconds for the data transfer itself while
| spending another 3 seconds or so on waiting. Downloading the
| entire file would take around _an hour and a half_.
|
| For your 10 MB database, transferring it as a whole would take
| 80 seconds. Assuming this solution instead needs to read e.g.
| 250 kB (taking 2 seconds to transfer), it could still bounce
| around 250 times to the database before those 10 MB are fully
| downloaded. (This would be a really odd query, since it would
| only read on average two pages per read request)
| CyberRabbi wrote:
| Right but that is an artificially created demo by the author
| to justify the solution being presented (no offense). The
| question is how common are ~GB large SQLite databases in the
| real world relative to databases that are ~MB large?
|
| In my experience SQLite databases of millions of rows of raw
| tabular data tend to compress very well into dozens of
| megabytes. Indeed SQLite is often touted as a file format for
| applications.
| dkarras wrote:
| Even if you compress it, you need it in client memory which
| can also be a dealbreaker. Some people might need it for
| some projects, I know I did.
| CyberRabbi wrote:
| > you need it in client memory which can also be a
| dealbreaker
|
| Most workstations have GBs of available memory. If not
| you can dump it in indexeddb as a raw data store.
|
| I never disputed that it would be useful for some use
| cases. I only said it would be unusable with high latency
| links. If you have a low latency link and aren't running
| very complex queries with lots of random seeks, then this
| should work fine for you.
| rakoo wrote:
| I'm running magnetico
| (https://github.com/boramalper/magnetico) on my VPS. I
| currently have an index of 1.6M magnet links stored in a
| 5GB database.
|
| SQLite is most interesting not when the database is small,
| but when there are very few writes and all you do is
| reading. You can also look at https://datasette.io/ and see
| how SQLite is perfect for representing a lot of datasets
| and querying them
| brandmeyer wrote:
| My team has a few TB of data in SQLite files that are
| themselves dozens of GB each.
|
| We're using them as a replacement for leveldb's sstables,
| but with the structure of full SQL. It is highly effective.
| CyberRabbi wrote:
| Do you think your team's usage of SQLite is
| representative of the average SQLite user?
| detaro wrote:
| Where has it been suggested that this is the best
| solution for "the average SQLite user", instead of a tool
| you can use if it fits your requirements? To take your
| 10MB number, the article _starts_ by mentioning you can
| probably just download the entire thing if you aren 't
| above that exact same number.
| detaro wrote:
| And the article _starts_ by mentioning that you can download
| the entire file if it 's not too large. And then goes on to
| present a solution for larger files. What more answer to "Why
| not just download the entire database" do you expect?
| bob1029 wrote:
| Modeling the DOM in SQL... Further evidence that anything we can
| imagine has some stable representation in third normal form. Is
| it fast? Maybe not. But is it correct? Provably.
| crazygringo wrote:
| This is hilariously clever.
|
| Using the "Range" HTTP header to _read_ chunks of the database
| file absolutely works!
|
| But to be clear, there's no _write_ equivalent, is there? You can
| 't use "Range" with a PUT request.
| TimTheTinker wrote:
| TA was clear - there's no way to write, since static file
| hosting doesn't support dynamic write to begin with.
|
| However, I imagine a service to support your scenario could be
| written in a standard back-end server language like Go or JS.
| The challenges involved would be much greater, however -- how
| to handle concurrency in particular. I suspect one would do
| better to just run PostgreSQL behind a web API.
| csomar wrote:
| That's basically re-inventing the Database but on the client
| side. We have gone a long way but we are closer to having the
| server side just as a data store.
| ianlevesque wrote:
| This is easily the most clever web programming hack I've seen
| this year. Bravo. I had seen this used for video or audio of
| course but it never occurred to me you could use it for
| databases. There are probably a ton of other formats this is good
| for too.
| sroussey wrote:
| The innovation here is getting sql.js to use http and range
| requests for file access rather than all being in memory.
|
| I wonder when people using next.js will start using this for
| faster builds for larger static sites?
| jhgb wrote:
| Microsoft Access Cloud Edition, basically?
| tyingq wrote:
| Would also be great to add (efficient) search to a static blog.
| edoceo wrote:
| yea, sqlite FTS5 has been pretty amazing for quick search
| solutions (but I use english only)
| misterdata wrote:
| See also https://github.com/bittorrent/sqltorrent, same trick
| but using BitTorrent
| phiresky wrote:
| Yeah, that was one of the inspirations for this. That one
| does not work in the browser though, would be a good project
| to do that same thing but with sqlite in wasm and integrated
| with WebTorrent instead of a native torrent program.
|
| I actually did also implement a similar thing fetching data
| on demand from WebTorrent (and in turn helping to host the
| data yourself by being on the website):
| https://phiresky.github.io/tv-show-ratings/ That uses a
| protobufs split into a hashmap instead of SQLite though.
| westurner wrote:
| This looks pretty efficient. Some chains can be interacted
| with without e.g. web3.js? LevelDB indexes aren't SQLite.
|
| Datasette is one application for views of _read-only_
| SQLite dbs with out-of-band replication.
| https://github.com/simonw/datasette
|
| There are a bunch of *-to-sqlite utilities in corresponding
| dogsheep project.
|
| Arrow JS for 'paged' browser client access to DuckDB might
| be possible and faster but without full SQLite SQL
| compatibility and the SQLite test suite.
| https://arrow.apache.org/docs/js/
|
| https://duckdb.org/ :
|
| > _Direct Parquet & CSV querying_
|
| In-browser notebooks like Pyodide and Jyve have local
| filesystem access with the new "Filesystem Access API", but
| downloading/copying all data to the browser for every run
| of a browser-hosted notebook may not be necessary.
| https://web.dev/file-system-access/
| mvanaltvorst wrote:
| I'm curious, in what manner could this method speed up Next.js
| builds? That's all done locally, which negates the effect of
| HTTP range requests, right?
| city41 wrote:
| I'm guessing they mean rather than build a static Next site
| that generates 10k+ pages (or whatever large means in the
| given context), it instead creates one page that just queries
| the data from the client.
|
| I have one Next static site that has about 20k pages and
| takes about 20 minutes to build and deploy. I think that's an
| acceptable build time. But I do know of other people around
| the net who have mentioned having sites with 20k-ish pages
| taking an hour+ to build. For them I could see the desire to
| try this sqlite trick.
| nindalf wrote:
| You should write a post about this if you implement it. My
| humble suggestion for blog post title - 'Speed up your
| Next.js builds with this author's one weird trick! Vercel
| hates him!'
| modeless wrote:
| Super cool. I wonder if an approach like this could ever work for
| a writable database. I guess the issue is that you couldn't have
| fine grained access control without a whole lot of server side
| validation, at which point you might as well just run a regular
| database server.
| arafalov wrote:
| Amazing, especially - for me - that the FTS5 full-text search
| just works. Longer term, I am if it were possible to split the DB
| code into read and write parts and cross-compile only read part
| for delivery to the browser.
| rzzzt wrote:
| If you are interested in full-text search on the client, Lunr
| is also an option: https://lunrjs.com/docs/index.html
| tehbeard wrote:
| Lunr needs the full index on client though, right?
|
| Being able to use fts-5 without the penalty of having to pull
| down the whole index make it work much better at larger
| scales, even with the penalty of additional network requests.
| formerly_proven wrote:
| This could be genuinely interesting for tools like e.g.
| sphinx-doc, which currently has a client-side search that
| does indeed ship the entire index to the client.
| tehbeard wrote:
| There's probably a dataset size tradeoff with small
| enough number of documents, but even then this could have
| the option of if db is < x MB total, fetch all in async
| task and then use that.
| cozzyd wrote:
| If you just want static data, using jsroot or jsfive might be a
| better option.
| BMorearty wrote:
| This is fantastically creative. And the author does a great job
| of starting out by describing why this is useful.
|
| And then using SQLite to insert and update DOM elements? Holy
| cow, icing on the cake. Unlike the first part, there's no
| explanation of why you'd want to do that. But by that point I was
| so drawn in that I didn't care and was just enjoying the ride.
| phiresky wrote:
| Yeaah I felt like at that point the article was already long
| enough so I didn't bother describing the DOM part too much -
| even though I spent more time implementing that than I did
| implementing the rest ;)
|
| Basically SQLite has a virtual table mechanism [1] where you
| have to define a few functions that figure out how to scan your
| "fake" table / which indices to use and then how to read /
| write the actual data. I hook into this mechanism and redirect
| the request to DOM functions like querySelector() etc. Then
| there's the issue about SQLite being fully synchronous, but I
| have to run it in a WebWorker - and the WebWorker can't
| actually access the DOM and it can only communicate
| asynchronously with the main thread... So I have to do some
| weird stuff with SharedArrayBuffer and Atomics.wait to make
| that work [2].
|
| [1] https://www.sqlite.org/vtab.html [2]
| https://github.com/phiresky/sql.js-httpvfs/blob/master/src/v...
| BMorearty wrote:
| MY GOD. You are my hero.
| simonw wrote:
| Pop open the network pane in the browser tools and try running
| this SQL query for a demo of how clever this is:
| select country_code, long_name from wdi_country order by
| rowid desc limit 100
|
| It fetches just 54.2KB of new data (across 49 small HTTP
| requests) to return 100 results - from a statically hosted
| database file that's 668.8MB!
|
| I have an animated GIF demo of this here:
| https://twitter.com/simonw/status/1388933800445452290
| WJW wrote:
| So assuming no country has a name longer than 98 characters and
| that all country codes are 2 characters, that is over 500%
| overhead? Are you missing a /s in your post?
| phiresky wrote:
| Since random accesses across the internet are really slow,
| for this kind of fairly small table (where SQLite stores the
| row data inline within the B-Tree of the table) it basically
| fetches the whole content for each row - so even if you query
| only the long_name and country_code column, it will in fact
| fetch the data of all 29 columns in that table.
|
| If you want it to fetch less data for querying a subset of
| columns, you could create create an index on those columns -
| then SQLite will do an COVERING INDEX scan and thus read only
| the necessary data (with the B-Tree itself and the start /
| end page alignment being the only overhead).
| cookguyruffles wrote:
| Nothing to add to the conversation, just wanted to say I
| absolutely adore this. Years ago I used to think it'd be
| nice if search engines simply exposed all their shards to
| clients and let clients do all the results merging and
| suchlike. Of course that's probably a terrible idea in
| practice for performance, but this library is definitely
| implementing the spirit in a very practical way!
|
| It also reminded me of the inverse of this hack. In old
| versions of Qemu (possibly it is still implemented, but I
| have vague memories it got ripped out), you could point
| Qemu at a directory on disk and it'd produce an emulated
| floopy disk drive with a virtual FAT12 image containing the
| directory contents. AFAIK it didn't keep the actual data in
| memory, I guess all it needed was file sizes to know how to
| build a virtual bytearray that contained the filesystem
| metadata + proxied reads from the underlying files for data
| sectors. I look forward to seeing your implementation of
| this concept in a virtualized SQLite file <-> GraphQL proxy
| ;)
| simonw wrote:
| The clever part here is only fetching 50KB out of 680MB.
| ajfjrbfbf wrote:
| > sql.js only allows you to create and read from databases
| that are fully in memory though - so I implemented a virtual
| file system that fetches chunks of the database with HTTP
| Range requests when SQLite tries to read from the filesystem:
| sql.js-httpvfs. From SQLite's perspective, it just looks like
| it's living on a normal computer with an empty filesystem
| except for a file called /wdi.sqlite3 that it can read from.
|
| From this paragraph it should be pretty clear that it's
| actually a great result. The database will obviously need to
| read more data than it presents, so more is fetched.
| watermelon0 wrote:
| Have you actually read the article? SQLite is unmodified, and
| thinks it runs on a virtual file system, which fetches file
| chunks via HTTP range headers.
|
| It's REALLY impressive that you only need to read 54 KB out
| of 700 MB, to fetch the records.
| OOPMan wrote:
| It's impressive on one hand.
|
| On the other it's still a lot of overhead.
| throwawayboise wrote:
| Do most static site hosters support range requests?
| bbkane wrote:
| Would it be possible to use a datasette frontend with this as a
| "backend" and statically host the whole thing?
| simonw wrote:
| Not easily - Datasette is written in Python, so you could try
| running it in WebAssembly (like Mozilla did for Jupyter with
| https://hacks.mozilla.org/2019/03/iodide-an-experimental-
| too... ) but it would be enough work that it might be easier
| to reimplement a subset of Datasette directly in JavaScript.
| bbkane wrote:
| Thank you!
| phiresky wrote:
| Huh, that's actually kind of a worst case I didn't think about:
| Since you're doing a reverse table scan my "sequential access"
| detection doesn't kick in. If you do the same query but with a
| forward scan it should fetch roughly the same amount of data
| but only do like 5 HTTP requests since the request size doubles
| for every sequential access.
|
| e.g.:
|
| select country_code, long_name from wdi_country where rowid >=
| 164 order by rowid asc limit 100;
| vladf wrote:
| I solved a similar problem recently: given a stream of data,
| how should you choose packet size in an online way to
| minimize regret (a linear combination of spare capacity of
| last packet and total packets used).
|
| Turns out doubling isn't the best strategy. The optimal
| solution is actually to add a constant increment to packet
| size. How much depends on relative cost of the terms in the
| regret function.
| brandmeyer wrote:
| > I've set the page size to 1 KiB for this database.
|
| > That's because I implemented a pre-fetching system that
| tries to detect access patterns through three separate
| virtual read heads and exponentially increases the request
| size for sequential reads.
|
| > Since you're doing a reverse table scan my "sequential
| access" detection doesn't kick in.
|
| You know, starting off with the default 4kB page size
| naturally adds some resistance to these kinds of failure
| cases. If the VFS isn't issuing many requests in parallel, I
| would think that setting up a page size near target_bandwidth
| * round_trip_time would be a better initial guess. 1kB would
| be appropriate for a pretty low latency-bandwidth product.
| phiresky wrote:
| That's true, but it also means that random access will
| always use at least that amount of data even if it only has
| to fetch a tiny amount. I did a few (non-scientific)
| benchmarks on a few queries and 1kB seemed like an OK
| compromise.
|
| And note that the request chunk size is bound to the SQLite
| page size, and to change that page size you have to rewrite
| the whole DB. So it can't be set on the fly unless you have
| multiple copies of the database.
| pjc50 wrote:
| 1kb fits in most IP MTU sizes, so that seems reasonable.
| simonw wrote:
| Hah, I thought "in reverse order by ID" might be a stress
| test but I was still very impressed by how it performed!
| Gehinnn wrote:
| TL;DR: Compile SQLite to JS with emscripten, implement a virtual
| fs in JS to stream chunks of a statically hosted (readonly) SQL
| database.
|
| If queries make use of indices, only a fraction of the database
| needs to be downloaded.
|
| Also, you can use SQLite to query the DOM.
| [deleted]
| pudmaidai wrote:
| I can't figure out exactly how it knows which chunk to
| download. Does it always download the whole index first? Or
| does it include it in the built JS file itself?
| fulafel wrote:
| B-Tree indexes are designed to work like this, to require a
| low number of IO operations. The index contains pointers to
| other places in the index.
| pudmaidai wrote:
| So the answer is "yes, it has to download the index first"?
| None of these comments answer my question.
| rakoo wrote:
| Everything in SQLite is stored in B-Trees. Data or
| indexes. So you don't need to download the whole index
| first; you only need to download the necessary pages of
| the trees to access data, whether it's part of an index
| or actual data
| tehbeard wrote:
| It just needs to download the pages (x KB chunks) to
| traverse from the header to the particular index and
| table schema, a fraction of the whole data.
| detaro wrote:
| At least the information describing what tables and
| indices there are and where to find them - and then it
| gets what it needs once a query is run. Just like sqlite
| would if running from a local file on disk.
| phiresky wrote:
| The B-Tree is a tree that in this case is perfectly
| balanced. So if you do a query with an index in a
| database it will fetch an logarithmic amount of data from
| the index and then a constant amount of data from the
| table.
|
| For the example the wdi_data table is 300MB and an index
| on it is 100MB in size. This index has a tree depth of 4
| - which means SQLite has to read exactly 4 pages (4KiB)
| to get to the bottom of it and find the exact position of
| the actual row data.
|
| you can check the depth of the b-trees with
| `sqlite3_analyzer`.
| formerly_proven wrote:
| SQLite has runtime-pluggable VFS support, i.e. you give it a
| struct with functions for opening a file, reading some bytes,
| writing some bytes, synchronizing file contents, closing a
| file. This project provides such a VFS module, that, because
| it actually runs in the browser, performs HTTP requests to
| read data. Emscripten provides a way to run a mix of C/C++
| code in the same environment as some JavaScript code inside
| the browser. The reason SQLite has this pluggable VFS support
| is to properly support embedded systems, different locking
| APIs, and things like database encryption.
|
| https://www.sqlite.org/vfs.html
| cookguyruffles wrote:
| Both the index and table data are btrees. These are trees -
| the root node sits in some known location (offset) in the
| file, referenced by the file header and metadata. As SQLite
| traverses the tree, it encounters new descendents it would
| like to visit, presumably identified by their byte offset in
| the file, which is all needed for this VFS magic to issue a
| suitable range request.
|
| - SQlite opens the file and reads 4kb worth of header ->
| range request for byte 0-4096
|
| - headers/metadata refers to index table with root node at
| 8192kb
|
| - user issues SELECT * from index WHERE name = 'foo'
|
| - SQLite reads root node from the file (range request for
| 8192kb..)
|
| - Root node indicates left branch covers 'foo'. Left branch
| node at address 12345kb
|
| - Fetch left branch (range request for 12345kb)
|
| - New node contains an index entry for 'foo', row 55 of data
| page at 919191kb
|
| - SQLite reads data page (range request for 91919191kb..)
|
| etc etc etc
| mvanaltvorst wrote:
| That's part of SQLite. It has been optimised to reduce disk
| reads, because those can be slow on spinning hard drives.
| Coincidentally, this translates well into an optimised
| algorithm that minimises the amount of HTTP range requests to
| make.
| diveanon wrote:
| I am a huge fan of sqlite and will definitely be giving this a
| shot in the near future.
|
| Incredible work.
|
| I see myself using this in conjunction with a conventionally
| hosted pg db for dynamic content.
| rjeli wrote:
| Nice. Reminds me of Bellard's https://vfsync.org/ . It's file
| system pages served over http, to back his jslinux browser VMs
| [deleted]
| bilater wrote:
| This is awesome. Got my wheels turning :)
| dreix wrote:
| Over the last few months I tried to think of a clever way to set
| up a legacy site for a dictionary that I serve on a VM just
| because I also need to run sqlite. Since I want to make sure
| it'll run for longer than me paying for the VM this is the best
| possible solution. At some point no more updates will happen and
| it's going to be a static website. So bundling it like this is
| incredible. I can run multiple backups on different hosts with no
| additional costs.
| simonw wrote:
| Using HTTP range requests like this is just SO clever.
| laurencerowe wrote:
| In the genomics world, Tabix indices enables similar use cases.
| An ordered TSV file is compressed in chunks (bgzip) and a Tabix
| index created to allow range based access by mapping from the
| index -> chunk. This allows a browser based genome browser zoomed
| into a section of the genome to fetch information from a multi
| gigabyte file.
|
| http://www.htslib.org/doc/tabix.html
| Ultimatt wrote:
| now if only tabix and most hokey bioinformatics formats would
| die and just be replaced with a formal schema spec in SQLite...
| laurencerowe wrote:
| Honestly I think Tabix's bgzipped TSV is one of the less
| hokey bioinformatics formats, at least compared to the
| various custom binary formats floating around.
|
| For web browser based genome browsers I suspect this (very
| cool!) sqlite hack would require many more http requests.
| spamalot159 wrote:
| This is really cool! I wonder what the restrictions are and if we
| would ever be able to write to a SQLite db like this in the
| future. This could push more to the front end without needing to
| write apis.
| phiresky wrote:
| The main restriction is that the DB really needs well fitting
| indexes, otherwise querying is really slow and fetches a lot of
| data.
|
| Regarding writing:
|
| You could of course implement a writing API with POST requests
| for changing pages of the database - but then you would lose
| most of the benefits of this (not requiring any special kind of
| server).
|
| I also thought about implementing a kind of overlay filesystem,
| where chunks that are written to the file are stored in a local
| storage so the modified data is available locally while still
| reading everything else from the remote database.
|
| Interestingly in SQLite that's already exactly what the WAL
| mode does: It's a second file next to the database that's just
| a set of pages that are overlaid over the main file when read
| queries happen - which allows concurrent readers and writers
| since the database itself isn't in an undefined state even when
| write transactions are happening.
|
| So you could enable WAL mode and disable WAL auto
| checkpointing, then you get a downloadable WAL file that can be
| read by normal SQLite and written back to the main file. It
| would be neat, but I'm not sure what the actual use case would
| be ;)
| WrtCdEvrydy wrote:
| Since you can do static hosting from a git repo, I wonder if
| you could directly push your changes to your git repo and
| have your CI/CD solution just deploy it instead?
|
| There has to be a git.js implementation out there and you
| could move the DB to it's own repo and create an https access
| token (for Github)... the issue there is that someone could
| use that token to commit whatever to your database repo.
| wh33zle wrote:
| Maybe forcing the user to login with GitHub would be an
| option? And the changes to the database could be modelled
| as pull requests?
| stagas wrote:
| > I also thought about implementing a kind of overlay
| filesystem, where chunks that are written to the file are
| stored in a local storage so the modified data is available
| locally while still reading everything else from the remote
| database.
|
| Perhaps adding IPFS to the mix for persisting data would be
| interesting, I'm sure there are use cases in peer to peer
| applications. Anyway, amazing innovation thank you for
| writing this :)
| BlueTemplar wrote:
| This seems cool, but doesn't Fossil (from the SQLite author)
| already do it "out of the box" ?
|
| https://www.fossil-scm.org/home/doc/trunk/www/index.wiki
| detaro wrote:
| Fossil runs a server application.
| wheybags wrote:
| This is an really awesome idea! I have a plan for a static-ish
| site (updated daily) that I was going to use sqlite for anyway,
| but server side. I will definitely look into this approach
| instead!
| drtournier wrote:
| Solution works really well to databases which will not be updated
| frequently, like a standalone site.
|
| Although one should be aware of one very important git behavior -
| git does not diff binary files (like SQLite dbs). That means 2
| things:
|
| 1. Each db update will generate a new file in git, maintaining
| the whole old file in history, instead of the diff in bytes. This
| will accumulate a lot of clutter in the repo
|
| 2. As git does not diff binaries, there is a very small risk of
| corruption (especially if you work in multiple OSs, because of
| CRLF)
|
| Ref - https://robinwinslow.uk/dont-ever-commit-binary-files-to-
| git
| phiresky wrote:
| Git does actually diff binaries and stores them very
| efficiently :) If you do a single small UPDATE in your db file
| git will only store the changed information. It's just kinda
| slow, and for most binary files the effort git spends to try
| and compute deltas for binary files is useless - which is why
| it has a bad reputation for binary files.
|
| Note that the diffs that git shows you are completely unrelated
| to the deltas it uses to compress it's database - which are
| always "binary deltas" and not line-based diffs.
|
| Also I'm not sure why you mean that db corruption possibility
| has something to do with whether or not it stores diffs?
| laszlokorte wrote:
| Minor nitpick: Git does not store diffs for any file format but
| always the full file for each version. So it does not really
| matter that its binary (except for not being able to VIEW the
| diff, but I guess you could even implement a plugin for that)
| but just that it's a big file. Even a huge text file would be
| fully stored per version.
|
| /edit: The sibling comments mentions that git can infact delta
| compress older commits for storage efficency. But my point was
| that git commits are not deltas but full snapshots.
| tehbeard wrote:
| You are only gonna encounter corruption if u either a) messed
| up the gitconfig for line endings or b) named the database
| mydbfile.txt
| xucheng wrote:
| You can just store the database in text format (e.g. csv) in
| the git and turn it to SQLite db when building the website.
| claytongulick wrote:
| Incredibly clever use of paging and http range queries! Well
| done.
| banana_giraffe wrote:
| In a similar vein, I've mentioned this before, but if you're
| doing Python stuff, you can use the apsw package (not the one in
| PyPi, though) to write a VFS layer that SQLite will use to read
| the database.
|
| I've used this for the same basic idea as this article, only
| letting me store SQLite databases in AWS's S3 that I can access
| with AWS APIs so they don't need to be public. It works well,
| though it's absolutely not for every use case, the overhead is
| considerable.
|
| I even used it once to read SQLite database files in a zip file
| stored in S3 without having any local storage to use. Not one of
| my prouder moments, but hey, I coded my way out of the corner
| someone else designed for me.
| tlb wrote:
| The question I had is answered by this line of code:
| xhr.setRequestHeader("Range", "bytes=" + from + "-" + to);
|
| I am a little surprised you can just do that. In
| https://github.com/phiresky/sql.js-httpvfs/blob/master/src/l...
| yread wrote:
| I've learnt about this by using https://www.biodalliance.org
|
| It's an embedded genome viewer, you can just point it at a
| multigigabyte reference files and .seg files and it loads super
| quick
| punnerud wrote:
| Here is direct link to GitHub with the usage: https://github.
| com/dasmoth/dalliance/search?q=bytes%3D%27&ty...
| crazygringo wrote:
| Not all webservers support/enable it, so YMMV.
|
| But as long as you're dealing with a known server that does,
| then gravy!
| dathinab wrote:
| Range headers are a pretty standard tools to e.g. continue
| interrupted downloads and similar.
|
| Any well designed system, especially if it has static sources
| and is server cached _should_ support it.
|
| Surprisingly many web-frameworks don't support it out of the
| box, or don't support it well.
|
| Either way gh-pages are static content and probably with some
| server side regional caches, so I'm not surprised it works.
| villasv wrote:
| Wicket stuff, loved it. Not sure if this will ever be "production
| ready" or merged by sql.JS-like projects but cool proof of
| concept.
| ponytech wrote:
| This is truly awesome. Very clever.
___________________________________________________________________
(page generated 2021-05-02 23:00 UTC)