[HN Gopher] SQLite Internals: Pages and B-trees
___________________________________________________________________
SQLite Internals: Pages and B-trees
Author : eatonphil
Score : 433 points
Date : 2022-07-27 13:55 UTC (9 hours ago)
(HTM) web link (fly.io)
(TXT) w3m dump (fly.io)
| nathell wrote:
| I dream of a SQLite-like embeddable database engine based on
| Datomic's data model and queryable with Datalog. Written in
| something like C, Rust, or Zig. I'm toying around with the idea
| of hacking something up, but it'll likely stay in the dream realm
| until I have Heaps of Free Time on My Hands (tm).
|
| Looking into SQLite's innards is a great source of inspiration.
| Thanks for this post.
| user5678 wrote:
| klysm wrote:
| I'm surprised something like this doesn't exist yet - I wonder
| if it's possible to build it on top of SQLite somehow?
| packetlost wrote:
| I tried. It's not easy because of how limiting SQLites
| indexes are. You have to build your own indexes using
| `TRIGGER`s or in a software wrapper and tables.
|
| You can see me prototype here:
| https://git.sr.ht/~chiefnoah/quark
| infogulch wrote:
| Commendable attempt! I've considered writing a datalog
| storage backend on sqlite just like your prototype. Thank
| you for sharing, now I can lazily study your prototype
| instead of doing the hard work myself. :) I'm curious, what
| kinds of limitations of SQLite indexes are you referring
| to?
| packetlost wrote:
| Sparse indexes are pretty limited and it only supports
| B-tree, which make implementing AVET and VAET difficult.
| Further efficiently finding the _current_ value for a E +
| A is difficult to do in SQL in a way that doesn 't
| require maintaining a whole copy of the data. I actually
| bumped up against what I believe are weird edge-case bugs
| in the SQLite query planner when dealing with sparse
| indexes as well.
|
| I think I gave up when trying to implement one-many
| relationships because the SQL was getting too gnarly.
| packetlost wrote:
| I've been doing something very similar, but based on extendible
| hashing algorithms and LSM trees. I've come to the conclusion
| that a DAG of constraints, triggers and validations on a flat
| KV + entity model is probably the ideal data structure for 99%
| of projects I've worked on. You can get the benefits of the
| Datomic-like history by... skipping compaction and including a
| TX instant next to all record/assertions. I've found SQLite,
| Postgres, the LSM papers, Bitcask, and many other papers to be
| very helpful in terms of inspiration.
|
| Edit: I'm prototyping in Python and implementing in Rust with
| intent to create a C API and embedding a Scheme runtime for
| "server"-side query and constraint parsing.
| ftdyuio9p0iugy wrote:
| I've had some append-only tables in Postgres and only
| recently realised that Postgres' system columns
| (https://www.postgresql.org/docs/14/ddl-system-columns.html)
| already effectively enabled some Datomic-like structure for
| such append-only tables! Specifically the xmin column allows
| me to identify the rows to treat atomically as a unit, and to
| ignore if I'm querying for a historical view.
| packetlost wrote:
| You could probably do it with BRIN indexes similar to how
| TimescaleDB handles their time-series hypertables
| LoriP wrote:
| TimescaleDB is packaged as a postgres extension, there's
| a GitHub project here if anyone is interested to check in
| on that https://github.com/timescale/timescaledb
| sauruk wrote:
| Do you have a public repo for that yet? (Assuming you're
| planning to open-source)
| packetlost wrote:
| It's got a LICENSE but not publicly listed yet. It's very
| rough, and has a ton of work left. Once I get it to a
| workable state I'll open it up under AGPL, though probably
| with a CLA because I'd like to turn it into a marketable
| product in the long-run. If I make significant progress on
| it, I'll reply to this thread with updates :)
| sherbondy wrote:
| Obligatory link to Project Mentat:
| https://github.com/mozilla/mentat
|
| No longer actively maintained, but maybe a nice starting point
| for hacking on your dream!
| infogulch wrote:
| mentat was archived by mozilla back in 2017, but there are a
| bunch of forks. Because github is dumb and has a terrible
| interface for exploring forks [0], I used the Active GitHub
| Forks tool [1] that helped to find:
|
| qpdb/mentat [2] seems to be the largest (+131 commits) and
| most recently modified (May this year) fork of
| mozilla/mentat.
|
| [0]: https://github.com/mozilla/mentat/network/members -
| Seriously, how am I supposed to use this? Hundreds of
| entries, but no counts for stars, contributors, or commits,
| no details about recent commits. Just click every one?
|
| [1]: https://techgaun.github.io/active-forks/index.html
|
| [2]: https://github.com/qpdb/mentat
| huahaiy wrote:
| Sounds like Datalevin https://github.com/juji-io/datalevin
|
| Embeddable, check. Datomic data model and Datalog query, check.
| Storage written in C, check.
| packetlost wrote:
| Ooh, this looks good. LMDB backend though, meh.
|
| Edit: it's written in Clojure, so JVM. Extra bleh
| artemisart wrote:
| Embeddable... in the java ecosystem. I often see comments
| about datalog/datomic on HN and it seems interesting but I
| never see it anywhere else, is it because it's mostly known
| and used in the java and clojure ecosystem? Do you know of
| any free database with similar models usable from e.g.
| python?
| krn wrote:
| > I dream of a SQLite-like embeddable database engine based on
| Datomic's data model and queryable with Datalog.
|
| You can have this today by running XTDB[1] on top of SQLite via
| JDBC.
|
| > Written in something like C, Rust, or Zig.
|
| And then compiling your application into native executables
| with GraalVM Native Image[2].
|
| [1] https://xtdb.com/
|
| [2] https://www.graalvm.org/native-image/
| Kinrany wrote:
| I hope using an embeddable database will also free us from
| delegating the choice of query language to the database
| library. It should be possible to have a general purpose low
| level persistence API and many different query engines built on
| top of it.
| packetlost wrote:
| I think the problem with querying is efficient query-planning
| requires understanding the indexes on the dataset, so you at
| least need to be able to expose indexes and the their
| properties in an API.
| Kinrany wrote:
| The API should definitely either allow directly managing
| indexes or provide even lower level primitives that let the
| query engine create its own indexes.
| packetlost wrote:
| I mean, if you have a KV-like store that supports
| enumeration, you can pretty much always index the data
| yourself.
| kjeetgill wrote:
| Imho the 90% of query planning is not that hard at all in
| practice. If it's your data, and your query you'll probably
| have a pretty good idea which table you'll want to filter
| first, and what to join the same you would with any data
| structures.
|
| The hard part is getting all of that consistent with
| concurrent writes. Can rows change while you scan? can
| indexes? How do you check that your write is valid
| immediately before committing, etc. things like that.
|
| I think SQL makes that pretty hard already, but in a
| "database-as-a-bag-of-data-structures" mode I think that's
| going to get even harder.
| thesz wrote:
| IMNSHO query planning is pretty hard. I recently found
| exponential behavior in SELECT query processing that
| depends on the depth of subselects. This happened with
| pretty seasoned database system, let me say.
|
| To have good query optimization, you need to implement,
| at the very least, some form of dynamic programming,
| otherwise you will not be able to optimize queries that
| have more than half a dozen tables in selects. Then you
| have to implement selection of the best plan or
| approximation to it, which would make you implement beam
| search through space of all solutions you generated, and
| that's simplest case. For guaranteed optimization, you
| need to implement or utilize pseudoboolean optimization
| engine.
|
| I am a database engine developer right now. ;)
| j-pb wrote:
| If you just pick a worst case optimal join algorithm with
| a limited table cardinality, i.e. triples like in the
| case of OP, so that you can materialise every possible
| index, you can perform a dynamic _instance optimal_
| search (i.e. you can ignore skew), if you choose the best
| possible variable every time you have to pick a variable.
| This can be done by estimating the cardinality and
| jaccard index of the variables, which is pretty
| straightforward with the right datastructure. If you
| don't want to limit yourself to just three columns, you
| can also go for cutting edge succinct data-structure
| magic. https://aidanhogan.com/docs/wco-ring.pdf
|
| Either way, using a WCO join combined with a data-
| structure that allows for efficient range estimate and
| dynamic variable ordering, completely obliviates the need
| for query planning.
| thesz wrote:
| "Estimate the cardinality", "jaccard index", "succinct
| data structure" and, finally, some unknown abbreviation
| "WCO" (I guess, it stands for "worst case optimal").
|
| Yes, of course, having implementation of all that
| completely obliviates the need for query planning. ;)
|
| I can't help being sarcastic for a moment, sorry.
|
| In my opinion, in your comment above you clearly
| demonstrated that even avoidance of query planning is
| hard, using as example (multi)set of triples for which it
| is possible to realize all indices.
|
| If what you described is simpler than query planning,
| then query planning is hard.
| j-pb wrote:
| While the technical terms may be unfamiliar, it's all
| pretty straightforward, and requires between 1-4kloc
| depending on how fast and fancy you want things to be.
| (This includes everything from the basic copy on write
| data-structures to the query language.)
|
| Building an immutable path compressed radix tree is
| pretty straightforward and requires around 1-2kloc, and
| it's easy to keep track of the n-smallest hashes of leaf
| values, as well as the total count of leafs in the nodes.
| The sampling done by the min-hashes give you a good
| indication of two nodes overlap which the jaccard index
| is just a different name for.
|
| The query engine itself is like 0.5kloc, and is just
| walking the different radix-trie indices simultaneously.
| The basic insight of worst case optimal (WCO) joins, is
| that it's a lot cheaper to join everything at once and
| treat it as a constraint propagation problem.
|
| A LINQ style query parser takes up another 1-2kloc and is
| just a bunch of ol' boilerplate.
|
| In total that's about as much code as your average large
| C++ codebase CMake file.
|
| You could sketch the entire thing on a napkin and build
| it in a week if you've build something like it before.
| ComodoHacker wrote:
| >you'll probably have a pretty good idea which table
| you'll want to filter first, and what to join
|
| Until the day you load a bunch of new data and it gets
| skewed, or you delete a bunch of data without shrinking
| and oops, your join order and method is not efficient
| anymore.
| awild wrote:
| Have you had a look at arrow? It has those capabilities
| EastLondonCoder wrote:
| What about https://xtdb.com/
| michael_j_ward wrote:
| if litestream ever enables logical replication, I think you
| could do `SQLite logical replication --> embedded materialize-
| db --> back to SQLite`
| kebman wrote:
| Your dream sounds very nice.
| chakkepolja wrote:
| There's a bunch of resources on r/databasedesign.
| HyperMassive wrote:
| I can't seem to find this subreddit. Do you have a link?
| benbjohnson wrote:
| I think it's this one:
| https://www.reddit.com/r/databasedevelopment/
| Rochus wrote:
| Interesting article. When I implemented a no-sql database using
| the sqlite backend for https://github.com/rochus-keller/DoorScope
| and https://github.com/rochus-keller/CrossLine 15 years ago there
| was little literature about it (actually also the referenced
| article is rather light on the pages and b-tree implementation,
| but e.g. section 9 of https://www.amazon.com/Definitive-Guide-
| SQLite-Experts-Sourc... was helpful). There was no lmdb or
| leveldb yet; though there was bekreley db and I did prototypes
| with it, but the sqlite backend turned out to be much leaner and
| faster for the task at hand.
| alberth wrote:
| Off topic: does it bother anyone else when author By Line and
| Date Published is displayed at the very end of the blog/article
| as opposed to the start/top?
|
| I'm not even a journalist / communication person but I do
| immediately look at those fields to get a quick sense of
| relevance (is this an old article possibly out-of-date) and
| authority of author on said topic.
| zasdffaa wrote:
| A date published anywhere at all is fortunate.
| Linda703 wrote:
| guessmyname wrote:
| I love this.
|
| I have been a fan of SQLite for years, and it makes me genuinely
| happy to see other people have the same enthusiasm for the
| software.
|
| A couple of years ago, during a job search, I came across a
| couple of companies that I thought were a good match. I went
| through the interview process with all of them and, due to
| similarities in the interview process, we ended up talking about
| the same thing: databases. Oh the horror in the interviewers'
| eyes whenever I mentioned SQLite. Experienced software engineers
| from The New York Times, for example, went as far as to mock my
| engineering choices in 3 out of 5 interviews, despite the success
| of the products my team built on top of SQLite.
|
| The experience made me feel awful and I stopped talking about
| SQLite for a couple of years. Instead, whenever someone asked a
| question about databases, I would answer with the PostgreSQL
| equivalent, knowing that the basic features are all the same.
|
| That is why, whenever I see projects like Litestream [1][2] used
| by excellent products like Tailscale [3], it brings me joy.
|
| A friend of SQLite is a friend of mine :-)
|
| [1] https://litestream.io
|
| [2] https://fly.io/blog/all-in-on-sqlite-litestream/
|
| [3] https://tailscale.com/blog/database-for-2022/
| jjoonathan wrote:
| Mockery is never OK.
|
| That said, their experience probably reflects my own: using
| SQLite on a personal project for config or something, loving
| the simplicity, loving the "ditch all that complexity" hype,
| then trying it on a more demanding project that according to
| their mental model it should have excelled at (a central job
| tracking process), watching it completely faceplant under load
| (20TPS? Surely it just needs an index or something? Maybe a
| ramdisk? Err.....), panicking when they realize that perf and
| monitoring and diagnostics and optimized backends were some of
| those "bells and whistles that nobody needs," deciding "fine
| I'll do it myself," throwing far too much time and effort at
| co-debugging python and C, recompiling SQLite with debug
| symbols, sprinkling printfs, running profilers, reading the
| source, and tracking down the issue... only to find that it was
| a bad default which people had been unsuccessfully begging the
| maintainers to change for years, which was guaranteed to
| torpedo perf under a wide variety of common workloads, and
| which the maintainers had been resisting based entirely on
| inertia and the misguided idea that this setting was easily
| discoverable. Yikes.
|
| _That 's_ how someone picks up rules of thumb like "SQLite is
| for preferences, not perf, and definitely not prod."
|
| Now, this experience is a decade stale. SQLite bit me in 2012
| and the particular bug was fixed in 2015 IIRC. The people
| begging to fix the bad default finally won. Time passed, SQLite
| got better, and now, based on these new stories of SQLite doing
| well under load, I think it's time to lift my "once bit, twice
| shy" sanctions regarding perf. I had them in place for a
| reason, though, and if you had YOLO'd when I had YOLO'd, you
| would have faceplanted when I faceplanted. I fully defend the
| use of sanctions like this in general. They are the only tool
| we have against overenthusiastic marketing.
|
| To be fair, the SQLite site is pretty good, it's the fans that
| tend to take things too far.
| jjoonathan wrote:
| Speaking of potentially obsolete sanctions, conferences have
| been on hold so I might be hanging on to a few more of them
| past their due date. Does anyone have fresh experiences with
| these?
|
| ------------------
|
| AMD GPUs (2014): Waiting for GPGPU support from Adobe and
| Blender that was supposed to be operational (Adobe) or
| getting there (Blender) but never landed.
|
| AMD GPUs (2016): I burned an entire semester trying to port
| CUDA code to OpenCL only to run into bug after bug after bug.
| Lockups, black screens (no framebuffers were supposed to be
| involved but bugs find a way), leaks, stalls which were
| difficult to attribute, unscruitable errors leading to
| abandoned threads with pleading users from years past with no
| reply. I finally cracked when after a long day of "I swear
| this should work I have no idea how the changes I make
| correspond to the bugs I see" I ran my ailing OpenCL code on
| a weaker NVidia card, and it Just Worked. Not only did it
| just work, it was 10x faster. Even though the card was like 2
| generations older, and I was using OpenCL, which NVidia had
| every reason to sandbag, but they were still winning at. Oh,
| and the debugging tools actually worked, too. I was persuaded
| to sell my red cards, eat the ebay tax, eat the nvidia tax,
| and buy green from then on. Time has passed since 2016,
| though. ROCm looks good and AMD has money now, so maybe they
| can afford to pay people to fix bugs. I want to see someone
| running Blender and a modern ML framework or two before I
| retire the sanctions, though. Anyone have fresh intuition
| here?
|
| Numba CUDA (2018): I spent two weeks working through a
| succession of problems that I eventually resolved in two days
| by just using CUDA C++. Numba CUDA only worked with year-
| stale CUDA, it copied float8s over a float4 array despite
| having access to triply redundant type specs, the mechanism
| to pass buffers between kernels always forced copy off GPU,
| and one more that I forget. I made patches for the first
| three and gave up on the last one before fixing. Has anyone
| used it recently? Is it in better shape these days?
| hardwaregeek wrote:
| Never underestimate programmer machismo. So many people seem to
| rule out SQLite because of programmer lore that it's not fast
| enough or doesn't scale. But ofc Postgres is a _real_ database
| and therefore is the automatic correct choice.
| outworlder wrote:
| > went as far as to mock my engineering choices
|
| That's not acceptable.
|
| > Experienced software engineers
|
| Measured in years, probably, not by ability.
|
| I've heard supposedly 'experienced' engineers saying things
| like "No-one uses Lua in production!".
| ok_dad wrote:
| It's amazing how otherwise competent engineers will look down
| on technology that so many use all around the world when
| generally their contribution to society is a few buggy Python
| classes in some internal codebase.
| atonse wrote:
| Crazy to hear that.
|
| Those conversations are my favorite when interviewing
| candidates because it gives you a view into how their brains
| work.
|
| I would've nerded out about how so many macOS apps use SQLite
| under the hood because it's so solid.
|
| SQLite is even used in the Airbus A350 and they signed (approx
| something like) a 35 year support contact with Airbus for it, I
| remember hearing in a podcast a long time ago. (Please correct
| if I'm way wrong here)
| LAC-Tech wrote:
| I sometimes wonder how much of this objection to SQLite is
| purely technical, and how much is due to the fact that Dr.
| Richard Hipp would not exactly be a great "cultural fit" at
| places like the New York Times or Mozilla.
| mypalmike wrote:
| SQLite is great. But the simple fact that it runs locally
| makes it a poor technical fit in many scenarios.
| bob1029 wrote:
| > Simplicity leads to reliability and I don't know of a more
| reliable database than SQLite.
|
| JSON file on disk might be a reasonable competitor here. This
| scales poorly, but sometimes you don't need to scale up to
| something as crazy as full-blown SQLite.
| benbjohnson wrote:
| Yeah, I like JSON on disk too. There's some "gotchas" to avoid
| to make it safe though like atomic renaming or fsyncing the
| file and its parent directory. But as long as you avoid those
| and don't have a lot of data then JSON-on-disk works great.
| yndoendo wrote:
| One of the projects I worked on was an embedded system running
| on 200MHz with 8MB RAM and micro SD for storage, single data
| channel. SQLite was chosen because of the power to preform
| report calculations in SQL and is perfect for configuration
| storage in simple Key / Value relation, like Firefox
| about:config.
|
| SQLite is my go-to database for application configuration and
| even non-enterprise web applications.
| vlovich123 wrote:
| Implied is "as simple as possible but no simpler". We used JSON
| on disk at one job BUT.
|
| 1. We knew what we were doing with respect to fsync+rename to
| guarantee transactions. I bet you the vast majority of people
| who go to do that won't (SQLite abstract you from needing to
| understand that)
|
| 2. We ended up switching to SQLite anyway and that was a
| painful migration.
|
| Just pick SQLite to manage data and bypass headaches. There's a
| huge ecosystem of SQL-based tools that can help you manage your
| growth too (eg if you need to migrate from SQLite to Postgres
| or something)
| anonymouse008 wrote:
| At what point does one "need" to move to Postgres / MySQL? It
| feels like much of DBA is all style points - not necessarily
| objective.
| stonemetal12 wrote:
| SQLite is not good at OLTP workloads. If that is what you
| are doing then SQLite runs out of steam pretty quickly.
| infogulch wrote:
| > _multiple-user_ OLTP
|
| It works fine for single user applications.
| dymk wrote:
| If at least one of the following are true (non-exuastive
| list):
|
| - More than one machine needs to read from the database
|
| - More than one machine needs to write to the database
|
| - The database is too large to hold on one machine
| [deleted]
| jjoonathan wrote:
| Does SQLite have perf tools these days? I got bit badly by
| SQLite perf a decade ago (a bad default tanked performance
| and took a week of autotools, profiling, and mucking around
| in source to fix). Now I just use Postgres whenever I need
| anything resembling performance or scale. A decade is a
| long time, though -- do the tools exist now?
|
| If not, some jobs will benefit from a better perf insurance
| policy than "I could probably get this into a profiler if I
| needed to."
| yread wrote:
| if you see "database locked" more often then you'd like
| zdkl wrote:
| It becomes necessary at some (low) value of concurrent
| write usage basically. Or if you want to lean on logic
| built into one of the "proper" RDBMS (GIS, granular access
| managment, ...).
| masklinn wrote:
| > It becomes necessary at some (low) value of concurrent
| write usage basically.
|
| Much less so since the introduction of WAL mode 12 years
| ago, tho.
| justinclift wrote:
| If your database is write heavy, and receives a bunch of
| traffic, then PostgreSQL is probably a better fit.
|
| Also, if the SQL that's needed in an application is more
| "enterprise" style, then SQLite isn't there (yet). ;)
| masklinn wrote:
| Also if you want things like advanced datatypes or
| indexes, or features sqlite doesn't have (of which there
| are many, though in the last few years the support for
| "advanced" SQL has really taken a leap forward making it
| much more enjoyable).
| schwartzworld wrote:
| JSON is a pretty poor storage format though, isn't it? I use it
| for the wordle clone I made, but the code would have been much
| simpler with SQL statements. And that was without any kind of
| complicated relational data
| https://github.com/schwartzworld/schwardle
| conradev wrote:
| The funny thing is that SQLite is faster for storing JSON blobs
| than the file system is:
| https://www.sqlite.org/fasterthanfs.html
|
| I would also say that the second you need any sort of writing
| or mutating in any sort of production environment SQLite is
| also simpler than the file system!
|
| Race conditions in file system operations are easy to hit, and
| files have a lot more moving pieces (directory entries,
| filenames, file descriptors, advisory locks, permissions, the
| read/write API, etc) than a simple SQL schema
|
| Whenever I see "full-blown" next to "SQLite" there is usually
| some sort of misconception. It's often simpler and more
| reliable than the next best option!
| jck wrote:
| It looks like sqlite is not much faster than fs on Linux.
|
| I've also found that ripgrep performs surprisingly well with
| folders containing large numbers of text files - this makes
| raw json on disk more convenient for many simple usecases
| which don't require complex queries.
| samatman wrote:
| It's on write that using SQLite really shines.
|
| Imagining a moderately large JSON object, a lot of them
| actually, not hard to read or write exactly but they do
| carry some common references.
|
| SQLite can: only rewrite the changed fields with a query,
| and transact the update across the objects. The filesystem
| can do atomic swaps of each file, usually, if nothing goes
| wrong. SQLite would also be faster in this case, but I'm
| more motivated by the ACID guarantees.
| outworlder wrote:
| > something as crazy as full-blown SQLite.
|
| The sqlite library is tiny. Often smaller than some JSON
| parsers!
| CJefferson wrote:
| We did have a big system that used JSON file on disk. The
| problem is over time we ended up with files with half a JSON,
| and needed a "cleanup" pass. Except of course the cleanup might
| find a file in the process of being written, and life slowly
| got more and more complicated.
| zimpenfish wrote:
| Worked at a place that used on-disk XML as their database
| with multiple (10s of) writers. Started off fragile, ended up
| hideously complex and fragile. Unfortunately it was
| originally written by one of the powers that be and his word
| held sway over every technical decision, good or disastrous.
| eatonphil wrote:
| See also: https://tailscale.com/blog/an-unlikely-database-
| migration/, and https://tailscale.com/blog/database-for-2022/.
| didgetmaster wrote:
| I wrote a new, general-purpose data management system that was
| originally designed to compete with file systems for storing
| large amounts of unstructured data (my background is in disk
| utilities like PartitionMagic and Drive Image). It can be used
| to convert 100M+ files (photos, documents, videos, etc.) into
| data objects and attach searchable, contextual meta-data tags
| to them.
|
| The tags I invented were essentially highly-optimized key-value
| stores that worked really well. I then discovered that these
| same KV stores could also be used to form relational tables (a
| columnar store). It could be used to store both highly
| structured data or the more semi-structured data found in
| things like Json documents where every value could be an array.
| Queries were lightning fast and it could perform analytic
| operations while still performing transactions at a high speed.
|
| My hobby turned into a massive project as I figured out how to
| import/export CSV, Json, Json lines, XML, etc. files to/from
| tables easily and quickly. It still has a lot of work to go
| before it is a 'full-blown' database, but it is now a great
| tool for cleaning and analyzing some pretty big tables (tested
| to 2500 columns and 200M rows).
|
| It is in open beta at https://didgets.com/download and there
| are a bunch of short videos that show some of the things it can
| do on my youtube channel:
| https://www.youtube.com/channel/UC-L1oTcH0ocMXShifCt4JQQ
| dymk wrote:
| JSON file (or any file you directly write to) is simple, but
| has few of the guarantees SQLite gives you with regards to
| persistence. SQLite abstracts away all of that for you by
| handling the fancy platform-specific I/O stuff under the hood.
|
| SQLite does increase complexity in the sense that it's a new
| dependency. But as far as dependencies go, SQLite has bindings
| in almost all languages. IMO even for simple usecases, it's
| worth using it from the start. Refactors are easier down the
| line than with a bespoke JSON/yaml/flat encoding.
|
| Read only configuration, sure, JSON (etc) file makes sense. The
| second you start writing to it, probably not.
| fikama wrote:
| Could you elaborate more on those things abstracted by
| SQLite? or point to further reading - You got me interested
| Lex-2008 wrote:
| Not OP, but I remember reading an interesting article about
| issues that simple write() might face:
| https://danluu.com/deconstruct-files/
|
| And here's what SQLite does to ensure that in case of
| application or computer crash database contains either new
| or old data, but not their mix or some other garbage:
| https://sqlite.org/atomiccommit.html
| under-Peter wrote:
| For people who enjoy this article I recommend reading the two-
| parter "How does SQLite work?" [1] [2] by Julia Evans. It too is
| an exploration into the inner workings but diving deeper into
| code. And for those interested in more details on b-trees,
| there's "The Ubiquitous B-Tree" by Douglas Cramer which I enjoyed
| a lot.
|
| [1] https://jvns.ca/blog/2014/09/27/how-does-sqlite-work-
| part-1-... [2] https://jvns.ca/blog/2014/10/02/how-does-sqlite-
| work-part-2-... [3]
| http://carlosproal.com/ir/papers/p121-comer.pdf
| upupandup wrote:
| A bit off topic but fly related:
|
| - How can i forward range of ports instead of generating 1000s of
| lines? This is a major blocker for me.
|
| edit: not sure why this is being downvoted, I've received no
| response from fly.io about this and this is the only way to get
| their attention on this shortcoming. you can't really expect
| people to write port 100,00 to 50,000 individually thats like
| 40,000 lines. Fly.io really needs to support port ranges like
| 10,000-50,000
| mrkurt wrote:
| community.fly.io is the best place to ask about this.
|
| We don't support tcp port ranges yet. We will someday:
| https://community.fly.io/t/new-feature-every-public-port-now...
| freecodyx wrote:
| It's always good for this kind of articles, to refer to source
| code. Provide links to the source code responsible of some of the
| implementation details
| eatonphil wrote:
| Yeah that would be awesome! I'd rephrase your comment as a
| suggestion rather than a demand though personally. :)
| benbjohnson wrote:
| That's a good idea. There's a careful balance though since
| once you start talking about specific functions in SQLite
| then the post becomes very low-level and starts to become
| less accessible. I'll try to integrate that more in the
| future though.
| mandeepj wrote:
| >
| https://github.com/sqlite/sqlite/blob/master/src/btreeInt.h
|
| Woah, Ben! I'm so glad the above link was shared above;
| thanks to the poster as well. There's a very good amount of
| in-depth knowledge shared via comments and a reference to
| Donald knuth's book
| benbjohnson wrote:
| There's a lot of goodies in the comments. I like this one
| about how POSIX locks are broken: https://github.com/sqli
| te/sqlite/blob/master/src/os_unix.c#L...
| mandeepj wrote:
| Thanks! Enough catalyst for me to read through all of
| SQLite source files
| eatonphil wrote:
| Yeah maybe it would force you to get too close to the code,
| I don't know SQLite well enough.
|
| When I do surveys of software I try to provide links to
| source (mostly just the relevant file or directory) so
| folks can verify for themselves. For example when looking
| into the parsers behind Postgres-compatible DBs [0] and
| parser techniques in a few major language implementations
| [1]. But both those posts are at a higher-level than what
| you're doing here.
|
| I'm sure you'll have good reason either way for this
| series.
|
| [0]
| https://datastation.multiprocess.io/blog/2022-02-08-the-
| worl...
|
| [1] https://notes.eatonphil.com/parser-generators-vs-
| handwritten...
| samatman wrote:
| GP's account page says From Morocco With Love, and I'm sure
| their Arabic and French are far beyond mine.
|
| This reads to me like an accidental slip into the imperative,
| breaking a sentence in two parts, and not adding a phrase
| such, "For example,".
|
| In fact, changing the period to a colon would defuse the
| imperative. It's subtle!
| photochemsyn wrote:
| Here's a header file that basically mirrors some of what the
| article is talking about, the layout of pages and the btree and
| so on (~lines 100-200)
|
| https://github.com/sqlite/sqlite/blob/master/src/btreeInt.h
|
| The code for the btree functions is here and is a bit over my
| head TBH with all the locks and permissions and so on but it's
| a nice example of how to comment code I think:
|
| https://github.com/sqlite/sqlite/blob/master/src/btree.c
| jvdvegt wrote:
| The SQLite documentation is pretty good as well:
| https://sqlite.org/fileformat.html
| dunham wrote:
| I'll second that. I was able to use that documentation to
| write javascript code that reads and queries a sqlite
| database. (Just as a learning exercise, I don't have a
| practical use for this.)
| CoastalCoder wrote:
| A bit of a tangent, but the Corecursive podcast has a really
| interesting interview with the creator of SQLite: [0]
|
| [0] https://corecursive.com/066-sqlite-with-richard-hipp/
| anewpersonality wrote:
| How much of SQLite is D. Richard Hipp versus companies that are
| contributing to the code?
| azornathogron wrote:
| I can't directly answer your question because I don't know, but
| I will note that the SQLite Copyright page states
|
| > Contributed Code
|
| > In order to keep SQLite completely free and unencumbered by
| copyright, the project does not accept patches.
|
| https://www.sqlite.org/copyright.html
|
| I think the number of people who have worked directly on the
| code for the canonical SQLite distribution is quite small.
| benbjohnson wrote:
| I talked with the SQLite team about a year or two ago and
| IIRC there were four people working on it (including Dr.
| Hipp). I believe they were all full time but I didn't ask.
| LAC-Tech wrote:
| SQLite is 21 years old, and for me it just keeps getting better
| and better. More performant, more relevant features, all while
| being just as easy to embed and administer.
___________________________________________________________________
(page generated 2022-07-27 23:00 UTC)