[HN Gopher] Sensible SQLite Defaults
       ___________________________________________________________________
        
       Sensible SQLite Defaults
        
       Author : thunderbong
       Score  : 109 points
       Date   : 2024-12-06 11:21 UTC (11 hours ago)
        
 (HTM) web link (briandouglas.ie)
 (TXT) w3m dump (briandouglas.ie)
        
       | darthShadow wrote:
       | FWIW, auto_vacuum set to INCREMENTAL doesn't do any vacuuming
       | automatically; it just stores the information needed to vacuum
       | the DB.
       | 
       | From https://www.sqlite.org/pragma.html#pragma_auto_vacuum:
       | 
       | > When the value of auto-vacuum is 2 or "incremental" then the
       | additional information needed to do auto-vacuuming is stored in
       | the database file but auto-vacuuming does not occur automatically
       | at each commit as it does with auto_vacuum=full. In incremental
       | mode, the separate incremental_vacuum pragma must be invoked to
       | cause the auto-vacuum to occur.
        
       | neverartful wrote:
       | Unfortunately, there doesn't seem to be a PRAGMA option for
       | strict tables. Docs at following link say that 'STRICT' must be
       | appended to end of CREATE TABLE statement to disable the flexible
       | typing 'feature'.
       | 
       | https://www.sqlite.org/stricttables.html
        
         | giraffe_lady wrote:
         | I don't think there can be with sqlite's backwards
         | compatibility model. The DB has to assume it may be accessed
         | with multiple versions of the library or clients using
         | different pragma settings. So one client disabling strict (or
         | using a version predating that feature) and inserting a bunch
         | of violating records would break the DB file when used by
         | another client with strict enabled.
         | 
         | With it part of table creation, which IIRC is read for each
         | connection to create the schema, it's part of the DB file not
         | the client configuration. If you open a DB using strict with a
         | version that doesn't support it, it will fail but not cause
         | problems for other users.
        
           | neverartful wrote:
           | Yep, that all makes perfect sense!
        
           | alberth wrote:
           | I wish there was a branch of SQLite that cleaned up
           | everything, set new sane behavior (and broke backward
           | compatibility since it's needed to achieve that).
           | 
           | Eg strict mode by default, WAL2, BEGIN CONCURRENT, etc.
           | 
           | Maybe HC-Tree will become that?
           | 
           | https://sqlite.org/hctree/doc/hctree/doc/hctree/index.html
        
             | silvestrov wrote:
             | It is time for version 4 of SQLite.
        
       | ratorx wrote:
       | I think it would be really useful to have a "Why Not?" section
       | for each option as well.
       | 
       | Some of these feel like tradeoffs and setting them blindly
       | without understanding the downsides seems incorrect.
        
         | TheRealPomax wrote:
         | That's the essence of defaults: they're just better defaults,
         | not an excuse to stop caring about what better settings to use
         | once you have runtime performance metrics to guide whether they
         | need changing and which direction to change them in.
        
           | ratorx wrote:
           | Right, but they are not the official defaults. If you are
           | changing the official defaults, then you might as well do it
           | in a more principled way.
           | 
           | If you are going to need to optimise with performance metrics
           | anyway, then why not stick to just the official defaults
           | (unless the official defaults are non-functional, is that the
           | case?)
           | 
           | I think anyone who is reading a blog post on "better
           | defaults" is front loading some of the optimisation, so you
           | could let them make a principled choice straight away for
           | marginal extra cost.
        
             | nemothekid wrote:
             | A lot of these aren't defaults because of backwards
             | compatibility. IMO there is no reason to not use WAL mode,
             | but it's not default because it came later
        
               | bruce511 wrote:
               | As long as all the processes are on the same machine the
               | wal mode is a good idea.
               | 
               | It's not good in the case where multiple machines are
               | sharing the same database. Like say if you had a shared
               | settings file which allowed multiple VMs to be set in one
               | place.
               | 
               | Obviously the same-machine situation is the most common.
               | But you asked for a reason for when wal is not
               | appropriate.
        
             | TheRealPomax wrote:
             | There is some principle here: these are more sensible
             | defaults (but still _only_ defaults) for a very different
             | purpose when compared to the defaults that make sense for
             | "generic SQLite use" that has to keep those set to
             | something that works across all versions. Having different,
             | domain-specific defaults based on an assumption of "you're
             | setting up a new project using the current version of
             | SQLite" makes a whole lot of sense.
        
             | antisthenes wrote:
             | > If you are going to need to optimise with performance
             | metrics anyway, then why not stick to just the official
             | defaults (unless the official defaults are non-functional,
             | is that the case?)
             | 
             | Well, why don't you do the research and tell us?
             | 
             | In my 10-15 years of dealing with official defaults of many
             | programs is that they do work, but in 90% of cases they are
             | overly-conservative.
        
         | btilly wrote:
         | The note of several of them as being defaults for a web app
         | kind of sets expectations for me. These are defaults for maybe
         | you're writing a web app and want it to be a bit more like
         | MySQL or Postgres. They aren't defaults for using SQLite as an
         | in-memory cache on a complex piece of analysis.
         | 
         | SQLite is an amazingly widely used piece of software. It's
         | impossible for one set of defaults to be perfect for all use
         | cases.
        
         | jotaen wrote:
         | I agree. While it's probably not possible to settle on defaults
         | that work for each and every scenario, my personal preference
         | is that factory defaults should tend to optimise for safety
         | primarily. (Both operational safety, but also in regards to
         | usage.)
         | 
         | For example, OP suggests setting the `synchronous` pragma to
         | `NORMAL`. This can be a performance gain, but it also comes at
         | the cost of slightly decreased durability. So for that setting,
         | I'd feel that `FULL` (the default) makes more sense as factory
         | default for a database.
        
         | jccooper wrote:
         | https://www.sqlite.org/pragma.html does a pretty good job on
         | the tradeoffs. Something like WAL you'd want to look a bit
         | deeper on... but need not go too far afield:
         | https://www.sqlite.org/wal.html
        
       | chasil wrote:
       | I do not agree that WAL mode should be enabled by default. It can
       | break things if used incorrectly.
       | 
       | https://www.vldb.org/pvldb/vol15/p3535-gaffney.pdf
       | 
       | "To accelerate searching the WAL, SQLite creates a WAL index in
       | shared memory. This improves the performance of read
       | transactions, but the use of shared memory requires that all
       | readers must be on the same machine [and OS instance]. Thus, WAL
       | mode does not work on a network filesystem."
       | 
       | "It is not possible to change the page size after entering WAL
       | mode."
       | 
       | "In addition, WAL mode comes with the added complexity of
       | checkpoint operations and additional files to store the WAL and
       | the WAL index."
       | 
       | https://www.sqlite.org/lang_attach.html
       | 
       | "SQLite does not guarantee ACID consistency with ATTACH DATABASE
       | in WAL mode. "Transactions involving multiple attached databases
       | are atomic, assuming that the main database is not ":memory:" and
       | the journal_mode is not WAL. If the main database is ":memory:"
       | or if the journal_mode is WAL, then transactions continue to be
       | atomic within each individual database file. But if the host
       | computer crashes in the middle of a COMMIT where two or more
       | database files are updated, some of those files might get the
       | changes where others might not."
        
         | masklinn wrote:
         | These all sounds like load-specific / ancillary concerns.
         | 
         | I don't think most people are in one of those cases, and most
         | people will _greatly_ benefit from WAL. WAL should absolutely
         | be the default unless you have a good reason to not use it.
        
           | chasil wrote:
           | You may prefer non-acid characteristics in your databases in
           | general.
           | 
           | I do not. There may be specific uses, but standards should be
           | followed when possible.
        
             | giraffe_lady wrote:
             | "When possible" in this case being simply don't enable the
             | WAL if your use requires transactions across multiple
             | attached sqlite DBs.
        
             | masklinn wrote:
             | > You may prefer non-acid characteristics in your databases
             | in general.
             | 
             | No. Quite the opposite. However the multi-database case
             | strikes me as a very limited edge case, I would be
             | surprised if double digit % users of sqlite even knew it's
             | an option let alone used it, and the concern is further
             | lessened because afaik things like FKs do not work cross-
             | db.
        
         | striking wrote:
         | WAL isn't the footgun, SQLite over network attached storage is.
         | Additional files are not a problem. If you need cross-DB file
         | ACID you can just turn off WAL mode in that instance.
        
           | bruce511 wrote:
           | >> WAL isn't the footgun, SQLite over network attached
           | storage is
           | 
           | I agree that using SQLite over a network is not a good idea.
           | But people do do that, and it works tolerably well. On most
           | networks it'll work just fine, especially if writes are
           | sparse.
           | 
           | The point of defaults is that they should cover the "widest
           | range". By all means read the docs, and choose to use Wal.
           | But that's an action you take and understand.
           | 
           | The point of articles like this though is to help alert
           | people to the settings that exist. You should always read the
           | docs and apply them to your context before using any setting.
        
       | masklinn wrote:
       | > Why?: Prevents "database is locked" errors by giving SQLite 5
       | seconds to wait for a locked resource before returning an error,
       | useful for handling multiple concurrent accesses.
       | 
       | Alongside that one of the most important things to do is to have
       | strong read/write segregation: separate readonly and r/w pools,
       | the latter having a single connection which is immediately set to
       | BEGIN IMMEDIATE when it's borrowed out.
        
         | skrebbel wrote:
         | What do you mean by "connection" in the context of SQLite?
        
           | giraffe_lady wrote:
           | How is the meaning not clear? If you're trying for a "there's
           | no connections in sqlite" type gotcha that's not even true.
           | SQLite itself uses the term internally and it shows up in the
           | docs.
           | 
           | https://www.sqlite.org/c3ref/open.html
           | https://www.sqlite.org/c3ref/sqlite3.html
        
       | elchief wrote:
       | why is cache size negative?
        
         | cogman10 wrote:
         | According to the docs, it sets the cache_size based on
         | kilobytes rather than pages. [1]
         | 
         | [1] https://www.sqlite.org/pragma.html#pragma_cache_size
        
       | acuozzo wrote:
       | This will probably get buried, but I'm wondering: Has any
       | scripting language attempted to use SQLite as its memory model?
       | You get GC, memory safety, atomicity guarantees, core dump
       | functionality, and rich data structures for free. If configured
       | to run in-memory, I reckon it wouldn't impose enough overhead to
       | be problematic.
       | 
       | It sounds crazy, but most scripting languages add this stuff
       | anyway.
       | 
       | With rqlite this can be extended to distributed systems
       | programming as well.
        
         | aziis98 wrote:
         | I also think about this sometimes, maybe not as only memory
         | model but a simple scripting language with tight integration
         | with sqlite would be pretty cool.
         | 
         | P.S. Actually if think I saw something similar somewhere in the
         | past, if it comes to mind I'll send it
        
           | Imustaskforhelp wrote:
           | yeah dude send it , I am thinking of writing a scripting
           | language some years later and I have absolutely loved this
           | idea.
        
             | acuozzo wrote:
             | I'm not aziis98, but Smalltalk had something similar which
             | was significantly greater in scope.
             | 
             | IIRC, your application shipped with the entire development
             | environment.
        
         | prisenco wrote:
         | That's interesting. There would be overhead compared to
         | directly managing memory. But it would be wild to be able to
         | declare an array of structs that you could directly query.
         | var people person[5]         people[5] // { name: "Human
         | Person", age: 39 }         people.query("SELECT name FROM a
         | WHERE age = 39") // "Human Person"
         | 
         | If everything in memory was a table structure, you could do
         | joins across variables too.
         | 
         | Would it have to be a scripted language? Seems like you could
         | do it as a compiled language.
         | 
         | I love SQL so if I had the option to use it within in
         | application I'd be hard pressed not to. But from what I've
         | learned from the popularity of ORMs and NoSQL, most don't share
         | that feeling.
        
           | acuozzo wrote:
           | > that you could directly query
           | 
           | To take it even further, the language grammar could be a
           | superset of the SQL grammar.
           | 
           | > Would it have to be a scripted language?
           | 
           | Not at all. It's just the first thing which came to mind. I'm
           | tickled by the idea of having the entire program state (stack
           | frames and all!) in SQLite so that you could ship a snapshot
           | of your program around.
           | 
           | > There would be overhead compared to directly managing
           | memory.
           | 
           | Of course, but if the critical queries were pre-compiled via
           | sqlite3_prepare, I doubt the overhead would so greatly exceed
           | the memory management overhead in language runtimes which
           | ship with a GC that it would be a show-stopper.
        
             | Imustaskforhelp wrote:
             | wow the idea of shipping a snapshot is also crazy good ,
             | bookmarking your comment as well the parent comment as
             | well.
             | 
             | Because snapshotting is a rather crazy thing which have to
             | be managed by criu or qemu
             | 
             | this could even be faster than a vm shipped.
             | 
             | MY goal with snapshots can be literally anything (yet , I
             | like the idea of extraction snapshots for extremely large
             | files very lucrative idea)
        
               | acuozzo wrote:
               | Thank you!
               | 
               | Considering this further... I think Lua might be a good
               | match for this idea. Everything it can do can be pretty
               | easily mapped to SQLite queries and its type system is
               | pretty close to the limited set of types offered by
               | SQLite.
        
               | knome wrote:
               | While not serialized to a database, memory-dump/snapshot
               | based distribution was done with smalltalk and various
               | lisps. Basically dump the gc and the executable just
               | reloads it right into RAM. Wasn't the best from what I've
               | read over time.
               | 
               | Having an SQL based callstack sounds painfully slow.
        
               | acuozzo wrote:
               | > Having an SQL based callstack sounds painfully slow.
               | 
               | It could be, but this aspect of the idea could also be
               | user-togglable at runtime.
        
           | bbatha wrote:
           | You've just described LINQ which has a sqlish dsl as part of
           | C#:
           | 
           | ```
           | 
           | List<int> numbers = [5, 4, 1, 3, 9, 8, 6, 7, 2, 0];
           | 
           | // The query variables can also be implicitly typed by using
           | var
           | 
           | // Query #1.
           | 
           | IEnumerable<int> filteringQuery =                   from num
           | in numbers              where num is < 3 or > 7
           | select num;
           | 
           | // Query #2. IEnumerable<int> orderingQuery =
           | from num in numbers              where num is < 3 or > 7
           | orderby num ascending              select num;
           | 
           | // Query #3. string[] groupingQuery = ["carrots", "cabbage",
           | "broccoli", "beans", "barley"];
           | 
           | IEnumerable<IGrouping<char, string>> queryFoodGroups =
           | from item in groupingQuery              group item by
           | item[0];
           | 
           | ```
           | 
           | https://learn.microsoft.com/en-us/dotnet/csharp/linq/get-
           | sta...
        
             | acuozzo wrote:
             | I'm not prisenco, but does LINQ have its memory model in
             | SQL?
             | 
             | What tickles me about the idea I posted above is that you
             | could take a snapshot of your entire program state (at
             | sequence points) and it's just an SQLite db file.
        
               | knome wrote:
               | Linq uses a bunch of iterators functions that accept
               | lambdas. C# allows them to capture the lambdas as an
               | expression so it can convert the expression into SQL or
               | whatever.
               | 
               | it also has an SQL-like syntax you can use that is
               | compiled into those iterators.
               | 
               | the interface is generic, with builtin support for
               | ienumerable types and SQL data sources.
        
         | Imustaskforhelp wrote:
         | this is such a crazy good take , I was thinking of creating a
         | golang based module loader / code loader directly from a
         | sqllite file and a tool / cli that can help you with this thing
         | where what I intend to do is lets say my cli name is goter (go
         | + motor IDK)
         | 
         | goter init (creates a sqllite and a main.go etc. but all
         | main.go etc. are just a symlink to something inside sqllite I
         | am not sure , its just a really really hypothetical idea)
         | 
         | goter add dep <some dependency>
         | 
         | (which it also then loads in that sqllite file)
         | 
         | goter run main.go (which can run)
         | 
         | goter share (which can output a sqllite file)
         | 
         | goter load .db file (which could even have whole golang
         | versions inside of it)
         | 
         | goter build docker (creates a docker image , if possible)
         | 
         | goter collab ipc (creates a ipc gateway between two servers
         | using something like piping-web if behind nat or directly as
         | well) (to have really easy multi deployments)
         | 
         | and integration with popular tools like go-blueprint / even the
         | ability to create custom templates like go-blueprint if
         | possible since clack has been migrated to go by some guy.
         | 
         | this doesn't have to be golang specific , it can be language
         | agnostic.
         | 
         | My thoughts about this were creating a dead simple cloud editor
         | / code runner (where the code runs inside bubblewrap) and this
         | has the golang code and the whole codebase of a single user is
         | just a sqllite file , insanely simple to transport , Insanely
         | efficient.
         | 
         | I am not sure if this makes sense , but you have guided me in a
         | direction that feels just as enlightening as the day I
         | discovered cosmopolitan project on hackernews / redbean
         | (offtopic) (cross platform executables say what!)
         | 
         | THANKS A LOT , I AM BOOKMARKING THIS , THIS IS SO SO GOOD OF AN
         | IDEA THAT I MIGHT'VE LOST A BIT OF MY MIND AT 1 AM
         | Comprehending how good of an idea this might be.
        
         | dpe82 wrote:
         | Not crazy at all; just uncommon. It would be an implementation
         | of the basic recommendation from the excellent paper, "Out of
         | the Tar Pit".
         | 
         | https://curtclifton.net/papers/MoseleyMarks06a.pdf
        
           | acuozzo wrote:
           | Thank you for the link! Reading now.
        
       | pixelesque wrote:
       | A very similar list (more aimed at high-performance) here:
       | 
       | https://highperformancesqlite.com/articles/sqlite-recommende...
        
       | jmull wrote:
       | > PRAGMA busy_timeout = 5000;
       | 
       | I would go higher than this by default. Probably the best default
       | is something like "amount of time a human would be willing to
       | wait for something they want before giving up" which, if you have
       | to pick one value, is hopefully longer than 5 sec. Perhaps 30000.
       | 
       | Now, there are also cases where fail-fast is best. However 5000
       | isn't good for those either. And that assumes a wider system that
       | handles failures in quite a nice way, which doesn't happen
       | without special effort, so doesn't make sense to target with a
       | default value.
        
       ___________________________________________________________________
       (page generated 2024-12-06 23:01 UTC)