[HN Gopher] Sensible SQLite Defaults
___________________________________________________________________
Sensible SQLite Defaults
Author : thunderbong
Score : 109 points
Date : 2024-12-06 11:21 UTC (11 hours ago)
(HTM) web link (briandouglas.ie)
(TXT) w3m dump (briandouglas.ie)
| darthShadow wrote:
| FWIW, auto_vacuum set to INCREMENTAL doesn't do any vacuuming
| automatically; it just stores the information needed to vacuum
| the DB.
|
| From https://www.sqlite.org/pragma.html#pragma_auto_vacuum:
|
| > When the value of auto-vacuum is 2 or "incremental" then the
| additional information needed to do auto-vacuuming is stored in
| the database file but auto-vacuuming does not occur automatically
| at each commit as it does with auto_vacuum=full. In incremental
| mode, the separate incremental_vacuum pragma must be invoked to
| cause the auto-vacuum to occur.
| neverartful wrote:
| Unfortunately, there doesn't seem to be a PRAGMA option for
| strict tables. Docs at following link say that 'STRICT' must be
| appended to end of CREATE TABLE statement to disable the flexible
| typing 'feature'.
|
| https://www.sqlite.org/stricttables.html
| giraffe_lady wrote:
| I don't think there can be with sqlite's backwards
| compatibility model. The DB has to assume it may be accessed
| with multiple versions of the library or clients using
| different pragma settings. So one client disabling strict (or
| using a version predating that feature) and inserting a bunch
| of violating records would break the DB file when used by
| another client with strict enabled.
|
| With it part of table creation, which IIRC is read for each
| connection to create the schema, it's part of the DB file not
| the client configuration. If you open a DB using strict with a
| version that doesn't support it, it will fail but not cause
| problems for other users.
| neverartful wrote:
| Yep, that all makes perfect sense!
| alberth wrote:
| I wish there was a branch of SQLite that cleaned up
| everything, set new sane behavior (and broke backward
| compatibility since it's needed to achieve that).
|
| Eg strict mode by default, WAL2, BEGIN CONCURRENT, etc.
|
| Maybe HC-Tree will become that?
|
| https://sqlite.org/hctree/doc/hctree/doc/hctree/index.html
| silvestrov wrote:
| It is time for version 4 of SQLite.
| ratorx wrote:
| I think it would be really useful to have a "Why Not?" section
| for each option as well.
|
| Some of these feel like tradeoffs and setting them blindly
| without understanding the downsides seems incorrect.
| TheRealPomax wrote:
| That's the essence of defaults: they're just better defaults,
| not an excuse to stop caring about what better settings to use
| once you have runtime performance metrics to guide whether they
| need changing and which direction to change them in.
| ratorx wrote:
| Right, but they are not the official defaults. If you are
| changing the official defaults, then you might as well do it
| in a more principled way.
|
| If you are going to need to optimise with performance metrics
| anyway, then why not stick to just the official defaults
| (unless the official defaults are non-functional, is that the
| case?)
|
| I think anyone who is reading a blog post on "better
| defaults" is front loading some of the optimisation, so you
| could let them make a principled choice straight away for
| marginal extra cost.
| nemothekid wrote:
| A lot of these aren't defaults because of backwards
| compatibility. IMO there is no reason to not use WAL mode,
| but it's not default because it came later
| bruce511 wrote:
| As long as all the processes are on the same machine the
| wal mode is a good idea.
|
| It's not good in the case where multiple machines are
| sharing the same database. Like say if you had a shared
| settings file which allowed multiple VMs to be set in one
| place.
|
| Obviously the same-machine situation is the most common.
| But you asked for a reason for when wal is not
| appropriate.
| TheRealPomax wrote:
| There is some principle here: these are more sensible
| defaults (but still _only_ defaults) for a very different
| purpose when compared to the defaults that make sense for
| "generic SQLite use" that has to keep those set to
| something that works across all versions. Having different,
| domain-specific defaults based on an assumption of "you're
| setting up a new project using the current version of
| SQLite" makes a whole lot of sense.
| antisthenes wrote:
| > If you are going to need to optimise with performance
| metrics anyway, then why not stick to just the official
| defaults (unless the official defaults are non-functional,
| is that the case?)
|
| Well, why don't you do the research and tell us?
|
| In my 10-15 years of dealing with official defaults of many
| programs is that they do work, but in 90% of cases they are
| overly-conservative.
| btilly wrote:
| The note of several of them as being defaults for a web app
| kind of sets expectations for me. These are defaults for maybe
| you're writing a web app and want it to be a bit more like
| MySQL or Postgres. They aren't defaults for using SQLite as an
| in-memory cache on a complex piece of analysis.
|
| SQLite is an amazingly widely used piece of software. It's
| impossible for one set of defaults to be perfect for all use
| cases.
| jotaen wrote:
| I agree. While it's probably not possible to settle on defaults
| that work for each and every scenario, my personal preference
| is that factory defaults should tend to optimise for safety
| primarily. (Both operational safety, but also in regards to
| usage.)
|
| For example, OP suggests setting the `synchronous` pragma to
| `NORMAL`. This can be a performance gain, but it also comes at
| the cost of slightly decreased durability. So for that setting,
| I'd feel that `FULL` (the default) makes more sense as factory
| default for a database.
| jccooper wrote:
| https://www.sqlite.org/pragma.html does a pretty good job on
| the tradeoffs. Something like WAL you'd want to look a bit
| deeper on... but need not go too far afield:
| https://www.sqlite.org/wal.html
| chasil wrote:
| I do not agree that WAL mode should be enabled by default. It can
| break things if used incorrectly.
|
| https://www.vldb.org/pvldb/vol15/p3535-gaffney.pdf
|
| "To accelerate searching the WAL, SQLite creates a WAL index in
| shared memory. This improves the performance of read
| transactions, but the use of shared memory requires that all
| readers must be on the same machine [and OS instance]. Thus, WAL
| mode does not work on a network filesystem."
|
| "It is not possible to change the page size after entering WAL
| mode."
|
| "In addition, WAL mode comes with the added complexity of
| checkpoint operations and additional files to store the WAL and
| the WAL index."
|
| https://www.sqlite.org/lang_attach.html
|
| "SQLite does not guarantee ACID consistency with ATTACH DATABASE
| in WAL mode. "Transactions involving multiple attached databases
| are atomic, assuming that the main database is not ":memory:" and
| the journal_mode is not WAL. If the main database is ":memory:"
| or if the journal_mode is WAL, then transactions continue to be
| atomic within each individual database file. But if the host
| computer crashes in the middle of a COMMIT where two or more
| database files are updated, some of those files might get the
| changes where others might not."
| masklinn wrote:
| These all sounds like load-specific / ancillary concerns.
|
| I don't think most people are in one of those cases, and most
| people will _greatly_ benefit from WAL. WAL should absolutely
| be the default unless you have a good reason to not use it.
| chasil wrote:
| You may prefer non-acid characteristics in your databases in
| general.
|
| I do not. There may be specific uses, but standards should be
| followed when possible.
| giraffe_lady wrote:
| "When possible" in this case being simply don't enable the
| WAL if your use requires transactions across multiple
| attached sqlite DBs.
| masklinn wrote:
| > You may prefer non-acid characteristics in your databases
| in general.
|
| No. Quite the opposite. However the multi-database case
| strikes me as a very limited edge case, I would be
| surprised if double digit % users of sqlite even knew it's
| an option let alone used it, and the concern is further
| lessened because afaik things like FKs do not work cross-
| db.
| striking wrote:
| WAL isn't the footgun, SQLite over network attached storage is.
| Additional files are not a problem. If you need cross-DB file
| ACID you can just turn off WAL mode in that instance.
| bruce511 wrote:
| >> WAL isn't the footgun, SQLite over network attached
| storage is
|
| I agree that using SQLite over a network is not a good idea.
| But people do do that, and it works tolerably well. On most
| networks it'll work just fine, especially if writes are
| sparse.
|
| The point of defaults is that they should cover the "widest
| range". By all means read the docs, and choose to use Wal.
| But that's an action you take and understand.
|
| The point of articles like this though is to help alert
| people to the settings that exist. You should always read the
| docs and apply them to your context before using any setting.
| masklinn wrote:
| > Why?: Prevents "database is locked" errors by giving SQLite 5
| seconds to wait for a locked resource before returning an error,
| useful for handling multiple concurrent accesses.
|
| Alongside that one of the most important things to do is to have
| strong read/write segregation: separate readonly and r/w pools,
| the latter having a single connection which is immediately set to
| BEGIN IMMEDIATE when it's borrowed out.
| skrebbel wrote:
| What do you mean by "connection" in the context of SQLite?
| giraffe_lady wrote:
| How is the meaning not clear? If you're trying for a "there's
| no connections in sqlite" type gotcha that's not even true.
| SQLite itself uses the term internally and it shows up in the
| docs.
|
| https://www.sqlite.org/c3ref/open.html
| https://www.sqlite.org/c3ref/sqlite3.html
| elchief wrote:
| why is cache size negative?
| cogman10 wrote:
| According to the docs, it sets the cache_size based on
| kilobytes rather than pages. [1]
|
| [1] https://www.sqlite.org/pragma.html#pragma_cache_size
| acuozzo wrote:
| This will probably get buried, but I'm wondering: Has any
| scripting language attempted to use SQLite as its memory model?
| You get GC, memory safety, atomicity guarantees, core dump
| functionality, and rich data structures for free. If configured
| to run in-memory, I reckon it wouldn't impose enough overhead to
| be problematic.
|
| It sounds crazy, but most scripting languages add this stuff
| anyway.
|
| With rqlite this can be extended to distributed systems
| programming as well.
| aziis98 wrote:
| I also think about this sometimes, maybe not as only memory
| model but a simple scripting language with tight integration
| with sqlite would be pretty cool.
|
| P.S. Actually if think I saw something similar somewhere in the
| past, if it comes to mind I'll send it
| Imustaskforhelp wrote:
| yeah dude send it , I am thinking of writing a scripting
| language some years later and I have absolutely loved this
| idea.
| acuozzo wrote:
| I'm not aziis98, but Smalltalk had something similar which
| was significantly greater in scope.
|
| IIRC, your application shipped with the entire development
| environment.
| prisenco wrote:
| That's interesting. There would be overhead compared to
| directly managing memory. But it would be wild to be able to
| declare an array of structs that you could directly query.
| var people person[5] people[5] // { name: "Human
| Person", age: 39 } people.query("SELECT name FROM a
| WHERE age = 39") // "Human Person"
|
| If everything in memory was a table structure, you could do
| joins across variables too.
|
| Would it have to be a scripted language? Seems like you could
| do it as a compiled language.
|
| I love SQL so if I had the option to use it within in
| application I'd be hard pressed not to. But from what I've
| learned from the popularity of ORMs and NoSQL, most don't share
| that feeling.
| acuozzo wrote:
| > that you could directly query
|
| To take it even further, the language grammar could be a
| superset of the SQL grammar.
|
| > Would it have to be a scripted language?
|
| Not at all. It's just the first thing which came to mind. I'm
| tickled by the idea of having the entire program state (stack
| frames and all!) in SQLite so that you could ship a snapshot
| of your program around.
|
| > There would be overhead compared to directly managing
| memory.
|
| Of course, but if the critical queries were pre-compiled via
| sqlite3_prepare, I doubt the overhead would so greatly exceed
| the memory management overhead in language runtimes which
| ship with a GC that it would be a show-stopper.
| Imustaskforhelp wrote:
| wow the idea of shipping a snapshot is also crazy good ,
| bookmarking your comment as well the parent comment as
| well.
|
| Because snapshotting is a rather crazy thing which have to
| be managed by criu or qemu
|
| this could even be faster than a vm shipped.
|
| MY goal with snapshots can be literally anything (yet , I
| like the idea of extraction snapshots for extremely large
| files very lucrative idea)
| acuozzo wrote:
| Thank you!
|
| Considering this further... I think Lua might be a good
| match for this idea. Everything it can do can be pretty
| easily mapped to SQLite queries and its type system is
| pretty close to the limited set of types offered by
| SQLite.
| knome wrote:
| While not serialized to a database, memory-dump/snapshot
| based distribution was done with smalltalk and various
| lisps. Basically dump the gc and the executable just
| reloads it right into RAM. Wasn't the best from what I've
| read over time.
|
| Having an SQL based callstack sounds painfully slow.
| acuozzo wrote:
| > Having an SQL based callstack sounds painfully slow.
|
| It could be, but this aspect of the idea could also be
| user-togglable at runtime.
| bbatha wrote:
| You've just described LINQ which has a sqlish dsl as part of
| C#:
|
| ```
|
| List<int> numbers = [5, 4, 1, 3, 9, 8, 6, 7, 2, 0];
|
| // The query variables can also be implicitly typed by using
| var
|
| // Query #1.
|
| IEnumerable<int> filteringQuery = from num
| in numbers where num is < 3 or > 7
| select num;
|
| // Query #2. IEnumerable<int> orderingQuery =
| from num in numbers where num is < 3 or > 7
| orderby num ascending select num;
|
| // Query #3. string[] groupingQuery = ["carrots", "cabbage",
| "broccoli", "beans", "barley"];
|
| IEnumerable<IGrouping<char, string>> queryFoodGroups =
| from item in groupingQuery group item by
| item[0];
|
| ```
|
| https://learn.microsoft.com/en-us/dotnet/csharp/linq/get-
| sta...
| acuozzo wrote:
| I'm not prisenco, but does LINQ have its memory model in
| SQL?
|
| What tickles me about the idea I posted above is that you
| could take a snapshot of your entire program state (at
| sequence points) and it's just an SQLite db file.
| knome wrote:
| Linq uses a bunch of iterators functions that accept
| lambdas. C# allows them to capture the lambdas as an
| expression so it can convert the expression into SQL or
| whatever.
|
| it also has an SQL-like syntax you can use that is
| compiled into those iterators.
|
| the interface is generic, with builtin support for
| ienumerable types and SQL data sources.
| Imustaskforhelp wrote:
| this is such a crazy good take , I was thinking of creating a
| golang based module loader / code loader directly from a
| sqllite file and a tool / cli that can help you with this thing
| where what I intend to do is lets say my cli name is goter (go
| + motor IDK)
|
| goter init (creates a sqllite and a main.go etc. but all
| main.go etc. are just a symlink to something inside sqllite I
| am not sure , its just a really really hypothetical idea)
|
| goter add dep <some dependency>
|
| (which it also then loads in that sqllite file)
|
| goter run main.go (which can run)
|
| goter share (which can output a sqllite file)
|
| goter load .db file (which could even have whole golang
| versions inside of it)
|
| goter build docker (creates a docker image , if possible)
|
| goter collab ipc (creates a ipc gateway between two servers
| using something like piping-web if behind nat or directly as
| well) (to have really easy multi deployments)
|
| and integration with popular tools like go-blueprint / even the
| ability to create custom templates like go-blueprint if
| possible since clack has been migrated to go by some guy.
|
| this doesn't have to be golang specific , it can be language
| agnostic.
|
| My thoughts about this were creating a dead simple cloud editor
| / code runner (where the code runs inside bubblewrap) and this
| has the golang code and the whole codebase of a single user is
| just a sqllite file , insanely simple to transport , Insanely
| efficient.
|
| I am not sure if this makes sense , but you have guided me in a
| direction that feels just as enlightening as the day I
| discovered cosmopolitan project on hackernews / redbean
| (offtopic) (cross platform executables say what!)
|
| THANKS A LOT , I AM BOOKMARKING THIS , THIS IS SO SO GOOD OF AN
| IDEA THAT I MIGHT'VE LOST A BIT OF MY MIND AT 1 AM
| Comprehending how good of an idea this might be.
| dpe82 wrote:
| Not crazy at all; just uncommon. It would be an implementation
| of the basic recommendation from the excellent paper, "Out of
| the Tar Pit".
|
| https://curtclifton.net/papers/MoseleyMarks06a.pdf
| acuozzo wrote:
| Thank you for the link! Reading now.
| pixelesque wrote:
| A very similar list (more aimed at high-performance) here:
|
| https://highperformancesqlite.com/articles/sqlite-recommende...
| jmull wrote:
| > PRAGMA busy_timeout = 5000;
|
| I would go higher than this by default. Probably the best default
| is something like "amount of time a human would be willing to
| wait for something they want before giving up" which, if you have
| to pick one value, is hopefully longer than 5 sec. Perhaps 30000.
|
| Now, there are also cases where fail-fast is best. However 5000
| isn't good for those either. And that assumes a wider system that
| handles failures in quite a nice way, which doesn't happen
| without special effort, so doesn't make sense to target with a
| default value.
___________________________________________________________________
(page generated 2024-12-06 23:01 UTC)