hngopher.com

       [HN Gopher] Consider SQLite
       ___________________________________________________________________
        
       Consider SQLite
        
       Author : wesleyac
       Score  : 241 points
       Date   : 2021-12-29 18:11 UTC (4 hours ago)
        
 (HTM) web link (blog.wesleyac.com)
 (TXT) w3m dump (blog.wesleyac.com)
        
       | brunoluiz wrote:
       | I love to see that more projects are using SQLite as their main
       | database.
       | 
       | One thing that I always wondered though: does anyone knows a big
       | project/service that uses Golang and is backed by SQLite? This
       | because SQLite would require CGO and CGO generally adds extra
       | complexities and performance costs. I wonder how big Golang
       | applications fare with this.
        
         | nn3 wrote:
         | As a author of a library that got rewritten by another team in
         | go with a similar rationale, I must say that the Go ecosystem's
         | unwillingness (or inability) to interface seamlessly with other
         | languages seems like its greatest drawback.
        
         | koeng wrote:
         | You can use pure Golang SQLite (without requiring CGO) -
         | 
         | https://pkg.go.dev/modernc.org/sqlite
         | 
         | It works well, but the performance is worse than C version. Not
         | a big deal for what I used it for, though. It was approx. 6x
         | worse at inserts.
        
         | anderspitman wrote:
         | arp242 has an excellent post[0] about statically compiling
         | sqlite for Go programs, which may be useful.
         | 
         | Isn't there some issue where SQLite basically has to be single-
         | threaded in Golang programs, at least if you use the stdlib SQL
         | library?
         | 
         | [0]: https://www.arp242.net/static-go.html
        
         | usrbinbash wrote:
         | Using https://github.com/mattn/go-sqlite3 for a pet project
         | right now. So far not a single issue with it, I never had to
         | think about any CGO specifics.
        
         | mholt wrote:
         | Not a "big project/service" but a Go project that uses Sqlite
         | is one of my own, Timeliner[1] and its successor, Timelinize[2]
         | (still in development). Yeah the cgo dependency kinda sucks but
         | you don't feel it in code, just compilation. And it easily
         | manages Timeline databases of a million and more entries just
         | fine.
         | 
         | [1]: https://github.com/mholt/timeliner
         | 
         | [2]: https://twitter.com/timelinize
        
         | hoaljasio wrote:
         | https://github.com/gravitational/teleport/ has the option to
         | use it, but it only uses it as a key value store.
         | 
         | CGO isnt too big a problem and if it is a real dealbreaker
         | something like https://pkg.go.dev/modernc.org/sqlite will work
         | as it transpiled the c into go and passes the sqlite test
         | suite. I think there is performance degradation with writes but
         | reads are still pretty quick.
        
       | reneberlin wrote:
       | Wow. So much tech - sqlite this time, and so many opinions to
       | discuss with words / characters. Maybe i could create my own
       | alphabet to have some peace of mind in the end? I don't think so
       | - somebody would make a story of it - wait ... happened!
       | 
       | Read the creator of sqlite for more information on all running
       | topics that make you go 'Uh' up to this point of time:
       | https://corecursive.com/066-sqlite-with-richard-hipp/
        
       | anderspitman wrote:
       | Am I the only one who thinks SQLite is still too complicated for
       | many programs? Maybe it's just the particular type of software I
       | normally work on, which tends towards small, self-hosted
       | networking services[0] that would often have a single user, or
       | maybe federated with <100 users. These programs need a small
       | amount of state for things like tokens, users accounts, and maybe
       | a bit of domain-specific things. This can all live in memory, but
       | needs to be persisted to disk on writes. I've reached for SQLite
       | several times, and always come back to just keeping a struct of
       | hashmaps[1] in memory and dumping JSON to disk. It's worked great
       | for my needs.
       | 
       | Now obviously if I wanted to scale up, at some point you would
       | have too many users to fit in memory. But do programs at that
       | scale actually need to exist? Why can't everyone be on a
       | federated server with state that fits in memory/JSON? I guess
       | that's more of a philosophical question about big tech. But I
       | think it's interesting that most of our tech stack choices are
       | driven by projects designed to work at a scale most of us will
       | never need, and maybe nobody needs.
       | 
       | As an aside, is there something like SQLite but closer to my use
       | cases? So I guess like the nosql version of SQLite.
       | 
       | [0]: https://boringproxy.io/
       | 
       | [1]:
       | https://github.com/boringproxy/boringproxy/blob/master/datab...
        
         | charrondev wrote:
         | For Vanilla Forums we have a file based configuration interface
         | with a bunch of utility methods for writing/reading too it.
         | This is the default and what you really want in a single
         | environment either hosted or in development.
         | 
         | In our cloud infrastructure though we have 10k+ distributed
         | sites so the interface reads/writes to an API backed by a
         | database.
         | 
         | A major benefit to the confit file is that you can just open it
         | in a text editor and muck around during development.
        
         | aduitsis wrote:
         | Would Redis be a valid solution for your case?
        
           | twicetwice wrote:
           | This adds another service you now need to deploy and
           | maintain. For small personal projects, I definitely would
           | like to avoid that if I can!
        
             | anderspitman wrote:
             | I agree.
        
         | root_axis wrote:
         | If you care about data normalization and data integrity then
         | SQLite is going to be a much better choice.
        
           | anderspitman wrote:
           | At the scale I described in my comment, I do not care about
           | normalization. Can you give an example of where I'm likely to
           | lose data integrity?
        
         | klysm wrote:
         | SQLite hides a ton of complexity that lives in the filesystem.
         | It's incredibly hard to do robust IO correctly with the APIs we
         | have.
         | 
         | I almost always choose SQLite for persisting to disk over JSON
         | files. It essentially removes a large class of bugs and is
         | robust enough that I'm not worried about introducing new
         | problems.
        
           | newlisp wrote:
           | _SQLite hides a ton of complexity that lives in the
           | filesystem_
           | 
           | Since they are using Go, couldn't you say the same thing
           | about the Golang std library? As long as they know how to use
           | a local file as database (do the swap, flush, etc...) I don't
           | see the problem.
        
             | gnabgib wrote:
             | What makes you think SQLite uses go? (It's written in C[0])
             | 
             | [0]: https://www.sqlite.org/index.html
        
               | newlisp wrote:
               | Was saying that Go's std lib also "hides a ton of
               | complexity that lives in the filesystem" for you.
        
             | klysm wrote:
             | No because it doesn't give you IO abstractions as robust as
             | SQLite.
        
           | anderspitman wrote:
           | Do you have an example of a problem I'm likely to encounter
           | at the scale I described in my comment?
        
           | bob1029 wrote:
           | In addition to hiding a ton of complexity, SQLite is actually
           | faster than the filesystem in a surprising range of cases.
        
             | anderspitman wrote:
             | I never questioned whether it was faster or had more
             | features. My whole point is maybe a lot of programs don't
             | need what it offers.
        
         | tyingq wrote:
         | BerkeleyDB maybe? DBM files used to be pretty commonly used on
         | Unix boxes. Sendmail, for example, uses DBM files.
         | 
         | Or LMDB, LevelDB, RocksDB, etc, if performance is important.
        
         | dkjaudyeqooe wrote:
         | Your solution works to a point, but it will not be as robust as
         | SQLite. Its ACID powers are incredibly valuable in the real
         | world where things fail or are unreliable. Also the ability to
         | do complicated queries comes in handy sooner than you would
         | think.
         | 
         | Of course you have to then mess around with SQL but you only
         | have to write it once and encapsulate it somewhere and you're
         | done.
        
           | anderspitman wrote:
           | I never questioned whether it was faster or had more
           | features. My whole point is maybe a lot of programs don't
           | need what it offers.
        
         | interpenetrate wrote:
         | Look up "Prevayler" and the "system prevalence" pattern. It
         | introduces occasional data journaling, and so it exists
         | somewhere in between "serialize and persist the whole thing at
         | once" and a full blown DBMS. But I've experimented with system
         | prevalence in the past and decided that just "serializing and
         | persisting the whole thing at once" like you've mentioned was
         | simply a better solution.
        
         | mftb wrote:
         | > This can all live in memory, but needs to be persisted to
         | disk on writes. I've reached for SQLite several times, and
         | always come back to just keeping a struct of hashmaps[1] in
         | memory and dumping JSON to disk. It's worked great for my
         | needs.
         | 
         | I do a very similar thing for many of my at-home projects. They
         | generally have a single-user, me, and it works great.
        
         | anyfoo wrote:
         | Most relational databases offer ACID, for one thing.
         | 
         | And also a query language, which is easy to write and modify
         | queries for, presenting a great benefit over writing large
         | amounts of boilerplate code for looping over records and
         | accessing the right indices every time (keep abstracting that
         | and you'll end up with your own relational database system).
         | 
         | I often import data into SQLite just to work with it, without
         | necessarily even an application in between. Depending on the
         | nature of the data it's either that or Matlab.
        
         | twicetwice wrote:
         | For a recent project I picked a similar approach, but a
         | directory full of JSON files, where the filename is the key and
         | the value is the contents of the file. This gets me two
         | improvements over the single massive object persisted to disk
         | in a single file: ACIDish for free from the filesystem, and a
         | large reduction in bytes (potentially) written to disk since
         | you don't have to rewrite a large file every time a single key
         | changes, and this service potentially has frequent writes to a
         | subset of keys.
         | 
         | This latter point matters a lot to me since the disk in
         | question is a Raspberry Pi's SD card, which I've heard have a
         | tendency to wear out quickly, so I wanted to avoid writing to
         | it unnecessarily. And yeah, this system is fragile, not
         | actually ACID, etc etc... but I'm the only one who relies on it
         | (it's not even accessible on the public internet, only over my
         | Wireguard VPN), so it doesn't matter if it breaks!
         | 
         | I will probably rewrite this service later, backed by either
         | SQLite or Postgres, but writing it this way saved me a fair bit
         | of thinking and lines of code, which was great for getting it
         | off the ground and usable to me in like two hours.
        
       | bonyt wrote:
       | I've always thought it interesting that there was a time when
       | large(ish) websites were hosted using servers that would struggle
       | to outperform a modern smart toaster or wristwatch, and yet
       | modern web applications tend to demand a dramatic distributed
       | architecture. I like the examples in this article showing what a
       | single modern server can do when you're not scaling to Google's
       | level.
       | 
       | As an aside, what about distributed derivatives of sqlite, like
       | rqlite, as a response to the criticism that sqlite requires your
       | database server to also be your web server. Could something like
       | rqlite also provide a way for an sqlite database to grow into a
       | distributed cluster at a later point?
       | 
       | https://github.com/rqlite/rqlite
        
         | usrbinbash wrote:
         | > and yet modern web applications tend to demand a dramatic
         | distributed architecture
         | 
         | The dirty secret is: Most of them don't really need that
         | architecture. Most, and I mean +95% of websites would run just
         | fine on a single box, running the websrv, db and whatever
         | script-interpreter the backend runs on.
         | 
         | Sure, it may be a bit slower if its not hooked up to a global
         | CDN, and if the box goes down the user gets 404. But its also
         | alot easier for an admin to simply give the one box a good
         | wallop and put it back online, since it so simple.
         | 
         | https://idlewords.com/talks/website_obesity.htm#heavyclouds
        
         | voidfunc wrote:
         | It's mostly because there is a demand for HA which requires
         | multiple replicas. The moment you start going down that route
         | you increase complexity.
         | 
         | Whether most things actually require HA is debatable, but a lot
         | of businesses make it a core requirement and so it gets baked
         | into the architecture. Personally I feel like most stuff would
         | be better suited to having fast fail-over and recovery early
         | on, but my advice rarely gets taken. Instead you end up with
         | complicated HA architectures that nobody totally understands,
         | which then (inevitable still) fall over and take hours to
         | recover.
        
           | lenkite wrote:
           | HA is generally needed because users of a SaaS can crash the
           | instance especially in products that offer customisation
           | abilities.
        
             | voidfunc wrote:
             | Sure, but that becomes a requirement at that particular
             | design junction. A lot of stuff is built with HA that isn't
             | even close to that complex.
        
         | freedomben wrote:
         | I don't disagree with you, a single server can go a really,
         | really long way in scale before you run into problems. I know
         | because I've done it a few times.
         | 
         | The problem to me isn't ability to scale on one server, it's
         | the single point of failure. My biggest site is a wordpress box
         | with one instance on a pretty big VPS. In the last year I've
         | had several outages big enough to require a post-mortem (not
         | complete outages, but periods with high levels of error
         | rates/failures), and every time it has been because of internal
         | data center networking issues at my preferred cloud provider
         | (and thankfully their customer service is amazing and they will
         | tell me honestly what the problem was instead of leaving me to
         | wonder and guess). So the main incentive for me to achieve
         | horizontal scalability in that app is not scaling, it's high
         | availability so I can survive temporary outages because of
         | hardware or networking, and other stuff outside of my control.
        
         | ljm wrote:
         | Not even a decade ago we were hosting our web-facing services
         | and our postgres DB on the same server. A lot of smaller
         | projects had their own db and credentials on the same instance.
         | The idea of having to spin up independent DB servers for each
         | one seemed wasteful to us, since back then each box would cost
         | at least PS10/mo or more on Linode or Rackspace and it meant
         | more secrets to keep, more stuff to secure, more stuff to
         | maintain.
         | 
         | It was only with the advent of heroku and the sudden shift to
         | the cloud (AWS) that the DB stopped being software you ran on
         | your box as a daemon, and became essentially a dedicated
         | resource. Even for services running at a completely trivial
         | scale.
        
         | WesolyKubeczek wrote:
         | Consider, though, that in the days past (when your server would
         | be probably on an equivalent of dual-socket Pentium 166-MMX),
         | most clients would be coming from slow links like 33.4-56.1kbps
         | dialup, and it wouldn't be a problem to serve them at all.
         | Links were slow, users were patient, timeouts were high,
         | webpages were sort of slim. Although if you ask me, they always
         | have been heavy, just within the constraints of their time.
         | 
         | Then, of course, there was ISDN and xDSL, which would give you
         | true to god whopping 128 kbits/s for a while. 64 kpbs if you
         | were cheap. It took a while to get to affordable multiples of
         | Mbits per second.
         | 
         | Now that there's at least 10 Mbps uplink from each residential
         | subscriber, doesn't take long to DoS even a beefy server.
         | 
         | And I'd say that server-side, things improved _vastly_ with
         | advent of FastCGI and its equivalents. Back in that heyday of
         | your P166-MMX server, it was CGI with Perl, spawning a process
         | for each incoming request, or  "blazing-fast" Apache's server-
         | side includes, or other things like that. Maybe mod_perl with
         | its caveats on memory sharing.
         | 
         | Anyway, you're right in that whenever you show them a wider
         | pipe, they will find more stuff to congest it with.
        
         | hinkley wrote:
         | Edge networking should work for SaaS, especially if your
         | tenants tend to be geographically clustered. Stand up stateless
         | front end web servers in half a dozen regions, and have them
         | talk back to a more central region, with caches and circuit
         | breakers to control the amount of cross-region traffic. That
         | geographical separation will also make diurnal traffic patterns
         | more obvious because you need more servers later in the day in
         | PST than in EST.
         | 
         | You don't have to distribute everything to scale pretty high.
        
       | [deleted]
        
       | smitty1e wrote:
       | I've been doing some ETL exploration with opening a sqlite
       | :memory: connection, ingesting small-to-medium data, and then
       | doing
       | 
       | "VACUUM INTO somefile.sqlite;"
       | 
       | to dump the RAM copy to disk.
       | 
       | What a great tool.
        
       | gjvc wrote:
       | sqlite as the data store for a system package manager is possibly
       | the most practical single-host example
        
       | reneberlin wrote:
       | Before you start throwing your opinions into HN about sqlite,
       | please read: https://corecursive.com/066-sqlite-with-richard-
       | hipp/?utm_so...
        
       | bobobob420 wrote:
       | i have so much love for SQLite. Consider me, getting my first
       | internship at a startup. They have a bunch of contractors doing
       | work for them as part of their service. The whitelabel
       | application they got developed would export data in CSV. My job
       | was to take that data and get some meaning from it. data included
       | availability, locations, etc (imagine data about delivery
       | drivers). I had no idea what to do but realized I could
       | definitely parse this CSV through Python.
       | 
       | Once I had this data in Python I needed a way to analyze it. I
       | never worked with Databases but decided to install a local copy
       | of SQLite. The rest is history. I feel like I learned how to use
       | databases in an organic way: by looking for a solution from raw
       | data. A couple of queries later and python was exporting excel
       | sheets with color coded boxes that indicated something based on
       | the analysis I did.
       | 
       | Of course this could be done with any database application but
       | the low weight nature of sqlite allowed me to prototype a
       | solution so easily. We just backed up that native sqlite dump
       | with the cloud and had an easy (super easy) solution to analyze
       | raw data.
        
       | iffycan wrote:
       | I've had great success using SQLite as both a desktop application
       | file format and web server database. I'll mention just one thing
       | I like about it in the desktop application realm: undo/redo is
       | implemented entirely within SQLite using in-memory tables and
       | triggers following this as a starting point:
       | https://www.sqlite.org/undoredo.html
       | 
       | It's not perfect, but it fills the niche nicely.
        
       | hankchinaski wrote:
       | I have tried adopting SQLite in my side projects. The problem I
       | encountered is that using managed PostgreSQL/MySQL is still more
       | convenient and more reliable than using SQLite on a bare metal
       | VPS. I like to use Heroku or Digital Ocean App platform because I
       | want to spend time creating and not managing the infrastructure
       | (ci/cd, ssl certs, reverse proxy, db backup, scaling, container
       | management and what not). I tried looking for a managed SQLite
       | but could not find one. On an unrelated note I found using Redis
       | a good lightweight alternative to the classical psql/MySQL.
       | Although still multi-tier and more difficult to model data, it's
       | initially cheaper and easier to manage than its relational
       | counterparts. Anyone has had similar setup/preference?
        
         | [deleted]
        
         | cheeselip420 wrote:
         | What would a managed sqlite even look like? I can't tell if
         | this is a real response or not...
        
           | Something1234 wrote:
           | Maybe an NFS mount or something that handles back ups
           | automatically? Scripting to handle an automatic restore of
           | the database?
           | 
           | Maybe a heroku that knows about your database file and
           | automatically loads the latest version for you?
           | 
           | I kind of feel like GP is a troll comment, as there's no real
           | value add for a managed SQLite.
        
             | config_yml wrote:
             | Or just use litestream, it's perfect for this and the
             | closest you can get to managed by replicating to S3 or
             | Cloud Storage or the likes.
        
           | DizzyDoo wrote:
           | When hankchinaski says 'managed' I think they really mean
           | that there's some capital-A App dashboard somewhere, on
           | Digital Ocean or wherever, and they log in and click 'new
           | database' and that's it. No ssh-ing to a VPS, and choosing
           | the file location where the sqlite file will sit, figuring
           | out backups and so on. But as you say, while you can wrap
           | postgres or redis in that sort of 'just take care of it for
           | me' approach, given the simplicity of sqlite it doesn't fit
           | that paradigm, so perhaps hankchinaski is just
           | misunderstanding what sqlite fundamentally is and how it
           | works.
        
             | hankchinaski wrote:
             | Exactly. I was expressing the limitation of that single
             | tier paradimg viewed from an angle of someone who doesn't
             | want to ssh into machines to configure and deploy code...
             | :)
        
           | hankchinaski wrote:
           | I clearly did not express myself correctly. Being all these
           | managed solutions (heroku, do app platform) running on an
           | ephemeral fs it is not possible to use disk as long term
           | storage as sqlite requires. So I'm basically forced to manage
           | an attached disk to my container which needs to be backed up,
           | monitored and so on
        
       | aynyc wrote:
       | I used it for ETL process extensive which is great. I still don't
       | know how people use it for concurrent writes like a simple ToDo
       | webapp?
        
         | aidenn0 wrote:
         | With WAL, writes for something like a ToDo app finish in a
         | small fraction of a millisecond so unless your todo webapp is
         | writing to the DB at a rate exceeding 20k writes per second,
         | the fact that writes are not concurrent becomes largely
         | irrelevant.
        
           | aynyc wrote:
           | This can't be right. As far as I can tell, WAL allows
           | concurrent READS and WRITE, not concurrent WRITES. Am I doing
           | this wrong all these years?
        
             | aidenn0 wrote:
             | That's correct, it doesn't allow concurrent writes, but if
             | the writes finish fast enough, that's somewhat academic.
        
       | newlisp wrote:
       | _Postgres is 9.5x slower when running on the same machine as the
       | one doing the query_
       | 
       | I'm surprised by this, sure in-process is always going to be
       | faster but still find it hard to believe that sqlite can be beat
       | postgres in a single machine.
        
       | samatman wrote:
       | This excellent article doesn't even mention rqlite, which will
       | synchronize an arbitrary number of SQLite instances using the
       | Raft protocol.
       | 
       | There must be some scaling limits to encounter using this
       | combination, but wouldn't you love to have that problem?
        
       | zaptheimpaler wrote:
       | I don't doubt the power of SQLite, but its difficult to see why
       | its worth using over Postgres anyways.
       | 
       | This is what it takes to run a basic postgres database on my own
       | PC (in a docker compose file):                 postgres:
       | image: postgres:12.7          container_name: postgres
       | environment:            - PGDATA=/var/lib/postgresql/data/pgdata
       | - POSTGRES_PASSWORD=<pw>          volumes:            -
       | ./volumes/postgres/:/var/lib/postgresql/data/
       | 
       | For someone who's completely allergic to SSH and linux, a managed
       | Postgres service will take care of all that too.
       | 
       | SQLite seems simple in that its "just a file". But its not. You
       | can't pretend a backup is just copying the file while a DB is
       | operating and expect it to be consistent. You can't put the file
       | on NFS and have multiple writers and expect it to work. You can't
       | use complex datatypes or have the database catch simple type
       | errors for you. Its "simple" in precisely the wrong way - it
       | looks simple, but actually using it well is not simple. It
       | doesn't truly reduce operational burden, it only hides it until
       | you find that it matters.
       | 
       | Similarly postgres is not automatically complex simply because it
       | _can_ scale. It really is a good technology that can be simple at
       | small scale yet complex if you need it.
        
         | lenkite wrote:
         | It's unfortunate you are getting downvoted. Nobody in this
         | thread has given clear solutions for addressing issues like
         | fault-tolerance, failover, backup and all the managed goodness
         | you can get from choosing Postgres.
        
         | gravypod wrote:
         | I'm not really sure why this post has downvotes. docker-compose
         | dramatically lowers the barrier for setting up a single machine
         | with multiple services (your service, db, etc). For a similar
         | experience you do the same with AWS RDS or equivalent.
         | Performance will be better and worse in various situations but
         | if your software still fits in one machine you're largely going
         | to be "ok." Backups, restore, monitoring, etc are all important
         | for running software and that's something an sqlite file
         | doesn't really offer the best solutions for. It works great for
         | some things (I've used it many many times) but it's not perfect
         | for everything.
        
           | zaptheimpaler wrote:
           | I call this trend "tech hipster"-ism. Part of the motivation
           | is just to do something different just for the sake of being
           | different. Maybe part of it is a perception that Postgres or
           | Linux are oh-so-scary and difficult things or that using the
           | same technology that Amazon uses makes you evil.
           | -\\_(tsu)_/-.
        
       | srcreigh wrote:
       | More info about WAL mode concurrency [0]
       | 
       | No reader-writer lock. Still only 1 concurrent writer, but write
       | via append to WAL file is cheaper. Can adjust read vs write
       | performance by syncing WAL file more or less often. Can also
       | increase performance with lower durability by not syncing WAL
       | file to disk as often
       | 
       | https://www.sqlite.org/wal.html
        
       | reneberlin wrote:
       | meh https://corecursive.com/066-sqlite-with-richard-
       | hipp/?utm_so...
        
       | errantmind wrote:
       | Also, consider just using the filesystem with binary encoding.
       | Serialize / deserialize directly from / to data structures. It is
       | faster and simpler than any database you'll ever use, assuming
       | you don't need the functionality a database provides.
        
       | chasil wrote:
       | There are some important things that SQLite does not do.
       | 
       | It is not client/server; a process must be able to fopen() the
       | database file. NFS and SMB are options that can convey access to
       | remote systems, but performance will not likely be good.
       | 
       | Only a single process can write to the database at any time; it
       | does not support concurrent writers.
       | 
       | The backup tools do not support point-in-time recovery to a
       | specific past time.
       | 
       | If your application can live with these limitations, then it does
       | have some wonderful features.
        
         | mushufasa wrote:
         | Note -- the single process write at any one time is a killer
         | for most web apps, where for example within SaaS you have many
         | users doing things at the same time.
        
           | js4ever wrote:
           | It's not really an issue if you have 1 db per customer
        
             | rguillebert wrote:
             | If a customer has 1000 employees all using your app, it is.
        
               | chasil wrote:
               | If the app is designed correctly, then the thousand
               | employees would write to their own temporary databases,
               | and a background job would pull their changes into the
               | main database sequentially.
               | 
               | If the app is not specifically designed to do this, then
               | SQLite would not be an option.
        
               | rguillebert wrote:
               | What's the benefit over just using the main database?
        
               | pc86 wrote:
               | Serious question, is this just a "how do I get SQLite to
               | work in this scenario?" thing, or is there actually some
               | other benefit to having this sort of data architecture?
        
               | chasil wrote:
               | This can actually relate to SMTP servers using mbox or
               | maildir formats. Maildir is more resistant to corruption,
               | and doesn't rely (as much) on file locks.
               | 
               | https://serverfault.com/questions/564456/what-are-the-
               | practi...
        
           | blacksmith_tb wrote:
           | Sure, though that's if they're all doing CRUD at the same
           | time. I maintain our company blog on a self-hosted install of
           | Ghost backed by sqlite, and it's a been great (since the only
           | inserting or updating is likely to be one person editing a
           | post, and the frontend is mostly cached).
        
           | benbjohnson wrote:
           | Most writes only take tens to hundreds of microseconds and
           | you can perform those writes from different processes
           | serially. As long as you set WAL mode & a busy timeout then
           | you should be fine:
           | 
           | PRAGMA journal_mode = WAL;
           | 
           | PRAGMA busy_timeout = 30000;
        
           | pc86 wrote:
           | Does an event sourcing architecture help with this? I'm
           | thinking a bunch of client piping events into a queue with a
           | single writer working through that queue. Might be stretching
           | the _eventual_ part of eventual consistency if the latency
           | gets too bad, but I don 't think SQLite is top of mind for
           | most SaaS solutions anyway so more of a thought exercise.
        
           | dahart wrote:
           | Can you elaborate? I'm imagining that the DB web server
           | process would have no problem opening a thread per user and
           | multiple threads can write to a SQLite store. Or, optionally
           | you could buffer multi-threaded requests into a single queue
           | for DB I/O. I'm not seeing why the single process constraint
           | is a major impediment, or maybe I guess I'm not sure I
           | understand why multiple separate processes might be required.
           | Am I misunderstanding your comment?
        
         | boring_twenties wrote:
         | None of those matter to me, but I'm being forced to switch from
         | SQLite simply because of its lack of uint64 support.
        
         | lrem wrote:
         | I don't know what backup tools you have in mind... But since a
         | SQLite database is a single file (modulo write ahead journal
         | and whatnot), making whatever you need is trivial.
        
           | cperciva wrote:
           | Be careful! If you copy a database file while it is being
           | modified, you may end up with an inconsistent view of the
           | database depending on the order in which different parts of
           | the file are modified.
           | 
           | There are ways around this (filesystem snapshots and
           | potentially file locking) but it's not _trivial_ and failing
           | to handle this correctly is a very good way to build a system
           | which passes tests under low load (when the file-copying
           | operation will win all the relevant races) but fail
           | catastrophically under heavy load.
        
             | chasil wrote:
             | The sqlite3 utility can be used to create a transaction-
             | consistent backup of a live database. This interface is the
             | only way that a backup should be taken, either from the
             | utility or the C API.                   .backup ?DB? FILE
             | Backup DB (default "main") to FILE
             | 
             | https://sqlite.org/cli.html
        
           | chasil wrote:
           | In Oracle, I can do this:                   RECOVER DATABASE
           | UNTIL TIME '2021-10-01 02:00:00' USING BACKUP CONTROLFILE;
           | 
           | SQLite does not implement such a feature.
        
             | srcreigh wrote:
             | There is a diff tool [0], and an online backup tool [1].
             | That kinda thing should be pretty easy to cobble together.
             | You'll just need ~3x disk space to store 2 backups at an
             | interval, diff them and discard the older backup, then
             | apply backward diffs on your most recent backup to achieve
             | a specific backup time.
             | 
             | EDIT: or maybe the Session Extension [2]
             | 
             | [0]: https://www.sqlite.org/sqldiff.html
             | 
             | [1]: https://www.sqlite.org/backup.html
             | 
             | [2]: https://www.sqlite.org/sessionintro.html
        
           | DasIch wrote:
           | The state of the art is that you click a few buttons in the
           | UI of the cloud provider of your choice. That takes a few
           | seconds.
           | 
           | What you are proposing may be trivial but in comparison it
           | might just as well be rocket science. Doing more work to use
           | a simpler database that is harder to setup and has less
           | features is not exactly compelling.
        
           | masklinn wrote:
           | > modulo write ahead journal and whatnot
           | 
           | That's like saying immortality is trivial modulo death and
           | whatnot. If you don't integrate with sqlite's locking system,
           | you can easily "backup" a file mid-write, corrupted and
           | unusable.
           | 
           | That's why _sqlite has a built-in backup command_.
        
             | lrem wrote:
             | I guess my secret is having almost no writes (the couple
             | things I deployed with SQLite would see a dozen writes per
             | month).
        
         | eternalban wrote:
         | > Only a single process can write to the database at any time;
         | it does not support concurrent writers.
         | 
         | Potential plus in cloud/container deployments.
         | 
         | > It is not client/server; a process must be able to fopen()
         | the database file. NFS and SMB are options that can convey
         | access to remote systems, but performance will not likely be
         | good.
         | 
         | There are other possibilities given a bit of additional
         | wrappers (over the db process) and federating nodes with some
         | (consensus) protocol. It may actually be a good fit as the
         | 'node-unit' for very large scale distributed metadata store
         | that would benefit from relational DB semantics instead of
         | basic K/V.
        
         | umvi wrote:
         | > Only a single process can write to the database at any time;
         | it does not support concurrent writers.
         | 
         | I think you can do concurrent writing now with Write-Ahead
         | Logging (WAL): https://www.sqlite.org/wal.html
         | 
         | I've never tried it though, so I don't know how suitable it is
         | for web apps that might potentially have multiple processes
         | trying to write to the DB at the same time.
        
           | teraflop wrote:
           | Nope. In the default rollback-journal mode, SQLite supports
           | _either_ multiple readers _or_ a single writer at a time. WAL
           | mode improves this to multiple readers _and_ at most one
           | writer.
           | 
           | But transactions in SQLite are often fast enough that this
           | could still be acceptable for a lot of purposes, especially
           | on an SSD.
        
           | srcreigh wrote:
           | There is no concurrent writing with WAL mode
        
         | teraflop wrote:
         | > NFS and SMB are options that can convey access to remote
         | systems, but performance will not likely be good.
         | 
         | And the safety of your data depends on the quality of your
         | network filesystem's locking implementation. It's not too
         | difficult to design a locking method that works most of the
         | time, but it's a lot harder to build something that guarantees
         | mutual exclusion over an imperfect network.
         | 
         | On a single machine, file locking is generally reliable
         | (mediated by the kernel). Multiple clients can access the same
         | database simultaneously, with readers and writers taking turns,
         | and all that happens is you pay the performance penalty of
         | serialized access.
         | 
         | But if your client _thinks_ it has a lock on a remotely-stored
         | database file, but the server-side lock has actually expired
         | and another client is writing to the database, your data is
         | likely to get hosed.
         | 
         | https://www.sqlite.org/howtocorrupt.html
        
           | chasil wrote:
           | Databases on remote filesystems should be limited to
           | SQLITE3_OPEN_READONLY access, agreed.
        
             | ridiculous_fish wrote:
             | I'm experimenting with using SQLite to store users' history
             | in fish shell, but the remote filesystems problem seems
             | likely to be a showstopper. What can be done about it?
        
               | chasil wrote:
               | I'm just reading this, and learning a few new things:
               | 
               | https://www.sqlite.org/howtocorrupt.html
               | 
               | SQLite has an alternate lock mode with dotfiles that
               | seems to prevent database corruption over NFS. It is
               | important that all SQLite accesses from all connected
               | processes use the same lock mode.
               | 
               | "2.3. Two processes using different locking protocols
               | 
               | "The default locking mechanism used by SQLite on unix
               | platforms is POSIX advisory locking, but there are other
               | options. By selecting an alternative sqlite3_vfs using
               | the sqlite3_open_v2() interface, an application can make
               | use of other locking protocols that might be more
               | appropriate to certain filesystems. For example, dot-file
               | locking might be select for use in an application that
               | has to run on an NFS filesystem that does not support
               | POSIX advisory locking.
               | 
               | "It is important that all connections to the same
               | database file use the same locking protocol. If one
               | application is using POSIX advisory locks and another
               | application is using dot-file locking, then the two
               | applications will not see each other's locks and will not
               | be able to coordinate database access, possibly leading
               | to database corruption."
        
               | ridiculous_fish wrote:
               | The problems with dotfile locking are:
               | 
               | 1. A crash risks leaving the lock file in place; it must
               | be manually removed or else you hang forever.
               | 
               | 2. The same home directory may be local and remote on
               | different machines, meaning different locking protocols.
               | 
               | I am considering a mode where the database is loaded
               | fully into memory, and writes are atomically saved via
               | the classic adjacent-file-renamed mechanism. You risk
               | losing data due to races, but it can't corrupt the
               | database.
        
         | srcreigh wrote:
         | In WAL mode, a write can mean writing some data in memory on
         | the local machine. With Postgres/MySQL/etc, it has to go over
         | the network.
         | 
         | I can't parse [0] right now about the overhead of TCP and round
         | trips with the database server, but it's basically a question
         | of whether one SQLite write is 10x, 100x, 1000x, or more faster
         | than a database server write. That should make a lot of
         | difference.
         | 
         | [0] https://gist.github.com/jboner/2841832
        
           | chousuke wrote:
           | If you have PostgreSQL or MySQL on the same machine as your
           | application, you can use UNIX sockets; they won't have much
           | latency at all (I think Linux also optimizes TCP on localhost
           | to skip all the pointless protocol overhead).
           | 
           | In terms of latency it'll still be difficult to beat a
           | database that lives in the same process as your application,
           | but it won't be as bad as going over the network might be.
        
       | reneberlin wrote:
       | This "little thingie" is so powerful, that words are missing to
       | tell the story.
        
       | pgwhalen wrote:
       | SQLite is great, but it's not a more simple drop in replacement
       | for DB servers like HN often suggests it is.
       | 
       | My team at work has adopted it and generally likes it, but the
       | biggest hurdle we've found is that it's not easy to inspect or
       | fix data in production the way we would with postgres.
        
         | tyingq wrote:
         | >it's not easy to inspect or fix data in production the way we
         | would with postgres.
         | 
         | I assume because you're using a remote socket connection from
         | the client?
         | 
         | I haven't tried it in a serious setting yet, but I did play
         | around with dqlite and was impressed. Canonical uses it as the
         | backing data store for lxd. Basically sqlite with raft
         | clustering and the ability for clients to connect remotely via
         | a wire protocol. https://dqlite.io/
        
           | pgwhalen wrote:
           | > I assume because you're using a remote socket connection
           | from the client?
           | 
           | Yeah, it's common for all developers to connect to and query
           | against prod postgres DBs via DataGrip or similar.
           | 
           | dqlite definitely looks interesting, but I worry it's a bit
           | heavy given that our only use case for remote access is prod
           | troubleshooting. I think I saw something recently where you
           | could spin up a server on top of a sqlite file temporarily -
           | that might be ideal for us.
        
         | srcreigh wrote:
         | Did you try SSH and using SQLite command line tool?
        
           | pgwhalen wrote:
           | That ought to work, but the experience we are trying to
           | emulate is that of using DataGrip, rather than psql.
        
         | brunoluiz wrote:
         | I believe you mean that you can't easily do a "psql ..." or
         | connect using DataGrid and similars, right?
         | 
         | Does this mean that devs need to copy the production database
         | file locally to then inspect it? Or are there tools to
         | connect/bridge to a remote sqlite file?
        
           | pgwhalen wrote:
           | > I believe you mean that you can't easily do a "psql ..." or
           | connect using DataGrid and similars, right?
           | 
           | Yeah.
           | 
           | > Does this mean that devs need to copy the production
           | database file locally to then inspect it? Or are there tools
           | to connect/bridge to a remote sqlite file?
           | 
           | We use "kubectl copy" currently when we want to inspect it,
           | and we haven't actually had to write back to a production
           | file yet. We've explored the "remote" option, but since it's
           | just a file, everything seems to boil back down to "copy
           | locally" then "copy back to remote prod".
           | 
           | It's only a small part of our stack at the moment, so we
           | haven't invested in tooling very much - but I'd be curious if
           | others have solved similar problems.
        
             | brunoluiz wrote:
             | Ah I see! Yeah, I expected your team would have to copy it
             | locally. I wonder tho, in times of data leaks and whatnot,
             | couldn't it be dangerous to have lots of PII (personal
             | identifiable information) copied around many dev laptops?
             | 
             | I mean, devs can do the same with Postgres, but it is more
             | for backups instead of purely querying.
        
               | pgwhalen wrote:
               | That's true - our software does not deal with PII, which
               | makes many of our DB practices sound strange to
               | outsiders.
        
           | tfehring wrote:
           | I think the usual approach would be to SSH into the server
           | and run sqlite3 there. This issue [0] mentions some
           | workarounds for connecting from DBeaver, which I assume would
           | work for other graphical client software. I haven't tried
           | those approaches, they seem pretty hacky and I imagine
           | performance isn't great, but I guess that's to be expected
           | given that SQLite isn't designed for that type of access.
           | 
           | [0] https://github.com/dbeaver/dbeaver/issues/6876
        
         | aflag wrote:
         | Can't you run your queries in a copy of the data (eg.: a
         | backup)? I think that'd be advisable even if you were running
         | postgresql.
        
           | lenkite wrote:
           | One can create read-only users in Postgres.
        
           | pgwhalen wrote:
           | > Can't you run your queries in a copy of the data (eg.: a
           | backup)?
           | 
           | That's essentially what we do: copy the file locally if we
           | need to inspect it. It's slightly more cumbersome though.
           | 
           | > I think that'd be advisable even if you were running
           | postgresql.
           | 
           | Connecting to live prod servers is definitely not a 10/10 on
           | the "best practices" scale, but it works well for our
           | business (trading), where there are small developer teams
           | that also operate, no PII in the database, and critical
           | realtime functionality isn't directly involved with the
           | database anyway.
        
             | aflag wrote:
             | With postgres you could set up a read-only replica or
             | something. At least for me, the small effort is well worth
             | not having to worry about accidentally deleting production
             | data.
             | 
             | I feel like having an easy mechanism to clone the
             | production database somewhere you can play with is well
             | worth the effort. You can even use those clones to run
             | backtests and other integration/regression tests against,
             | which is also a very nice to have.
        
               | pgwhalen wrote:
               | > With postgres you could set up a read-only replica or
               | something. At least for me, the small effort is well
               | worth not having to worry about accidentally deleting
               | production data.
               | 
               | We do do this, and probably should be more disciplined
               | about connecting to it when only reading. Of course that
               | doesn't help if we need to run an update in production,
               | but that isn't _that_ often.
               | 
               | > I feel like having an easy mechanism to clone the
               | production database somewhere you can play with is well
               | worth the effort. You can even use those clones to run
               | backtests and other integration/regression tests against,
               | which is also a very nice to have.
               | 
               | We do actually do this as well (nightly), and it is a
               | _huge_ productivity boost for testing and development. I
               | would recommend to anyone that writes software dealing
               | with persisted data to invest in an easy mechanism to
               | clone from production.
        
       | reneberlin wrote:
       | My 2 cents on sqlite:
       | 
       | https://corecursive.com/066-sqlite-with-richard-hipp/
       | 
       | An interview with one of the creators:Mr. Richard Hipp - for a
       | better and deeper understanding what pitch they took and what
       | industries they were in to. Their approach to overcome the db-
       | world that they saw in front of them. See the obstacles and the
       | solutions and why it came to be that underestimated 'sqlite' that
       | powers a good chunk of all you mobile actions triggered by your
       | apps - but just read that interview - i cannot reproduce the
       | dramatic here in my own words (underestimated).
        
       | mrfusion wrote:
       | I wrote a script to find all csv files in a directory and figure
       | out the best way to load them into SQLite.
       | 
       | It gives me a handy way to run queries on data.
       | 
       | I tried to make it super smart too and really make educated
       | guesses on what data types to use and even linking foreign keys.
        
         | lenkite wrote:
         | A language like R (https://www.r-project.org/) helps more for
         | CSV data-science.
         | https://www.tutorialspoint.com/r/r_csv_files.htm
        
       | pietroppeter wrote:
       | Nim forum uses SQLite as its db since 2012 and it fits perfectly
       | the article's use case. Code is available and it can be used to
       | run a discourse inspired forum (although much less featured).
       | 
       | https://github.com/nim-lang/nimforum
        
       | greatjack613 wrote:
       | I use SQLite exclusively on a high performance crypto sniper
       | project - https://bsctrader.app and I could not be happier with
       | it.
       | 
       | Performs much better then postgres in terms of query latency
       | which is ultra important for the domain we operate in.
       | 
       | I take machine level backups every 2 hours, so in the event of an
       | outage, just boot the disk image on a new vm and it's off.
       | 
       | I would never do this on my professional job due to the stigma,
       | but for this side project, it has been incredible
        
         | rhn_mk1 wrote:
         | > machine level backups
         | 
         | I presume that's block level backups? Or some snapshotting?
         | 
         | As far as I know, block level filesystem copies can get
         | inconsistent (so we have journalling file systems). But
         | assuming it works well, like a filesystem aware snapshot, can
         | sqlite deal with files snapshotted in the middle of an
         | operation?
        
         | jxi wrote:
        
         | dkjaudyeqooe wrote:
         | > I would never do this on my professional job due to the
         | stigma, but for this side project, it has been incredible
         | 
         | Using the right tool for the job does indeed carry a lot of
         | stigma in many professional environments. Instead you use the
         | tool that some VP has been sold by some salesman.
        
         | bob1029 wrote:
         | > I would never do this on my professional job due to the
         | stigma, but for this side project, it has been incredible
         | 
         | I love controversy so I was able to push SQLite through as the
         | sole persistence mechanism for our product. Virtually every
         | constraint lined up with our business perfectly.
         | 
         | We sell a B2B product that gets installed by our customers, so
         | they love it too. No more sweaty hands conversations about why
         | their database server cant talk to their application server
         | over their networks while we are caught in the middle trying to
         | avoid taking ownership of their internal IT problems.
        
         | catillac wrote:
         | The stigma of using the most popular database in existence?
        
           | copperx wrote:
           | For the domain of webapps, where multiple concurrent writers
           | are often expected, yes, I would say it's a stigma.
        
             | usrbinbash wrote:
             | Are they expected, or are the _required_?
             | 
             | Because, serializing db access through a single process
             | only becomes a problem when the number of reads/writes get
             | so large, that the process becomes a bottlenec.
             | 
             | And judging by the test the author of the linked article
             | did, that would have to be a HUGE number.
        
       | chunkyks wrote:
       | In the past I had a website with hundreds of gigs of data that
       | needed updating regularly, but could be read-only from the web
       | server perspective.
       | 
       | I used sqlite for that, and had a mysql server for the user data
       | and stuff that needed to be written to. Performance was
       | fantastic, users were happy, data updates were instantaneous ;
       | copy the new data to the server then repoint a symlink.
       | 
       | Most of my work is modeling and simulation. Sqlite is almost
       | always my output format ; one case per database is really natural
       | and convenient, both for analysis, and run management.
       | 
       | Anyway. Sqlite is amazing.
        
       | matdehaast wrote:
       | Shout out to litestream[0] for backups
       | 
       | [0] https://github.com/benbjohnson/litestream
        
       | gregors wrote:
       | I love this approach. I've ran sqlite in production for a variety
       | of products over the years and for the most part it's great. One
       | "back up" solution I did was dump tables to text - commit the
       | diff in git. Taking file copies while transactions are in process
       | can lead to corrupted db's.
        
       | samwillis wrote:
       | I believe SQLite is about to explode in usage into areas it's not
       | been used before.
       | 
       | SQL.js[0] and the incredible "Absurd SQL"[1] are making it
       | possible to build PWAs and hybrid mobile apps with a local SQL
       | db. Absurd SQL uses IndexedDB as a block store fs for SQLite so
       | you don't have to load the whole db into memory and get atomic
       | writes.
       | 
       | Also I recently discovered the Session Extension[2] which would
       | potentially enable offline distributed updates with eventual
       | consistency!
       | 
       | I can imagine building a SAAS app where each customer has a
       | "workspace" each as a single SQLite db, and a hybrid/PWA app
       | which either uses a local copy of the SQLite db synced with the
       | session extension or uses a serveless backend (like CloudFlare
       | workers) where a lightweight function performs the db operations.
       | I haven't yet found a nice way to run SQLite on CloudFlare
       | workers, it need some sort of block storage, but it can't be far
       | off.
       | 
       | 0: https://sql.js.org/
       | 
       | 1: https://github.com/jlongster/absurd-sql
       | 
       | 2: https://www.sqlite.org/sessionintro.html
        
         | [deleted]
        
         | koeng wrote:
         | The golang equivalent of native SQL can be found here -
         | https://pkg.go.dev/modernc.org/sqlite
         | 
         | Very useful for pure go applications!
        
         | dorian-marchal wrote:
         | How would we deal with conflicts (e.g. syncing back several
         | conflicting offline clients for the same user) with something
         | based on the Session Extension?
        
           | samwillis wrote:
           | You have to handle the merge conflicts yourself, see
           | https://www.sqlite.org/session/sqlite3changeset_apply.html
           | 
           | So you need to be carful how you design your schema, but very
           | possible.
           | 
           | One option is to use something like Yjs and a JSON column to
           | get proper CRDTs for merging.
           | 
           | https://github.com/yjs/yjs
        
         | pfarrell wrote:
         | > I can imagine building a SAAS app where each customer has a
         | "workspace" each as a single SQLite db
         | 
         | I did just that at my (now defunct) startup a few years ago. We
         | were building a collaborative database query tool. The software
         | we built used sqlite to keep a local db for storing user
         | credentials, caching query results, etc. Bonus, we were able to
         | have the local database file be encrypted protecting the data
         | at rest.
        
         | hinkley wrote:
         | When they tried to put SQLite into browsers I was excited but I
         | had no project that needed it. It took years to finally get
         | there, and wouldn't you know it, they had just marked it as
         | deprecated. KV stores are just not the same.
        
         | lenkite wrote:
         | How does one deal with fault tolerance ?
        
           | bsenftner wrote:
           | Using the transaction API for modifications, and ordinary
           | fault tolerance measures for the hosting application. It is
           | no more difficult than any other app-foundation technical
           | feature.
        
         | rmbyrro wrote:
         | You can run it on AWS Lambda and store the SQLite file(s) on
         | EBS [1].
         | 
         | [1] https://aws.amazon.com/blogs/compute/using-amazon-efs-for-
         | aw...
        
           | otterley wrote:
           | That's EFS, not EBS. EFS is file-level storage (i.e., NFS);
           | EBS is block-level storage (SAN). AWS Lambda only supports
           | EFS.
        
         | paulryanrogers wrote:
         | I evaluated sqlite for a web extension but ultimately decided
         | it wasn't worth it. There is no easy way to save the data
         | directly to the file system. And saving the data in other ways
         | meant I was probably better off with IndexDB instead. Still it
         | is a tempting option and one that seems to work well for
         | separate tenancy.
        
           | dorian-marchal wrote:
           | > There is no easy way to save the data directly to the file
           | system.
           | 
           | That's what absurd SQL is for (link in the parent comment).
        
             | paulryanrogers wrote:
             | I read that one and agree it feels absurd. Not something I
             | want to depend on.
        
               | samwillis wrote:
               | I'm 100% sure this is (very nearly) suitable for
               | production. It works and works well.
               | 
               | It is only a matter of time before browsers implement the
               | proposed sandboxed virtual file system apis with block
               | level access and this would be superseded.
        
               | dorian-marchal wrote:
               | I believe the absurd things are: - the idea (i.e. storing
               | a database in a database) - the fact that the resulting
               | storage is faster than vanilla indexed db.
               | 
               | The repo's readme is a bit misleading and it does make it
               | look like a mad scientist experiment but you can read
               | more about it here (really interesting stuff):
               | https://jlongster.com/future-sql-web
               | 
               | Obviously this project is still young, but I find it
               | convincing (disclaimer: I don't use it yet).
        
       | alin23 wrote:
       | I'm exactly at a point where I'm considering SQLite for its
       | _single file db_ advantage, but I 'm struggling to find solutions
       | for my use case.
       | 
       | I need to import some 30k JSONs of external monitor data from
       | Lunar (https://lunar.fyi) into a normalized form so that everyone
       | can query it.
       | 
       | I'd love to get this into a single SQLite file that can be served
       | and cached through CDN and local browser cache.
       | 
       | But is there something akin to Metabase that could be used to
       | query the db file _after_ it was downloaded?
       | 
       | I know I could have a Metabase server that could query the SQLite
       | DB on my server, but I'd like the db and the queries to run
       | locally for faster iteration and less load on my server.
       | 
       |  _Besides, I 'm reluctant to run a public Metabase instance given
       | the log4j vulnerabilities that keep coming._
        
         | dkjaudyeqooe wrote:
         | You could do incremental updates of the local databases using
         | the session extension:
         | 
         | https://sqlite.org/sessionintro.html
        
       | bob1029 wrote:
       | We've been using SQLite in production as our exclusive means for
       | getting bytes to/from disk for going on 6 years now. To this day,
       | not one production incident can be attributed to our choice of
       | database or how we use it.
       | 
       | We aren't using SQLite exactly as intended either. We have
       | databases in the 100-1000 gigabyte range that are concurrently
       | utilized by potentially hundreds or thousands of simultaneous
       | users. Performance is hardly a concern when you have reasonable
       | hardware (NVMe/SSD) and utilize appropriate configuration (PRAGMA
       | journal_mode=WAL).
       | 
       | In our testing, our usage of SQLite vastly outperformed an
       | identical schema on top of SQL Server. It is my understanding
       | that something about not having to take a network hop and being
       | able to directly invoke the database methods makes a huge
       | difference. Are you able to execute queries and reliably receive
       | results within microseconds with your current database setup?
       | 
       | Sure, there is no way we are going to be able to
       | distribute/cluster our product by way of our database provider
       | alone, but this is a constraint we decided was worth it,
       | especially considering all of the other reduction in complexity
       | you get with single machine business systems. I am aware of
       | things like DQLite/RQLite/et.al., but we simply don't have a
       | business case that demands that level of resilience (and
       | complexity) yet.
       | 
       | Some other tricks we employ - We do not use 1 gigantic SQLite
       | database for the entire product. It's more like a collection of
       | microservices that live inside 1 executable with each owning an
       | independent SQLite database copy. So, we would have databases
       | like Users.db, UserSessions.db, Settings.db, etc. We don't have
       | any use cases that would require us to write some complex
       | reporting query across multiple databases.
        
         | nurettin wrote:
         | How do you join?
        
           | nathcd wrote:
           | Presumably ATTACH: https://www.sqlite.org/lang_attach.html
        
         | gunshowmo wrote:
         | This is really interesting. Would you mind sharing a bit about
         | how you keep the data that is shared across services in sync?
         | Or is there a hard separation of concerns so that services
         | would only commuincate with the service that owns that data to
         | obtain it?
        
         | lisper wrote:
         | > So, we would have databases like Users.db, UserSessions.db,
         | Settings.db, etc.
         | 
         | How do you do joins?
        
           | tmp_anon_22 wrote:
           | I would guess they do parallelized calls to the microservices
           | to collect the data.
        
           | em500 wrote:
           | Not the OP, but you can attach and join tables from multiple
           | sqlite databases in the same process.
        
             | lisper wrote:
             | Really??? How? Because I use sqlite, and the query
             | functions that I know take a single DB as an argument.
        
               | dwohnitmok wrote:
               | https://stackoverflow.com/questions/6824717/sqlite-how-
               | do-yo...
        
         | qvrjuec wrote:
         | How is this setup fault tolerant? What happens if there is a
         | hardware failure? How do you partition access in a way that
         | means an extremely active user doesn't impact availability?
        
           | bob1029 wrote:
           | > How is this setup fault tolerant?
           | 
           | It is not.
           | 
           | > What happens if there is a hardware failure?
           | 
           | The product would suffer a total outage until manual
           | intervention takes place. A restore of the VM from snapshot
           | would be carried out by the customer. Some loss of the most
           | recent business data would occur (i.e. between latest
           | snapshot and time of failure). All of this is understood and
           | agreed to by our customers.
           | 
           | > How do you partition access in a way that means an
           | extremely active user doesn't impact availability?
           | 
           | Partitioning is not applicable. Our entire product fits on 1
           | machine and dividing the I/O along any dimension does not add
           | much value to the performance equation.
        
             | vidanay wrote:
             | Wait a minute....are you implying that I don't need Azure
             | and FB level infrastructure for my local recipe
             | application?
             | 
             | /s
        
               | lkxijlewlf wrote:
               | I'm pretty sure you _do_ need FB and Azure level infra
               | for you recipe app. I 've read a couple blog posts,
               | watched a video on YouTube and copied the code from Stack
               | Overflow, so I'm pretty much an expert on this, so trust
               | me.
        
               | vidanay wrote:
               | My recipe application is going to be written in a purely
               | functional language with CQRS and a message bus that
               | provides "guaranteed once" determinism. I should be able
               | to spin up no more than a half dozen containers on two
               | VM's to handle this architecture. The database will
               | probably be redundant MSSQL and Oracle (just in case one
               | technology is fundamentally broken). Database writes will
               | be proxied by a Redis instance. I will use 3x 3080 GPU to
               | train a ML model used for scaling the recipe depending on
               | the number of people you wish to serve. Hmmm...I might
               | need a third VM.
        
               | zbentley wrote:
               | > The database will probably be redundant MSSQL and
               | Oracle (just in case one technology is fundamentally
               | broken)
               | 
               | Only _one_ technology?
        
               | joconde wrote:
               | You can't do any serious neural stuff on 8GB of VRAM. I'd
               | say save yourself some pains and buy the $3000
               | professional models with ECC so you can train
               | transformers.
        
               | kevin_thibedeau wrote:
               | It's never gonna scale. Get on the NoSQL train now to
               | future proof and woo the VCs.
        
               | xcambar wrote:
               | Yes. Make it web scale.
               | 
               | Also, think early on about your compensation packages.
               | You don't want to lose a 10x engineer to a FAANG, do you?
        
             | qvrjuec wrote:
             | Interesting. For an extremely specific use case and with
             | users who understand and accept the caveats of this
             | approach I'm sure it would work well enough. The most
             | confusing thing to me is that there is apparently an
             | intersection of users who are ok with an outage and data
             | loss with users who want a product which can
             | 
             | > execute queries and reliably receive results within
             | microseconds
             | 
             | What is your product? Who are these users?
        
               | gbear605 wrote:
               | I can think of plenty of services that an occasional
               | (once a year? less?) outage is okay. Heck, anything
               | relying on AWS us-east-1 is going to have outages that
               | frequently based on the last few months. Meanwhile,
               | almost any service is better off when its response times
               | are cut drastically. I've seen many instances where a
               | service's response times are more than half waiting for a
               | db to respond.
        
               | qvrjuec wrote:
               | It's not the threat of an outage with data loss that is
               | concerning to me- I just want to understand use case that
               | needs fractions of a second shaved off of query times by
               | using SQLite in this way that is also ok with the
               | possibility of data loss.
        
               | wongarsu wrote:
               | Interactive interfaces. There's a huge difference between
               | moving a slider and seeing the reaction in real time and
               | moving the slider and seeing the reaction a second later.
               | If you define "real time" as 30 fps, you have 33ms to
               | process each query and show the result. That could
               | involve multiple database queries with computation in
               | between, if your business logic isn't easily expressible
               | in SQL.
               | 
               | Come to think about it, that covers most apps with UI.
               | Apps where you are exploring data are definitely more
               | impacted however.
        
               | bob1029 wrote:
               | I agree that 'reliably' would be a poor choice of diction
               | when considered in your context.
               | 
               | The user group of our product is effectively only
               | professional bankers and their managers. No one in the
               | general public has any direct access.
        
               | lkxijlewlf wrote:
        
               | qvrjuec wrote:
               | No need to be condescending when I'm trying to learn. The
               | premise of using SQLite because it was easier and to save
               | fractions of a second didn't make sense to me as a
               | tradeoff for potentially losing data
        
         | twicetwice wrote:
         | Interesting. So I would ordinarily want to put a foreign key
         | constraint on the user_id column of a UserSessions table (or
         | similar). In general, presumably you have relationships across
         | the tables that are in those discrete databases. Do you just
         | enforce these constraints/do joins in code? It seems like
         | splitting related tables across multiple databases loses some
         | (possibly a lot?) of the benefits of relational DBs, so I'm
         | curious how you handle/manage that.
         | 
         | That said, I love the idea this architecture. Might use it for
         | whatever next dumb little web service I cook up! I love how
         | this simplifies a lot of dev/deployment ops, perfect for a side
         | project.
        
           | nathcd wrote:
           | SQLite has ATTACH (https://sqlite.org/lang_attach.html) for
           | cross database operations. I've never tried it for foreign
           | key constraints across databases, but I think it would work?
        
           | stu2b50 wrote:
           | You'd have to do the join on the client. This is a problem in
           | general for scaling SQL - its not impossible, but with
           | sharded SQL servers you have to be very deliberate in how you
           | do joins so as to coincide as much as possible with your
           | sharing strategy.
           | 
           | It would be similar here. According to the author each sqlite
           | DB belongs to a single microservice, which will naturally
           | group together the most common of joins. Anything else will
           | indeed have to be manually joined.
           | 
           | Part of the reason nosql became popular for a bit. That's
           | reversed, and sharded SQL is pretty common now, but it
           | definitely adds more cognitive load to schema design.
        
         | plq wrote:
         | To my knowledge, WAL mode still needs to serialize writes for
         | each database file. I'm assuming this is not a setup where
         | there are too many concurrent writers?
        
           | bob1029 wrote:
           | Correct - We are using the typical SQLite build which
           | serializes all writes by default and we have made no effort
           | to undo this behavior. We actually rely on this to ensure
           | things happen in the right order and take advantage for
           | performance reasons.
           | 
           | Because SQLite is effectively serializing all the writes for
           | us, we have zero locking in our code. We used to have to lock
           | when inserting new items (to get the LastInsertRowId), but
           | the newer version of SQLite supports the RETURNING keyword,
           | so we don't even have to lock on inserts now.
           | 
           | Also, the fact that we have the databases divided across
           | function helps free up some of the lock contention in the
           | provider. We don't really have any mixed workload databases -
           | its either "slow" gigantic operations (JSON blob access), or
           | super quick tiny things (updating session timestamps). So,
           | there is some minor isolation of contention issues on a
           | subsystem basis.
        
           | thunderbong wrote:
           | Testing with JMeter, I got upto 10,000 concurrent writes on
           | SQLite with WAL mode enabled
        
         | scottcodie wrote:
         | As long as you're estimating your future costs correctly then
         | you're golden. If you aren't and your application becomes
         | complex through growth or you need resiliency then you'll need
         | to pay that cost and that cost can be big.
        
         | starik36 wrote:
         | Is this a situation where multiple web servers in a farm are
         | accessing the SQLite databases on a file server?
        
           | bob1029 wrote:
           | This is a situation where a single process owns all of its
           | SQLite databases and handles everything out of 1 machine.
        
         | [deleted]
        
         | gfody wrote:
         | > It is my understanding that something about not having to
         | take a network hop and being able to directly invoke the
         | database methods makes a huge difference. Are you able to
         | execute queries and reliably receive results within
         | microseconds with your current database setup?
         | 
         | with SQL Server you can get a very fast local connection by
         | specify "server=(local)" in the connection string - this uses
         | shared memory protocol bypassing the net stack.
        
           | bob1029 wrote:
           | This is true - The counterpoint is that now you have this
           | leviathan that is the SQL Server process(es) running on the
           | same machine as the application.
           | 
           | If I am constraining my SQL Server usage to fit on 1 box, I
           | might as well use SQLite (assuming no future plans for
           | horizontal scaling).
        
           | chasil wrote:
           | Oracle clients can do the same by setting the ORACLE_SID and
           | ORACLE_HOME environment variables.
           | 
           | For remote databases, setting the TWO_TASK environment
           | variable to the server's TNS descriptor is one way to force a
           | network login.
        
       | frays wrote:
       | Is anyone here running production workloads which perform read
       | and write operations on a remote SQLite database?
       | 
       | Currently using Postgres and I'm open to switching but I haven't
       | seen any libraries or implementations of SQLite being used as a
       | client/server database.
        
         | danielheath wrote:
         | Why switch? Typically the work involved would need to be
         | justified by some benefit, right?
        
       | galaxyLogic wrote:
       | SQLite database can be stored in git which seems like a great
       | benefit. But I wonder would it also be possible to have different
       | "branches" of the database and then merge them at some point?
        
         | pgwhalen wrote:
         | It's not integrated with git the way you're perhaps imagining,
         | but SQLite sessions[0] is adjacent to what you're imagining.
         | 
         | [0] https://www.sqlite.org/sessionintro.html
        
       | davidhariri wrote:
       | Just a note that there are significant features of SQLAlchemy
       | that don't work with SQLite such as ARRAY columns, UUID primary
       | keys and certain types of foreign key constraints.
        
         | zzzeek wrote:
         | well no major database other than PostgreSQL has native support
         | for UUID or ARRAY, you can certainly use string-based types for
         | these things for other databases. the DB agnostic UUID is at
         | https://docs.sqlalchemy.org/en/14/core/custom_types.html?hig...
         | and for ARRAY it's likely most convenient to use the SQLite
         | JSON datatype which is also supported directly.
        
       | reneberlin wrote:
       | What a cruel world- sqlite to the rescue!
        
       | kristianpaul wrote:
        
       | vanilla-almond wrote:
       | Is SQLite suitable for a small-to-medium CMS (Content Management
       | System), or a blog platform e.g. WordPress (MySQL) or Ghost
       | (MySQL)?
        
       ___________________________________________________________________
       (page generated 2021-12-29 23:01 UTC)