[HN Gopher] Consider SQLite
___________________________________________________________________
Consider SQLite
Author : wesleyac
Score : 241 points
Date : 2021-12-29 18:11 UTC (4 hours ago)
(HTM) web link (blog.wesleyac.com)
(TXT) w3m dump (blog.wesleyac.com)
| brunoluiz wrote:
| I love to see that more projects are using SQLite as their main
| database.
|
| One thing that I always wondered though: does anyone knows a big
| project/service that uses Golang and is backed by SQLite? This
| because SQLite would require CGO and CGO generally adds extra
| complexities and performance costs. I wonder how big Golang
| applications fare with this.
| nn3 wrote:
| As a author of a library that got rewritten by another team in
| go with a similar rationale, I must say that the Go ecosystem's
| unwillingness (or inability) to interface seamlessly with other
| languages seems like its greatest drawback.
| koeng wrote:
| You can use pure Golang SQLite (without requiring CGO) -
|
| https://pkg.go.dev/modernc.org/sqlite
|
| It works well, but the performance is worse than C version. Not
| a big deal for what I used it for, though. It was approx. 6x
| worse at inserts.
| anderspitman wrote:
| arp242 has an excellent post[0] about statically compiling
| sqlite for Go programs, which may be useful.
|
| Isn't there some issue where SQLite basically has to be single-
| threaded in Golang programs, at least if you use the stdlib SQL
| library?
|
| [0]: https://www.arp242.net/static-go.html
| usrbinbash wrote:
| Using https://github.com/mattn/go-sqlite3 for a pet project
| right now. So far not a single issue with it, I never had to
| think about any CGO specifics.
| mholt wrote:
| Not a "big project/service" but a Go project that uses Sqlite
| is one of my own, Timeliner[1] and its successor, Timelinize[2]
| (still in development). Yeah the cgo dependency kinda sucks but
| you don't feel it in code, just compilation. And it easily
| manages Timeline databases of a million and more entries just
| fine.
|
| [1]: https://github.com/mholt/timeliner
|
| [2]: https://twitter.com/timelinize
| hoaljasio wrote:
| https://github.com/gravitational/teleport/ has the option to
| use it, but it only uses it as a key value store.
|
| CGO isnt too big a problem and if it is a real dealbreaker
| something like https://pkg.go.dev/modernc.org/sqlite will work
| as it transpiled the c into go and passes the sqlite test
| suite. I think there is performance degradation with writes but
| reads are still pretty quick.
| reneberlin wrote:
| Wow. So much tech - sqlite this time, and so many opinions to
| discuss with words / characters. Maybe i could create my own
| alphabet to have some peace of mind in the end? I don't think so
| - somebody would make a story of it - wait ... happened!
|
| Read the creator of sqlite for more information on all running
| topics that make you go 'Uh' up to this point of time:
| https://corecursive.com/066-sqlite-with-richard-hipp/
| anderspitman wrote:
| Am I the only one who thinks SQLite is still too complicated for
| many programs? Maybe it's just the particular type of software I
| normally work on, which tends towards small, self-hosted
| networking services[0] that would often have a single user, or
| maybe federated with <100 users. These programs need a small
| amount of state for things like tokens, users accounts, and maybe
| a bit of domain-specific things. This can all live in memory, but
| needs to be persisted to disk on writes. I've reached for SQLite
| several times, and always come back to just keeping a struct of
| hashmaps[1] in memory and dumping JSON to disk. It's worked great
| for my needs.
|
| Now obviously if I wanted to scale up, at some point you would
| have too many users to fit in memory. But do programs at that
| scale actually need to exist? Why can't everyone be on a
| federated server with state that fits in memory/JSON? I guess
| that's more of a philosophical question about big tech. But I
| think it's interesting that most of our tech stack choices are
| driven by projects designed to work at a scale most of us will
| never need, and maybe nobody needs.
|
| As an aside, is there something like SQLite but closer to my use
| cases? So I guess like the nosql version of SQLite.
|
| [0]: https://boringproxy.io/
|
| [1]:
| https://github.com/boringproxy/boringproxy/blob/master/datab...
| charrondev wrote:
| For Vanilla Forums we have a file based configuration interface
| with a bunch of utility methods for writing/reading too it.
| This is the default and what you really want in a single
| environment either hosted or in development.
|
| In our cloud infrastructure though we have 10k+ distributed
| sites so the interface reads/writes to an API backed by a
| database.
|
| A major benefit to the confit file is that you can just open it
| in a text editor and muck around during development.
| aduitsis wrote:
| Would Redis be a valid solution for your case?
| twicetwice wrote:
| This adds another service you now need to deploy and
| maintain. For small personal projects, I definitely would
| like to avoid that if I can!
| anderspitman wrote:
| I agree.
| root_axis wrote:
| If you care about data normalization and data integrity then
| SQLite is going to be a much better choice.
| anderspitman wrote:
| At the scale I described in my comment, I do not care about
| normalization. Can you give an example of where I'm likely to
| lose data integrity?
| klysm wrote:
| SQLite hides a ton of complexity that lives in the filesystem.
| It's incredibly hard to do robust IO correctly with the APIs we
| have.
|
| I almost always choose SQLite for persisting to disk over JSON
| files. It essentially removes a large class of bugs and is
| robust enough that I'm not worried about introducing new
| problems.
| newlisp wrote:
| _SQLite hides a ton of complexity that lives in the
| filesystem_
|
| Since they are using Go, couldn't you say the same thing
| about the Golang std library? As long as they know how to use
| a local file as database (do the swap, flush, etc...) I don't
| see the problem.
| gnabgib wrote:
| What makes you think SQLite uses go? (It's written in C[0])
|
| [0]: https://www.sqlite.org/index.html
| newlisp wrote:
| Was saying that Go's std lib also "hides a ton of
| complexity that lives in the filesystem" for you.
| klysm wrote:
| No because it doesn't give you IO abstractions as robust as
| SQLite.
| anderspitman wrote:
| Do you have an example of a problem I'm likely to encounter
| at the scale I described in my comment?
| bob1029 wrote:
| In addition to hiding a ton of complexity, SQLite is actually
| faster than the filesystem in a surprising range of cases.
| anderspitman wrote:
| I never questioned whether it was faster or had more
| features. My whole point is maybe a lot of programs don't
| need what it offers.
| tyingq wrote:
| BerkeleyDB maybe? DBM files used to be pretty commonly used on
| Unix boxes. Sendmail, for example, uses DBM files.
|
| Or LMDB, LevelDB, RocksDB, etc, if performance is important.
| dkjaudyeqooe wrote:
| Your solution works to a point, but it will not be as robust as
| SQLite. Its ACID powers are incredibly valuable in the real
| world where things fail or are unreliable. Also the ability to
| do complicated queries comes in handy sooner than you would
| think.
|
| Of course you have to then mess around with SQL but you only
| have to write it once and encapsulate it somewhere and you're
| done.
| anderspitman wrote:
| I never questioned whether it was faster or had more
| features. My whole point is maybe a lot of programs don't
| need what it offers.
| interpenetrate wrote:
| Look up "Prevayler" and the "system prevalence" pattern. It
| introduces occasional data journaling, and so it exists
| somewhere in between "serialize and persist the whole thing at
| once" and a full blown DBMS. But I've experimented with system
| prevalence in the past and decided that just "serializing and
| persisting the whole thing at once" like you've mentioned was
| simply a better solution.
| mftb wrote:
| > This can all live in memory, but needs to be persisted to
| disk on writes. I've reached for SQLite several times, and
| always come back to just keeping a struct of hashmaps[1] in
| memory and dumping JSON to disk. It's worked great for my
| needs.
|
| I do a very similar thing for many of my at-home projects. They
| generally have a single-user, me, and it works great.
| anyfoo wrote:
| Most relational databases offer ACID, for one thing.
|
| And also a query language, which is easy to write and modify
| queries for, presenting a great benefit over writing large
| amounts of boilerplate code for looping over records and
| accessing the right indices every time (keep abstracting that
| and you'll end up with your own relational database system).
|
| I often import data into SQLite just to work with it, without
| necessarily even an application in between. Depending on the
| nature of the data it's either that or Matlab.
| twicetwice wrote:
| For a recent project I picked a similar approach, but a
| directory full of JSON files, where the filename is the key and
| the value is the contents of the file. This gets me two
| improvements over the single massive object persisted to disk
| in a single file: ACIDish for free from the filesystem, and a
| large reduction in bytes (potentially) written to disk since
| you don't have to rewrite a large file every time a single key
| changes, and this service potentially has frequent writes to a
| subset of keys.
|
| This latter point matters a lot to me since the disk in
| question is a Raspberry Pi's SD card, which I've heard have a
| tendency to wear out quickly, so I wanted to avoid writing to
| it unnecessarily. And yeah, this system is fragile, not
| actually ACID, etc etc... but I'm the only one who relies on it
| (it's not even accessible on the public internet, only over my
| Wireguard VPN), so it doesn't matter if it breaks!
|
| I will probably rewrite this service later, backed by either
| SQLite or Postgres, but writing it this way saved me a fair bit
| of thinking and lines of code, which was great for getting it
| off the ground and usable to me in like two hours.
| bonyt wrote:
| I've always thought it interesting that there was a time when
| large(ish) websites were hosted using servers that would struggle
| to outperform a modern smart toaster or wristwatch, and yet
| modern web applications tend to demand a dramatic distributed
| architecture. I like the examples in this article showing what a
| single modern server can do when you're not scaling to Google's
| level.
|
| As an aside, what about distributed derivatives of sqlite, like
| rqlite, as a response to the criticism that sqlite requires your
| database server to also be your web server. Could something like
| rqlite also provide a way for an sqlite database to grow into a
| distributed cluster at a later point?
|
| https://github.com/rqlite/rqlite
| usrbinbash wrote:
| > and yet modern web applications tend to demand a dramatic
| distributed architecture
|
| The dirty secret is: Most of them don't really need that
| architecture. Most, and I mean +95% of websites would run just
| fine on a single box, running the websrv, db and whatever
| script-interpreter the backend runs on.
|
| Sure, it may be a bit slower if its not hooked up to a global
| CDN, and if the box goes down the user gets 404. But its also
| alot easier for an admin to simply give the one box a good
| wallop and put it back online, since it so simple.
|
| https://idlewords.com/talks/website_obesity.htm#heavyclouds
| voidfunc wrote:
| It's mostly because there is a demand for HA which requires
| multiple replicas. The moment you start going down that route
| you increase complexity.
|
| Whether most things actually require HA is debatable, but a lot
| of businesses make it a core requirement and so it gets baked
| into the architecture. Personally I feel like most stuff would
| be better suited to having fast fail-over and recovery early
| on, but my advice rarely gets taken. Instead you end up with
| complicated HA architectures that nobody totally understands,
| which then (inevitable still) fall over and take hours to
| recover.
| lenkite wrote:
| HA is generally needed because users of a SaaS can crash the
| instance especially in products that offer customisation
| abilities.
| voidfunc wrote:
| Sure, but that becomes a requirement at that particular
| design junction. A lot of stuff is built with HA that isn't
| even close to that complex.
| freedomben wrote:
| I don't disagree with you, a single server can go a really,
| really long way in scale before you run into problems. I know
| because I've done it a few times.
|
| The problem to me isn't ability to scale on one server, it's
| the single point of failure. My biggest site is a wordpress box
| with one instance on a pretty big VPS. In the last year I've
| had several outages big enough to require a post-mortem (not
| complete outages, but periods with high levels of error
| rates/failures), and every time it has been because of internal
| data center networking issues at my preferred cloud provider
| (and thankfully their customer service is amazing and they will
| tell me honestly what the problem was instead of leaving me to
| wonder and guess). So the main incentive for me to achieve
| horizontal scalability in that app is not scaling, it's high
| availability so I can survive temporary outages because of
| hardware or networking, and other stuff outside of my control.
| ljm wrote:
| Not even a decade ago we were hosting our web-facing services
| and our postgres DB on the same server. A lot of smaller
| projects had their own db and credentials on the same instance.
| The idea of having to spin up independent DB servers for each
| one seemed wasteful to us, since back then each box would cost
| at least PS10/mo or more on Linode or Rackspace and it meant
| more secrets to keep, more stuff to secure, more stuff to
| maintain.
|
| It was only with the advent of heroku and the sudden shift to
| the cloud (AWS) that the DB stopped being software you ran on
| your box as a daemon, and became essentially a dedicated
| resource. Even for services running at a completely trivial
| scale.
| WesolyKubeczek wrote:
| Consider, though, that in the days past (when your server would
| be probably on an equivalent of dual-socket Pentium 166-MMX),
| most clients would be coming from slow links like 33.4-56.1kbps
| dialup, and it wouldn't be a problem to serve them at all.
| Links were slow, users were patient, timeouts were high,
| webpages were sort of slim. Although if you ask me, they always
| have been heavy, just within the constraints of their time.
|
| Then, of course, there was ISDN and xDSL, which would give you
| true to god whopping 128 kbits/s for a while. 64 kpbs if you
| were cheap. It took a while to get to affordable multiples of
| Mbits per second.
|
| Now that there's at least 10 Mbps uplink from each residential
| subscriber, doesn't take long to DoS even a beefy server.
|
| And I'd say that server-side, things improved _vastly_ with
| advent of FastCGI and its equivalents. Back in that heyday of
| your P166-MMX server, it was CGI with Perl, spawning a process
| for each incoming request, or "blazing-fast" Apache's server-
| side includes, or other things like that. Maybe mod_perl with
| its caveats on memory sharing.
|
| Anyway, you're right in that whenever you show them a wider
| pipe, they will find more stuff to congest it with.
| hinkley wrote:
| Edge networking should work for SaaS, especially if your
| tenants tend to be geographically clustered. Stand up stateless
| front end web servers in half a dozen regions, and have them
| talk back to a more central region, with caches and circuit
| breakers to control the amount of cross-region traffic. That
| geographical separation will also make diurnal traffic patterns
| more obvious because you need more servers later in the day in
| PST than in EST.
|
| You don't have to distribute everything to scale pretty high.
| [deleted]
| smitty1e wrote:
| I've been doing some ETL exploration with opening a sqlite
| :memory: connection, ingesting small-to-medium data, and then
| doing
|
| "VACUUM INTO somefile.sqlite;"
|
| to dump the RAM copy to disk.
|
| What a great tool.
| gjvc wrote:
| sqlite as the data store for a system package manager is possibly
| the most practical single-host example
| reneberlin wrote:
| Before you start throwing your opinions into HN about sqlite,
| please read: https://corecursive.com/066-sqlite-with-richard-
| hipp/?utm_so...
| bobobob420 wrote:
| i have so much love for SQLite. Consider me, getting my first
| internship at a startup. They have a bunch of contractors doing
| work for them as part of their service. The whitelabel
| application they got developed would export data in CSV. My job
| was to take that data and get some meaning from it. data included
| availability, locations, etc (imagine data about delivery
| drivers). I had no idea what to do but realized I could
| definitely parse this CSV through Python.
|
| Once I had this data in Python I needed a way to analyze it. I
| never worked with Databases but decided to install a local copy
| of SQLite. The rest is history. I feel like I learned how to use
| databases in an organic way: by looking for a solution from raw
| data. A couple of queries later and python was exporting excel
| sheets with color coded boxes that indicated something based on
| the analysis I did.
|
| Of course this could be done with any database application but
| the low weight nature of sqlite allowed me to prototype a
| solution so easily. We just backed up that native sqlite dump
| with the cloud and had an easy (super easy) solution to analyze
| raw data.
| iffycan wrote:
| I've had great success using SQLite as both a desktop application
| file format and web server database. I'll mention just one thing
| I like about it in the desktop application realm: undo/redo is
| implemented entirely within SQLite using in-memory tables and
| triggers following this as a starting point:
| https://www.sqlite.org/undoredo.html
|
| It's not perfect, but it fills the niche nicely.
| hankchinaski wrote:
| I have tried adopting SQLite in my side projects. The problem I
| encountered is that using managed PostgreSQL/MySQL is still more
| convenient and more reliable than using SQLite on a bare metal
| VPS. I like to use Heroku or Digital Ocean App platform because I
| want to spend time creating and not managing the infrastructure
| (ci/cd, ssl certs, reverse proxy, db backup, scaling, container
| management and what not). I tried looking for a managed SQLite
| but could not find one. On an unrelated note I found using Redis
| a good lightweight alternative to the classical psql/MySQL.
| Although still multi-tier and more difficult to model data, it's
| initially cheaper and easier to manage than its relational
| counterparts. Anyone has had similar setup/preference?
| [deleted]
| cheeselip420 wrote:
| What would a managed sqlite even look like? I can't tell if
| this is a real response or not...
| Something1234 wrote:
| Maybe an NFS mount or something that handles back ups
| automatically? Scripting to handle an automatic restore of
| the database?
|
| Maybe a heroku that knows about your database file and
| automatically loads the latest version for you?
|
| I kind of feel like GP is a troll comment, as there's no real
| value add for a managed SQLite.
| config_yml wrote:
| Or just use litestream, it's perfect for this and the
| closest you can get to managed by replicating to S3 or
| Cloud Storage or the likes.
| DizzyDoo wrote:
| When hankchinaski says 'managed' I think they really mean
| that there's some capital-A App dashboard somewhere, on
| Digital Ocean or wherever, and they log in and click 'new
| database' and that's it. No ssh-ing to a VPS, and choosing
| the file location where the sqlite file will sit, figuring
| out backups and so on. But as you say, while you can wrap
| postgres or redis in that sort of 'just take care of it for
| me' approach, given the simplicity of sqlite it doesn't fit
| that paradigm, so perhaps hankchinaski is just
| misunderstanding what sqlite fundamentally is and how it
| works.
| hankchinaski wrote:
| Exactly. I was expressing the limitation of that single
| tier paradimg viewed from an angle of someone who doesn't
| want to ssh into machines to configure and deploy code...
| :)
| hankchinaski wrote:
| I clearly did not express myself correctly. Being all these
| managed solutions (heroku, do app platform) running on an
| ephemeral fs it is not possible to use disk as long term
| storage as sqlite requires. So I'm basically forced to manage
| an attached disk to my container which needs to be backed up,
| monitored and so on
| aynyc wrote:
| I used it for ETL process extensive which is great. I still don't
| know how people use it for concurrent writes like a simple ToDo
| webapp?
| aidenn0 wrote:
| With WAL, writes for something like a ToDo app finish in a
| small fraction of a millisecond so unless your todo webapp is
| writing to the DB at a rate exceeding 20k writes per second,
| the fact that writes are not concurrent becomes largely
| irrelevant.
| aynyc wrote:
| This can't be right. As far as I can tell, WAL allows
| concurrent READS and WRITE, not concurrent WRITES. Am I doing
| this wrong all these years?
| aidenn0 wrote:
| That's correct, it doesn't allow concurrent writes, but if
| the writes finish fast enough, that's somewhat academic.
| newlisp wrote:
| _Postgres is 9.5x slower when running on the same machine as the
| one doing the query_
|
| I'm surprised by this, sure in-process is always going to be
| faster but still find it hard to believe that sqlite can be beat
| postgres in a single machine.
| samatman wrote:
| This excellent article doesn't even mention rqlite, which will
| synchronize an arbitrary number of SQLite instances using the
| Raft protocol.
|
| There must be some scaling limits to encounter using this
| combination, but wouldn't you love to have that problem?
| zaptheimpaler wrote:
| I don't doubt the power of SQLite, but its difficult to see why
| its worth using over Postgres anyways.
|
| This is what it takes to run a basic postgres database on my own
| PC (in a docker compose file): postgres:
| image: postgres:12.7 container_name: postgres
| environment: - PGDATA=/var/lib/postgresql/data/pgdata
| - POSTGRES_PASSWORD=<pw> volumes: -
| ./volumes/postgres/:/var/lib/postgresql/data/
|
| For someone who's completely allergic to SSH and linux, a managed
| Postgres service will take care of all that too.
|
| SQLite seems simple in that its "just a file". But its not. You
| can't pretend a backup is just copying the file while a DB is
| operating and expect it to be consistent. You can't put the file
| on NFS and have multiple writers and expect it to work. You can't
| use complex datatypes or have the database catch simple type
| errors for you. Its "simple" in precisely the wrong way - it
| looks simple, but actually using it well is not simple. It
| doesn't truly reduce operational burden, it only hides it until
| you find that it matters.
|
| Similarly postgres is not automatically complex simply because it
| _can_ scale. It really is a good technology that can be simple at
| small scale yet complex if you need it.
| lenkite wrote:
| It's unfortunate you are getting downvoted. Nobody in this
| thread has given clear solutions for addressing issues like
| fault-tolerance, failover, backup and all the managed goodness
| you can get from choosing Postgres.
| gravypod wrote:
| I'm not really sure why this post has downvotes. docker-compose
| dramatically lowers the barrier for setting up a single machine
| with multiple services (your service, db, etc). For a similar
| experience you do the same with AWS RDS or equivalent.
| Performance will be better and worse in various situations but
| if your software still fits in one machine you're largely going
| to be "ok." Backups, restore, monitoring, etc are all important
| for running software and that's something an sqlite file
| doesn't really offer the best solutions for. It works great for
| some things (I've used it many many times) but it's not perfect
| for everything.
| zaptheimpaler wrote:
| I call this trend "tech hipster"-ism. Part of the motivation
| is just to do something different just for the sake of being
| different. Maybe part of it is a perception that Postgres or
| Linux are oh-so-scary and difficult things or that using the
| same technology that Amazon uses makes you evil.
| -\\_(tsu)_/-.
| srcreigh wrote:
| More info about WAL mode concurrency [0]
|
| No reader-writer lock. Still only 1 concurrent writer, but write
| via append to WAL file is cheaper. Can adjust read vs write
| performance by syncing WAL file more or less often. Can also
| increase performance with lower durability by not syncing WAL
| file to disk as often
|
| https://www.sqlite.org/wal.html
| reneberlin wrote:
| meh https://corecursive.com/066-sqlite-with-richard-
| hipp/?utm_so...
| errantmind wrote:
| Also, consider just using the filesystem with binary encoding.
| Serialize / deserialize directly from / to data structures. It is
| faster and simpler than any database you'll ever use, assuming
| you don't need the functionality a database provides.
| chasil wrote:
| There are some important things that SQLite does not do.
|
| It is not client/server; a process must be able to fopen() the
| database file. NFS and SMB are options that can convey access to
| remote systems, but performance will not likely be good.
|
| Only a single process can write to the database at any time; it
| does not support concurrent writers.
|
| The backup tools do not support point-in-time recovery to a
| specific past time.
|
| If your application can live with these limitations, then it does
| have some wonderful features.
| mushufasa wrote:
| Note -- the single process write at any one time is a killer
| for most web apps, where for example within SaaS you have many
| users doing things at the same time.
| js4ever wrote:
| It's not really an issue if you have 1 db per customer
| rguillebert wrote:
| If a customer has 1000 employees all using your app, it is.
| chasil wrote:
| If the app is designed correctly, then the thousand
| employees would write to their own temporary databases,
| and a background job would pull their changes into the
| main database sequentially.
|
| If the app is not specifically designed to do this, then
| SQLite would not be an option.
| rguillebert wrote:
| What's the benefit over just using the main database?
| pc86 wrote:
| Serious question, is this just a "how do I get SQLite to
| work in this scenario?" thing, or is there actually some
| other benefit to having this sort of data architecture?
| chasil wrote:
| This can actually relate to SMTP servers using mbox or
| maildir formats. Maildir is more resistant to corruption,
| and doesn't rely (as much) on file locks.
|
| https://serverfault.com/questions/564456/what-are-the-
| practi...
| blacksmith_tb wrote:
| Sure, though that's if they're all doing CRUD at the same
| time. I maintain our company blog on a self-hosted install of
| Ghost backed by sqlite, and it's a been great (since the only
| inserting or updating is likely to be one person editing a
| post, and the frontend is mostly cached).
| benbjohnson wrote:
| Most writes only take tens to hundreds of microseconds and
| you can perform those writes from different processes
| serially. As long as you set WAL mode & a busy timeout then
| you should be fine:
|
| PRAGMA journal_mode = WAL;
|
| PRAGMA busy_timeout = 30000;
| pc86 wrote:
| Does an event sourcing architecture help with this? I'm
| thinking a bunch of client piping events into a queue with a
| single writer working through that queue. Might be stretching
| the _eventual_ part of eventual consistency if the latency
| gets too bad, but I don 't think SQLite is top of mind for
| most SaaS solutions anyway so more of a thought exercise.
| dahart wrote:
| Can you elaborate? I'm imagining that the DB web server
| process would have no problem opening a thread per user and
| multiple threads can write to a SQLite store. Or, optionally
| you could buffer multi-threaded requests into a single queue
| for DB I/O. I'm not seeing why the single process constraint
| is a major impediment, or maybe I guess I'm not sure I
| understand why multiple separate processes might be required.
| Am I misunderstanding your comment?
| boring_twenties wrote:
| None of those matter to me, but I'm being forced to switch from
| SQLite simply because of its lack of uint64 support.
| lrem wrote:
| I don't know what backup tools you have in mind... But since a
| SQLite database is a single file (modulo write ahead journal
| and whatnot), making whatever you need is trivial.
| cperciva wrote:
| Be careful! If you copy a database file while it is being
| modified, you may end up with an inconsistent view of the
| database depending on the order in which different parts of
| the file are modified.
|
| There are ways around this (filesystem snapshots and
| potentially file locking) but it's not _trivial_ and failing
| to handle this correctly is a very good way to build a system
| which passes tests under low load (when the file-copying
| operation will win all the relevant races) but fail
| catastrophically under heavy load.
| chasil wrote:
| The sqlite3 utility can be used to create a transaction-
| consistent backup of a live database. This interface is the
| only way that a backup should be taken, either from the
| utility or the C API. .backup ?DB? FILE
| Backup DB (default "main") to FILE
|
| https://sqlite.org/cli.html
| chasil wrote:
| In Oracle, I can do this: RECOVER DATABASE
| UNTIL TIME '2021-10-01 02:00:00' USING BACKUP CONTROLFILE;
|
| SQLite does not implement such a feature.
| srcreigh wrote:
| There is a diff tool [0], and an online backup tool [1].
| That kinda thing should be pretty easy to cobble together.
| You'll just need ~3x disk space to store 2 backups at an
| interval, diff them and discard the older backup, then
| apply backward diffs on your most recent backup to achieve
| a specific backup time.
|
| EDIT: or maybe the Session Extension [2]
|
| [0]: https://www.sqlite.org/sqldiff.html
|
| [1]: https://www.sqlite.org/backup.html
|
| [2]: https://www.sqlite.org/sessionintro.html
| DasIch wrote:
| The state of the art is that you click a few buttons in the
| UI of the cloud provider of your choice. That takes a few
| seconds.
|
| What you are proposing may be trivial but in comparison it
| might just as well be rocket science. Doing more work to use
| a simpler database that is harder to setup and has less
| features is not exactly compelling.
| masklinn wrote:
| > modulo write ahead journal and whatnot
|
| That's like saying immortality is trivial modulo death and
| whatnot. If you don't integrate with sqlite's locking system,
| you can easily "backup" a file mid-write, corrupted and
| unusable.
|
| That's why _sqlite has a built-in backup command_.
| lrem wrote:
| I guess my secret is having almost no writes (the couple
| things I deployed with SQLite would see a dozen writes per
| month).
| eternalban wrote:
| > Only a single process can write to the database at any time;
| it does not support concurrent writers.
|
| Potential plus in cloud/container deployments.
|
| > It is not client/server; a process must be able to fopen()
| the database file. NFS and SMB are options that can convey
| access to remote systems, but performance will not likely be
| good.
|
| There are other possibilities given a bit of additional
| wrappers (over the db process) and federating nodes with some
| (consensus) protocol. It may actually be a good fit as the
| 'node-unit' for very large scale distributed metadata store
| that would benefit from relational DB semantics instead of
| basic K/V.
| umvi wrote:
| > Only a single process can write to the database at any time;
| it does not support concurrent writers.
|
| I think you can do concurrent writing now with Write-Ahead
| Logging (WAL): https://www.sqlite.org/wal.html
|
| I've never tried it though, so I don't know how suitable it is
| for web apps that might potentially have multiple processes
| trying to write to the DB at the same time.
| teraflop wrote:
| Nope. In the default rollback-journal mode, SQLite supports
| _either_ multiple readers _or_ a single writer at a time. WAL
| mode improves this to multiple readers _and_ at most one
| writer.
|
| But transactions in SQLite are often fast enough that this
| could still be acceptable for a lot of purposes, especially
| on an SSD.
| srcreigh wrote:
| There is no concurrent writing with WAL mode
| teraflop wrote:
| > NFS and SMB are options that can convey access to remote
| systems, but performance will not likely be good.
|
| And the safety of your data depends on the quality of your
| network filesystem's locking implementation. It's not too
| difficult to design a locking method that works most of the
| time, but it's a lot harder to build something that guarantees
| mutual exclusion over an imperfect network.
|
| On a single machine, file locking is generally reliable
| (mediated by the kernel). Multiple clients can access the same
| database simultaneously, with readers and writers taking turns,
| and all that happens is you pay the performance penalty of
| serialized access.
|
| But if your client _thinks_ it has a lock on a remotely-stored
| database file, but the server-side lock has actually expired
| and another client is writing to the database, your data is
| likely to get hosed.
|
| https://www.sqlite.org/howtocorrupt.html
| chasil wrote:
| Databases on remote filesystems should be limited to
| SQLITE3_OPEN_READONLY access, agreed.
| ridiculous_fish wrote:
| I'm experimenting with using SQLite to store users' history
| in fish shell, but the remote filesystems problem seems
| likely to be a showstopper. What can be done about it?
| chasil wrote:
| I'm just reading this, and learning a few new things:
|
| https://www.sqlite.org/howtocorrupt.html
|
| SQLite has an alternate lock mode with dotfiles that
| seems to prevent database corruption over NFS. It is
| important that all SQLite accesses from all connected
| processes use the same lock mode.
|
| "2.3. Two processes using different locking protocols
|
| "The default locking mechanism used by SQLite on unix
| platforms is POSIX advisory locking, but there are other
| options. By selecting an alternative sqlite3_vfs using
| the sqlite3_open_v2() interface, an application can make
| use of other locking protocols that might be more
| appropriate to certain filesystems. For example, dot-file
| locking might be select for use in an application that
| has to run on an NFS filesystem that does not support
| POSIX advisory locking.
|
| "It is important that all connections to the same
| database file use the same locking protocol. If one
| application is using POSIX advisory locks and another
| application is using dot-file locking, then the two
| applications will not see each other's locks and will not
| be able to coordinate database access, possibly leading
| to database corruption."
| ridiculous_fish wrote:
| The problems with dotfile locking are:
|
| 1. A crash risks leaving the lock file in place; it must
| be manually removed or else you hang forever.
|
| 2. The same home directory may be local and remote on
| different machines, meaning different locking protocols.
|
| I am considering a mode where the database is loaded
| fully into memory, and writes are atomically saved via
| the classic adjacent-file-renamed mechanism. You risk
| losing data due to races, but it can't corrupt the
| database.
| srcreigh wrote:
| In WAL mode, a write can mean writing some data in memory on
| the local machine. With Postgres/MySQL/etc, it has to go over
| the network.
|
| I can't parse [0] right now about the overhead of TCP and round
| trips with the database server, but it's basically a question
| of whether one SQLite write is 10x, 100x, 1000x, or more faster
| than a database server write. That should make a lot of
| difference.
|
| [0] https://gist.github.com/jboner/2841832
| chousuke wrote:
| If you have PostgreSQL or MySQL on the same machine as your
| application, you can use UNIX sockets; they won't have much
| latency at all (I think Linux also optimizes TCP on localhost
| to skip all the pointless protocol overhead).
|
| In terms of latency it'll still be difficult to beat a
| database that lives in the same process as your application,
| but it won't be as bad as going over the network might be.
| reneberlin wrote:
| This "little thingie" is so powerful, that words are missing to
| tell the story.
| pgwhalen wrote:
| SQLite is great, but it's not a more simple drop in replacement
| for DB servers like HN often suggests it is.
|
| My team at work has adopted it and generally likes it, but the
| biggest hurdle we've found is that it's not easy to inspect or
| fix data in production the way we would with postgres.
| tyingq wrote:
| >it's not easy to inspect or fix data in production the way we
| would with postgres.
|
| I assume because you're using a remote socket connection from
| the client?
|
| I haven't tried it in a serious setting yet, but I did play
| around with dqlite and was impressed. Canonical uses it as the
| backing data store for lxd. Basically sqlite with raft
| clustering and the ability for clients to connect remotely via
| a wire protocol. https://dqlite.io/
| pgwhalen wrote:
| > I assume because you're using a remote socket connection
| from the client?
|
| Yeah, it's common for all developers to connect to and query
| against prod postgres DBs via DataGrip or similar.
|
| dqlite definitely looks interesting, but I worry it's a bit
| heavy given that our only use case for remote access is prod
| troubleshooting. I think I saw something recently where you
| could spin up a server on top of a sqlite file temporarily -
| that might be ideal for us.
| srcreigh wrote:
| Did you try SSH and using SQLite command line tool?
| pgwhalen wrote:
| That ought to work, but the experience we are trying to
| emulate is that of using DataGrip, rather than psql.
| brunoluiz wrote:
| I believe you mean that you can't easily do a "psql ..." or
| connect using DataGrid and similars, right?
|
| Does this mean that devs need to copy the production database
| file locally to then inspect it? Or are there tools to
| connect/bridge to a remote sqlite file?
| pgwhalen wrote:
| > I believe you mean that you can't easily do a "psql ..." or
| connect using DataGrid and similars, right?
|
| Yeah.
|
| > Does this mean that devs need to copy the production
| database file locally to then inspect it? Or are there tools
| to connect/bridge to a remote sqlite file?
|
| We use "kubectl copy" currently when we want to inspect it,
| and we haven't actually had to write back to a production
| file yet. We've explored the "remote" option, but since it's
| just a file, everything seems to boil back down to "copy
| locally" then "copy back to remote prod".
|
| It's only a small part of our stack at the moment, so we
| haven't invested in tooling very much - but I'd be curious if
| others have solved similar problems.
| brunoluiz wrote:
| Ah I see! Yeah, I expected your team would have to copy it
| locally. I wonder tho, in times of data leaks and whatnot,
| couldn't it be dangerous to have lots of PII (personal
| identifiable information) copied around many dev laptops?
|
| I mean, devs can do the same with Postgres, but it is more
| for backups instead of purely querying.
| pgwhalen wrote:
| That's true - our software does not deal with PII, which
| makes many of our DB practices sound strange to
| outsiders.
| tfehring wrote:
| I think the usual approach would be to SSH into the server
| and run sqlite3 there. This issue [0] mentions some
| workarounds for connecting from DBeaver, which I assume would
| work for other graphical client software. I haven't tried
| those approaches, they seem pretty hacky and I imagine
| performance isn't great, but I guess that's to be expected
| given that SQLite isn't designed for that type of access.
|
| [0] https://github.com/dbeaver/dbeaver/issues/6876
| aflag wrote:
| Can't you run your queries in a copy of the data (eg.: a
| backup)? I think that'd be advisable even if you were running
| postgresql.
| lenkite wrote:
| One can create read-only users in Postgres.
| pgwhalen wrote:
| > Can't you run your queries in a copy of the data (eg.: a
| backup)?
|
| That's essentially what we do: copy the file locally if we
| need to inspect it. It's slightly more cumbersome though.
|
| > I think that'd be advisable even if you were running
| postgresql.
|
| Connecting to live prod servers is definitely not a 10/10 on
| the "best practices" scale, but it works well for our
| business (trading), where there are small developer teams
| that also operate, no PII in the database, and critical
| realtime functionality isn't directly involved with the
| database anyway.
| aflag wrote:
| With postgres you could set up a read-only replica or
| something. At least for me, the small effort is well worth
| not having to worry about accidentally deleting production
| data.
|
| I feel like having an easy mechanism to clone the
| production database somewhere you can play with is well
| worth the effort. You can even use those clones to run
| backtests and other integration/regression tests against,
| which is also a very nice to have.
| pgwhalen wrote:
| > With postgres you could set up a read-only replica or
| something. At least for me, the small effort is well
| worth not having to worry about accidentally deleting
| production data.
|
| We do do this, and probably should be more disciplined
| about connecting to it when only reading. Of course that
| doesn't help if we need to run an update in production,
| but that isn't _that_ often.
|
| > I feel like having an easy mechanism to clone the
| production database somewhere you can play with is well
| worth the effort. You can even use those clones to run
| backtests and other integration/regression tests against,
| which is also a very nice to have.
|
| We do actually do this as well (nightly), and it is a
| _huge_ productivity boost for testing and development. I
| would recommend to anyone that writes software dealing
| with persisted data to invest in an easy mechanism to
| clone from production.
| reneberlin wrote:
| My 2 cents on sqlite:
|
| https://corecursive.com/066-sqlite-with-richard-hipp/
|
| An interview with one of the creators:Mr. Richard Hipp - for a
| better and deeper understanding what pitch they took and what
| industries they were in to. Their approach to overcome the db-
| world that they saw in front of them. See the obstacles and the
| solutions and why it came to be that underestimated 'sqlite' that
| powers a good chunk of all you mobile actions triggered by your
| apps - but just read that interview - i cannot reproduce the
| dramatic here in my own words (underestimated).
| mrfusion wrote:
| I wrote a script to find all csv files in a directory and figure
| out the best way to load them into SQLite.
|
| It gives me a handy way to run queries on data.
|
| I tried to make it super smart too and really make educated
| guesses on what data types to use and even linking foreign keys.
| lenkite wrote:
| A language like R (https://www.r-project.org/) helps more for
| CSV data-science.
| https://www.tutorialspoint.com/r/r_csv_files.htm
| pietroppeter wrote:
| Nim forum uses SQLite as its db since 2012 and it fits perfectly
| the article's use case. Code is available and it can be used to
| run a discourse inspired forum (although much less featured).
|
| https://github.com/nim-lang/nimforum
| greatjack613 wrote:
| I use SQLite exclusively on a high performance crypto sniper
| project - https://bsctrader.app and I could not be happier with
| it.
|
| Performs much better then postgres in terms of query latency
| which is ultra important for the domain we operate in.
|
| I take machine level backups every 2 hours, so in the event of an
| outage, just boot the disk image on a new vm and it's off.
|
| I would never do this on my professional job due to the stigma,
| but for this side project, it has been incredible
| rhn_mk1 wrote:
| > machine level backups
|
| I presume that's block level backups? Or some snapshotting?
|
| As far as I know, block level filesystem copies can get
| inconsistent (so we have journalling file systems). But
| assuming it works well, like a filesystem aware snapshot, can
| sqlite deal with files snapshotted in the middle of an
| operation?
| jxi wrote:
| dkjaudyeqooe wrote:
| > I would never do this on my professional job due to the
| stigma, but for this side project, it has been incredible
|
| Using the right tool for the job does indeed carry a lot of
| stigma in many professional environments. Instead you use the
| tool that some VP has been sold by some salesman.
| bob1029 wrote:
| > I would never do this on my professional job due to the
| stigma, but for this side project, it has been incredible
|
| I love controversy so I was able to push SQLite through as the
| sole persistence mechanism for our product. Virtually every
| constraint lined up with our business perfectly.
|
| We sell a B2B product that gets installed by our customers, so
| they love it too. No more sweaty hands conversations about why
| their database server cant talk to their application server
| over their networks while we are caught in the middle trying to
| avoid taking ownership of their internal IT problems.
| catillac wrote:
| The stigma of using the most popular database in existence?
| copperx wrote:
| For the domain of webapps, where multiple concurrent writers
| are often expected, yes, I would say it's a stigma.
| usrbinbash wrote:
| Are they expected, or are the _required_?
|
| Because, serializing db access through a single process
| only becomes a problem when the number of reads/writes get
| so large, that the process becomes a bottlenec.
|
| And judging by the test the author of the linked article
| did, that would have to be a HUGE number.
| chunkyks wrote:
| In the past I had a website with hundreds of gigs of data that
| needed updating regularly, but could be read-only from the web
| server perspective.
|
| I used sqlite for that, and had a mysql server for the user data
| and stuff that needed to be written to. Performance was
| fantastic, users were happy, data updates were instantaneous ;
| copy the new data to the server then repoint a symlink.
|
| Most of my work is modeling and simulation. Sqlite is almost
| always my output format ; one case per database is really natural
| and convenient, both for analysis, and run management.
|
| Anyway. Sqlite is amazing.
| matdehaast wrote:
| Shout out to litestream[0] for backups
|
| [0] https://github.com/benbjohnson/litestream
| gregors wrote:
| I love this approach. I've ran sqlite in production for a variety
| of products over the years and for the most part it's great. One
| "back up" solution I did was dump tables to text - commit the
| diff in git. Taking file copies while transactions are in process
| can lead to corrupted db's.
| samwillis wrote:
| I believe SQLite is about to explode in usage into areas it's not
| been used before.
|
| SQL.js[0] and the incredible "Absurd SQL"[1] are making it
| possible to build PWAs and hybrid mobile apps with a local SQL
| db. Absurd SQL uses IndexedDB as a block store fs for SQLite so
| you don't have to load the whole db into memory and get atomic
| writes.
|
| Also I recently discovered the Session Extension[2] which would
| potentially enable offline distributed updates with eventual
| consistency!
|
| I can imagine building a SAAS app where each customer has a
| "workspace" each as a single SQLite db, and a hybrid/PWA app
| which either uses a local copy of the SQLite db synced with the
| session extension or uses a serveless backend (like CloudFlare
| workers) where a lightweight function performs the db operations.
| I haven't yet found a nice way to run SQLite on CloudFlare
| workers, it need some sort of block storage, but it can't be far
| off.
|
| 0: https://sql.js.org/
|
| 1: https://github.com/jlongster/absurd-sql
|
| 2: https://www.sqlite.org/sessionintro.html
| [deleted]
| koeng wrote:
| The golang equivalent of native SQL can be found here -
| https://pkg.go.dev/modernc.org/sqlite
|
| Very useful for pure go applications!
| dorian-marchal wrote:
| How would we deal with conflicts (e.g. syncing back several
| conflicting offline clients for the same user) with something
| based on the Session Extension?
| samwillis wrote:
| You have to handle the merge conflicts yourself, see
| https://www.sqlite.org/session/sqlite3changeset_apply.html
|
| So you need to be carful how you design your schema, but very
| possible.
|
| One option is to use something like Yjs and a JSON column to
| get proper CRDTs for merging.
|
| https://github.com/yjs/yjs
| pfarrell wrote:
| > I can imagine building a SAAS app where each customer has a
| "workspace" each as a single SQLite db
|
| I did just that at my (now defunct) startup a few years ago. We
| were building a collaborative database query tool. The software
| we built used sqlite to keep a local db for storing user
| credentials, caching query results, etc. Bonus, we were able to
| have the local database file be encrypted protecting the data
| at rest.
| hinkley wrote:
| When they tried to put SQLite into browsers I was excited but I
| had no project that needed it. It took years to finally get
| there, and wouldn't you know it, they had just marked it as
| deprecated. KV stores are just not the same.
| lenkite wrote:
| How does one deal with fault tolerance ?
| bsenftner wrote:
| Using the transaction API for modifications, and ordinary
| fault tolerance measures for the hosting application. It is
| no more difficult than any other app-foundation technical
| feature.
| rmbyrro wrote:
| You can run it on AWS Lambda and store the SQLite file(s) on
| EBS [1].
|
| [1] https://aws.amazon.com/blogs/compute/using-amazon-efs-for-
| aw...
| otterley wrote:
| That's EFS, not EBS. EFS is file-level storage (i.e., NFS);
| EBS is block-level storage (SAN). AWS Lambda only supports
| EFS.
| paulryanrogers wrote:
| I evaluated sqlite for a web extension but ultimately decided
| it wasn't worth it. There is no easy way to save the data
| directly to the file system. And saving the data in other ways
| meant I was probably better off with IndexDB instead. Still it
| is a tempting option and one that seems to work well for
| separate tenancy.
| dorian-marchal wrote:
| > There is no easy way to save the data directly to the file
| system.
|
| That's what absurd SQL is for (link in the parent comment).
| paulryanrogers wrote:
| I read that one and agree it feels absurd. Not something I
| want to depend on.
| samwillis wrote:
| I'm 100% sure this is (very nearly) suitable for
| production. It works and works well.
|
| It is only a matter of time before browsers implement the
| proposed sandboxed virtual file system apis with block
| level access and this would be superseded.
| dorian-marchal wrote:
| I believe the absurd things are: - the idea (i.e. storing
| a database in a database) - the fact that the resulting
| storage is faster than vanilla indexed db.
|
| The repo's readme is a bit misleading and it does make it
| look like a mad scientist experiment but you can read
| more about it here (really interesting stuff):
| https://jlongster.com/future-sql-web
|
| Obviously this project is still young, but I find it
| convincing (disclaimer: I don't use it yet).
| alin23 wrote:
| I'm exactly at a point where I'm considering SQLite for its
| _single file db_ advantage, but I 'm struggling to find solutions
| for my use case.
|
| I need to import some 30k JSONs of external monitor data from
| Lunar (https://lunar.fyi) into a normalized form so that everyone
| can query it.
|
| I'd love to get this into a single SQLite file that can be served
| and cached through CDN and local browser cache.
|
| But is there something akin to Metabase that could be used to
| query the db file _after_ it was downloaded?
|
| I know I could have a Metabase server that could query the SQLite
| DB on my server, but I'd like the db and the queries to run
| locally for faster iteration and less load on my server.
|
| _Besides, I 'm reluctant to run a public Metabase instance given
| the log4j vulnerabilities that keep coming._
| dkjaudyeqooe wrote:
| You could do incremental updates of the local databases using
| the session extension:
|
| https://sqlite.org/sessionintro.html
| bob1029 wrote:
| We've been using SQLite in production as our exclusive means for
| getting bytes to/from disk for going on 6 years now. To this day,
| not one production incident can be attributed to our choice of
| database or how we use it.
|
| We aren't using SQLite exactly as intended either. We have
| databases in the 100-1000 gigabyte range that are concurrently
| utilized by potentially hundreds or thousands of simultaneous
| users. Performance is hardly a concern when you have reasonable
| hardware (NVMe/SSD) and utilize appropriate configuration (PRAGMA
| journal_mode=WAL).
|
| In our testing, our usage of SQLite vastly outperformed an
| identical schema on top of SQL Server. It is my understanding
| that something about not having to take a network hop and being
| able to directly invoke the database methods makes a huge
| difference. Are you able to execute queries and reliably receive
| results within microseconds with your current database setup?
|
| Sure, there is no way we are going to be able to
| distribute/cluster our product by way of our database provider
| alone, but this is a constraint we decided was worth it,
| especially considering all of the other reduction in complexity
| you get with single machine business systems. I am aware of
| things like DQLite/RQLite/et.al., but we simply don't have a
| business case that demands that level of resilience (and
| complexity) yet.
|
| Some other tricks we employ - We do not use 1 gigantic SQLite
| database for the entire product. It's more like a collection of
| microservices that live inside 1 executable with each owning an
| independent SQLite database copy. So, we would have databases
| like Users.db, UserSessions.db, Settings.db, etc. We don't have
| any use cases that would require us to write some complex
| reporting query across multiple databases.
| nurettin wrote:
| How do you join?
| nathcd wrote:
| Presumably ATTACH: https://www.sqlite.org/lang_attach.html
| gunshowmo wrote:
| This is really interesting. Would you mind sharing a bit about
| how you keep the data that is shared across services in sync?
| Or is there a hard separation of concerns so that services
| would only commuincate with the service that owns that data to
| obtain it?
| lisper wrote:
| > So, we would have databases like Users.db, UserSessions.db,
| Settings.db, etc.
|
| How do you do joins?
| tmp_anon_22 wrote:
| I would guess they do parallelized calls to the microservices
| to collect the data.
| em500 wrote:
| Not the OP, but you can attach and join tables from multiple
| sqlite databases in the same process.
| lisper wrote:
| Really??? How? Because I use sqlite, and the query
| functions that I know take a single DB as an argument.
| dwohnitmok wrote:
| https://stackoverflow.com/questions/6824717/sqlite-how-
| do-yo...
| qvrjuec wrote:
| How is this setup fault tolerant? What happens if there is a
| hardware failure? How do you partition access in a way that
| means an extremely active user doesn't impact availability?
| bob1029 wrote:
| > How is this setup fault tolerant?
|
| It is not.
|
| > What happens if there is a hardware failure?
|
| The product would suffer a total outage until manual
| intervention takes place. A restore of the VM from snapshot
| would be carried out by the customer. Some loss of the most
| recent business data would occur (i.e. between latest
| snapshot and time of failure). All of this is understood and
| agreed to by our customers.
|
| > How do you partition access in a way that means an
| extremely active user doesn't impact availability?
|
| Partitioning is not applicable. Our entire product fits on 1
| machine and dividing the I/O along any dimension does not add
| much value to the performance equation.
| vidanay wrote:
| Wait a minute....are you implying that I don't need Azure
| and FB level infrastructure for my local recipe
| application?
|
| /s
| lkxijlewlf wrote:
| I'm pretty sure you _do_ need FB and Azure level infra
| for you recipe app. I 've read a couple blog posts,
| watched a video on YouTube and copied the code from Stack
| Overflow, so I'm pretty much an expert on this, so trust
| me.
| vidanay wrote:
| My recipe application is going to be written in a purely
| functional language with CQRS and a message bus that
| provides "guaranteed once" determinism. I should be able
| to spin up no more than a half dozen containers on two
| VM's to handle this architecture. The database will
| probably be redundant MSSQL and Oracle (just in case one
| technology is fundamentally broken). Database writes will
| be proxied by a Redis instance. I will use 3x 3080 GPU to
| train a ML model used for scaling the recipe depending on
| the number of people you wish to serve. Hmmm...I might
| need a third VM.
| zbentley wrote:
| > The database will probably be redundant MSSQL and
| Oracle (just in case one technology is fundamentally
| broken)
|
| Only _one_ technology?
| joconde wrote:
| You can't do any serious neural stuff on 8GB of VRAM. I'd
| say save yourself some pains and buy the $3000
| professional models with ECC so you can train
| transformers.
| kevin_thibedeau wrote:
| It's never gonna scale. Get on the NoSQL train now to
| future proof and woo the VCs.
| xcambar wrote:
| Yes. Make it web scale.
|
| Also, think early on about your compensation packages.
| You don't want to lose a 10x engineer to a FAANG, do you?
| qvrjuec wrote:
| Interesting. For an extremely specific use case and with
| users who understand and accept the caveats of this
| approach I'm sure it would work well enough. The most
| confusing thing to me is that there is apparently an
| intersection of users who are ok with an outage and data
| loss with users who want a product which can
|
| > execute queries and reliably receive results within
| microseconds
|
| What is your product? Who are these users?
| gbear605 wrote:
| I can think of plenty of services that an occasional
| (once a year? less?) outage is okay. Heck, anything
| relying on AWS us-east-1 is going to have outages that
| frequently based on the last few months. Meanwhile,
| almost any service is better off when its response times
| are cut drastically. I've seen many instances where a
| service's response times are more than half waiting for a
| db to respond.
| qvrjuec wrote:
| It's not the threat of an outage with data loss that is
| concerning to me- I just want to understand use case that
| needs fractions of a second shaved off of query times by
| using SQLite in this way that is also ok with the
| possibility of data loss.
| wongarsu wrote:
| Interactive interfaces. There's a huge difference between
| moving a slider and seeing the reaction in real time and
| moving the slider and seeing the reaction a second later.
| If you define "real time" as 30 fps, you have 33ms to
| process each query and show the result. That could
| involve multiple database queries with computation in
| between, if your business logic isn't easily expressible
| in SQL.
|
| Come to think about it, that covers most apps with UI.
| Apps where you are exploring data are definitely more
| impacted however.
| bob1029 wrote:
| I agree that 'reliably' would be a poor choice of diction
| when considered in your context.
|
| The user group of our product is effectively only
| professional bankers and their managers. No one in the
| general public has any direct access.
| lkxijlewlf wrote:
| qvrjuec wrote:
| No need to be condescending when I'm trying to learn. The
| premise of using SQLite because it was easier and to save
| fractions of a second didn't make sense to me as a
| tradeoff for potentially losing data
| twicetwice wrote:
| Interesting. So I would ordinarily want to put a foreign key
| constraint on the user_id column of a UserSessions table (or
| similar). In general, presumably you have relationships across
| the tables that are in those discrete databases. Do you just
| enforce these constraints/do joins in code? It seems like
| splitting related tables across multiple databases loses some
| (possibly a lot?) of the benefits of relational DBs, so I'm
| curious how you handle/manage that.
|
| That said, I love the idea this architecture. Might use it for
| whatever next dumb little web service I cook up! I love how
| this simplifies a lot of dev/deployment ops, perfect for a side
| project.
| nathcd wrote:
| SQLite has ATTACH (https://sqlite.org/lang_attach.html) for
| cross database operations. I've never tried it for foreign
| key constraints across databases, but I think it would work?
| stu2b50 wrote:
| You'd have to do the join on the client. This is a problem in
| general for scaling SQL - its not impossible, but with
| sharded SQL servers you have to be very deliberate in how you
| do joins so as to coincide as much as possible with your
| sharing strategy.
|
| It would be similar here. According to the author each sqlite
| DB belongs to a single microservice, which will naturally
| group together the most common of joins. Anything else will
| indeed have to be manually joined.
|
| Part of the reason nosql became popular for a bit. That's
| reversed, and sharded SQL is pretty common now, but it
| definitely adds more cognitive load to schema design.
| plq wrote:
| To my knowledge, WAL mode still needs to serialize writes for
| each database file. I'm assuming this is not a setup where
| there are too many concurrent writers?
| bob1029 wrote:
| Correct - We are using the typical SQLite build which
| serializes all writes by default and we have made no effort
| to undo this behavior. We actually rely on this to ensure
| things happen in the right order and take advantage for
| performance reasons.
|
| Because SQLite is effectively serializing all the writes for
| us, we have zero locking in our code. We used to have to lock
| when inserting new items (to get the LastInsertRowId), but
| the newer version of SQLite supports the RETURNING keyword,
| so we don't even have to lock on inserts now.
|
| Also, the fact that we have the databases divided across
| function helps free up some of the lock contention in the
| provider. We don't really have any mixed workload databases -
| its either "slow" gigantic operations (JSON blob access), or
| super quick tiny things (updating session timestamps). So,
| there is some minor isolation of contention issues on a
| subsystem basis.
| thunderbong wrote:
| Testing with JMeter, I got upto 10,000 concurrent writes on
| SQLite with WAL mode enabled
| scottcodie wrote:
| As long as you're estimating your future costs correctly then
| you're golden. If you aren't and your application becomes
| complex through growth or you need resiliency then you'll need
| to pay that cost and that cost can be big.
| starik36 wrote:
| Is this a situation where multiple web servers in a farm are
| accessing the SQLite databases on a file server?
| bob1029 wrote:
| This is a situation where a single process owns all of its
| SQLite databases and handles everything out of 1 machine.
| [deleted]
| gfody wrote:
| > It is my understanding that something about not having to
| take a network hop and being able to directly invoke the
| database methods makes a huge difference. Are you able to
| execute queries and reliably receive results within
| microseconds with your current database setup?
|
| with SQL Server you can get a very fast local connection by
| specify "server=(local)" in the connection string - this uses
| shared memory protocol bypassing the net stack.
| bob1029 wrote:
| This is true - The counterpoint is that now you have this
| leviathan that is the SQL Server process(es) running on the
| same machine as the application.
|
| If I am constraining my SQL Server usage to fit on 1 box, I
| might as well use SQLite (assuming no future plans for
| horizontal scaling).
| chasil wrote:
| Oracle clients can do the same by setting the ORACLE_SID and
| ORACLE_HOME environment variables.
|
| For remote databases, setting the TWO_TASK environment
| variable to the server's TNS descriptor is one way to force a
| network login.
| frays wrote:
| Is anyone here running production workloads which perform read
| and write operations on a remote SQLite database?
|
| Currently using Postgres and I'm open to switching but I haven't
| seen any libraries or implementations of SQLite being used as a
| client/server database.
| danielheath wrote:
| Why switch? Typically the work involved would need to be
| justified by some benefit, right?
| galaxyLogic wrote:
| SQLite database can be stored in git which seems like a great
| benefit. But I wonder would it also be possible to have different
| "branches" of the database and then merge them at some point?
| pgwhalen wrote:
| It's not integrated with git the way you're perhaps imagining,
| but SQLite sessions[0] is adjacent to what you're imagining.
|
| [0] https://www.sqlite.org/sessionintro.html
| davidhariri wrote:
| Just a note that there are significant features of SQLAlchemy
| that don't work with SQLite such as ARRAY columns, UUID primary
| keys and certain types of foreign key constraints.
| zzzeek wrote:
| well no major database other than PostgreSQL has native support
| for UUID or ARRAY, you can certainly use string-based types for
| these things for other databases. the DB agnostic UUID is at
| https://docs.sqlalchemy.org/en/14/core/custom_types.html?hig...
| and for ARRAY it's likely most convenient to use the SQLite
| JSON datatype which is also supported directly.
| reneberlin wrote:
| What a cruel world- sqlite to the rescue!
| kristianpaul wrote:
| vanilla-almond wrote:
| Is SQLite suitable for a small-to-medium CMS (Content Management
| System), or a blog platform e.g. WordPress (MySQL) or Ghost
| (MySQL)?
___________________________________________________________________
(page generated 2021-12-29 23:01 UTC)