[HN Gopher] SQLite or PostgreSQL? It's Complicated
       ___________________________________________________________________
        
       SQLite or PostgreSQL? It's Complicated
        
       Author : makaimc
       Score  : 175 points
       Date   : 2022-06-28 14:33 UTC (8 hours ago)
        
 (HTM) web link (www.twilio.com)
 (TXT) w3m dump (www.twilio.com)
        
       | danaugrs wrote:
       | How about PostgreSQLite? Joke aside, is anyone building something
       | that takes a stab at fixing the issues with both?
        
       | philliphaydon wrote:
       | It's difficult to accept the results without looking at the query
       | plans to see how the queries are running and if they are optimal.
       | Seems like it's just a straight dump of data into PostgreSQL and
       | letting SQLAlchemy handle some queries and indexes but no
       | analysis of the indexes at all.
       | 
       | Could be able to squeeze more perf out of both SQLite and
       | PostgreSQL.
        
         | jayski wrote:
         | exactamundo
        
         | miguelgrinberg wrote:
         | The point of this exercise was to determine how these database
         | perform with same schema, and if it makes sense to make the
         | jump from SQLite to PostgreSQL or not. As a side note and since
         | you ask, the indexes on this database are fairly well thought
         | out, I have exhausted all ideas on indexing improvements to
         | help with performance. I do mention a major implementation
         | enhancement which is to precalculate monthly totals.
         | 
         | Also, another important point I tried to make is that this
         | benchmark is useful for this particular deployment. I don't
         | really want to convince you the numbers that I've got will also
         | apply to you, my proposal is that you should test your
         | application and make decisions based on your own results, which
         | can certainly be different than mine.
        
           | philliphaydon wrote:
           | While the indexes /may/ be pretty well thought out. That
           | doesn't mean that they are useful.
           | 
           | Indexes don't work the same between SQLite and PostgreSQL so
           | a 1:1 transition between the two databases on indexes
           | probably isn't going to yield the best results.
           | 
           | For example UUIDs were stored incorrectly in PostgreSQL.
           | 
           | SQLite does not enforce varchar lengths, and typically in
           | PostgreSQL you just use text type and not impose any length
           | restrictions unless absolutely necessary.
           | 
           | PostgreSQL can use multiple indexes in a single query which
           | may help with the 2gb ram limits.
           | 
           | PostgreSQL supports partitioning which could help with the
           | range queries.
           | 
           | All I'm saying is that I believe the performance you get from
           | SQLite imo could be better, and the performance from
           | PostgreSQL a lot better.
        
             | pphysch wrote:
             | > typically in PostgreSQL you just use text type and not
             | impose any length restrictions unless absolutely necessary.
             | 
             | There was a horror story posted on here a while back about
             | some old API causing an outage because there happened to be
             | a random text field that a user was effectively using as an
             | S3 bucket. Their conclusion was to _always_ set a sane
             | length constraint (even if very large, say 1MiB for the
             | smallest fields) on field sizes.
             | 
             | Hyrum's Law. Yes user input validation, blah blah, but that
             | is infinitely more surface area than having a sane (or
             | absurdly large) limit by default.
        
               | jeffdn wrote:
               | Right, but use a check constraint and not a data type
               | constraint (i.e. `varchar(x)`).
               | 
               | That sounds more like a code review issue -- `blob`
               | would've been an appropriate storage type in that case.
        
               | pphysch wrote:
               | > That sounds more like a code review issue -- `blob`
               | would've been an appropriate storage type in that case.
               | 
               | No, it was supposed to be a short text field. Not an
               | infinite storage endpoint!
        
               | Pv9KKy3 wrote:
               | I use and prefer varchar(1000) personally, extremely easy
               | to change in the future if needed.
        
             | trollied wrote:
             | There are also a shitload more indexing options available
             | to you in Postgres.
        
           | BeefWellington wrote:
           | > As a side note and since you ask, the indexes on this
           | database are fairly well thought out, I have exhausted all
           | ideas on indexing improvements to help with performance. I do
           | mention a major implementation enhancement which is to
           | precalculate monthly totals.
           | 
           | So, this is what Data Warehousing is all about: reporting
           | related optimizations (typically what monthly totals and such
           | are aimed at). The idea being that you trade disk for speed.,
           | so that kind of optimization is actually pretty normal if
           | you've identified it as a bottleneck. There's no need
           | necessarily to separate data warehousing constructs into
           | another DB, though it's often good practice because sooner or
           | later you're going to start running into problems on your
           | important system related to storage inflation and non-indexed
           | sorts/filtered.
        
             | sodapopcan wrote:
             | I'm surprised you are the only comment mentioning this. I
             | don't have quite enough knowledge to speak up about it here
             | but it seems that while this article is about SQLite vs
             | PostgresQL, it's really that neither are the ideal answer
             | in this scenario.
        
       | msbarnett wrote:
       | > Once again, SQLite happily stored my UUIDs on this column and
       | even indexed them, but to get these across to PostgreSQL I needed
       | to fix the schema and properly declare these keys as strings.
       | 
       | In PostgreSQL you're leaving performance on the table if you
       | store UUIDs as strings instead of as the built in UUID type
       | (128-bit value under the hood)
        
         | cryptonector wrote:
         | Plus PG and SQLite3 have fundamentally different approaches to
         | types.
         | 
         | SQLite3 has very few types, you cannot create new types, and is
         | duck-typed.
         | 
         | PG has many built-in types, you can create new types, and is
         | (mostly) statically typed.
         | 
         | You really have to know this going in. If you have to support
         | both, you'll probably have to write two versions of your SQL --
         | or you'll end up using an ORM that supports the two RDBMSes,
         | and then you may get stuck with the ORM.
         | 
         | SQLite3 is amazing, but it's mainly amazing because of how easy
         | it is to embed in apps. If that's what you need, use SQLite3.
         | If you need more power, if you're building a server, then think
         | long and hard about why not just go straight to PG.
         | 
         | EDIT: SQLite3 is amazing for other reasons too.
        
           | m-p-3 wrote:
           | And SQLAlchemy made the process of migrating from one to
           | another quite simple too.
           | 
           | https://www.sqlalchemy.org/
        
           | password4321 wrote:
           | > As of version 3.37.0 (2021-11-27), SQLite provides STRICT
           | tables that do rigid type enforcement
           | 
           | https://www.sqlite.org/datatype3.html
        
             | m-p-3 wrote:
             | Thanks! Here's the link to the actual details
             | 
             | https://www.sqlite.org/stricttables.html
        
               | cryptonector wrote:
               | I'm aware of this, and it's a very welcomed addition,
               | though the types supported are still very limited, but...
               | 
               | > though new types might be added in future releases of
               | SQLite
               | 
               | Intriguing! Can't wait for SQLite3 to have `CREATE TYPE`!
        
           | anothercat wrote:
           | Is it really duck typed if no static type checking is
           | performed on the SQL? Duck typing to me seems to imply
           | checking quacks and waddles at "compile" time. I would think
           | it's at most loosely typed and at worst dynamically typed.
           | 
           | EDIT: Without the new strict mode of course, where it is not
           | duck typed but typed by name.
           | 
           | EDIT 2: Actually maybe duck typing is fine to use for dynamic
           | languages too. The affinity of INT keys and stuff is sort of
           | duck typing I guess.
        
             | kjeetgill wrote:
             | You're getting downvoted because the most well known duck-
             | typed languages, and probably where people first learned
             | the term from, are all examples of very dynamic languages
             | like python, ruby, and even js.
             | 
             | I'm sure there are others but the main language anyone
             | knows static duck-typing for is go. In go at least, the
             | interfaces get checked against implementations based what
             | instances are bound to what interface variables or
             | parameters at compile time.
        
               | kevin_thibedeau wrote:
               | Just to add more detail. SQLite was designed for Tcl
               | interop. Its typing model reflects that.
        
         | miguelgrinberg wrote:
         | For this particular application I discuss in the article it
         | would make no difference, since there is only one not heavily
         | used table that uses UUIDs. All the other tables use integer
         | primary keys.
         | 
         | Also this may be not a very popular opinion, but I prefer to
         | design the database without using database-specific features.
         | Things such as using SQLAlchemy and sticking to basic types.
         | Sure it may affect performance some, but thanks to having done
         | that I was able to port the app from SQLite to PostgreSQL in
         | like an hour.
        
           | lenkite wrote:
           | Prefer this approach too. Makes unit testing easier and makes
           | data more portable. Stick to the commonly supported set of
           | SQL data-types.
        
           | PeterisP wrote:
           | You can't completely avoid using database-specific features -
           | even the most basic types such as strings effectively require
           | database-specific treatment at least if you want strings that
           | are long and/or unicode.
        
           | klysm wrote:
           | > Also this may be not a very popular opinion, but I prefer
           | to design the database without using database-specific
           | features.
           | 
           | I prefer to leverage the DB schema to do as much as possible
           | for me. Generally with PostgreSQL that means using a lot of
           | domain types etc.
        
           | msbarnett wrote:
           | > Also this may be not a very popular opinion, but I prefer
           | to design the database without using database-specific
           | features.
           | 
           | The problem with that is that SQL as a standard is so patchy
           | and inconsistently implemented that effectively everything
           | has database-specific features. You discovered that yourself
           | when you inadvertently relied on SQLite's "feature" of
           | letting you store UUID strings in an integer column. No
           | matter how portable you imagine your schema is, it will work
           | very, very differently between any two SQL implementations.
           | 
           | I can't see any value whatsoever in voluntarily tying one
           | hand behind your back performance-wise, honestly. It's not
           | like changing "text" to "uuid" would have noticeably
           | increased the time this port-job took you.
           | 
           | As a wise man once said, if you're doing a job manually while
           | the tool designed for that job is sitting idle beside you,
           | the tool is smarter than you are.
        
             | sgbeal wrote:
             | > The problem with that is that SQL as a standard is so
             | patchy and inconsistently implemented that effectively
             | everything has database-specific features.
             | 
             | Noting that when it comes to such patchiness, sqlite's
             | developers specifically prefer to follow PG's example. When
             | they write new SQL features, their basis for comparison is
             | literally always "do we get the same answer as PG?" Yes,
             | there are differences between the two, but porting between
             | sqlite and PG, provided no fancy PG features are involved,
             | is very likely simpler than porting between any two other
             | arbitrary SQL engines. (That's ignoring any performance
             | factors - i'm only talking about semantic compatibility.)
        
         | masklinn wrote:
         | There's also no word about indexing or analysis (and postgres
         | is _very_ sensitive to having up-to-date statistics).
         | 
         | And on sqlite you might be leaving memory / storage on the
         | table if you use a UUID as PK and don't remove the implicit
         | ROWID.
        
           | miguelgrinberg wrote:
           | I did not feel that was a relevant topic for this article. I
           | have spent a good amount of time over the life of this
           | application optimizing indexes, though. But the article is
           | about how I designed a test to benchmark these two databases,
           | not a general discussion on all the optimizations you can
           | make.
        
             | phibz wrote:
             | In postgres there usually no good reason to use varchar
             | unless you specifically want to enforce size limits. If you
             | don't care use TEXT, the performance is the same.
        
       | mikece wrote:
       | I really wish WordPress supported using SQLite as it's database
       | instead of MySQL as this could make backups as simple as
       | rsync'ing from prod server to backups.
        
         | herbst wrote:
         | To be fair exporting and importing databases is a simple
         | command on MySQL (or postgres) so it's just a command more
         | simple than just rsyncing the sqlite file.
        
         | lazide wrote:
         | Wouldn't you run the (real) risk of a corrupted backup?
        
           | edbaskerville wrote:
           | Yeah. SQLite's backup API is the right way to do this:
           | 
           | https://sqlite.org/backup.html
        
             | ElectricalUnion wrote:
             | And, if you can't use the backup API for some reason, try
             | VACUUM INTO:
             | 
             | https://www.sqlite.org/lang_vacuum.html#vacuum_with_an_into
             | _...
        
         | simonw wrote:
         | There's a WordPress plugin that adds support for SQLite as an
         | alternative to MySQL.
         | 
         | The implementation is (to my mind) simply astonishing: they run
         | regular expressions against the SQL to convert it from MySQL
         | dialect to SQLite! https://github.com/aaemnnosttv/wp-sqlite-
         | db/blob/9a5604cce13...
         | 
         | I'm amazed that this works, but apparently it works really
         | well. I would expect it to break the moment it ran into a
         | WordPress plugin that executed custom SQL in a way they hadn't
         | anticipated though.
        
         | evanelias wrote:
         | WordPress became popular at a time when LAMP hosting was easy,
         | widespread, and affordable. It's one reason WordPress largely
         | supplanted Movable Type, which was an early popular blogging
         | platform / CMS / static site generator. Because Movable Type
         | was Perl-based, it was more difficult to host.
         | 
         | Internally, Movable Type had a really nice object persistence /
         | ORM abstraction, with support for MySQL, Postgres, SQLite, and
         | Oracle. Meanwhile, iirc around this time WordPress internally
         | had raw queries strewn about its codebase, which certainly
         | would have been difficult to support multiple DBs. (I haven't
         | looked at WP since then so I don't know if this is still the
         | case.)
         | 
         | Faced with shrinking market share and being out-executed on
         | features by WordPress/Automattic, Movable Type actually removed
         | Postgres and SQLite support in response! Then again, these DBs
         | weren't nearly as popular back then as they are today.
        
         | forinti wrote:
         | It would be great to have the option to move easily between
         | databases.
         | 
         | SQLite would be best for development, testing, and small sites.
         | MySQL would be better for production . And SQLite would be be
         | great for archiving old sites.
        
         | forgotpwd16 wrote:
         | Not only SQLite. I would've expected WordPress be database
         | agnostic.
        
           | eddieroger wrote:
           | Why? WordPress is from an era of PHP where the ties between
           | PHP and MySQL were tighter than tight. What reason would they
           | have to change away from that? I would love if WP worked with
           | more than just MySQL as well, but I understand why it
           | doesn't.
        
             | forgotpwd16 wrote:
             | It being from an era doesn't mean it has to remain in that
             | era. I also understand why it doesn't. I'm saying it isn't
             | expected to (anymore).
        
       | cryptonector wrote:
       | My advice: go straight to PG. And avoid ORMs.
        
         | heurisko wrote:
         | Hibernate+Spring Data allow me to write native queries, and
         | maps them to interfaces for me.
         | 
         | Using an ORM doesn't mean you have to rely on behaviour that
         | generates slow (e.g. n+1) query issues.
        
         | rr888 wrote:
         | I like that but its the testing that makes it hard. Being able
         | to use ORM then mock yourdatabase with sqlite or similar is
         | very useful. You can use testcontainers but not in my firm.
        
           | heurisko wrote:
           | This is the promise, but in experience nasty bugs occur when
           | the abstraction inevitably doesn't work.
           | 
           | If it doesn't work with an real life implementation e.g.
           | docker test container, then I don't consider it a robust
           | test.
        
           | cryptonector wrote:
           | You can test with PG. Trivially.
        
         | rco8786 wrote:
         | My advice: advice with no reasoning or context is not useful to
         | anyone.
        
           | cryptonector wrote:
           | Much much e-ink has been spilled on the subject of ORMs-good-
           | or-bad. I shan't always reprise that.
        
             | rco8786 wrote:
             | Literally you are the person who brought up ORMs.
        
               | cryptonector wrote:
               | https://news.ycombinator.com/item?id=31909806
        
         | philliphaydon wrote:
         | ORMs are fine but they are not a silver bullet that means you
         | never need to look at the database or the queries generated.
         | 
         | I think too many people use ORMs and rely on them to do
         | everything, then blame the database when things are slow. :(
        
           | cryptonector wrote:
           | Devs start with ORMs because they promise to make everything
           | easy.
           | 
           | Then their apps grow complex enough that the ORMs get in the
           | way.
           | 
           | Then the devs get sad.
        
             | bbkane wrote:
             | Re: no ORMs - do you have any advice on how I could make
             | this raw SQL insertion ( https://github.com/bbkane/starghaz
             | e/blob/54d255f524f8ef73384... ) more readable and less
             | error prone? Right now I'm basically concatenating strings
             | and hoping I don't misorder my parameters when I exec. 200
             | lines of this!! Am I doing raw SQL wrong?
        
               | bestinterest wrote:
               | It takes a lot to say very little in Go sometimes, I'd
               | say this is one of those cases. Mainly the if err != nil
               | boiler is at issue here.
               | 
               | I don't think you are doing anything wrong from a noobies
               | opinion.
        
               | cryptonector wrote:
               | > Right now I'm basically concatenating strings
               | 
               | Well, you're using query parameters, but I think I
               | understand what you mean.
               | 
               | What I would do is write a stored procedure that takes
               | whatever parameters, then I'd write PG SQL to generate
               | bindings to that procedure for whatever language you're
               | using (Golang here). I might write the bindings by hand
               | if it's not a lot of work, but definitely would write a
               | little compiler if it's a tone of work. That way you can
               | get type-safety.
        
               | bbkane wrote:
               | Thank you. I'm using SQLite3 to keep deployment simple,
               | which unfortunately doesn't support stored procedures. Do
               | you have any other suggestions to improve this code?
        
               | cryptonector wrote:
               | SQLite3 very much supports something like stored
               | procedures though! Only, instead of calling them that it
               | calls them `INSTEAD OF INSERT` triggers.
               | 
               | So, you'd do something like:                 CREATE VIEW
               | IF NOT EXISTS things AS ...;       CREATE TRIGGER IF NOT
               | EXISTS create_thing       INSTEAD OF INSERT ON things
               | FOR EACH ROW       BEGIN           ...       END;
               | 
               | and to invoke such a "stored procedure" you'd
               | INSERT INTO things (...)       VALUES (...), ...;
               | 
               | or `INSERT .. SELECT ..;`.
               | 
               | Now, with SQLite3 you don't get reflection via SQL, like
               | you do in PG via the `pg_catalog` schema, but you do get
               | reflection via the API. Also, because it's SQLite3, you
               | don't get strong typing, and you get very few types, so
               | even with reflection via the API, generating type-safe
               | stub functions to invoke the "stored procedure" is not
               | exactly easy or possible.
               | 
               | FYI, SQLite3 is Turing-complete. You can loop via
               | recursive triggers. And you can do conditionals via
               | `WHERE` clauses on the statements in your trigger bodies
               | (and also via `OR IGNORE` and such).
               | 
               | EDIT: I made a post of this:
               | https://news.ycombinator.com/item?id=31913062
        
         | ormaaj wrote:
         | You do know an ORM is a thing that is used with a database
         | rite? Perhaps you actually meant "don't use a relational
         | database". Either way, LOL
        
         | oh-4-fucks-sake wrote:
         | In the JVM world, jOOQ is great happy-medium technology between
         | the extremes of string SQL and full-blown ORM:
         | 
         | - Type-safe DSL generated from your schema that looks like SQL
         | 
         | - (If you can write the SQL, you can write it in jOOQ)
         | 
         | - SQL-injection protection
         | 
         | - Strong multi-dialect and vendor-specific support
         | 
         | - Custom-type mappers
         | 
         | - Composable queries
         | 
         | - Compile-time, schema-aware query safety
         | 
         | - Fine-grained transaction control
         | 
         | - And so much more...
        
           | gmac wrote:
           | There are various non-ORM database libraries for TypeScript
           | too. I develop Zapatos, but Slonik and pg-typed are probably
           | better known.
           | 
           | https://jawj.github.io/zapatos/
           | 
           | https://github.com/gajus/slonik
           | 
           | https://pgtyped.vercel.app/
        
       | xwdv wrote:
       | No one ever got fired for just using postgtreSQL, maybe burned
       | though.
        
       | ormaaj wrote:
       | This is totally uncomplicated. The choice of database is a
       | function of your application and its architecture, not a matter
       | of all-things-equal performance comparison.
       | 
       | "I want an extensible versitile database platform for use as a
       | component in an infrastructure that is part of a larger system" -
       | postgres.
       | 
       | "I want an embeddable lightweight SQL storage backend for a
       | narrow scope, single-purpose, probably single-user, probably
       | single-system application" - sqlite
       | 
       | Postgres should perform well in any role in the hands of an
       | expert - it's a question of whether or not that is overkill.
       | Sqlite can also perform well in a subset of those roles.
        
       | bob1029 wrote:
       | I think it's even more complicated.
       | 
       | The reduction in latency brought on by in-process databases,
       | combined with modern NVMe storage, means that SQLite is a
       | substantially faster approach than any other solution which
       | requires a trip through the network stack.
       | 
       | I've got services in production for several financial
       | institutions right now that are resolving _most_ SQL queries
       | within 100-1000 _micro_ seconds by simply using SQLite on
       | reasonable hardware.
       | 
       | How many more users could you support if the amount of time you
       | had to await IO for each was reduced by ~2 orders of magnitude?
       | 
       | Obvious caveats here being the resilience angle. We addressed
       | this with application-level protocols and additional instances.
        
         | infogulch wrote:
         | Another angle is unbounded horizontal scalability across
         | tenants. If your application can be divided cleanly between
         | different customers and one customer can be served with one
         | instance (see "How many more users could you support if the
         | amount of time you had to await IO for each was reduced by ~2
         | orders of magnitude?") then scaling to more customers is
         | extremely simple and linear.
        
         | srcreigh wrote:
         | Can you share more info about resilience, app protocols and
         | additional instances?
        
       | andy_ppp wrote:
       | Why is this limited to four clients? Maybe I missed something but
       | it would be good to see what happens with higher numbers of
       | connections (something both of these dbs are rumoured to be bad
       | at).
        
       | mritchie712 wrote:
       | Do clickhouse next!
        
       | umvi wrote:
       | Can SQLite be used outside of single threaded contexts? Last I
       | checked having multiple concurrent readers and writers is very
       | difficult. Multiple concurrent readers only is probably ok if
       | that's all you need.
        
         | simonw wrote:
         | In practice, most SQLite writes take less than a few ms, so
         | concurrent writes are only likely to be a problem for you if
         | you need to handle thousands of writes per second.
         | 
         | I've implemented the pattern where writes sit in an in-memory
         | queue and are applied to a dedicated write connection in
         | series. I've not run into any problems with this so far, but
         | I've also not tried to handle thousands of writes per second
         | with it yet.
        
         | neverartful wrote:
         | It can, but as the concurrency increases you'll start to see
         | errors indicating that database is locked.
        
       | ed25519FUUU wrote:
       | The fact that the performance is so close between a database that
       | requires NO SERVER and one that requires a dedicated server + ops
       | management is a signal to me that all projects should start on
       | SQLite unless there is a compelling reason not to.
        
         | rco8786 wrote:
         | yes!
         | 
         | It seems to me (or at least, I'm hoping this is happening) that
         | the pendulum is swinging back to simple deployments and away
         | from AWS/Cloud/K8s for every tiny little app.
         | 
         | One thing I'd love to see is that the 'default' MVP deployment
         | is your code + SQLite running on a single VM. That will scale
         | well into "traction" for the vast majority of applications
         | people are building.
        
           | namecheapTA wrote:
           | I have a fully working SaaS with 30+ companies paying, with
           | 20 employees each on average. I wrote it as a self taught
           | programmer via YouTube and stackoverflow. It is hosted on a
           | $2 a month namecheap sharedhosting. Php back, jQuery front. I
           | use cpanel to set things up.
           | 
           | It runs faster than 90% of webapps on the internet.
        
             | lazide wrote:
             | Yup, I believe it. As an industry 90% of companies are
             | spending 90% of their time on crap only the remaining 10%
             | of the companies actually need.
             | 
             | Good resume fodder though I guess.
        
             | Pv9KKy3 wrote:
             | As a former php dev I absolutely believe this. I can't get
             | over how slow and bloated stuff is these days. I can't even
             | do front-end anymore.
             | 
             | We once did an app ages ago where the database was used to
             | create a materialized view of code (PHP) + web pages for
             | everyone and everything. We then rsynced that to 6
             | machines. This is ancient times, but this thing FLEW ->
             | click click click click you could fly through the pages. It
             | was a read heavy workload, but still, even the front-end
             | (just hand coded) was somehow faster than all the latest
             | and greatest going on these days.
        
             | downut wrote:
             | Do you implement something akin to Role Based Access
             | Control for those 30x20 employees? If so where do you store
             | and manage that data?
             | 
             | I have zero experience with this but I am very curious how
             | people do it in sqlite.
        
               | ElectricalUnion wrote:
               | Not OP, but like any other database? Even if the database
               | supports it, you usually should not have users directly
               | logged in the database anyways, so whenever the database
               | "has" or "lacks" access control, it doesn't matter.
               | 
               | What does matter however, is enforcing parametrized
               | queries everywhere. Unless all the db handles you pass to
               | the client handling code are read-only, chaos will ensure
               | from the DDL permissions.
               | 
               | https://xkcd.com/327/
        
               | downut wrote:
               | I start by assuming a server-side bridge between the
               | client and the database, and it's a given the queries are
               | parameterized. So only the bridge is doing the "logging
               | in".
               | 
               | Why is it superior to put all of the (bespoke) access
               | control logic in the server side bridge rather than use
               | what's available in the database (accessed by the bridge,
               | not the client)?
               | 
               | I have been watching like a hawk for 6 months but I
               | haven't stumbled upon a clear reason why this is done,
               | except for "it helps source code db portability".
               | 
               | For a multiorg/multiuser application this seems like the
               | crucial distinction between sqlite and postgresql.
               | 
               | Again I have no experience here, talk to me like I'm
               | stupid (I really am!).
        
               | ElectricalUnion wrote:
               | For a multiorg approach, you probably want isolated DBs
               | anyways.
               | 
               | Within a single org, multiuser approach, there are 2 big
               | problems that I remember with attempting to shoehorn DB
               | auth into application auth:
               | 
               | * assuming you use a connection pool, you might run out
               | of TCP connections/ports if you need to handle too much
               | stuff;
               | 
               | say for example that your load balancer need 3
               | application nodes behind it - you will need 2
               | (connections per user) x 3 (application nodes)
               | connections just to handle a user - 6 connections/user.
               | That will eat your database connection limit very fast,
               | for no good reason.
               | 
               | * assuming you don't use a connection pool, you now have
               | horrible latency on every connection (bad) and need to
               | handle plain text passwords (assuming you use scram-
               | sha-256), or md5 non-replay-resistant hashes of user
               | passwords in, either sent in every client request, or in
               | a shared session system. No matter what you pick, you
               | have a security disaster in the making (very bad).
        
               | rco8786 wrote:
               | I think OP is saying their customers are mainly companies
               | with ~20 employees, not that their company has 20
               | employees
        
             | rco8786 wrote:
             | I believe it!
        
             | fmakunbound wrote:
             | In my experience the pain will come later as it grows, I
             | believe it.
        
               | rco8786 wrote:
               | And it's a great pain to have
        
         | avinassh wrote:
         | hmm in that case I would probably pick Postgres because despite
         | the added network call, it is still faster than an embedded DB.
        
           | ormaaj wrote:
           | As I understand it the actual low-level database interface
           | may be a shared library mechanism in either case. At least,
           | for a local database I'm pretty sure that's a function of the
           | connection type. You're still dlopen()-ing an ODBC library
           | and the underlying glue between that and the rest of the
           | userspace DB engine parts can be implemented in many ways. If
           | there's a socket involved it would probably be AF_UNIX
        
         | masklinn wrote:
         | It's funny because to me it says pretty much the opposite:
         | because sqlite has no server it works entirely in-process,
         | whereas the client will be talking to postgres over a unix
         | socket, or possibly a full-blown network socket.
         | 
         | Plus it seems to be a readonly workload, something for which
         | posgres is not ultra relevant (not that it doesn't work,
         | but...).
        
         | lazide wrote:
         | Uh, most distributed environments you have no choice but to use
         | a server, because you'll have clients on a (relatively)
         | uncontrollable set of other machines.
         | 
         | If you're writing something to run in an embedded/client app
         | environment, then yeah why would you use Postgres for your one
         | machine? You could, but it'll add a lot of moving parts you'll
         | never need and probably don't want (like remotely accessible
         | network ports you'll need to secure)
        
         | jayd16 wrote:
         | Postgres doesn't require a _dedicated_ server. If your
         | requirements allow for SQLite but it didn't exist, you could
         | conceivably run postgres on your app server to reduce the ops
         | load if you really felt it was a significant effort.
         | 
         | If you might want to use multiple app instances connected to a
         | shared DB, I would say it's probably easier to just use a local
         | postgres container for dev and a managed cloud DB. Really not
         | that much effort and you get automatic backups.
         | 
         | If you plan to never use a shared DB, SQLite is great though.
        
           | robonerd wrote:
           | I read that as _" dedicated server[process]"_ not _"
           | dedicated server[hardware]"_ While the latter would obviously
           | imply the former, it's really the former that increases the
           | complexity of managing whole affair. Desktop applications
           | with a dependency on mysql/postgres running on the same
           | machine are, in my experience without exception, a pain in
           | the ass for users.
        
           | riku_iki wrote:
           | > you could conceivably run postgres on your app server to
           | reduce the ops load if you really felt it was a significant
           | effort.
           | 
           | Still, PGSQL requires some level of setup/maintenance: create
           | user, permissions, database, service with config/security
           | effort. If there is a way to run PGSQL as lib and say to it:
           | just store your data in this dir without other steps, I am
           | very interested to learn.
        
             | urda wrote:
             | You are likely wanting something like RDS [1], Azure [2],
             | or even Linode (which I just discovered) [3]. Sure you have
             | to pay a larger premium for it, but these products do allow
             | you to focus on the app than the DBA stuff.
             | 
             | - [1] https://aws.amazon.com/rds/
             | 
             | - [2] https://azure.microsoft.com/en-
             | us/services/postgresql/#overv...
             | 
             | - [3] https://www.linode.com/products/postgresql/
        
               | riku_iki wrote:
               | > pay a larger premium for it
               | 
               | yeah, premium is very large if I want to do a lot of data
               | crunching (lots of iops, cores and ram is needed), plus
               | latency/throughput to those services will be much higher
               | than to my local disk through PCIE bus.
        
             | eropple wrote:
             | You can set up Postgres with three lines of bash and a Unix
             | socket. It is more work; I wouldn't say it's appreciably
             | more work, and if you need more rigorous data modeling (I
             | frequently do) then it's worth considering if this is the
             | mode in which you'd like to run your data stores (I
             | wouldn't, but latency isn't the most critical thing for
             | stuff I do).
        
               | riku_iki wrote:
               | I think it is more than that.
               | 
               | Steps will be:
               | 
               | - sudo apt install postgres (more steps are needed if you
               | are not satisfied with OS default version)
               | 
               | - sudo su postgres
               | 
               | - psql -> create db, user, grant permissions
               | 
               | - modify hba.conf file to change connection permissions
               | 
               | - modify postgresql.conf to change settings to your
               | liking because defaults are very out of touch with modern
               | hardware
               | 
               | Another options would be to build your docker image, but
               | I am not good with this.
        
           | mikepurvis wrote:
           | "never use a shared DB"
           | 
           | I think this is what it comes down to for me too. Yes there
           | might be some use cases that really benefit from Postgres
           | features like GIS and the native JSON type, but ultimately,
           | for most use-cases, it's going to hinge on whether or not you
           | ever expect to need to scale any part of the system
           | horizontally, whether that's the frontends or the DB itself.
        
             | msbarnett wrote:
             | > Yes there might be some use cases that really benefit
             | from Postgres features like GIS and the native JSON type
             | 
             | Or a proper UUID type, materialized views, enforced data-
             | constraints, lateral joins, queries that consider multiple
             | indexes, etc, etc.
             | 
             | It's not like there's just a couple of niches where
             | PostgreSQL's features distinguish it from SQLite. SQLite is
             | appropriate for very simple DB use-cases where you aren't
             | too bothered by silent data inconsistency/corruption, but
             | not a whole lot beyond that. It's a good little piece of
             | software, but it hardly relegates PostgreSQL to "Just GIS
             | things" or "Just JSON use-cases".
        
               | ElectricalUnion wrote:
               | > SQLite is appropriate for very simple DB use-cases
               | where you aren't too bothered by silent data
               | inconsistency/corruption, but not a whole lot beyond
               | that.
               | 
               | SQLite is also appropriate for use cases where you are
               | bothered by silent data inconsistency/corruption, it's
               | just that often the hardware running PostgreSQL (or any
               | "serious" DBMS, relational or not) is usually less prone
               | to random inconsistency/corruption (non-interruptible
               | redundant power sources, well cooled environment, running
               | lower but more stable clock speeds on everything, ECC
               | RAM, raid disk array, corruption-resistant FS).
               | 
               | If you run PostgreSQL on the cheapest consumer hardware,
               | expect random corruption of the database when power runs
               | out several times in a row.
        
               | msbarnett wrote:
               | Sorry, "corruption" was probably misleading there. By
               | "silent data inconsistency/corruption" I mean, "a bug in
               | my application led to me storing strings in an integer
               | column, or too-long data in a too-short column, and
               | SQLite just silently let me do that"-type errors - the
               | data consistency is logically (but not physically)
               | corrupted. The author encountered 2 separate occurrences
               | of this just in their simple use-case, evidently.
        
               | 323 wrote:
               | > _led to me storing strings in an integer column_
               | 
               | Recent SQLite versions have STRICT mode where they forbid
               | that.
               | 
               | https://www.sqlite.org/stricttables.html
               | 
               | > _or too-long data in a too-short column_
               | 
               | You can use a constraint for that:
               | CREATE TABLE test (              name TEXT NOT NULL
               | CHECK(length(name) <= 20)         ) STRICT;
        
               | mikepurvis wrote:
               | Neat, will definitely give this a try. Definitely one of
               | my minor peeves about SQLite is having to set a bunch of
               | pragmas on every new connection. Like, especially stuff
               | such as foreign_keys=1... really, enforcement of foreign
               | key constraints is something each connection has to opt
               | into rather than being set one on the whole DB file and
               | never thought about again? Blah.
        
         | Beltiras wrote:
         | There are enough SQL oopsies in this article to show that you
         | should NEVER use sqlite for any production except as filter
         | databases or other sort of throwaways. RDBMSs are there to
         | enforce constraints on data.
        
           | fmajid wrote:
           | That's silly, SQLite is the single most widely deployed
           | database engine in the world, possibly the most deployed
           | software, period. It's unit test suite is second to none.
           | It's reliable enough that SMF, Solaris' boot and service
           | management (sort of like systemd) depends on it.
           | 
           | That does not mean it is applicable to all databases use-
           | cases, but production-grade it undoubtedly is.
        
           | ElectricalUnion wrote:
           | > NEVER use sqlite for any production
           | 
           | > RDBMSs are there to enforce constraints on data.
           | 
           | If you did not ask SQLite to enforce type constraints on
           | relations (with STRICT), it will work in the relaxed,
           | backwards-compatible expected behavior of previous versions
           | of SQLite.
           | 
           | That being said, if you want _actual validation_ , you
           | probably need more complex CHECK expressions anyways with
           | your business rules, and those work by default on any
           | database _.
           | 
           | https://www.sqlite.org/stricttables.html                 *
           | unless you're a madman that runs things with "PRAGMA
           | ignore_check_constraints = false;" enabled or equivalent; in
           | that case, no DB can help you.
           | 
           | _
        
         | BeefWellington wrote:
         | Sure, you can start there. Once you scale though you're going
         | to have lots of fun.
        
           | neverartful wrote:
           | For many applications you're right. However, in the case of a
           | dashboard for blog articles I don't think it could ever come
           | to that.
        
       | stormdennis wrote:
       | I don't know about now but at one time Firebird made this
       | question moot. It never received the love it deserved in my view.
        
         | kstrauser wrote:
         | Many years ago, I was hired at a web shop running Firebird in
         | protection. It had the interesting property that if you ran the
         | client software on your desktop to connect to the server
         | software on another host, and used ^Z to stop the client, then
         | the server process would hang until you resumed the client. I'm
         | not entirely sure how I'd implement that as a feature if I
         | wanted to.
        
       | RedShift1 wrote:
       | Use the right tool for the right job.
        
         | ormaaj wrote:
         | Agreed. Performance comparison is interesting in a synthetic
         | benchmark context but the premise of the title isn't that.
        
       | KronisLV wrote:
       | You know, apart from some of the details in regards to what data
       | types to better use etc., I think that both PostgreSQL and SQLite
       | are excellent choices for most use cases!
       | 
       | Of course, SQLite might be the best choice for embedded or local
       | usage, they even have this lovely page about using it as an
       | application file format: https://sqlite.org/appfileformat.html
       | 
       | And PostgreSQL might scale better for multiple writers and
       | situations where you need it on a separate VM/server/container
       | that might be shared between different apps through a network
       | connection of some sort.
       | 
       | Though it's also curious, because you never hear a lot about
       | MySQL or MariaDB (comparatively, in my subjective opinion) on HN
       | or many other places for that matter. And yet, it's still a
       | decent option, has good tooling (MySQL Workbench is great), is
       | widely supported and backs a significant amount of production
       | systems out there.
       | 
       | Do people just view it as old/legacy, or is there the belief that
       | for many use cases PostgreSQL might simply be the better option,
       | e.g. also because of projects like PostGIS and support for more
       | complex setups, like custom data types, more advanced scripting
       | language for in-database processing and whatnot?
        
         | danenania wrote:
         | I'm very happy with MySQL currently. I started using it because
         | AWS Aurora supports it a bit better than postgres. I haven't
         | had any regrets since.
         | 
         | It's great when you just want simple relational storage without
         | a lot of features or ceremony, but need a bit more than SQLite.
         | It's fast, predictable, and gets the job done.
         | 
         | Postgres is great too and offers more, but sometimes that can
         | tempt you to try to do too much in the DB layer.
        
       | mattxxx wrote:
       | Some context: I've worked with massive PostgreSQL db's,
       | containing millions of daily records spanning multiple years
       | (allowing for potentially massive numbers of returned rows). I
       | also exclusively MVP simple db architectures in SQLite (think: no
       | massive joins or need to partition data).
       | 
       | Best takeaways here:
       | 
       | - SQLite kills it. It appears untuned PostgreSQL is only 2
       | seconds (???) faster than untuned SQLite; always try to kickoff a
       | project with SQLite
       | 
       | - PostgreSQL has a performance cliff when query responses leave
       | memory. This is extra true when an index no longer fits in
       | memory. worker_mem is critical for pushing back this performance
       | cliff
       | 
       | Side note: Tuning postgres can be really painful, but it's also a
       | really dope tried-and-true db.
        
       | neverartful wrote:
       | Interesting article. It should be noted that the testing done is
       | with 100% read/query workload (not surprising since the
       | application is a website of blogs).
        
         | miguelgrinberg wrote:
         | This is 100% correct. This is a dashboard application that is
         | mostly read-only for users. There is a cron job that imports
         | and writes new data once per day.
        
       | polyrand wrote:
       | Great blog post. A few observations (considering the blog post
       | skips numerous details):
       | 
       | SQLite really shines when compared to running a database in a
       | separate server from the app. This is when the network latency
       | starts being noticeable. Latency between you and the app is
       | usually constant. The latency between the app and the database is
       | what matters. If in this case, Postgres can run in the same host
       | as the main app, I think it's a good thing to do. But very often
       | that's not the case.
       | 
       | Even though not surprising, it's great to see how SQLite handles
       | concurrency. From my experience, SQLite would be able to handle
       | much more read concurrency. It would have been great to see more
       | benchmarks (like 1000 concurrent users, even if the app would
       | never see such traffic).
       | 
       | I was surprised that SQLite didn't beat Postgres on simpler
       | queries. Without seeing the queries, if they contain multiple
       | joins, that could be another reason why SQLite is slower since it
       | can only do "nested loop" JOINs. The workloads also seem mostly
       | analytical, we'll see where DuckDB goes in this space (looking
       | promising for now).
       | 
       | One thing not mentioned in the blog post is "easy to maintain and
       | update". If the app is deployed with Docker, adding Postgres
       | means you go from worrying about:                 1 docker
       | container + 1 docker volume
       | 
       | to:                 2 docker containers (docker-compose?) + 1
       | docker volume + internal docker networking
       | 
       | The fact that SQLite is so simple to operate also has some
       | advantages. You can usually go back to an SQLite-based app 1 year
       | later and understand everything. With other databases, you may
       | need to review users, roles, permissions, connection string, etc.
       | 
       | Another cool thing about SQLite is that it scales mostly linearly
       | as you add more CPU cores. It can scale as long as you can also
       | scale the number of processes in your app. In other databases,
       | scaling means adding connection pools and other tools in the
       | middle.
       | 
       | Lastly, and related to the latency topic. The low latency in
       | SQLite means that you can sometimes convert a big query into 5
       | separate, small and reusable queries without any performance hit.
       | It's really remarkable home much cleaner your code can become
       | with this.
       | 
       | In any case, I love working both with Postgres and SQLite, and we
       | need more blog posts like this one showing real-world cases.
        
       | mcdonje wrote:
       | >If you are going to take one thing away from this article, I
       | hope it is that the only benchmarks that are valuable are those
       | that run on your own platform, with your own stack, with your own
       | data, and with your own software. And even then, you may need to
       | add custom optimizations to get the best performance.
       | 
       | Good advice, contextually.
        
       | rammy1234 wrote:
       | If you say using SQLite, there is always risk of losing that data
       | file in disk right. How is that managed. _is this a dumb
       | question_ , which PostgreSQL do not ??
        
         | lazide wrote:
         | Not a dumb question, but some people don't care for their use
         | case (either the data is cheap, or they don't think it will
         | happen)
        
         | neverartful wrote:
         | The big question here would be whether the filesystem that the
         | data file resides on is mounted on a single-disk volume or a
         | RAID-volume (or even a clustered SDS volume such as Ceph). On a
         | single-disk volume, if the disk dies your DB is gone (at least
         | until you can replace the disk and restore the backup). On a
         | sensibly designed RAID-volume, if one of the disks dies the
         | application's admin probably would never even know. The
         | sysadmin would see it in their logging/alerting infrastructure
         | and replace the dead disk. There would not be any loss of data
         | or availability of the data (just a temporary degradation in
         | the overall resiliency). Same is true for a clustered SDS
         | volume such as Ceph.
        
           | rfoo wrote:
           | Not only that, the easy mode we all doing now is "run the
           | application in a major cloud". Besides AWS, They all promise
           | 99.999% or above durability for their block storage offering.
           | 
           | Of course you still need backup for other incidents.
        
         | simonw wrote:
         | The best answer I know of to this question is Litestream -
         | https://litestream.io/ - you can use it to inexpensively
         | replicate a backup of your database up to an S3 bucket (or
         | various other storage providers) to give you a very solid
         | robust recovery option should you lose the disk that your
         | SQLite database lives on.
        
         | stonemetal12 wrote:
         | Postgresql stores data to disk, so it has the same risk of
         | loss. Both setups need a backup solution to mitigate the
         | problem of disk failure.
         | 
         | My understanding of the situation is sqlite's solution is a bit
         | more low level than postgresql but should be workable.
        
           | GordonS wrote:
           | In combination with a base backup, you can use streaming WAL
           | backups with Postgres, shipping compressed WAL files to S3 or
           | wherever. I like WAL-G for this.
           | 
           | There are lower-level approaches built-in to SQLite, such as
           | the Backup API, but there is at least one 3rd party project,
           | Litestream, which does streaming WAL backups too.
        
         | procinct wrote:
         | These days, you could replicate it to something like S3 via
         | something like Lifestream.
        
       | yread wrote:
       | I understand and partially agree with all the criticism of the
       | article (and could add my own like why did you tweak work_mem out
       | of all the hundreds of settings on postgres) but it's a blog
       | post, not a scientific article - I still think it's a useful
       | post. I might be in a similar situation at some point and it's
       | nice to know that you haven't encountered any major gotchas and
       | made it even 2x faster. Thanks for sharing your experience.
        
       | mjw1007 wrote:
       | I had a look at SQLite around the time sqlite3 appeared, and my
       | notes say I decided I didn't like it because of the lax and
       | sometimes inconsistent way it can treat its input.
       | 
       | I thought I'd have another go at it today (sqlite 3.34) and I'm
       | mildly surprised that the things I didn't like are still there:
       | sqlite> select '1 apple' + '2 oranges';       3
       | sqlite> select count(*) where 0.9;       1       sqlite> select
       | count(*) where 1.1;       1       sqlite> select count(*) where
       | not 0.9;       1       sqlite> select count(*) where not 1.1;
       | 0
        
         | simonw wrote:
         | "I'm mildly surprised that the things I didn't like are still
         | there"
         | 
         | That's because SQLite doesn't ship changes that break existing
         | applications.
        
         | infogulch wrote:
         | You may be interested in STRICT Tables [1], introduced just 6
         | months ago:
         | 
         | > Some developers appreciate the freedom that SQLite's flexible
         | typing rules provide and use that freedom to advantage. But
         | other developers are aghast at SQLite's flagrant rule-breaking
         | and prefer the traditional rigid type system found in all other
         | SQL database engines and in the SQL standard. For this latter
         | group, SQLite supports a strict typing mode, as of version
         | 3.37.0 (2021-11-27), that is enabled separately for each table.
         | 
         | [1]: https://www.sqlite.org/stricttables.html
         | 
         | Related past discussions:
         | 
         | Strict Tables - Column type constraints in SQLite - Draft |
         | 2021-08-21 | 331 points, 108 comments |
         | https://news.ycombinator.com/item?id=28259104
         | 
         | SQLite Release 3.37.0 | 2021-11-27 | 333 points, 112 comments |
         | https://news.ycombinator.com/item?id=29363054
        
         | gunapologist99 wrote:
         | The debate over type conversion seems somewhat like arguing
         | language features (such as weak typing vs strong), which has
         | been a spirited debate since forever.
         | 
         | The fact that this is debatable at all means that there are
         | arguments and counter-arguments that serve both sides well,
         | which further implies that there are use cases where some pros
         | outweigh the cons on each side.
         | 
         | (Of course, this entirely neglects the point that it's usually
         | fun and even educational to have a technical argument.)
        
           | mjw1007 wrote:
           | I don't have any great objection to a database that lets you
           | store a value of any type in any column.
           | 
           | And if they want to say '3' + 5 = 8, I could imagine why that
           | might be fine.
           | 
           | But I don't think that's a good reason to have behaviour like
           | the above.
           | 
           | I think it's more likely that they wouldn't choose to do
           | those things again today: I think this is what fell out of
           | the initial TCL-based implementation and they feel they're
           | stuck with keeping it backwards-compatible.
        
       | zinekeller wrote:
       | I have a question that wasn't answered on the blog, and might be
       | silly for me to ask. Is SQLite running in WAL mode or not? I
       | usually found that WAL mode usually allows better performance
       | even in read-heavy workloads as it doesn't need to exclusively
       | lock the database each time.
       | 
       | Otherwise, the post is a nice reminder for SQLite users that
       | foresee a database switch to use the newly-introduced strict
       | mode.
        
         | miguelgrinberg wrote:
         | SQLite isn't running in WAL mode. This application isn't just
         | read-heavy, it is ready-only for users, so I wouldn't think
         | this will make a difference. The writes occur while everybody
         | is sleeping, from a cron job that runs once a day to upload the
         | traffic for the previous 24 hs.
        
         | dgb23 wrote:
         | Yes, WAL mode can have a drastic impact.
         | 
         | Also this is apparently benchmarking Postgres and SQLite _with
         | Python on top_. Some of the overhead here might not have much
         | to do with either SQLite or Postgres at all. Just because one
         | uses an ORM doesn't mean the same code is running and
         | especially considering how sluggish Python is, I'm not
         | convinced by any of this either way outside of this being an
         | improvement for the very specific use-case presented. But we
         | don't know _why_ that is.
        
           | miguelgrinberg wrote:
           | I explain this in the post. The goal is to optimize the
           | application for users. I really don't care if Postgres is
           | faster than SQLite or viceversa, I just want users to have a
           | snappier application. The experiment was to swap out the
           | database while leaving everything else the same to see if
           | that affects the overall user experience in a positive or
           | negative way.
           | 
           | Also note that I'm not saying Postgres is faster the SQLite
           | in absolute terms. That happened to be the case for this
           | particular application and how it was deployed. If you
           | perform a similar test with a different app or stack you will
           | likely get different results. A main point I'm trying to make
           | in the article is that people should do their own
           | benchmarking instead of making decisions on stuff you read
           | online.
        
       | jeffdn wrote:
       | > The solution was to grow the slug column to 512 characters and
       | retry.
       | 
       | No, the solution is to always use `text` as your column type, and
       | use a check constraint if you need to enforce a limit. It's much
       | easier and safer to alter a check constraint than it is to change
       | a column type live in production. The `text` type and
       | `varchar(x)` are identical under the hood, `text` takes up no
       | more space on disk.
        
         | masklinn wrote:
         | Note: that's true for postgres and sqlite (which ignores the
         | length limit entirely as they discovered anyway), not
         | necessarily for other database systems.
        
           | coolsunglasses wrote:
           | How convenient then that he's comparing PostgreSQL and SQLite
           | then!
        
         | SigmundA wrote:
         | I thought as of PG 9.2 expanding a varchar column was metadata
         | only operation and therefore low overhead.
         | 
         | I know in SQL Server there are two issues with doing
         | varchar(max) for everything and increasing a columns size is
         | metadata only. First indexes have a limit of 900 byte values
         | and will fail at runtime if you index a column with no max
         | length and insert a value larger than 900 bytes. PG seems to
         | have this issue as well but the limit is 2712 bytes.
         | 
         | Second the query planner makes use of the size to determine how
         | much memory to pre-allocate for the query, with unlimited
         | length field it assume something like 4096 bytes and wastes
         | working memory if your values are not actually that size. Not
         | sure if PG has the second issue, having a max value defined is
         | valuable information to a database engine along with other type
         | information such as UUID's only taking 16 bytes instead of 36
         | bytes.
        
           | masklinn wrote:
           | > PG seems to have this issue as well but the limit is 2712
           | bytes.
           | 
           | Note that this is only for bytes indexes. Hash is not so
           | limited.
           | 
           | Tho at this sort of sizes it seems unlikely such simple
           | indexes are of much use. I guess the range matching abilities
           | of btrees could be but that seems unlikely, a 2700 bytes
           | path-like datum is a lot.
        
         | luhn wrote:
         | For Postgres, there's a strong case to be made VARCHAR(n) is
         | preferable to TEXT+check constraint. As you note, they're
         | stored the same on disk. But for a varchar, you can increase
         | the length with just this:                   ALTER TABLE a
         | ALTER COLUMN b TYPE VARCHAR(n)
         | 
         | This only requires ACCESS EXCLUSIVE for a fraction of a second
         | while the metadata is updated.
         | 
         | Whereas updating a CHECK constraint will require a full scan of
         | the table. If you do it naively, this locks the table for the
         | entire scan. If you want to avoid that, it's three steps:
         | ALTER TABLE a             DROP CONSTRAINT b_c,             ADD
         | CONSTRAINT b_c CHECK (length(b) < n) NOT VALID;         COMMIT;
         | ALTER TABLE a             VALIDATE CONSTRAINT b_c;
         | 
         | So as long as you're only increasing the length (which in my
         | own experience is a safe assumption), VARCHAR is much easier to
         | work with.
        
           | singron wrote:
           | Btw this procedure doesn't work for partitioned tables. For
           | some reason, it locks the table and does a table scan anyway.
           | 
           | The workaround we came up with was to add check constraints
           | for instant partition attachment, lock the table, detach all
           | the partions, update the type on each partition, update the
           | parent table type, then reattach all the partitions.
           | 
           | In our case, we didn't already have those instant-attach
           | check conditions, so we incurred table scans anyway to
           | validate those, but if you were to make multiple updates, you
           | only have to do that once.
        
           | jeffdn wrote:
           | Yep, that's fair enough. In my experience, however, in the
           | vast, vast majority of cases, folks either don't actually
           | care about the length, or they should be using business logic
           | in code to do the validation, rather than waiting for the
           | database to yell at them, invalidate their transaction, etc.
        
             | layer8 wrote:
             | The benefit is that you can treat the database schema as
             | the single source of truth as to what data is valid.
             | Otherwise what tends to happen is that different code
             | components will have different ideas of what exactly is
             | valid. This is also important because databases tend to be
             | longer-lived than the code that accesses them.
        
             | HideousKojima wrote:
             | >they should be using business logic in code to do the
             | validation
             | 
             | I trust myself to do this, I don't trust my coworkers and
             | future devs writing applications for the same database to
             | follow the same constraints in their code though.
             | Constraints at the DB level keep your data clean(er) even
             | if some other devs tries to shit it up.
        
             | luhn wrote:
             | It's good practice to have a length constraint on all text
             | fields, just to put some cap on field sizes. You don't want
             | somebody uploading 100MB of garbage into the `first_name`
             | field--A 1000-char max is more than enough.
             | 
             | Constraints generally should be enforced both application-
             | and database-side wherever possible. Application
             | constraints can be more robust and have better UX, but only
             | database constraints can offer you guarantees about what is
             | actually in your database.
        
             | msbarnett wrote:
             | If I had a dollar for every time I saw someone doing
             | application-layer-only validation of data and end up being
             | surprised that some data that wasn't supposed to be valid
             | ended up in the database only to later blow up in their
             | face, I'd be a fairly well-off man.
             | 
             | In practice, declaratively creating data constraints at
             | schema-creation time has a much much higher success rate
             | than trying to imperatively enforce data constraints at
             | write time, in my experience. The missed cases and bugs
             | always come back to haunt you.
        
               | jeffdn wrote:
               | I'm not talking about all validation, I'm talking about a
               | length constraint on a string (that isn't something like
               | a US state, which should be a two-character field) --
               | already a pretty uncommon thing. At work, our codebase
               | has 975 instances of a column storing a text value, and
               | the length is material only in a very small handful of
               | those.
        
       | ehutch79 wrote:
       | Is it just me or was testing done with only four concurrent
       | clients? In a mostly read only environment?
       | 
       | Wouldn't moving to a static site generator been a better
       | direction in that case?
       | 
       | If you're maxing out at four people reading your blog, I can't
       | imagine more infrastructure is better.
        
         | tpetry wrote:
         | They not benchmarking the blog, they are benchmarking an
         | analytics application aggregating statistics about the blogs.
        
       | Thaxll wrote:
       | I mean SQLite is living on a single node, what exactly do you do
       | when you have failure, when more than one process need to acces
       | the DB, when you need RBAC etc ...
        
         | rco8786 wrote:
         | > what exactly do you do when you have failure
         | 
         | Same thing you have for failure when your pg instance is
         | running on a single node?
        
         | simonw wrote:
         | You tell the users of your internal statistics dashboard to
         | wait a few minutes/hours while you fix it.
        
         | warmwaffles wrote:
         | > what exactly do you do when you have failure
         | 
         | Depends on the failure. What do you have in mind?
         | 
         | > when more than one process need to acces the DB
         | 
         | Easy, you use `journal_mode` set to `WAL` when you connect to
         | the sqlite database.
         | 
         | > when you need RBAC
         | 
         | If you need RBAC, you aren't going to be using sqlite. In
         | theory you could have your application handle the RBAC part
         | since sqlite is embedded in your application.
         | 
         | Another solution is to leverage unix users and groups and set
         | permissions appropriately for the database file and the
         | application running.
        
           | ju-st wrote:
           | >> when more than one process need to acces the DB
           | 
           | > Easy, you use `journal_mode` set to `WAL` when you connect
           | to the sqlite database.
           | 
           | and set isolation level and check_same_thread and PRAGMA
           | synchronous and then put all queries in retry loops because
           | the DB is maybe locked https://sqlite.org/wal.html#sometimes_
           | queries_return_sqlite_...
           | 
           | I wasted so much time trying to get multiple sqlite queries
           | running at the same time
        
             | warmwaffles wrote:
             | > I wasted so much time trying to get multiple sqlite
             | queries running at the same time
             | 
             | You were trying to write at the same time right?
        
         | neverartful wrote:
         | It has everything to do with the nature of the application,
         | architecture, business, and the business cost of such failures
         | doesn't it?
         | 
         | If the discussion is around failures for the use case of the
         | article (dashboard for blogs which is read-only), my best guess
         | at 'high availability' would be just to have a separate SQLite
         | DB file on multiple app server nodes and call it done. The
         | author mentions that the DB is updated once per day (probably
         | during the night) by a cron job. Just as easy to then scp a few
         | copies to other app server nodes.
         | 
         | If the discussion is around a 24x7 read-write workload with
         | high business costs for performance, availability, and
         | scalability then it's an entirely different problem.
        
       | xsc wrote:
       | It seems a solid first step would be to use EXPLAIN to understand
       | query planning/scanning issues.
       | 
       | Assuming 6.5M rows/day*1.5yr=~3.5B rows, how large is the dataset
       | in both SQLite and pg? With indexes, is it fitting in RAM?
       | 
       | Linode 2GB plan is shared vcpu, so results will vary depending on
       | who is sharing it with your service.
        
         | miguelgrinberg wrote:
         | This isn't the topic of the article so I haven't included it,
         | but I have taken several "solid steps" to optimize this
         | application throughout its year and a half of being in
         | production.
         | 
         | This article is about one of them only, how I designed a test
         | system to evaluate the two databases head to head.
        
       ___________________________________________________________________
       (page generated 2022-06-28 23:01 UTC)