https://www.cybertec-postgresql.com/en/postgresql-vs-redis-vs-memcached-performance/

  * EN

 [github][svg]   Blog

Cybertec LogoCybertec Logo

  * Services
      + Requirement Analysis
      + PostgreSQL consulting
      + PostgreSQL migration
          o Migration from Oracle
          o Migrating from MySQL / MariaDB
          o CYBERTEC Migrator
      + PostgreSQL Infrastructure
          o Setup & Installation
          o Kubernetes
          o Database architecture
      + PostgreSQL development
          o Database modeling
          o Functions & Features
      + Update & Upgrade
      + Optimization & Security
          o PostgreSQL Health Check
          o Performance Tuning
          o Enterprise Security
          o Security Audit
      + Troubleshooting & Data Recovery
      + PostgreSQL clustering and HA
          o Clustering and failover
          o High availability with Patroni
          o Synchronous and asynchronous replication
          o Scaling with PL/Proxy
      + Spatial Services
          o GIS Tooling
      + CYBERTEC Partner Network
  * Support
      + Standard PostgreSQL Support
          o Product Support
          o 9/5 Basic Support
          o 24/7 Basic Support
          o 24/7 Enterprise Support
      + Advanced PostgreSQL Support
          o 9/5 Remote DBA
          o 24/7 Remote DBA
          o 9/5 Dedicated DBA
          o 24/7 Cloud-based Support
      + Support for Reseller
          o 3rd Level Support
          o CYBERTEC Partner Network
  * Products
      + Our Products
          o CYPEX - Build Apps & Forms
          o CYBERTEC PostgreSQL Enterprise Edition
          o CYBERTEC Migrator
          o PostgreSQL Transparent Data Encryption
          o Data Masking for PostgreSQL
          o PL/pgSQL_sec - Fully encrypted stored procedures
      + PostgreSQL Tools & Extensions
          o pg_timetable - Advanced Job Scheduling
          o pg_show_plans - Monitoring Execution Plans
          o pgwatch - PostgreSQL Monitoring Tool
          o pg_squeeze - Shrinks Tables
          o Walbouncer - Enterprise Grade Partial Replication
          o PGConfigurator - Visual PostgreSQL Configuration
          o ora_migrator
          o Patroni Environment Setup
      + Assessment Packages
          o Data Science Assessment Package
          o Start-Up Assessment Package
          o Spatial Data Assessment Package
      + CYBERTEC Partner Network
  * Training
      + Course Catalog
      + Online Training
      + Customized Training
  * PostgreSQL
      + Advantages of PostgreSQL
      + PostgreSQL Books
      + Solutions - Who uses PostgreSQL
          o PostgreSQL for Startups
          o PostgreSQL for governments and public services
          o Longlife solutions
      + Business Cases
          o Fraud Detection
          o PostgreSQL for biotech and scientific applications
  * Data Science
      + Data Science Overview
      + Machine Learning
      + Big Data Analytics
  * Contact

PostgreSQL vs Redis vs Memcached performance

Posted on 2021-08-11 by Kaarel Moppel
nosql performance postgresql redis

When I recently stumbled on an article comparing some main
properties, and also the read-write performance of two very popular
pieces of caching software, often used to speed up applications heavy
on database queries, I immediately felt an itch - how would it
actually look for PostgreSQL vs Redis vs Memcached on the performance
side? Meaning, if one would just skip the cache and hit the database
directly.

Especially after the first look, I wasn't actually too impressed with
the numbers presented for Redis and Memcache. It could have also been
due to the fact that the actual test data, extracted from a linked
paper, was from 2016 - that's surely ages in "internet years". Either
way, I was curious to get some kind of a comparison point and quickly
threw together a small benchmarking script in Python and let my
workstation get to work.

Why Python you might ask, if the original test used Java? Well, Java
is just one of my least favorite languages for database projects and
I somehow have the impression that Python is very popular both for
the Web and also ad-hoc database scripting, so probably a good fit
here - might be wrong of course on that.

Test setup

Some characteristics on my test setup:

  * AMD Ryzen 3600 CPU set to performance mode. My test script is
    synchronous in nature i.e. basically using a single CPU core, so
    speed probably matters a lot.
  * Ubuntu 20.04 desktop as OS, swap disabled.
  * PostgreSQL v13.3, i.e. the latest.
  * Default PostgreSQL configuration except "shared_buffers=512MB",
    "track_io_timing=on", "shared_preload_libraries=
    'pg_stat_statements'".
  * A single NVMe SSD both for system and database.
  * Reading/writing to Postgres was done using prepared statements to
    get best query performance, avoiding parsing on subsequent calls.
  * Size of the dataset fully fitted into the PostgreSQL managed
    cache (shared buffers).
  * Asynchronous mode was used for writing data, meaning a slight
    loss of most recent data is possible in case of a server crash/
    reboot, as a trade-off for more writing (especially in small
    transactions) performance. But this is consistent with how both
    Redis and Memcached by default handle their writes. With Redis
    though, one can also enable AOF persistence to get PostgreSQL
    default behaviour.
  * Both reading/writing tests go through all the inserted randomly
    generated floating-point data keys from 1 to $rows one-by-one.
    The original test didn't mention the exact access pattern though.
  * The key-value table was fetched into Postgres cache before the
    read test so that there would be no cold-cache effect as with
    Redis/Memcached.
  * I did not measure the memory usage of Postgres as this is
    expected to be very constant for this use case after all data has
    been inserted and cached.
  * Execution time is recorded on the application side with calls to
    the system clock to match the original test. Note that I decided
    to measure execution time from the server-side also to get a
    better understanding of the amount of "waste" as after all Python
    is not really optimal for performance testing for many reasons
    that I don't want to delve into here. And jumping ahead a bit -
    the results were pretty horrifying indeed and one should rather
    not use Python for benchmarking databases - a lot of CPU time
    just disappeared somewhere!
  * Full test script is available here. Should take around 10-15min
    to run. Basically something like this:

CREATE UNLOGGED TABLE kv_test(key text, value int);
CREATE INDEX ON kv_test (key);

-- pseudo-code from the Python script
for $ROWS in [1000, 10000, 100000, 1000000]:
  truncate kv_test
  generate $ROWS random values
  for $i := 1 .. $ROWS:
    insert $key into kv_test ($rand[i], $rand[i])
  vacuum analyze the table
  for $i := 1 .. $ROWS:
    select * from kv_test where key = $rand[i]

PostgreSQL vs Redis vs Memcached: Write Operation

Represented in a similar way to the original numbers. The calculated
time to write key-value pairs is in milliseconds.

                    NUMBER OF RECORDS
Database            1,000 10,000 100,000 1,000,000
Redis (v3.0.7)      34    214    1,666   14,638
Memcached (v1.4.14) 23    100    276     2,813
PostgreSQL (v13.3)  29.6  304    2,888   31,230

PostgreSQL vs Redis vs Memcached: Read Operation

The calculated time to read key-value pairs (ms).

                    NUMBER OF RECORDS
Database            1,000 10,000 100,000 1,000,000
Redis (v3.0.7)      8     6      8       8
Memcached (v1.4.14) 9     14     14      30
PostgreSQL (v13.3)  0.026 0.028  0.027   0.029

Conclusion

In short - the numbers looked surprisingly/suspiciously good for the
reading test for Postgres! I can't imagine how the original test
managed to get such high single-digit millisecond results for random
key reads across the whole dataset. For my test, I only managed to
see 1ms+ worst cases for the biggest rowcount. This data can, by the
way, be also looked up on the "pg_stat_statements" snapshots table
called "results".

So sadly, I still cannot possibly get too ecstatic as there was a lot
of information missing on the details of exactly how the original
tests were performed, so it might have been a bit of an "apples to
oranges" situation still in the end, I'm afraid. The average key
readout times for Redis/Memcached seemed just way too slow in
comparison to Postgres. I suspect they used a remote machine still
for the cache, although the paper didn't mention it and talked about
a single Core i7 node.

But about the writing speed of key-value data - well: Postgres
doesn't really compete on higher row counts.  But this was also more
or less expected! Why? A full-blown relational database engine like
Postgres goes to great lengths to ensure we cannot insert invalid
data violating some constraints, plus the WAL writing (minimal though
for unlogged tables but still) and the on-disk-format overhead -
internal columns, alignment, some index bloat on page splits, etc.
That all amplifies writes a lot! So basically, I think the results
still aren't bad in the end. Only ca 2x slower than Redis for 100k
and 1M row counts. The main idea of caches is that they're only
useful if we read much more from them compared to writing/updating
anyways!

But be it how it is with the comparison to the other DBs in absolute
numbers, it was good to see that the relative stability of Postgres
responses to growing datasets was very-very good! And even beating
Memcached which deteriorated 2x on 1M rows! This all probably shows
that the selected algorithms for PostgreSQL are mathematically sound
and well implemented!

But what can be definitely said - Postgres performance is definitely
at least good enough for a "caching use case" for some smaller amount
of rows. So with your next project it might be worth asking, do you
really need another external component?

Or maybe just:

 1. add some more memory/CPU to the DB and hammer it harder - pretty
    safe to do for reads actually;
 2. use your HA replicas for load balancing and achieve better total
    resource utilization, and enjoy the benefits of not having to
    change the mental model when dealing with relational data on the
    "real persistence" level anyways.

Other thoughts/notes

  * Once again - not to forget: the caching software used was by now
    5 years old! But at the same time, I also left a bit on the table
    still even for Postgres - for example, one could probably even
    get a slightly better read performance when using the relatively
    newish (v11+) "covering" indexes feature or maybe even with hash
    indexes.
  * The thought-provoking article states it actually quite well: "A
    performance comparison between in-memory key-value data stores is
    more of an intellectual exercise than of any practical importance
    - unless you are deploying systems at such scale that this
    becomes interesting as a cost-saving measure. This is because
    such stores are IO bound and usually the network latency might
    play a bigger role in application perceived latency than the
    database latency.". So in short - don't think too much about that
    milli- or microsecond cache response time with a local test.
    Real-life network latencies can and will eat that mostly away!
  * Note that the difference between query execution times measured
    from the "app side" vs from the "DB side" is huge, up to 10x for
    worst cases with a little number of rows! But again as the
    previous chapter states - for real-life database applications,
    the choice of languages usually doesn't matter performance-wise.
  * The maximum tested rowcount of 1 million is actually too small in
    today's context I think - even in the case of Postgres with its
    relatively large disk format, the data together with the index is
    only about 128MB. I suspect an average smartwatch even holds more
    data nowadays.
  * Note though that Redis is certainly much more than a simple
    volatile cache as the linked article also mentions. However, in
    practice, it is primarily used as a key-value store... which also
    Postgres can be, being a general-purpose management system not
    particularly tuned for a specific workload.
  * Aspect of cost - provisioning a capable and modern DB server with
    lots of memory usually costs much more than only a plain
    "discardable", albeit optionally memory optimized, compute node.
    So using a cache might still be a perfectly good idea.
  * Dedicated caches will always perform better if you need to hammer
    them really hard - as there is usually no permissions checking,
    sessions, snapshots, complex locking scenarios, etc.
  * Note that in some cases one can also use totally transparent
    PostgreSQL caching! It's sadly not a built-in feature of
    Postgres, but for example, one of the more popular connection
    pooling software pieces called pgpool2 provides it as an extra
    feature, so might be worth checking out.

You can find more of my posts. Cheers.

Kaarel Moppel

I've been interested with databases for about a dozen years, working
last 9 years exclusively with PostgreSQL. And still I'm constantly
surprised by it's powerful set of features and the fast pace of
development by the globally friendly community. On my spare time I
enjoy playing soccer and travelling.

Posted on 2021-08-11 by Kaarel Moppel

logologo

CYBERTEC PostgreSQL International GmbH
Romerstrasse 19
2752 Wollersdorf
AUSTRIA

+43 (0) 2622 93022-0
office@cybertec.at
twitter.com/PostgresSupport
[github][svg]github.com/cybertec-postgresql
[facebook][svg][linkedin][svg][xing][svg][yt_icon_mo][svg]

Our Services

* Administration
* Replication
* Consulting
* Database Design
* Support
* Migration
* Development

 

SUPPORT CUSTOMERS
Go to the support platform >>

Newsletter

Check out previous newsletters!

Stay well informed about PostgreSQL by subscribing to our newsletter.

[                    ]

[ ] Ja, ich mochte regelmassig Informationen uber neue Produkte,
aktuelle Angebote und Neuigkeiten rund ums Thema PostgreSQL per
E-Mail erhalten. Ich kann diese Zustimmung jederzeit widerrufen.
Weitere Informationen finden Sie in der Datenschutzerklarung. Yes, I
would like to receive information about new products, current offers
and news about PostgreSQL via e-mail on a regular basis. Granting
consent to receive the CYBERTEC Newsletter by electronic means is
voluntary and can be withdrawn free of charge at any time. Further
information can be found in the privacy policy. Yes, I would like to
receive information about new products, current offers and news about
PostgreSQL via e-mail on a regular basis. Granting consent to receive
the Cybertec Newsletter by electronic means is voluntary and can be
withdrawn free of charge at any time. Further information can be
found in the privacy policy. Tak, chce regularnie otrzymywac
wiadomosci e-mail o nowych produktach, aktualnych ofertach i
nowosciach dotyczacych PostgreSQL. Wyrazenie zgody na otrzymywanie
Newslettera Cybertec droga elektroniczna jest dobrowolne i moze
zostac w kazdej chwili bezplatnie odwolane.Wiecej informacji mozna
znalezc w polityce prywatnosci. Yes, I would like to receive
information about new products, current offers and news about
PostgreSQL via e-mail on a regular basis. Granting consent to receive
the CYBERTEC Newsletter by electronic means is voluntary and can be
withdrawn free of charge at any time. Further information can be
found in the privacy policy. Jah, ma soovin saada regulaarselt
e-posti teel teavet uute toodete, praeguste pakkumiste ja uudiste
kohta PostgreSQLi kohta. Cyberteci uudiskirja elektroonilisel teel
vastuvotmiseks nousoleku andmine on vabatahtlik ja seda saab igal
ajal tasuta tagasi votta. Lisateavet leiate privaatsuseeskirjadest.

Leave this field empty if you're human: [                    ]
&copy 2000-2021 CYBERTEC PostgreSQL International GmbH

  * IMPORTANT INFORMATION ABOUT COVID-19
  * Contact
  * Data protection policy
  * Imprint
  * Terms and Conditions

[                    ]