https://www.percona.com/blog/2021/01/22/postgresql-on-arm-based-aws-ec2-instances-is-it-any-good/

Percona Database Performance BlogPercona Database Performance Blog

  * Percona Live
  * About Us
  * Contact Us

[                    ]
  * Services
      + Support
          o MySQL Support
          o MongoDB Support
          o MariaDB Support
          o PostgreSQL Support
          o DBaaS Support
          o High Availability Support
          o Flexible Pricing
          o Support Tiers
          o Technical Account Managers
      + Managed Services
          o Percona Managed Database Services
          o Percona Advanced Managed Database Service
      + Consulting
          o Percona Cloud Cover
          o Percona Open Source Advance
          o Percona and Microsoft Azure Partnership
      + Policies
      + Training
  * Products
      + MySQL Database Software
          o Percona Distribution for MySQL
          o Percona Server for MySQL
          o Percona XtraDB Cluster
          o Percona XtraBackup
      + MongoDB Database Software
          o Percona Distribution for MongoDB
          o Percona Server for MongoDB
          o Percona Backup for MongoDB
      + PostgreSQL Database Software
      + Percona Monitoring and Management
      + Percona Kubernetes Operators
      + Open Source Database Tools
          o Percona Toolkit
          o Percona DBaaS Command Line Tool
  * Solutions
      + Eliminate Vendor Lock-In
      + Embrace the Cloud
      + Optimize Database Performance
      + Reduce Costs and Complexity
  * Resources
      + Calculators
      + 2020 Survey Results
      + Solution Briefs
      + White Papers
      + Webinars
      + Case Studies
      + Datasheets
      + Ebooks
      + Videos
      + Technical Presentations
      + Documentation
  * About
      + About Percona
      + Contact Us
      + Customers
      + Careers
          o Percona Lifestyle
      + In The News
      + Percona Live
      + Events
  * Community
      + Forums
      + Community Blog
      + PMM Community Contributions

PostgreSQL on ARM-based AWS EC2 Instances: Is It Any Good?

Back to the Homepage
22 Jan 2021
Jobin Augustine2021-01-22T09:40:44-05:00
By Jobin Augustine and Sergey Kuzmichev Benchmarks, Cloud, Insight
for DBAs, PostgreSQL AWS, cloud, insight for DBAs, PostgreSQL 1
Comment

The expected growth of ARM processors in data centers has been a hot
topic for discussion for quite some time, and we were curious to see
how it performs with PostgreSQL. The general availability of
ARM-based servers for testing and evaluation was a major obstacle.
The icebreaker was when AWS announced their ARM-based processors
offering in their cloud in 2018. But we couldn't see much excitement
immediately, as many considered it is more "experimental" stuff. We
were also cautious about recommending it for critical use and never
gave enough effort in evaluating it.  But when the second generation
of Graviton2 based instances was announced in May 2020, we wanted to
seriously consider. We decided to take an independent look at the
price/performance of the new instances from the standpoint of running
PostgreSQL.

Important: Note that while it's tempting to call this comparison of
PostgreSQL on x86 vs arm, that would not be correct. These tests
compare PostgreSQL on two virtual cloud instances, and that includes
way more moving parts than just a CPU. We're primarily focusing on
the price-performance of two particular AWS EC2 instances based on
two different architectures.

Test Setup

For this test, we picked two similar instances. One is the older m5d.
8xlarge, and the other is a new Graviton2-based m6gd.8xlarge. Both
instances come with local "ephemeral" storage that we'll be using
here. Using very fast local drives should help expose differences in
other parts of the system and avoid testing cloud storage. The
instances are not perfectly identical, as you'll see below, but are
close enough to be considered same grade. We used Ubuntu 20.04 AMI
and PostgreSQL 13.1 from pgdg repo. We performed tests with small
(in-memory) and large (io-bound) database sizes.

Instances

Specifications and On-Demand pricing of the instances as per the AWS
Pricing Information for Linux in the Northern Virginia region. With
the currently listed prices, m6gd.8xlarge is 25% cheaper.

Graviton2 (arm) Instance

Graviton2 (ARM) Instance
Shell
[Instance : m6gd.8xla]

1 Instance : m6gd.8xlarge
2 Virtual CPUs : 32
3 RAM  : 128 GiB
4 Storage : 1 x 1900 NVMe SSD (1.9 TiB)
5 Price : $1.4464 per Hour

Regular (x86) Instance

x86 Instance
Shell
[Instance : m5d.8xlar]

1 Instance : m5d.8xlarge
2 Virtual CPUs : 32
3 RAM : 128 GiB
4 Storage : 2 x 600 NVMe SSD (1.2 TiB)
5 Price : $1.808 per Hour

OS and PostgreSQL setup

We selected Ubuntu 20.04.1 LTS AMIs for the instances and didn't
change anything on the OS side. On the m5d.8xlarge instance, two
local NVMe drives were unified in a single raid0 device. PostgreSQL
was installed using .deb packages available from the PGDG repository.

The PostgreSQL version string shows confirm the OS architecture

Shell
[postgres=# select ve]

  postgres=# select version();
                                                                  
1 version
2                                                                 
3 --------------------------------------------------------------------
4 --------------------------------------------------------------------
5 PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on aarch64-unknown-linux-
  gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
  (1 row)

** aarch64 stands for 64-bit ARM architecture

The following PostgreSQL configuration was used for testing.

Shell
[max_connections = '2]

1  max_connections = '200'
2  shared_buffers = '32GB'
3  checkpoint_timeout = '1h'
4  max_wal_size = '96GB'
5  checkpoint_completion_target = '0.9'
6  archive_mode = 'on'
7  archive_command = '/bin/true'
8  random_page_cost = '1.0'
9  effective_cache_size = '80GB'
10 maintenance_work_mem = '2GB'
11 autovacuum_vacuum_scale_factor = '0.4'
12 bgwriter_lru_maxpages = '1000'
13 bgwriter_lru_multiplier = '10.0'
14 wal_compression = 'ON'
15 log_checkpoints = 'ON'
16 log_autovacuum_min_duration = '0'

pgbench Tests

First, a preliminary round of tests is done using pgbench, the
micro-benchmarking tool available with PostgreSQL. This allows us to
test with a different combination of a number of clients and jobs
like:

Shell
[pgbench -c 16 -j 16 ]

1 pgbench -c 16 -j 16 -T 600 -r

Where 16 client connections and 16 pgbench jobs feeding the client
connections are used.

Read-Write Without Checksum

The default load that pgbench creates is a tpcb-like Read-write load.
We used the same on a PostgreSQL instance which doesn't have checksum
enabled.
[x86_arm_ReadWrite_WithoutChecksum]
We could see a 19% performance gain on ARM.

+----------------+
|x86 (tps) |28878|
|----------+-----|
|ARM (tps) |34409|
+----------------+

Read-Write With Checksum

We were curious whether the checksum calculation has any impact on
Performance due to the architecture difference. if the PostgreSQL
level checksum is enabled. PostgreSQL 12 onwards, the checksum can be
enabled using pg_checksum utility as follows:

Shell
[pg_checksums -e -D $]

1 pg_checksums -e -D $PGDATA

[x86_arm_ReadWrite_WithChecksum]

+----------------+
|x86 (tps) |29402|
|----------+-----|
|ARM (tps) |34701|
+----------------+

To our surprise, the results were marginally better! Since the
difference is around just 1.7%, we consider it as a noise. At least
we feel that it is ok to conclude that enabling checksum doesn't have
any noticeable performance degradation on these modern processors.

Read-Only Without Checksum

Read-only loads are expected to be CPU-centric. Since we selected a
database size that fully fits into memory, we could eliminate IO
related overheads.
[x86_arm_ReadOnly_WithoutChecksum]

+-------------------+
|x86 (tps)|221436.05|
|---------+---------|
|ARM (tps)|288867.44|
+-------------------+

The results showed a 30% gain in tps for the ARM than the x86
instance.

Read-Only With Checksum

We wanted to check whether we could observe any tps change if we have
checksum enabled when the load becomes purely CPU centric.
[x86_arm_ReadOnly_WithChecksum]

+---------------------+
|x86 (tps)|221436.0531|
|---------+-----------|
|ARM (tps)|288867.4406|
+---------------------+

The results were very close to the previous one, with 30% gains.

In pgbench tests, we observed that as the load becomes CPU centric,
the difference in performance increases. We couldn't observe any
performance degradation with checksum.

Note on checksums

PostgreSQL calculates and writes checksum for pages when they are
written out and read in the buffer pool. In addition, hint bits are
always logged when checksums are enabled, increasing the WAL IO
pressure. To correctly validate the overall checksum overhead, we
would need longer and larger testing, similar to once we did with
sysbench-tpcc.

Testing With sysbench-tpcc

We decided to perform more detailed tests using sysbench-tpcc. We
were mainly interested in the case where the database fits into
memory. On a side note, while PostgreSQL on the arm server showed no
issues, sysbench was much more finicky compared to the x86 one.

Each round of testing consisted of a few steps:

 1. Restore the data directory of the necessary scale (10/200).
 2. Run a 10-minute warmup test with the same parameters as the large
    test.
 3. Checkpoint on the PG side.
 4. Run the actual test.

In-memory, 16 threads:

In-memory, 16 threads

With this moderate load, the ARM instance shows around 15.5% better
performance than the x86 instance. Here and after, the percentage
difference is based on the mean tps value.

You might be wondering why there is a sudden drop in performance
towards the end of the test. It is related to checkpointing with 
full_page_writes. Even though for in-memory testing we used pareto
distribution, a considerable amount of pages is going to be written
out after each checkpoint. In this case, the instance showing more
performance triggered checkpoint by WAL earlier than its counterpart.
These dips are going to be present across all tests performed.

In-memory, 32 threads:

In-memory, 32 threads

When concurrency increased to 32, the difference in performance
reduced to nearly 8%.

In-memory, 64 threads:

In-memory, 64 threads

Pushing instances close to their saturation point (remember, both are
32-cpu instances), we see the difference reducing further to 4.5%.

In-memory, 128 threads:

In-memory, 128 threads

When both instances are past their saturation point, the difference
in performance becomes negligible, although it's still there at 1.4% 
Additionally, we could observe a 6-7% drop in throughput(tps) for ARM
and a 4% drop for x86 when concurrency increased from 64 to 128 on
these 32 vCPU machines.

Not everything we measured is favorable to the Graviton2-based
instance. In the IO-bound tests (~200G dataset, 200 warehouses,
uniform distribution), we saw less difference between the two
instances, and at 64 and 128 threads, regular m5d instance performed
better. You can see this on the combined plots below.

[saturation_IO]

A possible reason for this, especially the significant meltdown at
128 threads for m6gd.8xlarge, is that it lacks the second drive that
m5d.8xlarge has. There's no perfectly comparable couple of instances
available currently, so we consider this a fair comparison; each
instance type has an advantage. More testing and profiling is
necessary to correctly identify the cause, as we expected local
drives to negligibly affect the tests. IO-bound testing with EBS can
potentially be performed to try and remove the local drives from the
equation.

More details of the test setup, results of the tests, scripts used,
and data generated during the testing are available from this GitHub
repo.

Summary

There were not many cases where the ARM instance becomes slower than
the x86 instance in the tests we performed. The test results were
consistent throughout the testing of the last couple of days. While
ARM-based instance is 25 percent cheaper, it is able to show a 15-20%
performance gain in most of the tests over the corresponding x86
based instances. So ARM-based instances are giving conclusively
better price-performance in all aspects. We should expect more and
more cloud providers to provide ARM-based instances in the future.
Please let us know if you wish to see any different type of benchmark
tests.

Related

Author

[2f06ebff71]

Jobin Augustine

Jobin Augustine is a PostgreSQL expert and Open Source advocate and
has more than 19 years of working experience as consultant,
architect, administrator, writer, and trainer in PostgreSQL, Oracle
and other database technologies. He has always been an active
participant in the Open Source communities and his main focus area is
database performance and optimization. He is a contributor to various
Open Source Projects and is an active blogger and loves to code in
C++ and Python. Jobin holds a Masters in Computer Applications and
joined Percona in 2018 as a Senior Support Engineer. Prior to joining
Percona, he worked at OpenSCG for 2 years as Architect and was part
of the BigSQL core team, a complete PostgreSQL distribution offering.
Previous to his work at OpenSCG, Jobin worked at Dell as Database
Senior Advisor for 10 years and 5 years with TCS/CMC.

---------------------------------------------------------------------
[ccf0c53f9c]

Sergey Kuzmichev

Sergey is a support engineer in Percona. Interested in all things
databases, he's currently working mainly with MySQL and PostgreSQL.
He started his career working as an Oracle DBA, later moving to a
DevOps engineer role supporting Java-based trading platform running
on PostgreSQL. After being a jack of all trades for a while, he's now
focusing on what he enjoys most: open source databases, systems
performance, and reliability.

---------------------------------------------------------------------

Share this post

FacebookTwitterLinkedInEmail

Comment (1)

  * [d19353aefc]
    Yuriy Safris Reply

    One note for comparison:
    m6gd.8xlarge Virtual CPUs : 32 - these are 32 physical cores
    m5d.8xlarge Virtual CPUs : 32 - these are 32 virtual threads or
    16 physical cores
    Thus, you are comparing 32 physical cores against 16.
    Considering that the competitors were selected on the basis of
    comparable value, the comparison can be considered quite correct.
    But it should be borne in mind that with an equal number of
    cores, the solution with Graviton2 will be much slower.

    January 22, 2021 at 2:51 pm

Leave a Reply Cancel reply

---------------------------------------------------------------------

How Can We Help?

Percona's experts can maximize your application performance with our
open source database support, managed services or consulting.

Contact us

Subscribe

Want to get weekly updates listing the latest blog posts? Subscribe
now and we'll send you an update every Friday at 1pm ET.

Subscribe to our blog

Blog Poll

How often do you upgrade your database software version?

  * [ ] Whenever there's a new release
  * [ ] Every other release
  * [ ] Annually
  * [ ] Only when we have a problem
  * [ ] Never

[   Vote   ]

View Results

Loading ... Loading ...

Categories

  * MySQL(3374)
  * Insight for DBAs(1561)
  * Percona Software(1510)
  * Percona Events(870)
  * MongoDB(559)
  * Insight for Developers(479)
  * Benchmarks(342)
  * Percona Live(332)
  * Webinars(294)
  * Cloud(286)
  * PostgreSQL(183)
  * Monitoring(171)
  * MariaDB(158)
  * Percona Services(145)
  * Security(129)
  * ProxySQL(129)
  * Hardware and Storage(105)
  * Storage Engine(52)
  * Database Trends(52)
  * Percona Announcements(10)

  Percona Blog RSS Feed

Upcoming Webinars

  * PostgreSQL High-Performance Tuning and Optimization
  * Using PMM to Identify and Troubleshoot Problematic MySQL Queries
  * MariaDB Observability
  * MongoDB Atlas vs Managed Community Edition
  * How to Maximize the Benefits of Using Open Source MongoDB with
    Percona Distribution for MongoDB

All Webinars >>


Services

  * Support
  * Managed Services
  * Consulting
  * Training

Products

  * MySQL Software
  * MongoDB Software
  * PostgreSQL Distribution
  * Kubernetes
  * Monitoring & Management

Resources

  * Solution Briefs
  * White Papers
  * Webinars
  * Case Studies
  * Datasheets
  * Documentation

More

  * Blog
  * Community Blog
  * Technical Forum Help

About

  * Customers
  * Newsroom
  * About
  * Careers

Contact Us

  * Sales & General Inquiries
  * (888) 316-9775 (USA)
  * (208) 473-2904 (USA)
  * +44 203 608 6727 (UK)
  * 0-808-169-6490 (UK)
  * 0-800-724-4569 (GER)

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective
owners. Proudly running Percona Server for MySQL

  *  
  *  
  *  
  *  
  *  
  *  

Terms of Use | Privacy | Copyright | Legal

Copyright (c) 2006-2021 Percona LLC.

*