https://www.percona.com/blog/2021/01/22/postgresql-on-arm-based-aws-ec2-instances-is-it-any-good/ Percona Database Performance BlogPercona Database Performance Blog * Percona Live * About Us * Contact Us [ ] * Services + Support o MySQL Support o MongoDB Support o MariaDB Support o PostgreSQL Support o DBaaS Support o High Availability Support o Flexible Pricing o Support Tiers o Technical Account Managers + Managed Services o Percona Managed Database Services o Percona Advanced Managed Database Service + Consulting o Percona Cloud Cover o Percona Open Source Advance o Percona and Microsoft Azure Partnership + Policies + Training * Products + MySQL Database Software o Percona Distribution for MySQL o Percona Server for MySQL o Percona XtraDB Cluster o Percona XtraBackup + MongoDB Database Software o Percona Distribution for MongoDB o Percona Server for MongoDB o Percona Backup for MongoDB + PostgreSQL Database Software + Percona Monitoring and Management + Percona Kubernetes Operators + Open Source Database Tools o Percona Toolkit o Percona DBaaS Command Line Tool * Solutions + Eliminate Vendor Lock-In + Embrace the Cloud + Optimize Database Performance + Reduce Costs and Complexity * Resources + Calculators + 2020 Survey Results + Solution Briefs + White Papers + Webinars + Case Studies + Datasheets + Ebooks + Videos + Technical Presentations + Documentation * About + About Percona + Contact Us + Customers + Careers o Percona Lifestyle + In The News + Percona Live + Events * Community + Forums + Community Blog + PMM Community Contributions PostgreSQL on ARM-based AWS EC2 Instances: Is It Any Good? Back to the Homepage 22 Jan 2021 Jobin Augustine2021-01-22T09:40:44-05:00 By Jobin Augustine and Sergey Kuzmichev Benchmarks, Cloud, Insight for DBAs, PostgreSQL AWS, cloud, insight for DBAs, PostgreSQL 1 Comment The expected growth of ARM processors in data centers has been a hot topic for discussion for quite some time, and we were curious to see how it performs with PostgreSQL. The general availability of ARM-based servers for testing and evaluation was a major obstacle. The icebreaker was when AWS announced their ARM-based processors offering in their cloud in 2018. But we couldn't see much excitement immediately, as many considered it is more "experimental" stuff. We were also cautious about recommending it for critical use and never gave enough effort in evaluating it. But when the second generation of Graviton2 based instances was announced in May 2020, we wanted to seriously consider. We decided to take an independent look at the price/performance of the new instances from the standpoint of running PostgreSQL. Important: Note that while it's tempting to call this comparison of PostgreSQL on x86 vs arm, that would not be correct. These tests compare PostgreSQL on two virtual cloud instances, and that includes way more moving parts than just a CPU. We're primarily focusing on the price-performance of two particular AWS EC2 instances based on two different architectures. Test Setup For this test, we picked two similar instances. One is the older m5d. 8xlarge, and the other is a new Graviton2-based m6gd.8xlarge. Both instances come with local "ephemeral" storage that we'll be using here. Using very fast local drives should help expose differences in other parts of the system and avoid testing cloud storage. The instances are not perfectly identical, as you'll see below, but are close enough to be considered same grade. We used Ubuntu 20.04 AMI and PostgreSQL 13.1 from pgdg repo. We performed tests with small (in-memory) and large (io-bound) database sizes. Instances Specifications and On-Demand pricing of the instances as per the AWS Pricing Information for Linux in the Northern Virginia region. With the currently listed prices, m6gd.8xlarge is 25% cheaper. Graviton2 (arm) Instance Graviton2 (ARM) Instance Shell [Instance : m6gd.8xla] 1 Instance : m6gd.8xlarge 2 Virtual CPUs : 32 3 RAM : 128 GiB 4 Storage : 1 x 1900 NVMe SSD (1.9 TiB) 5 Price : $1.4464 per Hour Regular (x86) Instance x86 Instance Shell [Instance : m5d.8xlar] 1 Instance : m5d.8xlarge 2 Virtual CPUs : 32 3 RAM : 128 GiB 4 Storage : 2 x 600 NVMe SSD (1.2 TiB) 5 Price : $1.808 per Hour OS and PostgreSQL setup We selected Ubuntu 20.04.1 LTS AMIs for the instances and didn't change anything on the OS side. On the m5d.8xlarge instance, two local NVMe drives were unified in a single raid0 device. PostgreSQL was installed using .deb packages available from the PGDG repository. The PostgreSQL version string shows confirm the OS architecture Shell [postgres=# select ve] postgres=# select version(); 1 version 2 3 -------------------------------------------------------------------- 4 -------------------------------------------------------------------- 5 PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on aarch64-unknown-linux- gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit (1 row) ** aarch64 stands for 64-bit ARM architecture The following PostgreSQL configuration was used for testing. Shell [max_connections = '2] 1 max_connections = '200' 2 shared_buffers = '32GB' 3 checkpoint_timeout = '1h' 4 max_wal_size = '96GB' 5 checkpoint_completion_target = '0.9' 6 archive_mode = 'on' 7 archive_command = '/bin/true' 8 random_page_cost = '1.0' 9 effective_cache_size = '80GB' 10 maintenance_work_mem = '2GB' 11 autovacuum_vacuum_scale_factor = '0.4' 12 bgwriter_lru_maxpages = '1000' 13 bgwriter_lru_multiplier = '10.0' 14 wal_compression = 'ON' 15 log_checkpoints = 'ON' 16 log_autovacuum_min_duration = '0' pgbench Tests First, a preliminary round of tests is done using pgbench, the micro-benchmarking tool available with PostgreSQL. This allows us to test with a different combination of a number of clients and jobs like: Shell [pgbench -c 16 -j 16 ] 1 pgbench -c 16 -j 16 -T 600 -r Where 16 client connections and 16 pgbench jobs feeding the client connections are used. Read-Write Without Checksum The default load that pgbench creates is a tpcb-like Read-write load. We used the same on a PostgreSQL instance which doesn't have checksum enabled. [x86_arm_ReadWrite_WithoutChecksum] We could see a 19% performance gain on ARM. +----------------+ |x86 (tps) |28878| |----------+-----| |ARM (tps) |34409| +----------------+ Read-Write With Checksum We were curious whether the checksum calculation has any impact on Performance due to the architecture difference. if the PostgreSQL level checksum is enabled. PostgreSQL 12 onwards, the checksum can be enabled using pg_checksum utility as follows: Shell [pg_checksums -e -D $] 1 pg_checksums -e -D $PGDATA [x86_arm_ReadWrite_WithChecksum] +----------------+ |x86 (tps) |29402| |----------+-----| |ARM (tps) |34701| +----------------+ To our surprise, the results were marginally better! Since the difference is around just 1.7%, we consider it as a noise. At least we feel that it is ok to conclude that enabling checksum doesn't have any noticeable performance degradation on these modern processors. Read-Only Without Checksum Read-only loads are expected to be CPU-centric. Since we selected a database size that fully fits into memory, we could eliminate IO related overheads. [x86_arm_ReadOnly_WithoutChecksum] +-------------------+ |x86 (tps)|221436.05| |---------+---------| |ARM (tps)|288867.44| +-------------------+ The results showed a 30% gain in tps for the ARM than the x86 instance. Read-Only With Checksum We wanted to check whether we could observe any tps change if we have checksum enabled when the load becomes purely CPU centric. [x86_arm_ReadOnly_WithChecksum] +---------------------+ |x86 (tps)|221436.0531| |---------+-----------| |ARM (tps)|288867.4406| +---------------------+ The results were very close to the previous one, with 30% gains. In pgbench tests, we observed that as the load becomes CPU centric, the difference in performance increases. We couldn't observe any performance degradation with checksum. Note on checksums PostgreSQL calculates and writes checksum for pages when they are written out and read in the buffer pool. In addition, hint bits are always logged when checksums are enabled, increasing the WAL IO pressure. To correctly validate the overall checksum overhead, we would need longer and larger testing, similar to once we did with sysbench-tpcc. Testing With sysbench-tpcc We decided to perform more detailed tests using sysbench-tpcc. We were mainly interested in the case where the database fits into memory. On a side note, while PostgreSQL on the arm server showed no issues, sysbench was much more finicky compared to the x86 one. Each round of testing consisted of a few steps: 1. Restore the data directory of the necessary scale (10/200). 2. Run a 10-minute warmup test with the same parameters as the large test. 3. Checkpoint on the PG side. 4. Run the actual test. In-memory, 16 threads: In-memory, 16 threads With this moderate load, the ARM instance shows around 15.5% better performance than the x86 instance. Here and after, the percentage difference is based on the mean tps value. You might be wondering why there is a sudden drop in performance towards the end of the test. It is related to checkpointing with full_page_writes. Even though for in-memory testing we used pareto distribution, a considerable amount of pages is going to be written out after each checkpoint. In this case, the instance showing more performance triggered checkpoint by WAL earlier than its counterpart. These dips are going to be present across all tests performed. In-memory, 32 threads: In-memory, 32 threads When concurrency increased to 32, the difference in performance reduced to nearly 8%. In-memory, 64 threads: In-memory, 64 threads Pushing instances close to their saturation point (remember, both are 32-cpu instances), we see the difference reducing further to 4.5%. In-memory, 128 threads: In-memory, 128 threads When both instances are past their saturation point, the difference in performance becomes negligible, although it's still there at 1.4% Additionally, we could observe a 6-7% drop in throughput(tps) for ARM and a 4% drop for x86 when concurrency increased from 64 to 128 on these 32 vCPU machines. Not everything we measured is favorable to the Graviton2-based instance. In the IO-bound tests (~200G dataset, 200 warehouses, uniform distribution), we saw less difference between the two instances, and at 64 and 128 threads, regular m5d instance performed better. You can see this on the combined plots below. [saturation_IO] A possible reason for this, especially the significant meltdown at 128 threads for m6gd.8xlarge, is that it lacks the second drive that m5d.8xlarge has. There's no perfectly comparable couple of instances available currently, so we consider this a fair comparison; each instance type has an advantage. More testing and profiling is necessary to correctly identify the cause, as we expected local drives to negligibly affect the tests. IO-bound testing with EBS can potentially be performed to try and remove the local drives from the equation. More details of the test setup, results of the tests, scripts used, and data generated during the testing are available from this GitHub repo. Summary There were not many cases where the ARM instance becomes slower than the x86 instance in the tests we performed. The test results were consistent throughout the testing of the last couple of days. While ARM-based instance is 25 percent cheaper, it is able to show a 15-20% performance gain in most of the tests over the corresponding x86 based instances. So ARM-based instances are giving conclusively better price-performance in all aspects. We should expect more and more cloud providers to provide ARM-based instances in the future. Please let us know if you wish to see any different type of benchmark tests. Related Author [2f06ebff71] Jobin Augustine Jobin Augustine is a PostgreSQL expert and Open Source advocate and has more than 19 years of working experience as consultant, architect, administrator, writer, and trainer in PostgreSQL, Oracle and other database technologies. He has always been an active participant in the Open Source communities and his main focus area is database performance and optimization. He is a contributor to various Open Source Projects and is an active blogger and loves to code in C++ and Python. Jobin holds a Masters in Computer Applications and joined Percona in 2018 as a Senior Support Engineer. Prior to joining Percona, he worked at OpenSCG for 2 years as Architect and was part of the BigSQL core team, a complete PostgreSQL distribution offering. Previous to his work at OpenSCG, Jobin worked at Dell as Database Senior Advisor for 10 years and 5 years with TCS/CMC. --------------------------------------------------------------------- [ccf0c53f9c] Sergey Kuzmichev Sergey is a support engineer in Percona. Interested in all things databases, he's currently working mainly with MySQL and PostgreSQL. He started his career working as an Oracle DBA, later moving to a DevOps engineer role supporting Java-based trading platform running on PostgreSQL. After being a jack of all trades for a while, he's now focusing on what he enjoys most: open source databases, systems performance, and reliability. --------------------------------------------------------------------- Share this post FacebookTwitterLinkedInEmail Comment (1) * [d19353aefc] Yuriy Safris Reply One note for comparison: m6gd.8xlarge Virtual CPUs : 32 - these are 32 physical cores m5d.8xlarge Virtual CPUs : 32 - these are 32 virtual threads or 16 physical cores Thus, you are comparing 32 physical cores against 16. Considering that the competitors were selected on the basis of comparable value, the comparison can be considered quite correct. But it should be borne in mind that with an equal number of cores, the solution with Graviton2 will be much slower. January 22, 2021 at 2:51 pm Leave a Reply Cancel reply --------------------------------------------------------------------- How Can We Help? Percona's experts can maximize your application performance with our open source database support, managed services or consulting. Contact us Subscribe Want to get weekly updates listing the latest blog posts? Subscribe now and we'll send you an update every Friday at 1pm ET. Subscribe to our blog Blog Poll How often do you upgrade your database software version? * [ ] Whenever there's a new release * [ ] Every other release * [ ] Annually * [ ] Only when we have a problem * [ ] Never [ Vote ] View Results Loading ... Loading ... Categories * MySQL(3374) * Insight for DBAs(1561) * Percona Software(1510) * Percona Events(870) * MongoDB(559) * Insight for Developers(479) * Benchmarks(342) * Percona Live(332) * Webinars(294) * Cloud(286) * PostgreSQL(183) * Monitoring(171) * MariaDB(158) * Percona Services(145) * Security(129) * ProxySQL(129) * Hardware and Storage(105) * Storage Engine(52) * Database Trends(52) * Percona Announcements(10) Percona Blog RSS Feed Upcoming Webinars * PostgreSQL High-Performance Tuning and Optimization * Using PMM to Identify and Troubleshoot Problematic MySQL Queries * MariaDB Observability * MongoDB Atlas vs Managed Community Edition * How to Maximize the Benefits of Using Open Source MongoDB with Percona Distribution for MongoDB All Webinars >> Services * Support * Managed Services * Consulting * Training Products * MySQL Software * MongoDB Software * PostgreSQL Distribution * Kubernetes * Monitoring & Management Resources * Solution Briefs * White Papers * Webinars * Case Studies * Datasheets * Documentation More * Blog * Community Blog * Technical Forum Help About * Customers * Newsroom * About * Careers Contact Us * Sales & General Inquiries * (888) 316-9775 (USA) * (208) 473-2904 (USA) * +44 203 608 6727 (UK) * 0-808-169-6490 (UK) * 0-800-724-4569 (GER) MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners. Proudly running Percona Server for MySQL * * * * * * Terms of Use | Privacy | Copyright | Legal Copyright (c) 2006-2021 Percona LLC. *