https://www.interdb.jp/pg/ * Introduction * Table of Contents + Introduction + 1. Database Cluster, Databases, and Tables o 1.1 Logical Structure of Database cluster o 1.2 Physical Structure of Database cluster o 1.3 Internal Layout of a Heap Table File o 1.4 The Methods of Writing and Reading Tuples + 2. Process and Memory Architecture o 2.1 Process Architecture o 2.2 Memory Architecture + 3. Query Processing o 3.1 Overview o 3.2 Cost Estimation in Single-Table Query o 3.3 Creating the Plan Tree of a Single-Table Query o 3.4 How the Executor Performs o 3.5 Join Operations o 3.6 Creating the Plan Tree of Multiple-Table Query + 4. Foreign Data Wrappers (FDW) and Parallel Query o 4.1 Foreign Data Wrappers (FDW) o 4.2 Parallel Query + 5. Concurrency Control o 5.1 Transaction ID o 5.2 Tuple Structure o 5.3 Inserting, Deleting, and Updating Tuples o 5.4 Commit Log (clog) o 5.5 Transaction Snapshot o 5.6 Visibility Check Rules o 5.7 Visibility Check o 5.8 Preventing Lost Updates o 5.9 Serializable Snapshot Isolation o 5.10 Required Maintenance Processes + 6. VACUUM Processing o 6.1 Outline of Concurrent VACUUM o 6.2 Visibility Map o 6.3 Freeze Processing o 6.4 Removing Unnecessary CLOG Files o 6.5 Autovacuum Daemon o 6.6 Full VACUUM + 7. Heap Only Tuple (HOT) and Index-Only Scans o 7.1 Heap Only Tuple (HOT) o 7.2 Index-Only Scans + 8. Buffer Manager o 8.1 Overview o 8.2 Buffer Manager Structure o 8.3 Buffer Manager Locks o 8.4 How the Buffer Manager Works o 8.5 Ring Buffer o 8.6 Flushing Dirty Pages + 9. Write Ahead Logging (WAL) o 9.1 Overview o 9.2 Transaction Log and WAL Segment Files o 9.3 Internal Layout of WAL Segment o 9.4 Internal Layout of XLOG Record o 9.5 Writing of XLOG Records o 9.6 WAL Writer Process o 9.7 Checkpoint Process in PostgreSQL o 9.8 Database Recovery in PostgreSQL o 9.9 Management of WAL Segment Files o 9.10 Continuous Archiving and Archive Logs + 10. Base Backup & Point-in-Time Recovery o 10.1 Base Backup o 10.2 How Point-in-Time Recovery Works o 10.3 timelineId and Timeline History File o 10.4 Point-in-Time Recovery with Timeline History File + 11. Streaming Replication o 11.1 Starting the Streaming Replication o 11.2 How to Work Streaming Replication o 11.3 Managing More Than One Standby Server o 11.4 Detecting Failures of Standby Servers * Copyright The Internals of PostgreSQL for database administrators and system developers Copyright Diego Zazzeo Introduction PostgreSQL is a well-designed open-source multi-purpose relational database system which is widely used throughout the world. It is one huge system with the integrated subsystems, each of which has a particular complex feature and works with each other cooperatively. Although understanding of the internal mechanism is crucial for both administration and integration using PostgreSQL, its hugeness and complexity prevent it. The main purposes of this document are to explain how each subsystem works, and to provide the whole picture of PostgreSQL. This document is based on the book I wrote in Japanese in 2012, and covers version 14 and earlier. The Chinese version of this document was published in June 2019. Contents * Chapter 1. Database Cluster, Databases and Tables * Chapter 2. Process and Memory Architecture * Chapter 3. Query Processing * Chapter 4. Foreign Data Wrappers (FDW) and Parallel Query * Chapter 5. Concurrency Control * Chapter 6. VACUUM Processing * Chapter 7. Heap Only Tuple (HOT) and Index-Only Scans * Chapter 8. Buffer Manager * Chapter 9. Write Ahead Logging (WAL) * Chapter 10. Base Backup and Point-In-Time Recovery (PITR) * Chapter 11. Streaming Replication Guide map Change History Click here to show the change history since 3rd June, 2018. +-------------------------------------------------------------------+ | Date | Description | |----------+--------------------------------------------------------| |2022-01-02|Added "WAL, Backup, and Replication" in Section 9.1.3. | |----------+--------------------------------------------------------| | |Changed the description in the preface of Chapter 5. | |2021-11-12| | | | * Concurrency Control is a mechanism that maintains | | | [S:consistency:S] atomicity and isolation,... | |----------+--------------------------------------------------------| |2021-03-25|Added a command to delete archiving logs in Section 9.10| | |. | |----------+--------------------------------------------------------| |2020-06-28|Changed the color scheme of the all figures. | |----------+--------------------------------------------------------| |2020-06-19|Added Section 3.5.3.3. | |----------+--------------------------------------------------------| |2020-06-06|Updated the table 4.1 in Section 4.1.2. | |----------+--------------------------------------------------------| | |Added "PARALLEL option" in Section 6.1. | |----------+--------------------------------------------------------| | |Added the description about the "cleaning up indexes" | | |phase in Section 6.1. | |----------+--------------------------------------------------------| |2020-06-04|Added a description related to recovery.conf in Section | | |10.2. | |----------+--------------------------------------------------------| |2019-03-27|Added "Indexes Internals" in Section 1.4.2 and Section | | |3.2.2.3. | |----------+--------------------------------------------------------| | |Fixed Section 3.3.2.1. | |2018-12-15| | | | * Based on the cheapest access path obtained here, [S:| | | a query tree:S] a plan tree is generated. | |----------+--------------------------------------------------------| |2018-10-10|Improved Section 8.1.2. | |----------+--------------------------------------------------------| | |Fixed Section 3.3.2.2. | | | | |2018-10-01| * (3) Create a path, estimate the cost of the | | | sequential scan and add the path to the [S:indexlist| | | :S] pathlist of the RelOptInfo. | |----------+--------------------------------------------------------| |2018-08-24|Added Section 4.1. | |----------+--------------------------------------------------------| |2018-08-13|Changed in Section 9.7. | |----------+--------------------------------------------------------| |2018-08-13|Added "How to Maintain AUTOVACUUM" in Section 6.5. | |----------+--------------------------------------------------------| |2018-08-03|Added "Why the checkpointer was separated from the | | |background writer?" in Section 8.6. | |----------+--------------------------------------------------------| | |Updated for Version 11. | | | | | | 1. Updated Section 6.1.1. | |2018-06-03| 2. Added "WAL segment file size" in Section 9.2. | | | 3. Added "Removal of prior checkpoint in PostgreSQL 11"| | | in Section 9.7.2. | | | 4. Updated Section 9.9.2. | | | 5. Updated Section 10.1.1. | +-------------------------------------------------------------------+ Author Hironobu SUZUKI I graduated from graduate school of information engineering (M.S. in Information Engineering), have worked for several companies as a software developer and technical manager/director. I published seven books in the fields of database and system integration (4 PostgreSQL books and 3 MySQL books). In June 2019, the Chinese book of this document was published. As a director of the Japan PostgreSQL Users Group (2010-2016), I organized the largest (non-commercial) technical seminar/lecture of PostgreSQL in Japan for more than six years, and also served as the program committee chair of the Japan PostgreSQL Conference in 2013 and as a member in 2008 and 2009. In 2021, I released pg_plan_inspector that is a framework to monitor and improve the performance of PostgreSQL using Machine Learning methods. Cuando era joven, vivio en Sudamerica por unos anos. Recientemente, a veces vuelve a alli. Now I live in Zurich. I am considering looking for a new job in Swiss or the EU. I love Swiss and like the company I work for, however, I would like to do new challenges, e.g. applying ML and AI technologies to DBMS. I am waiting for contacts from people who want to do new things. Blog Contact Please read this FAQ before sending messages. After reading, send a message to my twitter in public. Copyright (c) Copyright ALL Right Reserved, Hironobu SUZUKI. If you want to use any part of this document and/or any figure, please contact me. If you work at Amazon, you cannot use and refer to this document because of the copyright violation issues. Exception: Educational institutions can use this document freely. (c) Copyright Hironobu SUZUKI All Rights Reserved.