Continuous Archiving and Point-in-Time Recovery with PostgreSQL
Introduction
Imagine this scenario. You just realized you committed a transaction that incorrectly updated millions of rows in a table. Worse still, the affected rows were inserted into the table after you took the last backup of the database, and you have no access to the original values. This can really be a disaster if you don’t have any precautions.
Continuous archiving, which enables point-in-time recovery, is the precaution to take if you are a PostgreSQL user. Depending on the use case, the configurations and operations can be very sophisticated. In this post, however, I will introduce my way of doing it, which I tried to keep simple. I will try to explain each step, so hopefully it’ll be more informative for beginners.
The instructions presented below are based on PostgreSQL 10.1 (which is compiled from source with default configurations), with Ubuntu 16.04 as the operating system.
Some background
When we talk about continuous archiving, we are referring to the process of archiving Write Ahead Log (WAL) files. Here’s a description for the WAL file from PostgreSQL 10 documentation:
At all times, PostgreSQL maintains a write ahead log (WAL) in the pg_wal/ (note that in PostgreSQL 9.6 and earlier, the name of the directory was pg_xlog/) subdirectory of the cluster’s data directory. The log records every change made to the database’s data files. This log exists primarily for crash-safety purposes: if the system crashes, the database can be restored to consistency by “replaying” the log entries made since the last checkpoint. However, the existence of the log makes it possible to use a third strategy for backing up databases: we can combine a file-system-level backup with backup of the WAL files. If recovery is needed, we restore the file system backup and then replay from the backed-up WAL files to bring the system to a current state.
And here’s how point-in-time recovery works.
It is not necessary to replay the WAL entries all the way to the end. We could stop the replay at any point and have a consistent snapshot of the database as it was at that time. Thus, this technique supports point-in-time recovery: it is possible to restore the database to its state at any time since your base backup was taken.
Setting up continuous archiving of Write Ahead Log (WAL)
The server configuration file postgres.conf
resides in the database cluster directory. To enable continuous archiving of WAL for point-in-time recovery, open the file and set the following parameters:
wal_level = replica
archive_mode = on
archive_command = 'if test ! -d "archived_wal"; then mkdir "archived_wal"; fi; test ! -f "archived_wal/%f" && gzip < "%p" > "archived_wal/%f"'
The wal_level
parameter determines how much information is written to WAL files. The default value, replica
, writes enough information to support point-in-time-recovery.
When archive_mode
is on
, a completed WAL file will be archived using the archive_command
, as opposed to being recycled when archive_mode
is off
.
The archive_command
is a shell command, executed by the server to archive a WAL file once it’s completed. The essential function of the command is moving the recently completed WAL file to a specified location for archiving. However, it is often necessary to perform other actions as well, and you can customize the command to tailor to your needs. Note that, in the command string, %p
will be replaced by the path name of the WAL file to archive, while %f
will be replaced by only the file name. Use %%
if you need to embed an actual %
character in the command. Also note that path names are relative to the database cluster directory.
The example archive_command
presented above first creates a directory named archived_wal
(in the database cluster directory), if it doesn’t already exist. Then it tests whether a file with the same name as the recently completed WAL file already exists in the archived_wal
directory. If there is an existing file, the command will not proceed (and hence avoids overwriting), but it will be executed periodically until the condition is dealt with. On the other hand, if there is no such existing file, the command will compress the WAL file and send it to the archived_wal
directory. Note that the command will be executed under the ownership of the same user that the PostgreSQL server is running as. Therefore, you should make sure the user has necessary privileges (e.g., read/write access to the directories/files).
After the changes to postgres.conf
file have been saved, restart the server if it’s running to let the settings take effect.
Making a base backup using pg_basebackup
After enabling continuous archiving, a backup of the database cluster directory should be taken using the pg_basebackup
command. Should a point-in-time recovery be required after an accident happens, the backup (which must be taken before the accident) can be used to replay WAL files and restore the database to a desirable state.
Point-in-time recovery
Suppose an accident happens, and I have to perform a point-in-time recovery on a previously taken backup. Here’s the workflow:
- Stop the server, if it’s running.
- Make a copy of the database cluster directory backup and perform the following steps on the copy, in case I need to experiment with the recovery to find the most desirable state of the database to restore to.
- Remove the
archived_wal
directory from the backup. This directory only contains WAL files generated before the backup was taken. Such WAL files are not needed when restoring the database to any state after the backup was taken. - Copy the
archived_wal
directory from the current database cluster (which just encountered the accident) to the backup. This directory should contain a complete collection of archived WAL files. - It should also be noted that, in addition to the archived WAL files in the
archived_wal
directory, the recovery process may also require unarchived WAL files in thepg_wal
directory. Since the WAL files in thepg_wal
directory from the backup are also probably obsolete, I usually just remove this directory from the backup and replace it with a copy of thepg_wal
directory from the current database cluster. - Create a recovery command file
recovery.conf
in the backup. This file provides the instructions on how the recovery should be performed (see below for an example). - Start the server on the backup. The server will go into recovery mode and proceed to read through the archived WAL files it needs, based on the instructions provided by the
recovery.conf
file. Upon completion of the recovery process, the server will renamerecovery.conf
torecovery.done
to prevent accidentally re-entering recovery mode later.
- Remove the
Here’s what I usually put in a recovery.conf
file:
restore_command = 'gunzip < "archived_wal/%f" > "%p"'
recovery_target_time = 'yyyy-mm-dd hh:mm:ss.sss'
restore_command
is the local shell command to execute to retrieve the archived WAL files. The meanings of %f
and %p
are the same as those in the archive_command
. Note that since the archive_command
presented earlier compresses a WAL file before archiving it, the restore_command
should do the contrary and uncompress the WAL file before restoring it.
recovery_target_time
specifies the timestamp up to which recovery will proceed. For example, if I committed a problematic transaction at 2018-02-10 22:15:30.000 and I want to restore the database to a recent state without the transaction, then I can set recovery_target_time = '2018-02-10 22:15:29.999'
. See the next subsection to learn some useful tips on how to find the exact timestamp to restore to.
Additional considerations
To restore the database to a desirable state, it’s important that you specify a correct recovery target. If you are using the recovery_target_time
parameter to specify the target (which I find is the most useful way), then you need to know the exact time stamp up to which recovery should proceed. Server message logs are great resources for you to find the ideal timestamp. With some additional settings in the postgres.conf
file, you can find the accomplish the goal with much more ease and confidence.
log_destination = 'csvlog'
logging_collector = on
log_statement = 'all'
If log_destination = 'csvlog'
, server message logs will be generated in “comma separated value” (CSV) format which, in my opinion, is much easier to read compared with other formats.
logging_collector
must be enabled to generate CSV-format log output.
log_statement
controls which SQL statements are logged. This setting should be set to 'all'
, in which case data definition statements (e.g., CREATE, ALTER, and DROP statements), data modifying statements (e.g., INSERT, UPDATE, and DELETE statements), as well as transactional control language (e.g., BEGIN, COMMIT, and ROLLBACK statements) will all be logged. These statements provide necessary information to find the problematic transaction.
By default, the log output files are stored in the log
directory under the running database cluster. The first column stores the time stamp (which includes a time zone) when the message is generated. There’s a column that stores SQL statements sent to the server. With these two columns, you should be able to find the time when the problematic transaction is sent to the server, and decide the proper time to restore the database to.
I find that point-in-time recovery doesn’t seem to work properly when a time zone is specified in the recovery_target_time
parameter (e.g., ‘yyyy-mm-dd hh:mm:ss.sss tz’). I installed Windows and Linux on the same machine, and to fix the dual boot time conflict, I made Linux treat hardware clock as local time. Perhaps this setting affects how the server treats the recovery_target_time
parameter with a time zone. Anyway, a time zone is not needed if the database is restored on the same machine where the server has been running.
The recovery_target_inclusive
parameter, which can be specified in the recovery.conf
, determines whether to stop before or after the recovery target. This parameter defaults to true
. However, I found it not true. For example, if I set recovery_target_time = '2018-02-15 09:00:00.000'
, then a transaction committed at the same time (which is indicated by csvlog
) would not be restored. My guess is that the csvlog
truncates some digits of the time, so that for example, 2018-02-15 09:00:00.000001
becomes 2018-02-15 09:00:00.000
. As a result, the restoration is not likely to be inclusive. Adjust the time stamp slightly forward (e.g., 2018-02-15 09:00:00.001
) if you really want to make sure the restoration is inclusive.
Conclusions
This post shows you how to implement continuous archiving and point-in-time recovery with PostgreSQL. However, note that the configurations and workflow do not necessarily represent the best practices. For example, you may prefer to archive WAL files in other location(s) instead of in the database cluster directory. Refer to the official documentation for more information on this topic, and figure out the best practice for yourself!