Imagine this scenario. You just realized you committed a transaction that incorrectly updated millions of rows in a table. Worse still, the affected rows were inserted into the table after you took the last backup of the database, and you have no access to the original values. This can really be a disaster if you don’t have any precautions.
Continuous archiving, which enables point-in-time recovery, is the precaution to take if you are a PostgreSQL user. Depending on the use case, the configurations and operations can be very sophisticated. In this post, however, I will introduce my way of doing it, which I tried to keep simple. I will try to explain each step, so hopefully it’ll be more informative for beginners.
The instructions presented below are based on PostgreSQL 10.1 (which is compiled from source with default configurations), with Ubuntu 16.04 as the operating system.
When we talk about continuous archiving, we are referring to the process of archiving Write Ahead Log (WAL) files. Here’s a description for the WAL file from PostgreSQL 10 documentation:
At all times, PostgreSQL maintains a write ahead log (WAL) in the pg_wal/ (note that in PostgreSQL 9.6 and earlier, the name of the directory was pg_xlog/) subdirectory of the cluster’s data directory. The log records every change made to the database’s data files. This log exists primarily for crash-safety purposes: if the system crashes, the database can be restored to consistency by “replaying” the log entries made since the last checkpoint. However, the existence of the log makes it possible to use a third strategy for backing up databases: we can combine a file-system-level backup with backup of the WAL files. If recovery is needed, we restore the file system backup and then replay from the backed-up WAL files to bring the system to a current state.
And here’s how point-in-time recovery works.
It is not necessary to replay the WAL entries all the way to the end. We could stop the replay at any point and have a consistent snapshot of the database as it was at that time. Thus, this technique supports point-in-time recovery: it is possible to restore the database to its state at any time since your base backup was taken.
Setting up continuous archiving of Write Ahead Log (WAL)
The server configuration file
postgres.conf resides in the database cluster directory. To enable continuous archiving of WAL for point-in-time recovery, open the file and set the following parameters:
wal_level = replica archive_mode = on archive_command = 'if test ! -d "archived_wal"; then mkdir "archived_wal"; fi; test ! -f "archived_wal/%f" && gzip < "%p" > "archived_wal/%f"'
wal_level parameter determines how much information is written to WAL files. The default value,
replica, writes enough information to support point-in-time-recovery.
on, a completed WAL file will be archived using the
archive_command, as opposed to being recycled when
archive_command is a shell command, executed by the server to archive a WAL file once it’s completed. The essential function of the command is moving the recently completed WAL file to a specified location for archiving. However, it is often necessary to perform other actions as well, and you can customize the command to tailor to your needs. Note that, in the command string,
%p will be replaced by the path name of the WAL file to archive, while
%f will be replaced by only the file name. Use
%% if you need to embed an actual
% character in the command. Also note that path names are relative to the database cluster directory.
archive_command presented above first creates a directory named
archived_wal (in the database cluster directory), if it doesn’t already exist. Then it tests whether a file with the same name as the recently completed WAL file already exists in the
archived_wal directory. If there is an existing file, the command will not proceed (and hence avoids overwriting), but it will be executed periodically until the condition is dealt with. On the other hand, if there is no such existing file, the command will compress the WAL file and send it to the
archived_wal directory. Note that the command will be executed under the ownership of the same user that the PostgreSQL server is running as. Therefore, you should make sure the user has necessary privileges (e.g., read/write access to the directories/files).
After the changes to
postgres.conf file have been saved, restart the server if it’s running to let the settings take effect.
Making a base backup using
After enabling continuous archiving, a backup of the database cluster directory should be taken using the
pg_basebackup command. Should a point-in-time recovery be required after an accident happens, the backup (which must be taken before the accident) can be used to replay WAL files and restore the database to a desirable state.
Suppose an accident happens, and I have to perform a point-in-time recovery on a previously taken backup. Here’s the workflow:
- Stop the server, if it’s running.
- Make a copy of the database cluster directory backup and perform the following steps on the copy, in case I need to experiment with the recovery to find the most desirable state of the database to restore to.
- Remove the
archived_waldirectory from the backup. This directory only contains WAL files generated before the backup was taken. Such WAL files are not needed when restoring the database to any state after the backup was taken.
- Copy the
archived_waldirectory from the current database cluster (which just encountered the accident) to the backup. This directory should contain a complete collection of archived WAL files.
- It should also be noted that, in addition to the archived WAL files in the
archived_waldirectory, the recovery process may also require unarchived WAL files in the
pg_waldirectory. Since the WAL files in the
pg_waldirectory from the backup are also probably obsolete, I usually just remove this directory from the backup and replace it with a copy of the
pg_waldirectory from the current database cluster.
- Create a recovery command file
recovery.confin the backup. This file provides the instructions on how the recovery should be performed (see below for an example).
- Start the server on the backup. The server will go into recovery mode and proceed to read through the archived WAL files it needs, based on the instructions provided by the
recovery.conffile. Upon completion of the recovery process, the server will rename
recovery.doneto prevent accidentally re-entering recovery mode later.
- Remove the
Here’s what I usually put in a
restore_command = 'gunzip < "archived_wal/%f" > "%p"' recovery_target_time = 'yyyy-mm-dd hh:mm:ss.sss'
restore_command is the local shell command to execute to retrieve the archived WAL files. The meanings of
%p are the same as those in the
archive_command. Note that since the
archive_command presented earlier compresses a WAL file before archiving it, the
restore_command should do the contrary and uncompress the WAL file before restoring it.
recovery_target_time specifies the timestamp up to which recovery will proceed. For example, if I committed a problematic transaction at 2018-02-10 22:15:30.000 and I want to restore the database to a recent state without the transaction, then I can set
recovery_target_time = '2018-02-10 22:15:29.999'. See the next subsection to learn some useful tips on how to find the exact timestamp to restore to.
To restore the database to a desirable state, it’s important that you specify a correct recovery target. If you are using the
recovery_target_time parameter to specify the target (which I find is the most useful way), then you need to know the exact time stamp up to which recovery should proceed. Server message logs are great resources for you to find the ideal timestamp. With some additional settings in the
postgres.conf file, you can find the accomplish the goal with much more ease and confidence.
log_destination = 'csvlog' logging_collector = on log_statement = 'all'
log_destination = 'csvlog', server message logs will be generated in “comma separated value” (CSV) format which, in my opinion, is much easier to read compared with other formats.
logging_collector must be enabled to generate CSV-format log output.
log_statement controls which SQL statements are logged. This setting should be set to
'all', in which case data definition statements (e.g., CREATE, ALTER, and DROP statements), data modifying statements (e.g., INSERT, UPDATE, and DELETE statements), as well as transactional control language (e.g., BEGIN, COMMIT, and ROLLBACK statements) will all be logged. These statements provide necessary information to find the problematic transaction.
By default, the log output files are stored in the
log directory under the running database cluster. The first column stores the time stamp (which includes a time zone) when the message is generated. There’s a column that stores SQL statements sent to the server. With these two columns, you should be able to find the time when the problematic transaction is sent to the server, and decide the proper time to restore the database to.
I find that point-in-time recovery doesn’t seem to work properly when a time zone is specified in the
recovery_target_time parameter (e.g., ‘yyyy-mm-dd hh:mm:ss.sss tz’). I installed Windows and Linux on the same machine, and to fix the dual boot time conflict, I made Linux treat hardware clock as local time. Perhaps this setting affects how the server treats the
recovery_target_time parameter with a time zone. Anyway, a time zone is not needed if the database is restored on the same machine where the server has been running.
recovery_target_inclusive parameter, which can be specified in the
recovery.conf, determines whether to stop before or after the recovery target. This parameter defaults to
true. However, I found it not true. For example, if I set
recovery_target_time = '2018-02-15 09:00:00.000', then a transaction committed at the same time (which is indicated by
csvlog) would not be restored. My guess is that the
csvlog truncates some digits of the time, so that for example,
2018-02-15 09:00:00.000001 becomes
2018-02-15 09:00:00.000. As a result, the restoration is not likely to be inclusive. Adjust the time stamp slightly forward (e.g.,
2018-02-15 09:00:00.001) if you really want to make sure the restoration is inclusive.
This post shows you how to implement continuous archiving and point-in-time recovery with PostgreSQL. However, note that the configurations and workflow do not necessarily represent the best practices. For example, you may prefer to archive WAL files in other location(s) instead of in the database cluster directory. Refer to the official documentation for more information on this topic, and figure out the best practice for yourself!