PostgreSQL has had replication features for some time now. With each new release the configuration process for replication and failover mechanisms are getting better. However, there are still some rough edges here and there especially for things like recovering a failed master.
In this blog post I'll explain how to setup a master-slave replication strategy for PostgreSQL 9.4 using Replication Manager and PgBouncer. The goal of the setup is to have clients (a web application) seamlessly continue to function in case of a database node failure.
While most of the 9.x series of PostgreSQL have replication support, 9.4 in particular adds a feature which makes replication setup easier. In 9.4 replication slots were introduced, which mean that the Master will keep the WAL segments until all nodes following the master have caught up with the replication. In previous versions a prolonged failure of a slave might cause it to be behind the master far enough that the WAL segments in the master has already been removed. This was usually mitigated by having very large WAL sizes but still required keeping regular database backups to recreate the slave in case the WAL wasn't enough to catchup to the master. For more on replication slots see http://www.postgresql.org/docs/devel/static/warm-standby.html#STREAMING-REPLICATION-SLOTS
Replication Manager (repmgr) 3.0
repmgr is an open-source tool suite to manage replication and failover in a cluster of PostgreSQL servers. It enhances PostgreSQL's built-in hot-standby capabilities with tools to set up standby servers, monitor replication, and perform administrative tasks such as failover or manual switchover operations.
PgBouncer is connection pooler for PostgreSQL databases and acts as gateway for clients to connect to the databases. The main reason for using pgBouncer is so that we don't have to change clients configurations in case of database failover between nodes. pgBouncer allows for backend database configuration changes by just editing its configuration file and reloading the process.
All in a picture
Let's start shall we!
Building the Setup
All the steps below will use the following terms:
dbhost1: first database host running on 192.168.0.100
dbhost2: second database host running on 192.168.0.101
pgbouncer-host: host running pgbouncer on 192.168.0.102
- Install dependencies.
- Configure ssh access between db nodes.
- Configure databases.
- Configure PostgreSQL replication.
- Configure replication manager.
- Clone slave.
- Configure pgBouncer.
Each step title will have between parentheses the hosts that the step applies to.
1) Install Dependences (dbhost1, dbhost2, pgbouncer-host)
The default ubuntu 14.04 installation does not include PostgreSQL 9.4 so we will need to add the repository:
Create the file
/etc/apt/sources.list.d/pgdg.list, and add a line for the repository
deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main
Import the repository signing key, and update the package lists
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - sudo apt-get update
Install the dependencies for database hosts (dbhost1,dbhost2)
sudo apt-get install postgresql-9.4 sudo apt-get install postgresql-9.4-repmgr sudo apt-get install postgresql-client-9.4
Install the dependencies for pgBouncer host (pgbouncer-host)
sudo apt-get install postgresql-client-9.4 sudo apt-get install pgbouncer
2) Configure password-less ssh between dbhosts (dbhost1,dbhost2)
by default postgreSQL creates a postgres local linux user which is used to do all database management operations. To create slaves or recreate a master after failure repmgr requires that the user running the commands have password-less ssh access to the source host. Since the postgres user is the user which has permissions on the postrgres directories, we will configure that user for ssh.
switch to the user
$ sudo su postgres
create ssh key pair
copy the publich key
id_rsa.pub contents and add it to the
authorized_keys file in the other host's postgres user ssh directory. For more details see https://github.com/2ndQuadrant/repmgr/blob/REL3_3_STABLE/SSH-RSYNC.md
test that you can ssh between servers before going to the next step.
3) Configure Databases and users (dbhost1)
We will need two databases one will be used by repmgr and the other will be our application database. All commands are to be executed using the postgres user.
for repmgr database
$ createuser -s repmgr $ createdb repmgr -O repmgr
for our application database
$ createuser test_user $ createdb test_db -O test_user
since this user will be used by the applications (clients) we will set a password for it. Run the following command from inside a psql shell with the default postgres user.
postgres=# ALTER USER test_user WITH PASSWORD '<newpassword>';
4) Configure PostgreSQL replication (dbhost1)
We will now configure the master db host (dbhost1) for replication. This is acheived by editing the
postgresql.conf file and updating the follwing items
listen_addresses='*' wal_level = 'hot_standby' archive_mode = on archive_command = 'cd .' max_wal_senders = 10 max_replication_slots = 1 hot_standby = on shared_preload_libraries = 'repmgr_funcs'
We also need to configure the authentication configuration (
pg_hba.conf) to allow repmgr to access replication database and also allow password based access for our application database (test_db). We will add the following lines:
host repmgr repmgr 192.168.0.0/0 trust host replication repmgr 192.168.0.0/0 trust host test_db test_user 192.168.0.0/0 md5
restart the postgres service after the changes are done
$ sudo service postgresql restart
5) Configure Replication Manager (dbhost1,dbhost2)
We will need to create a
repmgr.conf on each database host. The file could be stored anywhere but we will use
The file contents for the dbhost1 should be
cluster=test node=1 node_name=node1 use_replication_slots=1 conninfo='host=dbhost1 user=repmgr dbname=repmgr' pg_bindir=/usr/lib/postgresql/9.4/bin
We are now ready to join our master to the replication topology. Using the postgres user run the command
$ repmgr -f /etc/repmgr/repmgr.conf master register
and similarly for dbhost2 we need to create the file
/etc/repmgr/repmgr.conf with the contents
cluster=test node=2 node_name=node2 use_replication_slots=1 conninfo='host=dbhost2 user=repmgr dbname=repmgr' pg_bindir=/usr/lib/postgresql/9.4/bin
6) Clone Slave (dbhost2)
ssh into our slave server dbhost
stop the postgresql service
$ sudo service postgresql stop
run the following command using the postgres user to clone the master
$ repmgr -f /etc/repmgr/repmgr.conf --force --rsync-only -h dbhost1 -d repmgr -U repmgr --verbose standby clone
start the postgresql server
$ sudo service postgresql start
register the slave as a standby
$ repmgr -f /etc/repmgr/repmgr.conf --force standby register
if all the previous steps ran successfully we should be able to see the state of our replication cluster using the command
$ repmgr -f /etc/repmgr/repmgr.conf cluster show
and it should return an output like this
Role | Connection String * master | host=dbhost1 user=repmgr dbname=repmgr standby | host=dbhost2 user=repmgr dbname=repmgr
7) Configure pgBouncer (pgbouncer-host)
All pgBouncer needs is a configuration file
pgbouncer.ini which can be stored anywhere in the system. The contents of the file should be
[databases] test_db = host=dbhost1 port=5432 dbname=test_db [pgbouncer] listen_port = 6543 listen_addr = 127.0.0.1 auth_type = md5 auth_file = users.txt logfile = pgbouncer.log pidfile = pgbouncer.pid admin_users = test_user
The configuration above instructs pgBouncer to listen for connections on port 6543 and relay connections to the database test_db to our current master dbhost1 on port 5432.
Because pgBouncer uses md5 based authentication with postgreSQL 8.0 style files we need to create a
users.txt file which will have the user credentials for the database users. A sample of this file is below
To run pgBouncer we run the command
$ pgbouncer -d pgbouncer.ini -R
Congratulations!! you now have a running master-slave replicated postgreSQL setup.
8) Failover (dbhost2,pgbouncer-host)
In case of failover due to the master node failure. All that is needed to do is first make sure the master is truely down. Preferably, stopping the postgresql service if it is running. Then we just need to promote our slave to become the new master using the command:
$ repmgr -f /etc/repmgr/repmgr.conf standby promote
and then change the pgBouncer configuration to point to dbhost2 and re run the command
$ pgbouncer -d pgbouncer.ini -R
Update 1 (2/11/2017)
Updated link to password less ssh. Thanks for @darius for the correction.