Migrate PostgreSQL Databases Using pglogical
Introduction
To reduce the latency of database queries, we have decided to migrate our Postgres database from one zone to a zone closer to our application.
TLDR
To minimize application downtime, we have chosen continuous migration using pglogical.
Change config
We need to modify the database configuration, enable the pglogical extension,
and restart the instance.
If you have multiple databases to migrate,
please check worker registration failed
for configuring max_worker_processes
correctly.
You can use SHOW
to check the config parameter:
DDL
To start with pglogical, it's crucial to ensure that the database structures
of both the provider and subscriber are identical.
We employ our goose based migration for this purpose.
However when creating a subscription using pglogical.create_subscription()
,
you can set the synchronize_structure
parameter to true
,
but it requres that the owner of these structures also exists in the subscriber.
Create subscription
On the provider node:
On the subscriber node:
Here we sync the order
database.
Sync sequences
One common mistake people make is forgetting the last values of sequences, which can lead to duplicated key errors.
We write a script to do this. The new last value will be 1.1 times the origin, factoring in subscription latency. The primary SQL statement is:
Check rows
We've crafted a script to verify the equality of both the row count and a randomly selected subset of rows. See: https://gist.github.com/ionling/10f50bf3d77040fa8bb4f6695c23befe
Drop subsription
Once we confirm that the source and target databases are in a consistent state, we can proceed to drop the subscription.
SQL | |
---|---|
Solutions
Here, we will explore several solutions for the migration process.
Dump and restore
Dumping the old database and restoring it to the new database is a viable approach. However, it comes with the drawback of causing a significant downtime, which is undesirable. During this downtime, tasks such as database restoration, data verification, and service restarts are necessary, consuming a considerable amount of time.
Streaming replication
PostgreSQL offers native support for streaming replication, facilitating continuous synchronization of data from a source database to a new one. This approach significantly reduces service downtime. By configuring replication in advance, the migration process only requires restarting services, making it more efficient and minimizing downtime.
Multi-master
The best solution is to combine the old and new databases in a multi-master cluster. This allows us to smoothly migrate any service that relies on the old database to the new without any downtime.
One implementation for achieving this is through the spock extension. However, the performance and potential issues of this extension might be a concern. For more insights, you can refer to the blog post: How to achieve multi-master replication in PostgreSQL with Spock
Summary
To strike a balance between complexity and functionality, we ultimately opted for the streaming replication solution.
AWS has a blog on database upgrading that is also useful for migrating:
- https://aws.amazon.com/cn/blogs/database/part-1-upgrade-your-amazon-rds-for-postgresql-database-comparing-upgrade-approaches/
- https://aws.amazon.com/cn/blogs/database/part-2-upgrade-your-amazon-rds-for-postgresql-database-using-the-pglogical-extension/
Notes
Could not open relation with OID
When debugging the subscription, I mistakenly dropped the pglogical extension, deleted the pglogical schema, and subsequently recreated it. As a consequence, I encounter a error when running certain pg commands:
Text Only | |
---|---|
To resolve it, just reconnect the db.
See https://github.com/2ndQuadrant/pglogical/issues/347
PostgreSQL priveledges
Some priveledges are not intuitive, like this:
SQL | |
---|---|
grants privileges on the database itself, not things within the database. admin can now drop the database, still without being able to create tables in schema public.^1
ERROR: worker registration failed
When creating a subscription, we encountered the following error:
Text Only | |
---|---|
According to the pglogical README:
one process per node is needed on the subscriber node.
After ensuring that we do not exceed the limit, we started searching for related documents and encountered two conflicting pieces of information.
One, from EnterpriseDB, states:
One per database + two per subscription on the subscriber (downstream).
The other, from GitHub Issue #7, mentions:
One worker needed for the main maintenance process, one per subscription, one per replicated database, and during startup, it can need one per any connectable database on top of that.
If we have two databases with one subscription for each database:
- According to quote one, we will need
2*1 + 2*2 = 6
processes. - According to quote two, we will need
2*2 = 4
processes.
Which one is correct? I don't know.