# Synchronous Replication
# Synchronous Replication
Replication is a common technique used in distributed databases to achieve scalable data distribution for better fault tolerance. Multiple replicas of a primary database server are created for higher durability. One of the replication methods is to update each replica as part of a single atomic transaction, also known as synchronous replication. Consensus algorithms apply this approach to achieve strong consistency on a replicated data set. immudb now supports the option for synchronous replication.
# Architecture
In synchronous replication, each commit of a write transaction will wait until there is a confirmation that the commit has been committed to both the primary and quorum of replica server(s). This method minimizes the possibility of data loss.
immudb uses a quorum-based technique to enforce consistent operation in a distributed cluster. A quorum of replicas is used to ensure that synchronous replication is achieved even when replication is not completed across all replica servers. A quorum is a majority of the number of replicas in a cluster setup. The quorum can be set when creating or updating the database on the primary node.
The primary server will wait for acknowledgment from a quorum of replica server(s) that each transaction is durably stored before proceeding. The drawback is that if enough replica server(s) go down or can’t commit a transaction, and the quorum is not reached, the primary server goes into a hung state.
Comparing this to the asynchronous replication mode, the primary server does not need to wait for transaction-completion acknowledgment from the replica server. The replication transactions queue up on the replica server, and the two servers can remain out-of-sync for a specified time until the processing completes.
immudb provides support for synchronous replication by means of a follower approach. There are two grpc endpoint used for replication:
ExportTx
: Used by replicas to fetch precommitted transactions from the primary database server, and also to send the current database state to update the primary server.ReplicateTx
: Used by replicas to commit precommitted transactions (fetched from the primary) on the replica server.
The primary server keeps a record of the current state of each replica. The current state of each replica is updated through the ExportTx
grpc call from the replica server. So when a new transaction request comes to the primary server, it precommits the transaction, and checks if a quorum (on the transaction) has been reached by the replica server(s) by checking their state continuously. If the quorum was reached, the transaction is marked as successful.
# Deciding on number of servers in a cluster
Synchronous replication in a cluster can function only if the majority of servers are up and running. In systems with enabled data replication, it is important to consider the throughput of write operations. Every time data is written to the cluster, it needs to be copied to multiple replicas. Every additional server adds some overhead to complete this write. The latency of data write is directly proportional to the number of servers forming the quorum.
# Settings
Synchronous replication is enabled per database. The following flags in the immuadmin
tool will help in setting up synchronous replication for your database.
Flags:
--replication-allow-tx-discarding allow precommitted transactions to be discarded if the replica diverges from the primary
--replication-commit-concurrency uint32 number of concurrent replications (default 10)
--replication-is-replica set database as a replica
--replication-prefetch-tx-buffer-size uint32 maximum number of prefeched transactions (default 100)
--replication-primary-database string set primary database to be replicated
--replication-primary-host string set primary database host
--replication-primary-password string set password used for replication to connect to the primary database
--replication-primary-port uint32 set primary database port
--replication-primary-username string set username used for replication to connect to the primary database
--replication-sync-acks uint32 set a minimum number of replica acknowledgements required before transactions can be committed
--replication-sync-enabled enable synchronous replication
# Setup
This setup guides you through a simple demonstration of how synchronous replication works in immudb. Starting with a 2-node local cluster, you'll write some data and verify that it replicates in sync.
# Before you begin
Make sure you already have immudb installed.
Since you're running a local cluster, all nodes use the same hostname (
localhost
).
# Step 1. Start the cluster
Run the primary server:
$ immudb --dir data_primary
In a new terminal, start replica server:
$ immudb --dir data_replica \ --port=3324 \ --pgsql-server=false \ --metrics-server=false
In a new terminal, use the
immuadmin
command to create a database on the primary server:Login to immudb
$ immuadmin login immudb
Create a database
db
that requires 1 confirmation from the synchronous replicas to do the commit.Note that the number of confirmations needed (
--replication-sync-acks
option) should be set toceil(number of replicas/2)
to achieve majority-based quorum.$ immuadmin database create primarydb \ --replication-sync-acks 1 \ --replication-sync-enabled
At this point, the
primarydb
has been created on the primary server.Use the
immuadmin
command to create a database on the replica server:Login to immudb
$ immuadmin login immudb -p 3324
Create a database
replicadb
which will sync from the primary server's databaseprimarydb
$ immuadmin database create replicadb -p 3324 \ --replication-is-replica \ --replication-primary-host 127.0.0.1 \ --replication-primary-port 3322 \ --replication-primary-database primarydb \ --replication-primary-username immudb \ --replication-primary-password immudb \ --replication-sync-enabled \ --replication-prefetch-tx-buffer-size 1000 \ --replication-commit-concurrency 100
At this point, the
replicadb
has been created on the replica server to sync with theprimarydb
on primary server.
# Step 2. Send a request
Use the
immuclient
command to commit a transaction on the primary server:Login to immudb
$ immuclient login immudb
Select database
$ immuclient use primarydb
Set a value
$ immuclient safeset foo bar
Verify the transaction on the replica server using the
immuclient
command:Login to immudb
$ immuclient login immudb -p 3324
Select database
$ immuclient use primarydb -p 3324
Verify the key
$ immuclient safeget foo -p 3324
# Step 3. Stop the replica server
Stop the replica server running on port 3325
Send a transaction to the primary server:
Login to immudb
$ immuclient login immudb
Select database
$ immuclient use primarydb
Set a value
$ immuclient safeset foo bar
The client will block. This is because the primarydb requires 1 sync replica, and since the replica server is down, there is no ack from the replica server, hence synchronous transaction is blocked.
# Recovering from a replica loss
The primary node will continue read and write operations as long as the required quorum of replicas can send write confirmation to the primary node. If there are not enough confirmations, write operations will be queued and will wait for enough replicas to synchronize with the cluster. Read operations in such cases will continue to work.
The simplest way to recover the replica is to simply add a new replica into the cluster and setup replication in the same way as during the initial cluster setup, e.g.:
$ immuadmin database create replicadb -p 3324 \
--replication-is-replica \
--replication-primary-host 127.0.0.1 \
--replication-primary-port 3322 \
--replication-primary-database primarydb \
--replication-primary-username immudb \
--replication-primary-password immudb \
--replication-sync-enabled \
--replication-prefetch-tx-buffer-size 1000 \
--replication-commit-concurrency 100
The new replica will start fetching transactions from the primary node and as soon as it synchronizes all transactions it will become a valid member of the quorum for transaction confirmation.
# Speeding up initial replica synchronization
The synchronization process of a new replica may take a lot of time if the database is large or has to handle a lot of normal traffic. Such replica will fetch all transactions performing additional checksum calculations and validations. That way the security of the whole cluster is further hardened revealing tampering attempts in any transaction in the database including those transactions that were not accessed for a very long time.
There are situations however when the speed of recovery is crucial. In such a situations the data of the database may be copied from another cluster node. This should be done while the database is unloaded:
# Step 1. Create replica database
$ immuadmin database create replicadb -p 3324 \
--replication-is-replica \
--replication-primary-host 127.0.0.1 \
--replication-primary-port 3322 \
--replication-primary-database primarydb \
--replication-primary-username immudb \
--replication-primary-password immudb \
--replication-sync-enabled \
--replication-prefetch-tx-buffer-size 1000 \
--replication-commit-concurrency 100
database 'replicadb' {replica: true} successfully created
# Step 2. Unload replica from the database
Once database is unloaded, we can safely work on the files of that database.
$ immuadmin database unload replicadb
database 'replicadb' successfully unloaded
# Step 3. Copy files from other node
$ rsync -ave --delete \
<HEALTHY_REPLICA_HOST>:<HEALTHYREPLICA_DATA_DIR>/replicadb/ \
<NEW_REPLICA_HOST>:<NEW_REPLICA_DATA_DIR>/replicadb/
sending incremental file list
....
sent 590,357,187 bytes received 230 bytes 168,673,547.71 bytes/sec
total size is 590,212,158 speedup is 1.00
Note: if there are writes on the database happening during the sync, it is necessary to unload the source replica before copying files to avoid inconsistencies between database files.
# Step 4. Load database on new replica
$ immuadmin database load replicadb
database 'replicadb' successfully unloaded
# Recovering from a primary loss
Current immudb cluster setup requires the primary node to be always predefined. This means that in case of a primary node loss, it is necessary to manually promote a replica to become the primary node. Generally, electing the new primary depends on the number of available instances, their precommit state, and the replication-sync-acks setting on the primary.
# Step 1. Inspect states of all replicas in the cluster and choose the new primary node
$ immuclient login immudb
Password:
Successfully logged in
$ immuclient use replicadb
Now using replicadb
$ immuclient status
database: replicadb
txID: 734931
hash: 5e2f2feec159bc19c952a7a93832338a178936c5b258d0c906b7c145faf3a4b5
precommittedTxID: 734931
precommittedHash: 5e2f2feec159bc19c952a7a93832338a178936c5b258d0c906b7c145faf3a4b5
It's important to carefully choose the new primary node in order to avoid losing committed transactions.
It is generally a good idea to promote some instance as a primary that has already precommitted the largest
transaction contained in at least replication-sync-acks
instances.
In the following scenario, we consider a three-node cluster with an unreachable primary:
# state in replica1
precommittedTxID: 734931
precommittedHash: 5e2f2feec159bc19c952a7a93832338a178936c5b258d0c906b7c145faf3a4b5
# state in replica2
precommittedTxID: 734920
precommittedHash: 2a4f41c3d5b03ff014ca30b53d23ee3a098936c3b2a8a0d6e9b3b540cac166a1
In the event that the primary node becomes unavailable, a replica with a higher precommittedTxID should be chosen as the primary.
If replication-sync-acks
is 2, both replicas must acknowledge precommit before the primary can commit.
In the scenario above, this would mean 734920 was the most recent committed transaction. Therefore, replica2 could also be selected as the new primary.
# Step 2. Switch the selected replica to become new primary
$ immuadmin database update replicadb -p 3324 \
--replication-sync-enabled \
--replication-sync-acks 1 \
--replication-is-replica=false
database 'replicadb' {replica: false} successfully updated
Note that the number of required sync replicas may be temporarily lowered due to the loss of the primary node.
# Step 3. Switch other replicas to follow new primary
$ immuadmin database update replicadb -p 3325 \
--replication-primary-host 127.0.0.1 \
--replication-primary-port 3324 \
--replication-primary-database replicadb
# Step 4. Truncate precommitted transactions on other replicas if needed
It may happen that the new replica will reject synchronizing with the new primary. In such case immudb will report an error in logs:
immudb 2022/10/11 15:57:42 ERROR: replica precommit state at 'replicadb' diverged from primary's
To fix this issue the replica may need to discard precommited transactions.
This can be easily instructed with the flag replication-allow-tx-discarding
as follows:
$ immuadmin database update replicadb -p 3325 --replication-allow-tx-discarding
In the case immudb instance itself is run a replica, to fix that issue please restart immudb with the --replication-allow-tx-discarding
flag that will discard any transaction on the replica that has not yet been fully committed.
# Step 5. Start a new replica to restore original cluster size
Because the primary node was irrecoverably lost, a new replica should be spawned in its place. Please refer to the previous section dealing with the loss of replica for more details on how to add a replacement replica node.
# Step 6. Point immudb clients to the new primary node
Clients performing write operations should now be switched to the new primary node.
# Changing configuration of a locked primary database
In most cases the primary database can be easily updated and the change will be applied without the need for a restart. That way the primary node can change the number of required confirmations, enable/disable synchronous replication and even be converted to a replica.
There can be a situation though where the database is already blocked with writes waiting for confirmations from replicas. This could happen if replicas became unavailable or as a result of misconfiguration where the replicas quorum value was set to some large value.
In this situation trying to change the configuration of the database will block as well and will be unblocked once the database itself continues committing transactions.
If the database can not be fixed to restore commits (e.g. if it is impossible to add enough synced replicas quickly enough), the following workaround can be used (please note that it requires immudb restart):
- Update database settings, e.g. run
immuadmin database update
command - that operation will block indefinitely but will already persist new database settings - Restart the immudb database instance - upon restart, the configuration of the database is read and applied from persistent settings thus it will apply the configuration set in the previous step.
With this approach, the number of required confirmations can be lowered down to the correct value or disabled to switch to asynchronous replication.