Contents

Contents

Apache Kafka Replica Followers: Synchronous or Asynchronous?

Article on

Are Apache Kafka follower replicas synchronous or asynchronous?

This sounds like a simple question, but it is one of those Kafka topics where different materials often appear to contradict each other. Many Kafka training sessions, certification courses, and introductory explanations say that Kafka replication is asynchronous. At the same time, Confluent Platform documentation for Multi-Region Clusters, and multi-data-center architectures talks about synchronous and asynchronous replicas. In the replica placement terminology, replicas are sync replicas, while observers are async replicas.

So which answer is correct?

The short answer: it depends what you mean

If we talk about the low-level replication mechanism, Kafka followers replicate by fetching from the leader. They can lag. They can fall out of the ISR. They can catch up later. In this sense, it is understandable that people describe Kafka follower replication as asynchronous.

If we talk about Kafka's commit and durability semantics, followers that are members of the in-sync replica set, or ISR, are on the synchronous write path when producers use acks=all. A message is considered committed only when the ISR requirements are satisfied. In this sense, Kafka followers in the ISR are synchronous replicas.

That distinction becomes especially important in Confluent Platform stretched cluster and 2.5 data center architectures, where the terminology is explicit: normal replicas are synchronous replicas, while observers are asynchronous replicas by default.

The source of the confusion

The confusion usually comes from using the same word, synchronous, for two different questions:

  1. How does replication physically happen between brokers?
  2. When is a write considered committed or acknowledged?

Replication mechanics: why Kafka looks asynchronous

If we ask the first question, Kafka replication looks asynchronous. Followers fetch records from the leader. The leader does not push every record to every follower using a lock-step, two-phase-commit-style protocol. Followers may be temporarily behind the leader. A follower can be alive but not sufficiently caught up. A follower can be removed from the ISR and later rejoin after it catches up.

This is why training materials often say that Kafka replication is asynchronous. They are describing the data movement mechanism.

Commit semantics: why ISR followers are synchronous

But if we ask the second question, the answer changes. Kafka's own design documentation says that a message is considered committed only when all replicas in the ISR for that partition have applied it to their logs. The Kafka 4.3 documentation also says that messages are visible to consumers only after they are replicated to all in-sync replicas and the min.insync.replicas condition is met. See the Apache Kafka 4.3 docs on message delivery semantics and min.insync.replicas.

So the precise statement is not simply "Kafka followers are asynchronous" or "Kafka followers are synchronous". A better statement is:

Kafka follower replication is asynchronous-looking in its mechanics, because followers fetch and can lag, but ISR followers participate in synchronous commit semantics when the producer uses acks=all.

What really happens during a Kafka write

For each topic partition, Kafka has one leader replica and zero or more follower replicas. Producers write to the leader. The leader defines the order of records in the partition log. Followers replicate the leader's log.

A simplified write path looks like this:

  1. A producer sends a produce request to the partition leader.
  2. The leader appends the records to its local log.
  3. Follower replicas fetch records from the leader.
  4. Kafka tracks which replicas are sufficiently caught up. These replicas form the ISR.
  5. The high watermark advances when the ISR-based commit requirements are satisfied.
  6. Consumers read only committed records.
  7. With acks=all, the producer receives a successful acknowledgement only after the write satisfies the ISR requirements.

The important point is that not every assigned replica has the same role at every moment. A follower in the ISR is part of the durability guarantee. A follower outside the ISR is still a replica, but it is no longer part of the current commit decision.

This is why min.insync.replicas matters. The Kafka 4.3 broker configuration documentation says that min.insync.replicas specifies the minimum number of in-sync replicas, including the leader, required for a write to succeed when a producer sets acks=all or acks=-1. It also says that, in the acks=all case, every current ISR member must acknowledge a write for it to be considered successful. See min.insync.replicas in the Kafka 4.3 broker configs.

For example, suppose a topic has:

replication.factor=3
min.insync.replicas=2
producer acks=all

If enough replicas are in the ISR, Kafka can acknowledge the write after the ISR rules are satisfied. If the ISR shrinks below min.insync.replicas, the producer receives an error instead of a successful acknowledgement. This is the durability/availability trade-off controlled by acks and min.insync.replicas.

Why followers are often called asynchronous

Followers are often called asynchronous because Kafka does not require all assigned replicas to be perfectly up to date at every instant.

Several facts support this interpretation:

  • Followers fetch from the leader.
  • Followers can lag behind the leader.
  • Followers can be removed from the ISR if they are not sufficiently caught up.
  • Followers can later catch up and rejoin the ISR.
  • Producers can use weaker acknowledgement modes, such as acks=1, where the leader acknowledges after writing locally and does not wait for followers to replicate the data.

From this perspective, saying "Kafka replication is asynchronous" is understandable. It describes the mechanics of replication.

However, that statement becomes misleading if it ignores the ISR and producer acknowledgements. With acks=all, an in-sync follower is not just a passive background copy. It participates in the condition that must be satisfied before the producer receives a successful acknowledgement and before the record becomes visible as committed data.

So, if we want to be precise:

Kafka followers are not synchronous in the sense of perfect lock-step physical replication. They are synchronous in the sense that ISR followers are part of the commit path for acks=all writes.

Why Confluent 2.5 DC documentation calls followers synchronous

The terminology becomes clearer in Confluent Platform Multi-Region Clusters.

Confluent introduces a third type of replica called an observer. Historically, Kafka had leaders and followers. Multi-Region Clusters add observers. By default, observers do not join the ISR, but they try to keep up with the leader like followers.

This is the key distinction:

  • A normal replica in the ISR is a synchronous replica.
  • An observer is an asynchronous replica by default.

Confluent's documentation says that observers enable asynchronous replication because they do not join the ISR by default. The leader does not need to wait for observers before acknowledging the request back to the producer. See Confluent's Observers section.

The same terminology appears in replica placement. The replicas field contains constraints for sync replicas. The observers field contains constraints for async replicas. In one of the examples, Confluent describes a topic with three sync replicas in one rack and two observers in another rack. A producer with acks=all receives an acknowledgement after the sync replicas have replicated the message; the observers also replicate the data, but the leader does not wait for them.

That is why Confluent materials may say that followers are synchronous while observers are asynchronous. They are not claiming that followers never lag. They are classifying replicas based on whether they are normally part of the ISR-based acknowledgement path.

The 2.5 data center stretched cluster context

A stretched 2.5 data center architecture in Confluent Platform means two fully operational data centers plus one lightweight, or "0.5", data center.

The two full data centers run Confluent Server nodes configured as brokers and controllers. The lightweight third location runs a subset of KRaft controller nodes to maintain controller quorum. It is not a full data center with the same broker capacity as the other two locations.

In this architecture, Confluent describes RPO as configuration-dependent. To achieve RPO=0, the documentation says you need replicas in at least two data centers, and min.insync.replicas must be greater than the number of replicas in any given data center. The documentation also notes that producers must use acks=all to support this. See the Confluent Platform docs on Stretched Cluster 2.5 Data Center.

This is the context where the word "synchronous" becomes business-critical. If a write must be acknowledged only after enough replicas across data centers have persisted it, then those replicas are synchronous from the perspective of RPO=0. The cluster is using ISR-based commit semantics to make sure that a successful write survives the loss of a full data center.

Observers can still be useful in this architecture, especially with automatic observer promotion. But by default, observers are not in the ISR. They are not part of the normal producer acknowledgement path. They provide an asynchronous copy that can help with failover and recovery scenarios without adding the same latency cost to every write.

Summary

So, are Apache Kafka followers synchronous or asynchronous?

The best answer is:

Kafka followers are asynchronous in the mechanics of replication, but ISR followers are synchronous in the commit semantics of acks=all writes.

This explains the apparent contradiction between common Kafka training materials and Confluent Multi-Region Cluster documentation.

Training materials often say "asynchronous" because followers fetch from the leader, can lag, and can catch up later. That is a statement about how replication physically happens.

Confluent documentation says replicas are synchronous and observers are asynchronous because it classifies replicas by whether they participate in the ISR-based acknowledgement path. That is a statement about commit semantics and multi-region durability.

In a normal Kafka cluster, the key term is ISR. In a Confluent Multi-Region Cluster, the key distinction is replica vs observer.

A concise conclusion is:

Kafka's replication mechanism is pull-based and can show lag, which is why it is often described as asynchronous. But in-sync follower replicas are part of Kafka's synchronous commit path when producers use acks=all. In Confluent stretched cluster and 2.5 DC architectures, this is why followers are treated as synchronous replicas, while observers are treated as asynchronous replicas.

Documentation links

Blog Comments powered by Disqus.