Security
Apr 17, 2026

CVE-2026-35554: Apache Kafka Producer Message Corruption and Silent Misrouting (Buffer Pool Race Condition)

How a Kafka Producer Race Condition Leads to Undetected Data Corruption and Unauthorized Topic Exposure

Give me the TL;DR
CVE-2026-35554: Apache Kafka Producer Message Corruption and Silent Misrouting (Buffer Pool Race Condition)
For Qualys admins, NES for .NET directly resolves the EOL/Obsolete Software:   Microsoft .NET Version 6 Detected vulnerability, ensuring your systems remain secure and compliant. Fill out the form to get pricing details and learn more.

Apache Kafka's Java producer client manages message batches through a shared buffer pool. CVE-2026-35554 is a race condition in that buffer pool: when a produce batch expires via delivery.timeout.ms while its network request is still in flight, the batch's ByteBuffer is prematurely returned to the pool. A subsequent batch destined for a different topic can reuse that buffer before the original request completes. When it does, the buffer contents become a mix of both batches' data, and the corrupted payload is delivered to the wrong topic. No error is raised. The send callback reports success. The producer has no idea anything went wrong.

This makes CVE-2026-35554 particularly damaging in environments where Kafka topics carry different security classifications. When data silently crosses a topic boundary, it reaches every consumer authorized on the destination topic, regardless of whether those consumers are authorized to see the source data. The producer's audit log shows clean, successful sends throughout. You find out when consumers start logging deserialization errors on topics that shouldn't be receiving those message formats.

The vulnerability affects kafka-clients >= 2.8.0 and < 3.9.2, >= 4.0.0 and < 4.0.2, and >= 4.1.0 and < 4.1.2. Fix versions are 3.9.2, 4.0.2, 4.1.2, and 4.2.0+. For teams on 2.8.x through 3.8.x, there is no OSS fix: the only remediation path is upgrading to 3.9.2 or a 4.x release.

TL;DR

Who is affected

The vulnerability is in org.apache.kafka:kafka-clients, the Maven artifact used by Java applications to produce messages to Kafka. Any service that publishes messages to a Kafka topic is in scope.

Teams on 2.8.x through 3.8.x have no in-branch fix. The Apache Kafka project published patches only for 3.9.x, 4.0.x, and 4.1.x.

Kafka as a transitive dependency. kafka-clients is pulled in by spring-kafka, Kafka Streams, Apache Flink Kafka connectors, Apache Spark Streaming, and most microservice frameworks with Kafka integration. Run mvn dependency:tree | grep kafka-clients or gradle dependencies | grep kafka-clients to confirm what's in your runtime classpath.

What the vulnerability actually does

The Java producer's buffer pool recycles ByteBuffer allocations across message batches for efficiency. The race condition is triggered by any call to failBatch on an in-flight batch. Two conditions are confirmed to cause this: batch expiry via delivery.timeout.ms and broker disconnection. Importantly, the bug reproduces with the default client configuration. This is not a misconfiguration problem. Any producer on an affected version is exposed under normal operational conditions.

When failBatch fires, the client returns the batch's ByteBuffer to the pool immediately, without waiting for the in-flight network request to complete. If a new batch destined for a different topic allocates that same buffer before the original request finishes, both operations share the same memory. The original request writes Batch A's data into a buffer that now belongs to Batch B. The corrupted buffer is then sent to Batch B's topic.

The misrouted message is a duplicate of a batch the producer already reported as failed. No acknowledged write is lost.

Two structural factors together make this a misrouting bug rather than a duplicate on the intended topic. First, Kafka's per-record CRC validation does not cover the topic name, so a payload arriving on the wrong topic produces no checksum failure at the broker. Second, the 2.8.0 refactor moved topic and partition routing metadata into a separate serialization buffer from the record payload, decoupling content from routing. This is also why the affected range begins at 2.8.0 specifically.

CVSS 8.7 vector:

  • AV:N: network-accessible; any application producing to Kafka can trigger this
  • AC:H: high complexity; requires specific timing overlap between timeout and in-flight request
  • PR:N: no privileges required
  • UI:N: no user interaction required
  • S:C: scope changed; impact extends beyond the producer into the Kafka topic ecosystem and its consumers
  • C:H / I:H: full confidentiality and integrity impact via unauthorized topic delivery and data corruption
  • A:N: no availability impact

How to confirm exposure:

  • Check your kafka-clients version against the affected ranges. Version is your only reliable indicator.
  • Consumer-side deserialization errors and unexpected message formats on the wrong topics are symptoms, but do not rely on their absence as confirmation you have not been affected. The original reproduction of this bug took months to achieve: it required a 5-broker cluster, CPU stress on the producer host, scripted broker restarts, and packet capture to identify the corrupted client-to-broker packet. Single-broker staging setups never triggered it. When it did occur, it happened in bursts separated by hours or days of completely clean operation.
  • A clean monitoring dashboard is not evidence of non-exposure on affected versions. Proactive patching is the only reliable defense. Do not wait for symptoms.

Remediation

If you are on a supported Kafka version (3.9.x, 4.0.x, or 4.1.x)

Upgrade to the fix version for your branch:

  • Kafka 3.9.x: upgrade to 3.9.2
  • Kafka 4.0.x: upgrade to 4.0.2
  • Kafka 4.1.x: upgrade to 4.1.2
  • Kafka 4.2.0 and later: not affected

These are patch-level upgrades within your current branch. Kafka's client-broker compatibility matrix guarantees that newer clients work with existing broker clusters, so in most deployments you can upgrade your producer clients independently without a broker upgrade. Test in a staging environment first, then rolling-deploy to production.

If you are running EOL Kafka branches (2.8.x through 3.8.x)

There is no in-branch fix for any version before 3.9.x. Upgrading to 3.9.2 is the lower-friction path for teams on 3.0.x through 3.8.x. Upgrading to 4.x is also an option.

If migrating is not feasible on your current timeline, contact HeroDevs for more information about our NES solutions.

Interim risk reduction (not a fix): Increasing delivery.timeout.ms reduces the probability of triggering the race condition. This does not eliminate the vulnerability. Use it only as a temporary measure while you plan the upgrade.

Related vulnerabilities:

Table of Contents
Author
Mark Szymanski
Technical Product Manager / Product Owner (Java)
Open Source Insights Delivered Monthly