Fix Kafka Broker Log Segment Rolling Latency Spikes

Introduction

Kafka brokers roll log segments when they reach a configured size or time threshold. During rolling, the broker closes the current active segment file and opens a new one. Under peak write throughput, this file operation can cause brief but noticeable write latency spikes that cascade into producer timeouts, especially when segment size is misconfigured for the traffic volume.

Symptoms

Producer request latency spikes coincide with log segment rollover events
Broker logs show Rolling new log segment for partition during peak hours
Producer timeout rate increases at regular intervals matching segment roll frequency
p99 produce latency jumps from single-digit milliseconds to hundreds of milliseconds
Error message: Expiring record batch due to timeout

Common Causes

log.segment.bytes set too small, causing frequent segment rolls under high throughput
log.roll.hours triggering time-based rolls during peak traffic windows
Disk I/O bottleneck during segment file close and index flush operations
Too many partitions per broker amplifying concurrent segment roll operations
File system page cache pressure from simultaneous segment creation and index building

Step-by-Step Fix

1.Check current segment configuration and roll frequency: Identify how often segments are rolling.
2.```bash
3.grep "log.segment" /etc/kafka/server.properties
4.ls -lhS /var/lib/kafka/data/my-topic-0/ | head -20
5.`
6.Increase log segment size to reduce roll frequency during peak hours: Size segments for the workload.
7.```properties
8.log.segment.bytes=1073741824
9.log.roll.hours=168
10.log.roll.jitter.hours=24
11.`
12.Stagger segment roll timing with jitter: Prevent all partitions from rolling simultaneously.
13.```properties
14.log.roll.jitter.hours=24
15.`
16.Optimize disk I/O for segment operations: Ensure the broker disk can handle concurrent segment writes.
17.```bash
18.# Check disk I/O during segment roll
19.iostat -x 1 10 | grep -A1 "kafka-data"
20.`
21.Monitor produce latency around segment roll events: Verify the fix reduces latency spikes.
22.```bash
23.kafka-broker-api-versions.sh --bootstrap-server localhost:9092
24.# Monitor via JMX: kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce
25.`

Prevention

Set log.segment.bytes to 1GB or higher for high-throughput topics to reduce roll frequency
Configure log.roll.jitter.hours to stagger segment rolls across partitions
Monitor disk IOPS and throughput to ensure the storage tier can handle peak write load plus segment roll overhead
Use dedicated fast storage (NVMe SSD) for Kafka data directories to minimize segment roll latency
Set log.roll.hours to 7 days to rely primarily on size-based rolling rather than time-based

Kafka Broker Log Segment Rolling During Peak Write Throughput

Introduction

Symptoms

Common Causes

Step-by-Step Fix

Prevention

Share this guide

More Kafka Troubleshooting Guides

Kafka Schema Registry Backward Compatibility Check Rejecting New Version

Kafka Consumer Offset Commit Failed Group Coordinator Not Available

Kafka SASL SCRAM Authentication Failed During Credentials Rotation

Kafka Compacted Topic Log Cleanup Removing Active Keys

Kafka Producer Idempotence Lost After Broker Crash

Kafka ISR Shrinking Due to Slow Follower Replication Lag