Introduction
The Spring Boot Actuator /actuator/health endpoint aggregates the health status of all registered HealthIndicator beans -- database connections, disk space, mail servers, custom services, etc. When any one of these indicators is slow (e.g., a database connection pool taking 10 seconds to validate a connection), the entire health endpoint blocks until all indicators complete. This causes load balancer health checks to timeout, leading to unnecessary pod restarts in Kubernetes and false-negative alerts in monitoring systems.
Symptoms
The health endpoint takes several seconds:
$ curl -w "%{time_total}s" http://localhost:8080/actuator/health
{"status":"UP","components":{...}}
12.543sKubernetes events show health check failures:
Warning Unhealthy 30s kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 30s kubelet Liveness probe failed: Get "http://10.0.1.50:8080/actuator/health": context deadline exceededThe slow indicator is visible in the response:
{
"status": "UP",
"components": {
"db": {
"status": "UP",
"details": {
"database": "PostgreSQL",
"validationQuery": "isValid"
}
},
"diskSpace": {
"status": "UP",
"details": {
"total": 107374182400,
"free": 53687091200
}
},
"mail": {
"status": "UNKNOWN",
"details": {
"error": "java.net.SocketTimeoutException: connect timed out"
}
}
}
}Common Causes
- Database health check validates connection: Default
DataSourceHealthIndicatorcallsConnection.isValid()which may block - Remote service health check with no timeout: Custom
HealthIndicatorcalls an external service without timeout - Too many health indicators: Each indicator adds latency, and they run sequentially
- Disk space check scanning large directories: Default
DiskSpaceHealthIndicatorchecks the root path which may be on a slow NFS mount - Mail server health check:
MailHealthIndicatortries to connect to a mail server that is slow or unreachable - Liveness vs readiness not separated: Kubernetes uses the same endpoint for both probes with different timeout requirements
Step-by-Step Fix
Step 1: Add timeout to health indicators
```java @Configuration public class HealthCheckConfig {
@Bean @ConditionalOnBean(DataSource.class) public DataSourceHealthContributor dataSourceHealthContributor( ApplicationContext context) { return new DataSourceHealthContributor(context) { @Override protected HealthIndicator createIndicator(DataSource source) { DataSourceHealthIndicator indicator = new DataSourceHealthIndicator(source, "SELECT 1"); indicator.setTimeout(3000); // 3 second timeout return indicator; } }; } } ```
Step 2: Separate liveness and readiness probes
```java @Configuration public class ProbeConfig {
@Bean public HealthGroupEndpointCustomizer healthGroupCustomizer() { return config -> { // Readiness: only check database and app initialization config.addGroup("readiness", "db", "ping");
// Liveness: minimal check, just the app itself config.addGroup("liveness", "ping"); }; } } ```
Then configure Kubernetes probes:
```yaml livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 initialDelaySeconds: 30 timeoutSeconds: 3 periodSeconds: 10
readinessProbe: httpGet: path: /actuator/health/readiness port: 8080 initialDelaySeconds: 10 timeoutSeconds: 5 periodSeconds: 10 ```
Step 3: Disable unnecessary health indicators
management:
health:
defaults:
enabled: true
mail:
enabled: false # Skip mail server check
elasticsearch:
enabled: false # Skip Elasticsearch check
db:
enabled: true
diskspace:
enabled: true
path: /app/data # Check specific path, not rootStep 4: Create a fast custom health indicator
```java @Component public class AppStartupHealthIndicator implements HealthIndicator {
private volatile boolean initialized = false;
@EventListener(ApplicationReadyEvent.class) public void onApplicationReady() { this.initialized = true; }
@Override public Health health() { if (initialized) { return Health.up().build(); } return Health.down().withDetail("reason", "Application not yet initialized").build(); } } ```
Prevention
- Always set timeouts on health indicators that call external services
- Use separate health groups for liveness (fast) and readiness (thorough) probes
- Disable health indicators for services that are not critical to application operation
- Monitor health endpoint response time in APM and alert on p99 > 1 second
- Use
management.endpoint.health.show-details=when-authorizedin production - Add a startup endpoint (
/actuator/health/startup) for Kubernetes startup probes