Introduction

The Spring Boot Actuator /actuator/health endpoint aggregates the health status of all registered HealthIndicator beans -- database connections, disk space, mail servers, custom services, etc. When any one of these indicators is slow (e.g., a database connection pool taking 10 seconds to validate a connection), the entire health endpoint blocks until all indicators complete. This causes load balancer health checks to timeout, leading to unnecessary pod restarts in Kubernetes and false-negative alerts in monitoring systems.

Symptoms

The health endpoint takes several seconds:

bash
$ curl -w "%{time_total}s" http://localhost:8080/actuator/health
{"status":"UP","components":{...}}
12.543s

Kubernetes events show health check failures:

bash
Warning  Unhealthy  30s  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 503
Warning  Unhealthy  30s  kubelet  Liveness probe failed: Get "http://10.0.1.50:8080/actuator/health": context deadline exceeded

The slow indicator is visible in the response:

json
{
  "status": "UP",
  "components": {
    "db": {
      "status": "UP",
      "details": {
        "database": "PostgreSQL",
        "validationQuery": "isValid"
      }
    },
    "diskSpace": {
      "status": "UP",
      "details": {
        "total": 107374182400,
        "free": 53687091200
      }
    },
    "mail": {
      "status": "UNKNOWN",
      "details": {
        "error": "java.net.SocketTimeoutException: connect timed out"
      }
    }
  }
}

Common Causes

  • Database health check validates connection: Default DataSourceHealthIndicator calls Connection.isValid() which may block
  • Remote service health check with no timeout: Custom HealthIndicator calls an external service without timeout
  • Too many health indicators: Each indicator adds latency, and they run sequentially
  • Disk space check scanning large directories: Default DiskSpaceHealthIndicator checks the root path which may be on a slow NFS mount
  • Mail server health check: MailHealthIndicator tries to connect to a mail server that is slow or unreachable
  • Liveness vs readiness not separated: Kubernetes uses the same endpoint for both probes with different timeout requirements

Step-by-Step Fix

Step 1: Add timeout to health indicators

```java @Configuration public class HealthCheckConfig {

@Bean @ConditionalOnBean(DataSource.class) public DataSourceHealthContributor dataSourceHealthContributor( ApplicationContext context) { return new DataSourceHealthContributor(context) { @Override protected HealthIndicator createIndicator(DataSource source) { DataSourceHealthIndicator indicator = new DataSourceHealthIndicator(source, "SELECT 1"); indicator.setTimeout(3000); // 3 second timeout return indicator; } }; } } ```

Step 2: Separate liveness and readiness probes

```java @Configuration public class ProbeConfig {

@Bean public HealthGroupEndpointCustomizer healthGroupCustomizer() { return config -> { // Readiness: only check database and app initialization config.addGroup("readiness", "db", "ping");

// Liveness: minimal check, just the app itself config.addGroup("liveness", "ping"); }; } } ```

Then configure Kubernetes probes:

```yaml livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 initialDelaySeconds: 30 timeoutSeconds: 3 periodSeconds: 10

readinessProbe: httpGet: path: /actuator/health/readiness port: 8080 initialDelaySeconds: 10 timeoutSeconds: 5 periodSeconds: 10 ```

Step 3: Disable unnecessary health indicators

yaml
management:
  health:
    defaults:
      enabled: true
    mail:
      enabled: false      # Skip mail server check
    elasticsearch:
      enabled: false      # Skip Elasticsearch check
    db:
      enabled: true
    diskspace:
      enabled: true
      path: /app/data     # Check specific path, not root

Step 4: Create a fast custom health indicator

```java @Component public class AppStartupHealthIndicator implements HealthIndicator {

private volatile boolean initialized = false;

@EventListener(ApplicationReadyEvent.class) public void onApplicationReady() { this.initialized = true; }

@Override public Health health() { if (initialized) { return Health.up().build(); } return Health.down().withDetail("reason", "Application not yet initialized").build(); } } ```

Prevention

  • Always set timeouts on health indicators that call external services
  • Use separate health groups for liveness (fast) and readiness (thorough) probes
  • Disable health indicators for services that are not critical to application operation
  • Monitor health endpoint response time in APM and alert on p99 > 1 second
  • Use management.endpoint.health.show-details=when-authorized in production
  • Add a startup endpoint (/actuator/health/startup) for Kubernetes startup probes