# Fix AWS CloudWatch MaxConnectionsExceeded Alarm

Your CloudWatch alarm fires at 3 AM:

bash
ALARM: "RDS-High-Connection-Count" in region us-east-1
Threshold: 80% of max_connections
Current: 92%

And your application starts returning errors:

bash
FATAL: remaining connection slots are reserved for non-replication superuser connections
bash
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.

The database has hit its maximum connection limit. New connections are rejected, causing application failures.

Real Scenario: E-Commerce Site Outage

An e-commerce company experienced intermittent outages during flash sales. Their RDS PostgreSQL instance (db.r5.large) has a default max_connections of 340. During peak traffic, CloudWatch showed connection counts hitting 340, and users couldn't complete purchases.

Initial diagnosis:

bash
$ aws cloudwatch get-metric-statistics \
    --namespace AWS/RDS \
    --metric-name DatabaseConnections \
    --dimensions Name=DBInstanceIdentifier,Value=production-db \
    --start-time 2026-04-23T00:00:00Z \
    --end-time 2026-04-23T04:00:00Z \
    --period 300 \
    --statistics Maximum \
    --region us-east-1

Output showed connections peaked at 340 (the maximum).

Root cause: They had 8 application instances, each with a HikariCP pool of 50 connections:

bash
Total connections = 8 instances × 50 pool size = 400 connections
But max_connections = 340

The math didn't work.

Immediate Diagnosis

Step 1: Check Current Connection Count

PostgreSQL RDS:

bash
# Get connection count via psql
psql -h production-db.xxxxx.us-east-1.rds.amazonaws.com \
     -U postgres \
     -c "SELECT count(*) as total_connections, 
                (SELECT setting FROM pg_settings WHERE name = 'max_connections') as max_connections,
                round(100.0 * count(*) / (SELECT setting::int FROM pg_settings WHERE name = 'max_connections'), 2) as utilization_pct
         FROM pg_stat_activity;"

Output:

bash
total_connections | max_connections | utilization_pct
-------------------+-----------------+----------------
               312 |             340 |          91.76
(1 row)

MySQL RDS:

bash
mysql -h production-db.xxxxx.us-east-1.rds.amazonaws.com \
      -u admin -p \
      -e "SHOW STATUS LIKE 'Threads_connected'; 
          SHOW VARIABLES LIKE 'max_connections';"

Output:

``` Variable_name: Threads_connected Value: 285

Variable_name: max_connections Value: 340 ```

Step 2: Identify Connection Sources

PostgreSQL - Who's connecting?

sql
SELECT 
    client_addr,
    usename as username,
    application_name,
    state,
    count(*) as connection_count,
    array_agg(DISTINCT datname) as databases
FROM pg_stat_activity
WHERE client_addr IS NOT NULL
GROUP BY client_addr, usename, application_name, state
ORDER BY connection_count DESC;

Output:

bash
client_addr   | username | application_name | state  | connection_count | databases
----------------+----------+------------------+--------+------------------+-----------
 10.0.1.45      | app_user | myapp-api        | active |             48   | {production}
 10.0.1.46      | app_user | myapp-api        | active |             47   | {production}
 10.0.1.47      | app_user | myapp-api        | active |             47   | {production}
 10.0.1.48      | app_user | myapp-api        | active |             48   | {production}
 10.0.2.100     | readonly | analytics        | idle   |             25   | {production}
 10.0.3.50      | admin    | psql             | idle   |              5   | {production}

This shows 4 API servers with ~48 connections each, plus analytics and admin connections.

MySQL - Who's connecting?

sql
SELECT 
    USER,
    HOST,
    DB,
    COMMAND,
    COUNT(*) as connection_count
FROM information_schema.PROCESSLIST
GROUP BY USER, HOST, DB, COMMAND
ORDER BY connection_count DESC;

Step 3: Find Idle Connections

PostgreSQL - Connections idle for > 5 minutes:

sql
SELECT 
    pid,
    usename,
    application_name,
    client_addr,
    now() - query_start AS idle_duration,
    state,
    query
FROM pg_stat_activity
WHERE state = 'idle'
  AND (now() - query_start) > interval '5 minutes'
ORDER BY idle_duration DESC;

Output:

bash
pid  | usename  | application_name |  client_addr   | idle_duration | state | query
-------+----------+------------------+----------------+---------------+-------+-------
 12345 | app_user | myapp-api        | 10.0.1.45      | 00:15:23      | idle  | SELECT * FROM users WHERE id = 123
 12346 | app_user | myapp-api        | 10.0.1.45      | 00:12:45      | idle  | SELECT * FROM products WHERE id = 456
 12347 | readonly | analytics        | 10.0.2.100     | 00:45:12      | idle  | SELECT COUNT(*) FROM orders

These idle connections are likely from connection leaks.

Root Causes and Solutions

Cause 1: Connection Pool Too Large

Each application instance opens its own connection pool. If you have many instances, total connections exceed the database limit.

Calculate maximum pool size:

bash
Max pool size = (DB max_connections × 0.8) / number_of_instances

For db.r5.large (max_connections = 340) with 8 instances:

bash
Max pool size = (340 × 0.8) / 8 = 34 connections per instance

Fix HikariCP configuration:

yaml
# application.yml
spring:
  datasource:
    hikari:
      maximum-pool-size: 30      # Reduced from 50
      minimum-idle: 5
      connection-timeout: 30000
      idle-timeout: 600000
      max-lifetime: 1800000
      leak-detection-threshold: 60000  # Alert on leaks

Fix Node.js pg-pool:

```javascript const { Pool } = require('pg');

const pool = new Pool({ max: 30, // Maximum connections per instance min: 5, idleTimeoutMillis: 60000, connectionTimeoutMillis: 30000 }); ```

Cause 2: Connection Leaks

Application code doesn't properly release connections back to the pool.

Find leaked connections in PostgreSQL:

sql
-- Find connections idle in transaction for > 1 minute
SELECT 
    pid,
    usename,
    application_name,
    now() - xact_start AS transaction_duration,
    query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
  AND (now() - xact_start) > interval '1 minute'
ORDER BY transaction_duration DESC;

Fix Java connection leak:

```java // WRONG - Leaks connection on exception public User getUser(Long id) { Connection conn = dataSource.getConnection(); PreparedStatement stmt = conn.prepareStatement("SELECT * FROM users WHERE id = ?"); stmt.setLong(1, id); ResultSet rs = stmt.executeQuery(); rs.next(); User user = mapUser(rs); conn.close(); // Never reached if exception above! return user; }

// CORRECT - try-with-resources public User getUser(Long id) { try (Connection conn = dataSource.getConnection(); PreparedStatement stmt = conn.prepareStatement("SELECT * FROM users WHERE id = ?")) { stmt.setLong(1, id); try (ResultSet rs = stmt.executeQuery()) { rs.next(); return mapUser(rs); } } // All resources automatically closed } ```

Fix Node.js connection leak:

```javascript // WRONG - Leaks on error async function getUser(id) { const client = await pool.connect(); const result = await client.query('SELECT * FROM users WHERE id = $1', [id]); client.release(); // Never reached if query throws return result.rows[0]; }

// CORRECT - Use finally async function getUser(id) { const client = await pool.connect(); try { const result = await client.query('SELECT * FROM users WHERE id = $1', [id]); return result.rows[0]; } finally { client.release(); // Always executed } }

// BETTER - Use pool.query directly async function getUser(id) { const result = await pool.query('SELECT * FROM users WHERE id = $1', [id]); return result.rows[0]; // Connection automatically managed } ```

Cause 3: Long-Running Transactions

Transactions holding connections for extended periods (batch jobs, reports).

Find long-running transactions:

sql
SELECT 
    pid,
    usename,
    application_name,
    now() - xact_start AS duration,
    state,
    left(query, 100) AS query_preview
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY duration DESC
LIMIT 10;

Fix - Set transaction timeout:

sql
-- PostgreSQL parameter group
idle_in_transaction_session_timeout = 300000  -- 5 minutes
statement_timeout = 60000                      -- 1 minute per statement
lock_timeout = 30000                           -- 30 seconds for locks

Cause 4: Too Many Application Instances

Horizontal scaling increased instances without adjusting pool sizes.

Calculate max instances:

bash
Max instances = (DB max_connections × 0.8) / pool_size_per_instance

For db.r5.large (340 max) with pool size 30:

bash
Max instances = (340 × 0.8) / 30 = 9 instances
  1. 1.Solutions:
  2. 2.Use PgBouncer for connection pooling
  3. 3.Scale vertically (larger instance = more connections)
  4. 4.Add read replicas for read traffic

Solution: Deploy PgBouncer

PgBouncer sits between applications and database, multiplexing connections:

bash
Application (1000 connections) → PgBouncer (100 connections) → PostgreSQL

Deploy PgBouncer on ECS:

yaml
# pgbouncer-task-definition.yaml
family: pgbouncer
containerDefinitions:
  - name: pgbouncer
    image: edoburu/pgbouncer:latest
    essential: true
    environment:
      - name: DATABASE_URL
        value: "postgres://app_user:password@production-db.xxxxx.rds.amazonaws.com/production"
      - name: POOL_MODE
        value: "transaction"
      - name: MAX_CLIENT_CONN
        value: "1000"
      - name: DEFAULT_POOL_SIZE
        value: "25"
      - name: MIN_POOL_SIZE
        value: "5"
      - name: RESERVE_POOL_SIZE
        value: "5"
      - name: SERVER_IDLE_TIMEOUT
        value: "300"
    portMappings:
      - containerPort: 5432
        hostPort: 5432

PgBouncer configuration file:

```ini [databases] production = host=production-db.xxxxx.rds.amazonaws.com port=5432 dbname=production

[pgbouncer] listen_addr = 0.0.0.0 listen_port = 5432 auth_type = scram-sha-256 auth_file = /etc/pgbouncer/userlist.txt pool_mode = transaction max_client_conn = 1000 default_pool_size = 25 min_pool_size = 5 reserve_pool_size = 5 reserve_pool_timeout = 3 server_idle_timeout = 300 client_idle_timeout = 0 client_lifetime = 0 ```

Update application to connect to PgBouncer:

yaml
spring:
  datasource:
    url: jdbc:postgresql://pgbouncer.internal:5432/production
    hikari:
      maximum-pool-size: 50  # Can be higher now

Solution: Increase Max Connections

Quick fix, but requires instance reboot for some parameters:

```bash # Create custom parameter group aws rds create-db-parameter-group \ --db-parameter-group-name custom-postgres15 \ --db-parameter-group-family postgres15 \ --description "Custom PostgreSQL 15 parameters"

# Modify max_connections aws rds modify-db-parameter-group \ --db-parameter-group-name custom-postgres15 \ --parameters "ParameterName=max_connections,ParameterValue=500,ApplyMethod=pending-reboot"

# Apply to instance aws rds modify-db-instance \ --db-instance-identifier production-db \ --db-parameter-group-name custom-postgres15 \ --apply-immediately

# Reboot instance aws rds reboot-db-instance \ --db-instance-identifier production-db ```

Warning: More connections require more memory. Each connection uses ~10MB of RAM.

Connection Limits by Instance Class

Instance ClassvCPUMemoryDefault max_connections (PostgreSQL)
db.t3.micro21 GB55
db.t3.small22 GB85
db.t3.medium24 GB170
db.t3.large28 GB170
db.r5.large216 GB340
db.r5.xlarge432 GB680
db.r5.2xlarge864 GB1360
db.r5.4xlarge16128 GB2720

Formula: max_connections ≈ (Memory in MB × 0.4) / 10MB per connection

CloudWatch Alarm Configuration

Set up proactive alerting:

bash
aws cloudwatch put-metric-alarm \
    --alarm-name "RDS-High-Connection-Count" \
    --alarm-description "Alert when database connections exceed 80% of max" \
    --namespace AWS/RDS \
    --metric-name DatabaseConnections \
    --dimensions Name=DBInstanceIdentifier,Value=production-db \
    --statistic Maximum \
    --period 300 \
    --evaluation-periods 2 \
    --threshold 272 \
    --comparison-operator GreaterThanThreshold \
    --treat-missing-data notBreaching \
    --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts \
    --ok-actions arn:aws:sns:us-east-1:123456789012:alerts

Calculate threshold: max_connections × 0.8 = 340 × 0.8 = 272

Emergency: Kill Idle Connections

If the database is completely blocked:

```sql -- PostgreSQL: Terminate idle connections older than 10 minutes SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND (now() - query_start) > interval '10 minutes' AND pid <> pg_backend_pid();

-- Or terminate all connections from a specific application SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE application_name = 'problematic-app' AND pid <> pg_backend_pid(); ```

sql
-- MySQL: Kill idle connections
SELECT CONCAT('KILL ', id, ';') 
FROM information_schema.PROCESSLIST 
WHERE Command = 'Sleep' 
  AND Time > 600;

Checklist for Connection Issues

  1. 1.Check connection count:
  2. 2.```bash
  3. 3.aws cloudwatch get-metric-statistics --metric-name DatabaseConnections ...
  4. 4.`
  5. 5.Identify connection sources:
  6. 6.```sql
  7. 7.SELECT client_addr, count(*) FROM pg_stat_activity GROUP BY client_addr;
  8. 8.`
  9. 9.Find idle connections:
  10. 10.```sql
  11. 11.SELECT count(*) FROM pg_stat_activity WHERE state = 'idle';
  12. 12.`
  13. 13.Verify pool configuration:
  14. 14.```bash
  15. 15.# Check HikariCP settings
  16. 16.curl http://localhost:8080/actuator/configprops | jq '.contexts.application.beans.hikariDataSource'
  17. 17.`
  18. 18.Calculate if pool size is appropriate:
  19. 19.`
  20. 20.Total connections = instances × pool_size
  21. 21.Should be < max_connections × 0.8
  22. 22.`
  23. 23.Check for connection leaks:
  24. 24.```sql
  25. 25.SELECT count(*) FROM pg_stat_activity
  26. 26.WHERE state = 'idle' AND (now() - query_start) > interval '5 minutes';
  27. 27.`