# Fix AWS CloudWatch MaxConnectionsExceeded Alarm
Your CloudWatch alarm fires at 3 AM:
ALARM: "RDS-High-Connection-Count" in region us-east-1
Threshold: 80% of max_connections
Current: 92%And your application starts returning errors:
FATAL: remaining connection slots are reserved for non-replication superuser connectionsjava.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.The database has hit its maximum connection limit. New connections are rejected, causing application failures.
Real Scenario: E-Commerce Site Outage
An e-commerce company experienced intermittent outages during flash sales. Their RDS PostgreSQL instance (db.r5.large) has a default max_connections of 340. During peak traffic, CloudWatch showed connection counts hitting 340, and users couldn't complete purchases.
Initial diagnosis:
$ aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name DatabaseConnections \
--dimensions Name=DBInstanceIdentifier,Value=production-db \
--start-time 2026-04-23T00:00:00Z \
--end-time 2026-04-23T04:00:00Z \
--period 300 \
--statistics Maximum \
--region us-east-1Output showed connections peaked at 340 (the maximum).
Root cause: They had 8 application instances, each with a HikariCP pool of 50 connections:
Total connections = 8 instances × 50 pool size = 400 connections
But max_connections = 340The math didn't work.
Immediate Diagnosis
Step 1: Check Current Connection Count
PostgreSQL RDS:
# Get connection count via psql
psql -h production-db.xxxxx.us-east-1.rds.amazonaws.com \
-U postgres \
-c "SELECT count(*) as total_connections,
(SELECT setting FROM pg_settings WHERE name = 'max_connections') as max_connections,
round(100.0 * count(*) / (SELECT setting::int FROM pg_settings WHERE name = 'max_connections'), 2) as utilization_pct
FROM pg_stat_activity;"Output:
total_connections | max_connections | utilization_pct
-------------------+-----------------+----------------
312 | 340 | 91.76
(1 row)MySQL RDS:
mysql -h production-db.xxxxx.us-east-1.rds.amazonaws.com \
-u admin -p \
-e "SHOW STATUS LIKE 'Threads_connected';
SHOW VARIABLES LIKE 'max_connections';"Output:
``` Variable_name: Threads_connected Value: 285
Variable_name: max_connections Value: 340 ```
Step 2: Identify Connection Sources
PostgreSQL - Who's connecting?
SELECT
client_addr,
usename as username,
application_name,
state,
count(*) as connection_count,
array_agg(DISTINCT datname) as databases
FROM pg_stat_activity
WHERE client_addr IS NOT NULL
GROUP BY client_addr, usename, application_name, state
ORDER BY connection_count DESC;Output:
client_addr | username | application_name | state | connection_count | databases
----------------+----------+------------------+--------+------------------+-----------
10.0.1.45 | app_user | myapp-api | active | 48 | {production}
10.0.1.46 | app_user | myapp-api | active | 47 | {production}
10.0.1.47 | app_user | myapp-api | active | 47 | {production}
10.0.1.48 | app_user | myapp-api | active | 48 | {production}
10.0.2.100 | readonly | analytics | idle | 25 | {production}
10.0.3.50 | admin | psql | idle | 5 | {production}This shows 4 API servers with ~48 connections each, plus analytics and admin connections.
MySQL - Who's connecting?
SELECT
USER,
HOST,
DB,
COMMAND,
COUNT(*) as connection_count
FROM information_schema.PROCESSLIST
GROUP BY USER, HOST, DB, COMMAND
ORDER BY connection_count DESC;Step 3: Find Idle Connections
PostgreSQL - Connections idle for > 5 minutes:
SELECT
pid,
usename,
application_name,
client_addr,
now() - query_start AS idle_duration,
state,
query
FROM pg_stat_activity
WHERE state = 'idle'
AND (now() - query_start) > interval '5 minutes'
ORDER BY idle_duration DESC;Output:
pid | usename | application_name | client_addr | idle_duration | state | query
-------+----------+------------------+----------------+---------------+-------+-------
12345 | app_user | myapp-api | 10.0.1.45 | 00:15:23 | idle | SELECT * FROM users WHERE id = 123
12346 | app_user | myapp-api | 10.0.1.45 | 00:12:45 | idle | SELECT * FROM products WHERE id = 456
12347 | readonly | analytics | 10.0.2.100 | 00:45:12 | idle | SELECT COUNT(*) FROM ordersThese idle connections are likely from connection leaks.
Root Causes and Solutions
Cause 1: Connection Pool Too Large
Each application instance opens its own connection pool. If you have many instances, total connections exceed the database limit.
Calculate maximum pool size:
Max pool size = (DB max_connections × 0.8) / number_of_instancesFor db.r5.large (max_connections = 340) with 8 instances:
Max pool size = (340 × 0.8) / 8 = 34 connections per instanceFix HikariCP configuration:
# application.yml
spring:
datasource:
hikari:
maximum-pool-size: 30 # Reduced from 50
minimum-idle: 5
connection-timeout: 30000
idle-timeout: 600000
max-lifetime: 1800000
leak-detection-threshold: 60000 # Alert on leaksFix Node.js pg-pool:
```javascript const { Pool } = require('pg');
const pool = new Pool({ max: 30, // Maximum connections per instance min: 5, idleTimeoutMillis: 60000, connectionTimeoutMillis: 30000 }); ```
Cause 2: Connection Leaks
Application code doesn't properly release connections back to the pool.
Find leaked connections in PostgreSQL:
-- Find connections idle in transaction for > 1 minute
SELECT
pid,
usename,
application_name,
now() - xact_start AS transaction_duration,
query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
AND (now() - xact_start) > interval '1 minute'
ORDER BY transaction_duration DESC;Fix Java connection leak:
```java // WRONG - Leaks connection on exception public User getUser(Long id) { Connection conn = dataSource.getConnection(); PreparedStatement stmt = conn.prepareStatement("SELECT * FROM users WHERE id = ?"); stmt.setLong(1, id); ResultSet rs = stmt.executeQuery(); rs.next(); User user = mapUser(rs); conn.close(); // Never reached if exception above! return user; }
// CORRECT - try-with-resources public User getUser(Long id) { try (Connection conn = dataSource.getConnection(); PreparedStatement stmt = conn.prepareStatement("SELECT * FROM users WHERE id = ?")) { stmt.setLong(1, id); try (ResultSet rs = stmt.executeQuery()) { rs.next(); return mapUser(rs); } } // All resources automatically closed } ```
Fix Node.js connection leak:
```javascript // WRONG - Leaks on error async function getUser(id) { const client = await pool.connect(); const result = await client.query('SELECT * FROM users WHERE id = $1', [id]); client.release(); // Never reached if query throws return result.rows[0]; }
// CORRECT - Use finally async function getUser(id) { const client = await pool.connect(); try { const result = await client.query('SELECT * FROM users WHERE id = $1', [id]); return result.rows[0]; } finally { client.release(); // Always executed } }
// BETTER - Use pool.query directly async function getUser(id) { const result = await pool.query('SELECT * FROM users WHERE id = $1', [id]); return result.rows[0]; // Connection automatically managed } ```
Cause 3: Long-Running Transactions
Transactions holding connections for extended periods (batch jobs, reports).
Find long-running transactions:
SELECT
pid,
usename,
application_name,
now() - xact_start AS duration,
state,
left(query, 100) AS query_preview
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY duration DESC
LIMIT 10;Fix - Set transaction timeout:
-- PostgreSQL parameter group
idle_in_transaction_session_timeout = 300000 -- 5 minutes
statement_timeout = 60000 -- 1 minute per statement
lock_timeout = 30000 -- 30 seconds for locksCause 4: Too Many Application Instances
Horizontal scaling increased instances without adjusting pool sizes.
Calculate max instances:
Max instances = (DB max_connections × 0.8) / pool_size_per_instanceFor db.r5.large (340 max) with pool size 30:
Max instances = (340 × 0.8) / 30 = 9 instances- 1.Solutions:
- 2.Use PgBouncer for connection pooling
- 3.Scale vertically (larger instance = more connections)
- 4.Add read replicas for read traffic
Solution: Deploy PgBouncer
PgBouncer sits between applications and database, multiplexing connections:
Application (1000 connections) → PgBouncer (100 connections) → PostgreSQLDeploy PgBouncer on ECS:
# pgbouncer-task-definition.yaml
family: pgbouncer
containerDefinitions:
- name: pgbouncer
image: edoburu/pgbouncer:latest
essential: true
environment:
- name: DATABASE_URL
value: "postgres://app_user:password@production-db.xxxxx.rds.amazonaws.com/production"
- name: POOL_MODE
value: "transaction"
- name: MAX_CLIENT_CONN
value: "1000"
- name: DEFAULT_POOL_SIZE
value: "25"
- name: MIN_POOL_SIZE
value: "5"
- name: RESERVE_POOL_SIZE
value: "5"
- name: SERVER_IDLE_TIMEOUT
value: "300"
portMappings:
- containerPort: 5432
hostPort: 5432PgBouncer configuration file:
```ini [databases] production = host=production-db.xxxxx.rds.amazonaws.com port=5432 dbname=production
[pgbouncer] listen_addr = 0.0.0.0 listen_port = 5432 auth_type = scram-sha-256 auth_file = /etc/pgbouncer/userlist.txt pool_mode = transaction max_client_conn = 1000 default_pool_size = 25 min_pool_size = 5 reserve_pool_size = 5 reserve_pool_timeout = 3 server_idle_timeout = 300 client_idle_timeout = 0 client_lifetime = 0 ```
Update application to connect to PgBouncer:
spring:
datasource:
url: jdbc:postgresql://pgbouncer.internal:5432/production
hikari:
maximum-pool-size: 50 # Can be higher nowSolution: Increase Max Connections
Quick fix, but requires instance reboot for some parameters:
```bash # Create custom parameter group aws rds create-db-parameter-group \ --db-parameter-group-name custom-postgres15 \ --db-parameter-group-family postgres15 \ --description "Custom PostgreSQL 15 parameters"
# Modify max_connections aws rds modify-db-parameter-group \ --db-parameter-group-name custom-postgres15 \ --parameters "ParameterName=max_connections,ParameterValue=500,ApplyMethod=pending-reboot"
# Apply to instance aws rds modify-db-instance \ --db-instance-identifier production-db \ --db-parameter-group-name custom-postgres15 \ --apply-immediately
# Reboot instance aws rds reboot-db-instance \ --db-instance-identifier production-db ```
Warning: More connections require more memory. Each connection uses ~10MB of RAM.
Connection Limits by Instance Class
| Instance Class | vCPU | Memory | Default max_connections (PostgreSQL) |
|---|---|---|---|
| db.t3.micro | 2 | 1 GB | 55 |
| db.t3.small | 2 | 2 GB | 85 |
| db.t3.medium | 2 | 4 GB | 170 |
| db.t3.large | 2 | 8 GB | 170 |
| db.r5.large | 2 | 16 GB | 340 |
| db.r5.xlarge | 4 | 32 GB | 680 |
| db.r5.2xlarge | 8 | 64 GB | 1360 |
| db.r5.4xlarge | 16 | 128 GB | 2720 |
Formula: max_connections ≈ (Memory in MB × 0.4) / 10MB per connection
CloudWatch Alarm Configuration
Set up proactive alerting:
aws cloudwatch put-metric-alarm \
--alarm-name "RDS-High-Connection-Count" \
--alarm-description "Alert when database connections exceed 80% of max" \
--namespace AWS/RDS \
--metric-name DatabaseConnections \
--dimensions Name=DBInstanceIdentifier,Value=production-db \
--statistic Maximum \
--period 300 \
--evaluation-periods 2 \
--threshold 272 \
--comparison-operator GreaterThanThreshold \
--treat-missing-data notBreaching \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts \
--ok-actions arn:aws:sns:us-east-1:123456789012:alertsCalculate threshold: max_connections × 0.8 = 340 × 0.8 = 272
Emergency: Kill Idle Connections
If the database is completely blocked:
```sql -- PostgreSQL: Terminate idle connections older than 10 minutes SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND (now() - query_start) > interval '10 minutes' AND pid <> pg_backend_pid();
-- Or terminate all connections from a specific application SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE application_name = 'problematic-app' AND pid <> pg_backend_pid(); ```
-- MySQL: Kill idle connections
SELECT CONCAT('KILL ', id, ';')
FROM information_schema.PROCESSLIST
WHERE Command = 'Sleep'
AND Time > 600;Checklist for Connection Issues
- 1.Check connection count:
- 2.```bash
- 3.aws cloudwatch get-metric-statistics --metric-name DatabaseConnections ...
- 4.
` - 5.Identify connection sources:
- 6.```sql
- 7.SELECT client_addr, count(*) FROM pg_stat_activity GROUP BY client_addr;
- 8.
` - 9.Find idle connections:
- 10.```sql
- 11.SELECT count(*) FROM pg_stat_activity WHERE state = 'idle';
- 12.
` - 13.Verify pool configuration:
- 14.```bash
- 15.# Check HikariCP settings
- 16.curl http://localhost:8080/actuator/configprops | jq '.contexts.application.beans.hikariDataSource'
- 17.
` - 18.Calculate if pool size is appropriate:
- 19.
` - 20.Total connections = instances × pool_size
- 21.Should be < max_connections × 0.8
- 22.
` - 23.Check for connection leaks:
- 24.```sql
- 25.SELECT count(*) FROM pg_stat_activity
- 26.WHERE state = 'idle' AND (now() - query_start) > interval '5 minutes';
- 27.
`