Home / AWS / Fix AWS CloudWatch MaxConnectionsExceeded Alarm

AWS

Fix AWS CloudWatch MaxConnectionsExceeded Alarm

Resolve MaxConnectionsExceeded CloudWatch alarm for RDS and database instances with step-by-step diagnostics.

Today10 min read

Abstract illustration for a troubleshooting knowledge base category.

# Fix AWS CloudWatch MaxConnectionsExceeded Alarm

Your CloudWatch alarm fires at 3 AM:

bash

ALARM: "RDS-High-Connection-Count" in region us-east-1
Threshold: 80% of max_connections
Current: 92%

And your application starts returning errors:

bash

FATAL: remaining connection slots are reserved for non-replication superuser connections

bash

java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.

The database has hit its maximum connection limit. New connections are rejected, causing application failures.

Real Scenario: E-Commerce Site Outage

An e-commerce company experienced intermittent outages during flash sales. Their RDS PostgreSQL instance (db.r5.large) has a default max_connections of 340. During peak traffic, CloudWatch showed connection counts hitting 340, and users couldn't complete purchases.

Initial diagnosis:

bash

$ aws cloudwatch get-metric-statistics \
    --namespace AWS/RDS \
    --metric-name DatabaseConnections \
    --dimensions Name=DBInstanceIdentifier,Value=production-db \
    --start-time 2026-04-23T00:00:00Z \
    --end-time 2026-04-23T04:00:00Z \
    --period 300 \
    --statistics Maximum \
    --region us-east-1

Output showed connections peaked at 340 (the maximum).

Root cause: They had 8 application instances, each with a HikariCP pool of 50 connections:

bash

Total connections = 8 instances × 50 pool size = 400 connections
But max_connections = 340

The math didn't work.

Immediate Diagnosis

Step 1: Check Current Connection Count

PostgreSQL RDS:

bash

# Get connection count via psql
psql -h production-db.xxxxx.us-east-1.rds.amazonaws.com \
     -U postgres \
     -c "SELECT count(*) as total_connections, 
                (SELECT setting FROM pg_settings WHERE name = 'max_connections') as max_connections,
                round(100.0 * count(*) / (SELECT setting::int FROM pg_settings WHERE name = 'max_connections'), 2) as utilization_pct
         FROM pg_stat_activity;"

Output:

bash

total_connections | max_connections | utilization_pct
-------------------+-----------------+----------------
               312 |             340 |          91.76
(1 row)

MySQL RDS:

bash

mysql -h production-db.xxxxx.us-east-1.rds.amazonaws.com \
      -u admin -p \
      -e "SHOW STATUS LIKE 'Threads_connected'; 
          SHOW VARIABLES LIKE 'max_connections';"

Output:

``` Variable_name: Threads_connected Value: 285

Variable_name: max_connections Value: 340 ```

Step 2: Identify Connection Sources

PostgreSQL - Who's connecting?

sql

SELECT 
    client_addr,
    usename as username,
    application_name,
    state,
    count(*) as connection_count,
    array_agg(DISTINCT datname) as databases
FROM pg_stat_activity
WHERE client_addr IS NOT NULL
GROUP BY client_addr, usename, application_name, state
ORDER BY connection_count DESC;

Output:

bash

client_addr   | username | application_name | state  | connection_count | databases
----------------+----------+------------------+--------+------------------+-----------
 10.0.1.45      | app_user | myapp-api        | active |             48   | {production}
 10.0.1.46      | app_user | myapp-api        | active |             47   | {production}
 10.0.1.47      | app_user | myapp-api        | active |             47   | {production}
 10.0.1.48      | app_user | myapp-api        | active |             48   | {production}
 10.0.2.100     | readonly | analytics        | idle   |             25   | {production}
 10.0.3.50      | admin    | psql             | idle   |              5   | {production}

This shows 4 API servers with ~48 connections each, plus analytics and admin connections.

MySQL - Who's connecting?

sql

SELECT 
    USER,
    HOST,
    DB,
    COMMAND,
    COUNT(*) as connection_count
FROM information_schema.PROCESSLIST
GROUP BY USER, HOST, DB, COMMAND
ORDER BY connection_count DESC;

Step 3: Find Idle Connections

PostgreSQL - Connections idle for > 5 minutes:

sql

SELECT 
    pid,
    usename,
    application_name,
    client_addr,
    now() - query_start AS idle_duration,
    state,
    query
FROM pg_stat_activity
WHERE state = 'idle'
  AND (now() - query_start) > interval '5 minutes'
ORDER BY idle_duration DESC;

Output:

bash

pid  | usename  | application_name |  client_addr   | idle_duration | state | query
-------+----------+------------------+----------------+---------------+-------+-------
 12345 | app_user | myapp-api        | 10.0.1.45      | 00:15:23      | idle  | SELECT * FROM users WHERE id = 123
 12346 | app_user | myapp-api        | 10.0.1.45      | 00:12:45      | idle  | SELECT * FROM products WHERE id = 456
 12347 | readonly | analytics        | 10.0.2.100     | 00:45:12      | idle  | SELECT COUNT(*) FROM orders

These idle connections are likely from connection leaks.

Root Causes and Solutions

Cause 1: Connection Pool Too Large

Each application instance opens its own connection pool. If you have many instances, total connections exceed the database limit.

Calculate maximum pool size:

bash

Max pool size = (DB max_connections × 0.8) / number_of_instances

For db.r5.large (max_connections = 340) with 8 instances:

bash

Max pool size = (340 × 0.8) / 8 = 34 connections per instance

Fix HikariCP configuration:

yaml

# application.yml
spring:
  datasource:
    hikari:
      maximum-pool-size: 30      # Reduced from 50
      minimum-idle: 5
      connection-timeout: 30000
      idle-timeout: 600000
      max-lifetime: 1800000
      leak-detection-threshold: 60000  # Alert on leaks

Fix Node.js pg-pool:

```javascript const { Pool } = require('pg');

const pool = new Pool({ max: 30, // Maximum connections per instance min: 5, idleTimeoutMillis: 60000, connectionTimeoutMillis: 30000 }); ```

Cause 2: Connection Leaks

Application code doesn't properly release connections back to the pool.

Find leaked connections in PostgreSQL:

sql

-- Find connections idle in transaction for > 1 minute
SELECT 
    pid,
    usename,
    application_name,
    now() - xact_start AS transaction_duration,
    query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
  AND (now() - xact_start) > interval '1 minute'
ORDER BY transaction_duration DESC;

Fix Java connection leak:

```java // WRONG - Leaks connection on exception public User getUser(Long id) { Connection conn = dataSource.getConnection(); PreparedStatement stmt = conn.prepareStatement("SELECT * FROM users WHERE id = ?"); stmt.setLong(1, id); ResultSet rs = stmt.executeQuery(); rs.next(); User user = mapUser(rs); conn.close(); // Never reached if exception above! return user; }

// CORRECT - try-with-resources public User getUser(Long id) { try (Connection conn = dataSource.getConnection(); PreparedStatement stmt = conn.prepareStatement("SELECT * FROM users WHERE id = ?")) { stmt.setLong(1, id); try (ResultSet rs = stmt.executeQuery()) { rs.next(); return mapUser(rs); } } // All resources automatically closed } ```

Fix Node.js connection leak:

```javascript // WRONG - Leaks on error async function getUser(id) { const client = await pool.connect(); const result = await client.query('SELECT * FROM users WHERE id = $1', [id]); client.release(); // Never reached if query throws return result.rows[0]; }

// CORRECT - Use finally async function getUser(id) { const client = await pool.connect(); try { const result = await client.query('SELECT * FROM users WHERE id = $1', [id]); return result.rows[0]; } finally { client.release(); // Always executed } }

// BETTER - Use pool.query directly async function getUser(id) { const result = await pool.query('SELECT * FROM users WHERE id = $1', [id]); return result.rows[0]; // Connection automatically managed } ```

Cause 3: Long-Running Transactions

Transactions holding connections for extended periods (batch jobs, reports).

Find long-running transactions:

sql

SELECT 
    pid,
    usename,
    application_name,
    now() - xact_start AS duration,
    state,
    left(query, 100) AS query_preview
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY duration DESC
LIMIT 10;

Fix - Set transaction timeout:

sql

-- PostgreSQL parameter group
idle_in_transaction_session_timeout = 300000  -- 5 minutes
statement_timeout = 60000                      -- 1 minute per statement
lock_timeout = 30000                           -- 30 seconds for locks

Cause 4: Too Many Application Instances

Horizontal scaling increased instances without adjusting pool sizes.

Calculate max instances:

bash

Max instances = (DB max_connections × 0.8) / pool_size_per_instance

For db.r5.large (340 max) with pool size 30:

bash

Max instances = (340 × 0.8) / 30 = 9 instances

1.Solutions:
2.Use PgBouncer for connection pooling
3.Scale vertically (larger instance = more connections)
4.Add read replicas for read traffic

Solution: Deploy PgBouncer

PgBouncer sits between applications and database, multiplexing connections:

bash

Application (1000 connections) → PgBouncer (100 connections) → PostgreSQL

Deploy PgBouncer on ECS:

yaml

# pgbouncer-task-definition.yaml
family: pgbouncer
containerDefinitions:
  - name: pgbouncer
    image: edoburu/pgbouncer:latest
    essential: true
    environment:
      - name: DATABASE_URL
        value: "postgres://app_user:password@production-db.xxxxx.rds.amazonaws.com/production"
      - name: POOL_MODE
        value: "transaction"
      - name: MAX_CLIENT_CONN
        value: "1000"
      - name: DEFAULT_POOL_SIZE
        value: "25"
      - name: MIN_POOL_SIZE
        value: "5"
      - name: RESERVE_POOL_SIZE
        value: "5"
      - name: SERVER_IDLE_TIMEOUT
        value: "300"
    portMappings:
      - containerPort: 5432
        hostPort: 5432

PgBouncer configuration file:

```ini [databases] production = host=production-db.xxxxx.rds.amazonaws.com port=5432 dbname=production

[pgbouncer] listen_addr = 0.0.0.0 listen_port = 5432 auth_type = scram-sha-256 auth_file = /etc/pgbouncer/userlist.txt pool_mode = transaction max_client_conn = 1000 default_pool_size = 25 min_pool_size = 5 reserve_pool_size = 5 reserve_pool_timeout = 3 server_idle_timeout = 300 client_idle_timeout = 0 client_lifetime = 0 ```

Update application to connect to PgBouncer:

yaml

spring:
  datasource:
    url: jdbc:postgresql://pgbouncer.internal:5432/production
    hikari:
      maximum-pool-size: 50  # Can be higher now

Solution: Increase Max Connections

Quick fix, but requires instance reboot for some parameters:

```bash # Create custom parameter group aws rds create-db-parameter-group \ --db-parameter-group-name custom-postgres15 \ --db-parameter-group-family postgres15 \ --description "Custom PostgreSQL 15 parameters"

# Modify max_connections aws rds modify-db-parameter-group \ --db-parameter-group-name custom-postgres15 \ --parameters "ParameterName=max_connections,ParameterValue=500,ApplyMethod=pending-reboot"

# Apply to instance aws rds modify-db-instance \ --db-instance-identifier production-db \ --db-parameter-group-name custom-postgres15 \ --apply-immediately

# Reboot instance aws rds reboot-db-instance \ --db-instance-identifier production-db ```

Warning: More connections require more memory. Each connection uses ~10MB of RAM.

Connection Limits by Instance Class

Instance Class	vCPU	Memory	Default max_connections (PostgreSQL)
db.t3.micro	2	1 GB	55
db.t3.small	2	2 GB	85
db.t3.medium	2	4 GB	170
db.t3.large	2	8 GB	170
db.r5.large	2	16 GB	340
db.r5.xlarge	4	32 GB	680
db.r5.2xlarge	8	64 GB	1360
db.r5.4xlarge	16	128 GB	2720

Formula: max_connections ≈ (Memory in MB × 0.4) / 10MB per connection

CloudWatch Alarm Configuration

Set up proactive alerting:

bash

aws cloudwatch put-metric-alarm \
    --alarm-name "RDS-High-Connection-Count" \
    --alarm-description "Alert when database connections exceed 80% of max" \
    --namespace AWS/RDS \
    --metric-name DatabaseConnections \
    --dimensions Name=DBInstanceIdentifier,Value=production-db \
    --statistic Maximum \
    --period 300 \
    --evaluation-periods 2 \
    --threshold 272 \
    --comparison-operator GreaterThanThreshold \
    --treat-missing-data notBreaching \
    --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts \
    --ok-actions arn:aws:sns:us-east-1:123456789012:alerts

Calculate threshold: max_connections × 0.8 = 340 × 0.8 = 272

Emergency: Kill Idle Connections

If the database is completely blocked:

```sql -- PostgreSQL: Terminate idle connections older than 10 minutes SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND (now() - query_start) > interval '10 minutes' AND pid <> pg_backend_pid();

-- Or terminate all connections from a specific application SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE application_name = 'problematic-app' AND pid <> pg_backend_pid(); ```

sql

-- MySQL: Kill idle connections
SELECT CONCAT('KILL ', id, ';') 
FROM information_schema.PROCESSLIST 
WHERE Command = 'Sleep' 
  AND Time > 600;

Checklist for Connection Issues

1.Check connection count:
2.```bash
3.aws cloudwatch get-metric-statistics --metric-name DatabaseConnections ...
4.`
5.Identify connection sources:
6.```sql
7.SELECT client_addr, count(*) FROM pg_stat_activity GROUP BY client_addr;
8.`
9.Find idle connections:
10.```sql
11.SELECT count(*) FROM pg_stat_activity WHERE state = 'idle';
12.`
13.Verify pool configuration:
14.```bash
15.# Check HikariCP settings
16.curl http://localhost:8080/actuator/configprops | jq '.contexts.application.beans.hikariDataSource'
17.`
18.Calculate if pool size is appropriate:
19.`
20.Total connections = instances × pool_size
21.Should be < max_connections × 0.8
22.`
23.Check for connection leaks:
24.```sql
25.SELECT count(*) FROM pg_stat_activity
26.WHERE state = 'idle' AND (now() - query_start) > interval '5 minutes';
27.`

Related search paths

People also search for

If the symptom is close but not identical, these search paths usually surface the right neighboring fixes faster than scrolling the full archive.

AWS CloudWatch MaxConnectionsExceeded Alarm AWS CloudWatch MaxConnectionsExceeded Alarm AWS AWS CloudWatch MaxConnectionsExceeded Alarm troubleshooting AWS CloudWatch MaxConnectionsExceeded Alarm fix Resolve MaxConnectionsExceeded CloudWatch alarm for RDS and database instances with step-by-step diagnostics AWS Resolve MaxConnectionsExceeded CloudWatch alarm for RDS and database instances with step-by-step diagnostics

Explore Related Topics

Browse Guides from Other Categories

Discover troubleshooting guides from related categories to expand your knowledge.

FAQ

AWS Troubleshooting FAQs

Common questions about troubleshooting and preventing similar issues

How do I know if this aws-errors troubleshooting guide applies to my situation?

This guide is designed for aws-errors issues. If you're experiencing similar symptoms described in the article, follow the step-by-step instructions. Start with the most common causes and work through the diagnostic process.

Is it safe to follow these aws-errors troubleshooting steps?

Yes, all steps are designed to be safe and non-destructive. We recommend creating backups before making significant changes and testing each step before proceeding to the next.

How long does it typically take to resolve this type of aws-errors issue?

Most aws-errors issues can be resolved within 30 minutes to 2 hours, depending on the complexity and root cause. Follow the troubleshooting flow to identify and fix the problem efficiently.

How can I prevent this aws-errors issue from happening again?

Regular maintenance, monitoring, and following best practices for aws-errors configuration can help prevent recurrence. Consider implementing automated checks and alerts for early detection.

Fix AWS CloudWatch MaxConnectionsExceeded Alarm

Real Scenario: E-Commerce Site Outage

Immediate Diagnosis

Step 1: Check Current Connection Count

Step 2: Identify Connection Sources

Step 3: Find Idle Connections

Root Causes and Solutions

Cause 1: Connection Pool Too Large

Cause 2: Connection Leaks

Cause 3: Long-Running Transactions

Cause 4: Too Many Application Instances

Solution: Deploy PgBouncer

Solution: Increase Max Connections

Connection Limits by Instance Class

CloudWatch Alarm Configuration

Emergency: Kill Idle Connections

Checklist for Connection Issues

People also search for

Browse Guides from Other Categories

VS Code

Apache

Java

AWS Troubleshooting FAQs

Fix AWS CloudWatch MaxConnectionsExceeded Alarm

Real Scenario: E-Commerce Site Outage

Immediate Diagnosis

Step 1: Check Current Connection Count

Step 2: Identify Connection Sources

Step 3: Find Idle Connections

Root Causes and Solutions

Cause 1: Connection Pool Too Large

Cause 2: Connection Leaks

Cause 3: Long-Running Transactions

Cause 4: Too Many Application Instances

Solution: Deploy PgBouncer

Solution: Increase Max Connections

Connection Limits by Instance Class

CloudWatch Alarm Configuration

Emergency: Kill Idle Connections

Checklist for Connection Issues

People also search for

Share this guide

More AWS Troubleshooting Guides

Browse Guides from Other Categories

AWS Troubleshooting FAQs