# Fix AWS NAT Gateway Not Working

Instances in your private subnet can't reach the internet. They can't download packages, can't connect to external APIs, and can't pull container images. NAT gateways are essential for allowing private subnet instances to access the internet while preventing inbound connections, but when they don't work, your private infrastructure is effectively isolated.

Let's diagnose and fix NAT gateway connectivity issues.

Diagnosis Commands

First, check the NAT gateway status:

bash
aws ec2 describe-nat-gateways \
  --nat-gateway-id nat-1234567890abcdef0 \
  --query 'NatGateways[*].[NatGatewayId,State,SubnetId,VpcId]'

Get detailed NAT gateway info:

bash
aws ec2 describe-nat-gateways \
  --nat-gateway-id nat-1234567890abcdef0 \
  --query 'NatGateways[*].[NatGatewayId,State,NatGatewayAddresses[*].[AllocationId,NetworkInterfaceId,PublicIp,PrivateIp]]'

Check the Elastic IP allocation:

bash
aws ec2 describe-addresses \
  --filters "Name=association-id,Values=eipassoc-12345" \
  --query 'Addresses[*].[PublicIp,AllocationId,AssociationId,NetworkInterfaceId]'

Check the subnet where NAT gateway resides:

bash
aws ec2 describe-subnets \
  --subnet-ids subnet-public \
  --query 'Subnets[*].[SubnetId,VpcId,CidrBlock,AvailabilityZone,MapPublicIpOnLaunch]'

Check route tables for NAT gateway routes:

bash
aws ec2 describe-route-tables \
  --filters "Name=vpc-id,Values=vpc-12345" \
  --query 'RouteTables[*].[RouteTableId,Routes[?GatewayId!=`local` && NatGatewayId!=`null`].[DestinationCidrBlock,NatGatewayId,GatewayId]]'

Verify private subnet's route table:

bash
aws ec2 describe-route-tables \
  --route-table-ids rtb-private \
  --query 'RouteTables[*].Routes[*].[DestinationCidrBlock,GatewayId,NatGatewayId,State]'

Check CloudWatch metrics for NAT gateway:

bash
aws cloudwatch get-metric-statistics \
  --namespace AWS/NATGateway \
  --metric-name BytesOutToDestination \
  --dimensions Name=NatGatewayId,Value=nat-1234567890abcdef0 \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 300 \
  --statistics Sum \
  --output table

Common Causes and Solutions

NAT Gateway in Wrong Subnet

NAT gateway must be in a public subnet (one with a route to an internet gateway):

```bash # Check if NAT gateway subnet has IGW route NAT_SUBNET=$(aws ec2 describe-nat-gateways \ --nat-gateway-id nat-1234567890abcdef0 \ --query 'NatGateways[0].SubnetId' \ --output text)

# Find route tables associated with that subnet aws ec2 describe-route-tables \ --filters "Name=association.subnet-id,Values=$NAT_SUBNET" \ --query 'RouteTables[*].Routes[?DestinationCidrBlock==0.0.0.0/0]' ```

If the route points to an IGW, it's public. If not, the subnet isn't properly configured.

Fix by creating NAT gateway in a public subnet:

```bash # First, ensure you have a public subnet with IGW route # Create NAT gateway in public subnet aws ec2 create-nat-gateway \ --subnet-id subnet-public \ --allocation-id eipalloc-12345 \ --tag-specifications 'ResourceType=natgateway,Tags=[{Key=Name,Value=my-nat}]'

# Wait for it to be available aws ec2 wait nat-gateway-available --nat-gateway-id nat-new ```

Missing Route from Private Subnet

Private subnet needs a route pointing to the NAT gateway:

bash
# Get private subnet route table
aws ec2 describe-route-tables \
  --filters "Name=association.subnet-id,Values=subnet-private" \
  --query 'RouteTables[*].[RouteTableId,Routes]'

Add route to NAT gateway:

bash
aws ec2 create-route \
  --route-table-id rtb-private \
  --destination-cidr-block 0.0.0.0/0 \
  --nat-gateway-id nat-1234567890abcdef0

Or replace existing default route:

bash
aws ec2 replace-route \
  --route-table-id rtb-private \
  --destination-cidr-block 0.0.0.0/0 \
  --nat-gateway-id nat-1234567890abcdef0

NAT Gateway Not Available

NAT gateway state must be "available":

bash
aws ec2 describe-nat-gateways \
  --nat-gateway-id nat-1234567890abcdef0 \
  --query 'NatGateways[*].[State,FailureMessage]'

States: - pending: Being created - available: Ready to use - failed: Creation failed (check FailureMessage) - deleting/deleted: Being deleted

If failed, check the failure message and recreate:

```bash # Delete failed NAT gateway aws ec2 delete-nat-gateway --nat-gateway-id nat-failed

# Create new one aws ec2 create-nat-gateway \ --subnet-id subnet-public \ --allocation-id eipalloc-new ```

No Elastic IP Assigned

NAT gateway needs an Elastic IP:

bash
aws ec2 describe-nat-gateways \
  --nat-gateway-id nat-1234567890abcdef0 \
  --query 'NatGateways[*].NatGatewayAddresses[*].[PublicIp,AllocationId]'

If PublicIp is null, the EIP wasn't properly associated.

Allocate and associate a new EIP:

```bash # Allocate EIP EIP_ALLOC=$(aws ec2 allocate-address --domain vpc --query 'AllocationId' --output text)

# Create NAT gateway with EIP aws ec2 create-nat-gateway \ --subnet-id subnet-public \ --allocation-id $EIP_ALLOC ```

Security Group Blocking Traffic

NAT gateway uses an ENI (Elastic Network Interface). Check if security groups block traffic:

```bash # Get NAT gateway ENI ENI=$(aws ec2 describe-nat-gateways \ --nat-gateway-id nat-1234567890abcdef0 \ --query 'NatGateways[0].NatGatewayAddresses[0].NetworkInterfaceId' \ --output text)

# Check ENI security groups aws ec2 describe-network-interfaces \ --network-interface-ids $ENI \ --query 'NetworkInterfaces[0].Groups[*]' ```

NAT gateway doesn't filter traffic—security groups on your private instances do. But ensure private instances allow outbound traffic.

Check private instance security groups:

bash
aws ec2 describe-instances \
  --filters "Name=subnet-id,Values=subnet-private" \
  --query 'Reservations[*].Instances[*].[InstanceId,SecurityGroups[*].GroupId]'

Ensure outbound rules allow necessary traffic:

bash
aws ec2 describe-security-groups \
  --group-ids sg-private \
  --query 'SecurityGroups[0].IpPermissionsEgress'

Add outbound rule if missing:

bash
aws ec2 authorize-security-group-egress \
  --group-id sg-private \
  --ip-permissions '[{"IpProtocol":"-1","IpRanges":[{"CidrIp":"0.0.0.0/0"}]}]'

Internet Gateway Missing

The public subnet where NAT gateway sits needs an internet gateway route:

```bash # Check if VPC has IGW aws ec2 describe-vpcs \ --vpc-ids vpc-12345 \ --query 'Vpcs[*].CidrBlock,InternetGatewayId'

# Or describe IGWs for VPC aws ec2 describe-internet-gateways \ --filters "Name=attachment.vpc-id,Values=vpc-12345" \ --query 'InternetGateways[*].[InternetGatewayId,Attachments[*].State]' ```

Create IGW if missing:

```bash aws ec2 create-internet-gateway \ --tag-specifications 'ResourceType=internet-gateway,Tags=[{Key=Name,Value=my-igw}]'

aws ec2 attach-internet-gateway \ --vpc-id vpc-12345 \ --internet-gateway-id igw-new ```

Add route to IGW in public subnet:

bash
aws ec2 create-route \
  --route-table-id rtb-public \
  --destination-cidr-block 0.0.0.0/0 \
  --gateway-id igw-new

Source/Destination Check Not Disabled

For NAT instances (not NAT gateways), you must disable source/destination check. NAT gateways handle this automatically.

If you're using a NAT instance instead of NAT gateway:

bash
aws ec2 modify-instance-attribute \
  --instance-id i-nat-instance \
  --source-dest-check "{\"Value\": false}"

Wrong Availability Zone

NAT gateway only serves instances in its availability zone. If you have instances in multiple AZs without NAT gateway in each AZ, they can't access internet.

```bash # Check NAT gateway AZ aws ec2 describe-nat-gateways \ --nat-gateway-id nat-1234567890abcdef0 \ --query 'NatGateways[*].[SubnetId]' \ --output text

# Get AZ of subnet aws ec2 describe-subnets \ --subnet-ids subnet-nat \ --query 'Subnets[*].AvailabilityZone' ```

Create NAT gateway in each AZ:

```bash # Create NAT gateway in second AZ aws ec2 create-nat-gateway \ --subnet-id subnet-public-2 \ --allocation-id eipalloc-new-2

# Route second AZ private subnet to this NAT aws ec2 create-route \ --route-table-id rtb-private-2 \ --destination-cidr-block 0.0.0.0/0 \ --nat-gateway-id nat-new-2 ```

Connection Timeout Issues

NAT gateway has a 350-second timeout for idle connections. Long-running connections may timeout.

Check if traffic is flowing:

bash
aws cloudwatch get-metric-statistics \
  --namespace AWS/NATGateway \
  --metric-name ActiveConnectionCount \
  --dimensions Name=NatGatewayId,Value=nat-1234567890abcdef0 \
  --start-time $(date -u -d '30 minutes ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Average \
  --output table

For applications needing long-lived connections, implement keepalive:

```python # HTTP keepalive import requests session = requests.Session() session.keep_alive = True

# Socket keepalive import socket sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5) ```

Port Exhaustion

NAT gateway can handle up to 64,000 concurrent connections per destination. If exceeded, new connections fail.

Check connection count:

bash
aws cloudwatch get-metric-statistics \
  --namespace AWS/NATGateway \
  --metric-name ActiveConnectionCount \
  --dimensions Name=NatGatewayId,Value=nat-1234567890abcdef0 \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Maximum

If near 64,000, distribute across multiple NAT gateways:

```bash # Create second NAT gateway in same AZ aws ec2 create-nat-gateway \ --subnet-id subnet-public-1 \ --allocation-id eipalloc-new-2

# Use route tables to distribute traffic aws ec2 replace-route \ --route-table-id rtb-private-1b \ --destination-cidr-block 0.0.0.0/0 \ --nat-gateway-id nat-new-2 ```

Verification Steps

Test connectivity from private subnet:

```bash # SSH into private instance ssh -i my-key.pem ec2-user@private-instance-ip

# Test internet connectivity curl -I https://www.google.com ping -c 4 8.8.8.8

# Test package download sudo yum update --downloadonly

# For container workloads docker pull nginx ```

Or use Systems Manager to run commands:

bash
aws ssm send-command \
  --instance-ids i-private \
  --document-name AWS-RunShellScript \
  --parameters 'commands=["curl -I https://www.google.com","ping -c 4 8.8.8.8"]' \
  --output text

Verify NAT gateway metrics:

bash
# Check if traffic is flowing
aws cloudwatch get-metric-statistics \
  --namespace AWS/NATGateway \
  --metric-name BytesOutToDestination \
  --dimensions Name=NatGatewayId,Value=nat-1234567890abcdef0 \
  --start-time $(date -u -d '10 minutes ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Sum

NAT gateway diagnostic script:

```bash #!/bin/bash NAT_ID="nat-1234567890abcdef0" VPC_ID="vpc-12345"

echo "NAT Gateway Diagnostics" echo "======================="

echo "1. NAT Gateway Status:" aws ec2 describe-nat-gateways \ --nat-gateway-id $NAT_ID \ --query 'NatGateways[*].[NatGatewayId,State,SubnetId,NatGatewayAddresses[*].[PublicIp,PrivateIp]]'

echo "" echo "2. NAT Gateway Subnet (must be public):" NAT_SUBNET=$(aws ec2 describe-nat-gateways --nat-gateway-id $NAT_ID --query 'NatGateways[0].SubnetId' --output text) aws ec2 describe-subnets --subnet-ids $NAT_SUBNET --query 'Subnets[*].[SubnetId,CidrBlock,MapPublicIpOnLaunch]'

echo "" echo "3. Route from NAT subnet to Internet Gateway:" aws ec2 describe-route-tables \ --filters "Name=association.subnet-id,Values=$NAT_SUBNET" \ --query 'RouteTables[*].Routes[?DestinationCidrBlock==0.0.0.0/0]'

echo "" echo "4. Route tables with NAT gateway route:" aws ec2 describe-route-tables \ --filters "Name=vpc-id,Values=$VPC_ID" \ --query 'RouteTables[*].[RouteTableId,Routes[?NatGatewayId=='${NAT_ID}'].[DestinationCidrBlock,NatGatewayId]]'

echo "" echo "5. Private subnet associations:" aws ec2 describe-route-tables \ --filters "Name=vpc-id,Values=$VPC_ID" "Name=route.state,Values=active" \ --query 'RouteTables[*].[RouteTableId,Associations[*].SubnetId,Routes[?NatGatewayId=='${NAT_ID}'].NatGatewayId]'

echo "" echo "6. NAT Gateway Metrics (last hour):" aws cloudwatch get-metric-statistics \ --namespace AWS/NATGateway \ --metric-name BytesOutToDestination \ --dimensions Name=NatGatewayId,Value=$NAT_ID \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 300 \ --statistics Sum \ --output table ```

Set up monitoring:

bash
aws cloudwatch put-metric-alarm \
  --alarm-name nat-gateway-high-connections \
  --alarm-description "NAT gateway connection count high" \
  --namespace AWS/NATGateway \
  --metric-name ActiveConnectionCount \
  --dimensions Name=NatGatewayId,Value=$NAT_ID \
  --statistic Maximum \
  --period 60 \
  --threshold 50000 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts