# Fix AWS NAT Gateway Not Working
Instances in your private subnet can't reach the internet. They can't download packages, can't connect to external APIs, and can't pull container images. NAT gateways are essential for allowing private subnet instances to access the internet while preventing inbound connections, but when they don't work, your private infrastructure is effectively isolated.
Let's diagnose and fix NAT gateway connectivity issues.
Diagnosis Commands
First, check the NAT gateway status:
aws ec2 describe-nat-gateways \
--nat-gateway-id nat-1234567890abcdef0 \
--query 'NatGateways[*].[NatGatewayId,State,SubnetId,VpcId]'Get detailed NAT gateway info:
aws ec2 describe-nat-gateways \
--nat-gateway-id nat-1234567890abcdef0 \
--query 'NatGateways[*].[NatGatewayId,State,NatGatewayAddresses[*].[AllocationId,NetworkInterfaceId,PublicIp,PrivateIp]]'Check the Elastic IP allocation:
aws ec2 describe-addresses \
--filters "Name=association-id,Values=eipassoc-12345" \
--query 'Addresses[*].[PublicIp,AllocationId,AssociationId,NetworkInterfaceId]'Check the subnet where NAT gateway resides:
aws ec2 describe-subnets \
--subnet-ids subnet-public \
--query 'Subnets[*].[SubnetId,VpcId,CidrBlock,AvailabilityZone,MapPublicIpOnLaunch]'Check route tables for NAT gateway routes:
aws ec2 describe-route-tables \
--filters "Name=vpc-id,Values=vpc-12345" \
--query 'RouteTables[*].[RouteTableId,Routes[?GatewayId!=`local` && NatGatewayId!=`null`].[DestinationCidrBlock,NatGatewayId,GatewayId]]'Verify private subnet's route table:
aws ec2 describe-route-tables \
--route-table-ids rtb-private \
--query 'RouteTables[*].Routes[*].[DestinationCidrBlock,GatewayId,NatGatewayId,State]'Check CloudWatch metrics for NAT gateway:
aws cloudwatch get-metric-statistics \
--namespace AWS/NATGateway \
--metric-name BytesOutToDestination \
--dimensions Name=NatGatewayId,Value=nat-1234567890abcdef0 \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 300 \
--statistics Sum \
--output tableCommon Causes and Solutions
NAT Gateway in Wrong Subnet
NAT gateway must be in a public subnet (one with a route to an internet gateway):
```bash # Check if NAT gateway subnet has IGW route NAT_SUBNET=$(aws ec2 describe-nat-gateways \ --nat-gateway-id nat-1234567890abcdef0 \ --query 'NatGateways[0].SubnetId' \ --output text)
# Find route tables associated with that subnet
aws ec2 describe-route-tables \
--filters "Name=association.subnet-id,Values=$NAT_SUBNET" \
--query 'RouteTables[*].Routes[?DestinationCidrBlock==0.0.0.0/0]'
```
If the route points to an IGW, it's public. If not, the subnet isn't properly configured.
Fix by creating NAT gateway in a public subnet:
```bash # First, ensure you have a public subnet with IGW route # Create NAT gateway in public subnet aws ec2 create-nat-gateway \ --subnet-id subnet-public \ --allocation-id eipalloc-12345 \ --tag-specifications 'ResourceType=natgateway,Tags=[{Key=Name,Value=my-nat}]'
# Wait for it to be available aws ec2 wait nat-gateway-available --nat-gateway-id nat-new ```
Missing Route from Private Subnet
Private subnet needs a route pointing to the NAT gateway:
# Get private subnet route table
aws ec2 describe-route-tables \
--filters "Name=association.subnet-id,Values=subnet-private" \
--query 'RouteTables[*].[RouteTableId,Routes]'Add route to NAT gateway:
aws ec2 create-route \
--route-table-id rtb-private \
--destination-cidr-block 0.0.0.0/0 \
--nat-gateway-id nat-1234567890abcdef0Or replace existing default route:
aws ec2 replace-route \
--route-table-id rtb-private \
--destination-cidr-block 0.0.0.0/0 \
--nat-gateway-id nat-1234567890abcdef0NAT Gateway Not Available
NAT gateway state must be "available":
aws ec2 describe-nat-gateways \
--nat-gateway-id nat-1234567890abcdef0 \
--query 'NatGateways[*].[State,FailureMessage]'States:
- pending: Being created
- available: Ready to use
- failed: Creation failed (check FailureMessage)
- deleting/deleted: Being deleted
If failed, check the failure message and recreate:
```bash # Delete failed NAT gateway aws ec2 delete-nat-gateway --nat-gateway-id nat-failed
# Create new one aws ec2 create-nat-gateway \ --subnet-id subnet-public \ --allocation-id eipalloc-new ```
No Elastic IP Assigned
NAT gateway needs an Elastic IP:
aws ec2 describe-nat-gateways \
--nat-gateway-id nat-1234567890abcdef0 \
--query 'NatGateways[*].NatGatewayAddresses[*].[PublicIp,AllocationId]'If PublicIp is null, the EIP wasn't properly associated.
Allocate and associate a new EIP:
```bash # Allocate EIP EIP_ALLOC=$(aws ec2 allocate-address --domain vpc --query 'AllocationId' --output text)
# Create NAT gateway with EIP aws ec2 create-nat-gateway \ --subnet-id subnet-public \ --allocation-id $EIP_ALLOC ```
Security Group Blocking Traffic
NAT gateway uses an ENI (Elastic Network Interface). Check if security groups block traffic:
```bash # Get NAT gateway ENI ENI=$(aws ec2 describe-nat-gateways \ --nat-gateway-id nat-1234567890abcdef0 \ --query 'NatGateways[0].NatGatewayAddresses[0].NetworkInterfaceId' \ --output text)
# Check ENI security groups aws ec2 describe-network-interfaces \ --network-interface-ids $ENI \ --query 'NetworkInterfaces[0].Groups[*]' ```
NAT gateway doesn't filter traffic—security groups on your private instances do. But ensure private instances allow outbound traffic.
Check private instance security groups:
aws ec2 describe-instances \
--filters "Name=subnet-id,Values=subnet-private" \
--query 'Reservations[*].Instances[*].[InstanceId,SecurityGroups[*].GroupId]'Ensure outbound rules allow necessary traffic:
aws ec2 describe-security-groups \
--group-ids sg-private \
--query 'SecurityGroups[0].IpPermissionsEgress'Add outbound rule if missing:
aws ec2 authorize-security-group-egress \
--group-id sg-private \
--ip-permissions '[{"IpProtocol":"-1","IpRanges":[{"CidrIp":"0.0.0.0/0"}]}]'Internet Gateway Missing
The public subnet where NAT gateway sits needs an internet gateway route:
```bash # Check if VPC has IGW aws ec2 describe-vpcs \ --vpc-ids vpc-12345 \ --query 'Vpcs[*].CidrBlock,InternetGatewayId'
# Or describe IGWs for VPC aws ec2 describe-internet-gateways \ --filters "Name=attachment.vpc-id,Values=vpc-12345" \ --query 'InternetGateways[*].[InternetGatewayId,Attachments[*].State]' ```
Create IGW if missing:
```bash aws ec2 create-internet-gateway \ --tag-specifications 'ResourceType=internet-gateway,Tags=[{Key=Name,Value=my-igw}]'
aws ec2 attach-internet-gateway \ --vpc-id vpc-12345 \ --internet-gateway-id igw-new ```
Add route to IGW in public subnet:
aws ec2 create-route \
--route-table-id rtb-public \
--destination-cidr-block 0.0.0.0/0 \
--gateway-id igw-newSource/Destination Check Not Disabled
For NAT instances (not NAT gateways), you must disable source/destination check. NAT gateways handle this automatically.
If you're using a NAT instance instead of NAT gateway:
aws ec2 modify-instance-attribute \
--instance-id i-nat-instance \
--source-dest-check "{\"Value\": false}"Wrong Availability Zone
NAT gateway only serves instances in its availability zone. If you have instances in multiple AZs without NAT gateway in each AZ, they can't access internet.
```bash # Check NAT gateway AZ aws ec2 describe-nat-gateways \ --nat-gateway-id nat-1234567890abcdef0 \ --query 'NatGateways[*].[SubnetId]' \ --output text
# Get AZ of subnet aws ec2 describe-subnets \ --subnet-ids subnet-nat \ --query 'Subnets[*].AvailabilityZone' ```
Create NAT gateway in each AZ:
```bash # Create NAT gateway in second AZ aws ec2 create-nat-gateway \ --subnet-id subnet-public-2 \ --allocation-id eipalloc-new-2
# Route second AZ private subnet to this NAT aws ec2 create-route \ --route-table-id rtb-private-2 \ --destination-cidr-block 0.0.0.0/0 \ --nat-gateway-id nat-new-2 ```
Connection Timeout Issues
NAT gateway has a 350-second timeout for idle connections. Long-running connections may timeout.
Check if traffic is flowing:
aws cloudwatch get-metric-statistics \
--namespace AWS/NATGateway \
--metric-name ActiveConnectionCount \
--dimensions Name=NatGatewayId,Value=nat-1234567890abcdef0 \
--start-time $(date -u -d '30 minutes ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics Average \
--output tableFor applications needing long-lived connections, implement keepalive:
```python # HTTP keepalive import requests session = requests.Session() session.keep_alive = True
# Socket keepalive import socket sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5) ```
Port Exhaustion
NAT gateway can handle up to 64,000 concurrent connections per destination. If exceeded, new connections fail.
Check connection count:
aws cloudwatch get-metric-statistics \
--namespace AWS/NATGateway \
--metric-name ActiveConnectionCount \
--dimensions Name=NatGatewayId,Value=nat-1234567890abcdef0 \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics MaximumIf near 64,000, distribute across multiple NAT gateways:
```bash # Create second NAT gateway in same AZ aws ec2 create-nat-gateway \ --subnet-id subnet-public-1 \ --allocation-id eipalloc-new-2
# Use route tables to distribute traffic aws ec2 replace-route \ --route-table-id rtb-private-1b \ --destination-cidr-block 0.0.0.0/0 \ --nat-gateway-id nat-new-2 ```
Verification Steps
Test connectivity from private subnet:
```bash # SSH into private instance ssh -i my-key.pem ec2-user@private-instance-ip
# Test internet connectivity curl -I https://www.google.com ping -c 4 8.8.8.8
# Test package download sudo yum update --downloadonly
# For container workloads docker pull nginx ```
Or use Systems Manager to run commands:
aws ssm send-command \
--instance-ids i-private \
--document-name AWS-RunShellScript \
--parameters 'commands=["curl -I https://www.google.com","ping -c 4 8.8.8.8"]' \
--output textVerify NAT gateway metrics:
# Check if traffic is flowing
aws cloudwatch get-metric-statistics \
--namespace AWS/NATGateway \
--metric-name BytesOutToDestination \
--dimensions Name=NatGatewayId,Value=nat-1234567890abcdef0 \
--start-time $(date -u -d '10 minutes ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics SumNAT gateway diagnostic script:
```bash #!/bin/bash NAT_ID="nat-1234567890abcdef0" VPC_ID="vpc-12345"
echo "NAT Gateway Diagnostics" echo "======================="
echo "1. NAT Gateway Status:" aws ec2 describe-nat-gateways \ --nat-gateway-id $NAT_ID \ --query 'NatGateways[*].[NatGatewayId,State,SubnetId,NatGatewayAddresses[*].[PublicIp,PrivateIp]]'
echo "" echo "2. NAT Gateway Subnet (must be public):" NAT_SUBNET=$(aws ec2 describe-nat-gateways --nat-gateway-id $NAT_ID --query 'NatGateways[0].SubnetId' --output text) aws ec2 describe-subnets --subnet-ids $NAT_SUBNET --query 'Subnets[*].[SubnetId,CidrBlock,MapPublicIpOnLaunch]'
echo ""
echo "3. Route from NAT subnet to Internet Gateway:"
aws ec2 describe-route-tables \
--filters "Name=association.subnet-id,Values=$NAT_SUBNET" \
--query 'RouteTables[*].Routes[?DestinationCidrBlock==0.0.0.0/0]'
echo ""
echo "4. Route tables with NAT gateway route:"
aws ec2 describe-route-tables \
--filters "Name=vpc-id,Values=$VPC_ID" \
--query 'RouteTables[*].[RouteTableId,Routes[?NatGatewayId=='${NAT_ID}'].[DestinationCidrBlock,NatGatewayId]]'
echo ""
echo "5. Private subnet associations:"
aws ec2 describe-route-tables \
--filters "Name=vpc-id,Values=$VPC_ID" "Name=route.state,Values=active" \
--query 'RouteTables[*].[RouteTableId,Associations[*].SubnetId,Routes[?NatGatewayId=='${NAT_ID}'].NatGatewayId]'
echo "" echo "6. NAT Gateway Metrics (last hour):" aws cloudwatch get-metric-statistics \ --namespace AWS/NATGateway \ --metric-name BytesOutToDestination \ --dimensions Name=NatGatewayId,Value=$NAT_ID \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \ --period 300 \ --statistics Sum \ --output table ```
Set up monitoring:
aws cloudwatch put-metric-alarm \
--alarm-name nat-gateway-high-connections \
--alarm-description "NAT gateway connection count high" \
--namespace AWS/NATGateway \
--metric-name ActiveConnectionCount \
--dimensions Name=NatGatewayId,Value=$NAT_ID \
--statistic Maximum \
--period 60 \
--threshold 50000 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts