Introduction

An EC2 system status check failure usually means AWS detected a host-level problem such as networking impairment, power issues, or hardware trouble on the underlying infrastructure. That is different from an instance status check failure, which points more toward the guest OS. If the system check is the one failing, rebooting the operating system often does nothing because the problem is below the VM.

Symptoms

  • EC2 shows a failed system status check
  • The instance is unreachable even though the OS looked healthy before the incident
  • Reboot does not clear the impairment
  • CloudWatch and the EC2 console indicate host or system-level trouble rather than an application issue

Common Causes

  • AWS detected hardware or network impairment on the physical host
  • The instance needs to be moved to new infrastructure
  • The root issue is below the guest OS, so an in-OS restart cannot fix it
  • Recovery is delayed because teams focus on application logs instead of the status check type

Step-by-Step Fix

  1. 1.Confirm that the failing check is the system check, not only the instance check
  2. 2.The remediation differs depending on which status check is impaired.
bash
aws ec2 describe-instance-status \
  --instance-ids i-0123456789abcdef0 \
  --include-all-instances
  1. 1.Collect console output and recent events
  2. 2.Console output can help you rule out a guest panic while still confirming the incident aligns with host-level impairment.
bash
aws ec2 get-console-output \
  --instance-id i-0123456789abcdef0
  1. 1.Stop and start the instance if the instance type and workload allow it
  2. 2.A stop-start moves the instance to new underlying hardware. A reboot usually does not.
bash
aws ec2 stop-instances --instance-ids i-0123456789abcdef0
aws ec2 start-instances --instance-ids i-0123456789abcdef0
  1. 1.Validate storage, IP, and recovery implications before the move
  2. 2.Stop-start changes ephemeral instance-store data and may change the public IP unless an Elastic IP is attached.

Prevention

  • Distinguish system and instance status checks in monitoring and runbooks
  • Use EBS-backed instances and Elastic IPs when fast host migration is part of the recovery plan
  • Keep backups and AMIs current so replacement is straightforward if stop-start is not enough
  • Alert on system check failure immediately because it usually points to infrastructure, not application code