Introduction Elasticsearch snapshot repositories can enter a read-only or inaccessible state due to permission changes, expired credentials, S3 bucket policy updates, or filesystem issues. When this happens, snapshot creation fails silently or with explicit errors, leaving the cluster without recent backups and at risk of data loss.

Symptoms - `PUT /_snapshot/my_repo/snapshot_1` returns `repository_missing_exception` or `repository_verification_exception` - `GET /_snapshot/my_repo/_status` shows `FAILED` state - Snapshot creation hangs indefinitely - Elasticsearch logs show `IOException: Read-only file system` or `AccessDenied` for S3 - `GET /_snapshot/my_repo/_verify` returns verification failure

Common Causes - S3 access key or secret key expired or rotated - S3 bucket policy changed to deny Elasticsearch IAM role - NFS mount became read-only due to server issues - Disk full on the snapshot storage location - Elasticsearch process lost filesystem permissions after OS update

Step-by-Step Fix 1. **Verify the repository status": ```bash curl -X POST localhost:9200/_snapshot/my_repo/_verify?pretty # Expected: { "nodes": { "node_id": { "name": "node-1" } } } # On failure: { "error": { "type": "repository_verification_exception" } } ```

  1. 1.**Check S3 repository configuration":
  2. 2.```bash
  3. 3.curl -s localhost:9200/_snapshot/my_repo?pretty
  4. 4.# Check: bucket, region, access_key, base_path
  5. 5.`
  6. 6.**Update S3 credentials in the keystore":
  7. 7.```bash
  8. 8.# On each Elasticsearch node
  9. 9./usr/share/elasticsearch/bin/elasticsearch-keystore list
  10. 10.# Check for: s3.client.default.access_key, s3.client.default.secret_key

# Update the credentials /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.access_key /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.secret_key

# Restart Elasticsearch on each node sudo systemctl restart elasticsearch ```

  1. 1.**For filesystem repositories, fix permissions and space":
  2. 2.```bash
  3. 3.# Check disk space
  4. 4.df -h /mnt/snapshots

# Check permissions ls -la /mnt/snapshots/ sudo chown -R elasticsearch:elasticsearch /mnt/snapshots sudo chmod 755 /mnt/snapshots

# Remount if read-only sudo mount -o remount,rw /mnt/snapshots ```

  1. 1.**Recreate the repository if needed":
  2. 2.```bash
  3. 3.# Delete the broken repository
  4. 4.curl -X DELETE localhost:9200/_snapshot/my_repo

# Recreate with correct settings curl -X PUT localhost:9200/_snapshot/my_repo -H 'Content-Type: application/json' -d '{ "type": "s3", "settings": { "bucket": "es-backups-prod", "region": "us-east-1", "base_path": "elasticsearch", "compress": true } }' ```

Prevention - Use IAM roles instead of static credentials for S3 repositories - Set up CloudWatch/Cron monitoring for snapshot completion status - Test snapshot restoration quarterly as part of disaster recovery drills - Use lifecycle policies on S3 buckets to manage backup retention - Monitor repository verification status with `GET /_snapshot/_all/_verify` - Implement backup alerts using `GET /_snapshot/my_repo/_status` in monitoring - Keep at least 3 recent snapshots and verify the oldest can still be restored