Fix Elasticsearch Snapshot Restore Errors - Complete Recovery Guide

# How to Fix Elasticsearch Snapshot Restore Failures

Restoring from a snapshot should be straightforward, but various issues can cause failures. Whether you're dealing with repository problems, version incompatibilities, or shard allocation errors, here's how to diagnose and fix snapshot restore issues.

Recognizing Restore Failures

When a restore fails, you'll see errors like:

json

{
  "error": {
    "root_cause": [
      {
        "type": "snapshot_restore_exception",
        "reason": "[backup-repo:snapshot-2024-01-15] cannot restore index [logs-2024-01] because an open index with same name already exists in the cluster"
      }
    ],
    "type": "snapshot_restore_exception",
    "reason": "[backup-repo:snapshot-2024-01-15] cannot restore index [logs-2024-01]"
  },
  "status": 400
}

Or during partial failure:

json

{
  "snapshot" : "snapshot-2024-01-15",
  "indices" : [ "logs-2024-01", "logs-2024-02" ],
  "shards" : {
    "total" : 10,
    "failed" : 3,
    "successful" : 7
  }
}

Pre-Restore Checklist

Before attempting a restore, verify your snapshot repository:

```bash # List registered repositories curl -X GET "localhost:9200/_snapshot?pretty"

# Verify repository is accessible curl -X POST "localhost:9200/_snapshot/backup-repo/_verify?pretty"

# List available snapshots curl -X GET "localhost:9200/_snapshot/backup-repo/_all?pretty" ```

Check snapshot details:

bash

curl -X GET "localhost:9200/_snapshot/backup-repo/snapshot-2024-01-15?pretty"

json

{
  "snapshots" : [
    {
      "snapshot" : "snapshot-2024-01-15",
      "uuid" : "abc123...",
      "version_id" : 8010199,
      "version" : "8.1.1",
      "indices" : [ "logs-2024-01", "products", "users" ],
      "data_streams" : [ ],
      "include_global_state" : true,
      "state" : "SUCCESS",
      "start_time" : "2024-01-15T00:00:00.000Z",
      "end_time" : "2024-01-15T00:30:00.000Z"
    }
  ]
}

Problem 1: Index Already Exists

The most common error - trying to restore an index that already exists:

json

{
  "error": {
    "type": "snapshot_restore_exception",
    "reason": "cannot restore index [logs-2024-01] because an open index with same name already exists"
  }
}

Solution A: Close the Existing Index

```bash curl -X POST "localhost:9200/logs-2024-01/_close"

curl -X POST "localhost:9200/_snapshot/backup-repo/snapshot-2024-01-15/_restore" -H 'Content-Type: application/json' -d' { "indices": "logs-2024-01" } ' ```

Solution B: Delete the Existing Index

```bash # Warning: This deletes all data in the index curl -X DELETE "localhost:9200/logs-2024-01"

curl -X POST "localhost:9200/_snapshot/backup-repo/snapshot-2024-01-15/_restore" -H 'Content-Type: application/json' -d' { "indices": "logs-2024-01" } ' ```

Solution C: Restore to a Different Index Name

bash

curl -X POST "localhost:9200/_snapshot/backup-repo/snapshot-2024-01-15/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "logs-2024-01",
  "rename_pattern": "(.+)",
  "rename_replacement": "$1-restored"
}
'

This restores logs-2024-01 as logs-2024-01-restored.

Problem 2: Version Incompatibility

Snapshot version mismatch errors:

json

{
  "error": {
    "type": "snapshot_restore_exception",
    "reason": "the snapshot was created with Elasticsearch version [7.10.0] which is not compatible with the current version [8.5.0]"
  }
}

Solution: Upgrade Path

You cannot restore snapshots across major versions directly. Options:

1.Restore to the same major version, then reindex to a new index on the target version
2.Use the upgrade assistant for version migration
3.Export/Import data via JSON for cross-version migration

```bash # Check current version curl -X GET "localhost:9200"

# For minor version differences, upgrade the cluster first # Then snapshot and restore ```

Problem 3: Repository Not Accessible

Repository verification fails:

bash

curl -X POST "localhost:9200/_snapshot/backup-repo/_verify?pretty"

json

{
  "error": {
    "root_cause": [
      {
        "type": "repository_verification_exception",
        "reason": "[backup-repo] path is not accessible"
      }
    ],
    "type": "repository_verification_exception",
    "reason": "[backup-repo] path is not accessible"
  },
  "status": 500
}

Solution: Fix Repository Configuration

Check the repository configuration:

bash

curl -X GET "localhost:9200/_snapshot/backup-repo?pretty"

For file system repositories, verify the path is accessible:

```bash # Check elasticsearch.yml for path.repo setting grep path.repo /etc/elasticsearch/elasticsearch.yml

# Verify the directory exists and is writable ls -la /mnt/backups/elasticsearch

# Check permissions sudo chown -R elasticsearch:elasticsearch /mnt/backups/elasticsearch ```

Re-register the repository:

bash

curl -X PUT "localhost:9200/_snapshot/backup-repo" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/mnt/backups/elasticsearch",
    "compress": true
  }
}
'

For S3 repositories, check credentials:

bash

curl -X PUT "localhost:9200/_snapshot/s3-backups" -H 'Content-Type: application/json' -d'
{
  "type": "s3",
  "settings": {
    "bucket": "my-elasticsearch-backups",
    "region": "us-east-1",
    "base_path": "snapshots"
  }
}
'

Problem 4: Corrupted Snapshot

Snapshot integrity issues:

json

{
  "error": {
    "type": "snapshot_restore_exception",
    "reason": "[backup-repo:snapshot-2024-01-15] failed to restore snapshot"
  }
}

Check snapshot status:

bash

curl -X GET "localhost:9200/_snapshot/backup-repo/snapshot-2024-01-15/_status?pretty"

Solution: Restore Specific Indices

If only some indices are corrupted, restore only the good ones:

```bash # List indices in the snapshot curl -X GET "localhost:9200/_snapshot/backup-repo/snapshot-2024-01-15?pretty"

# Restore specific indices curl -X POST "localhost:9200/_snapshot/backup-repo/snapshot-2024-01-15/_restore" -H 'Content-Type: application/json' -d' { "indices": "products,users", "ignore_unavailable": true } ' ```

Problem 5: Insufficient Disk Space

Restore fails due to disk space:

json

{
  "error": {
    "type": "illegal_state_exception",
    "reason": "not enough disk space to restore snapshot"
  }
}

Solution: Free Space or Add Nodes

```bash # Check current disk usage curl -X GET "localhost:9200/_cat/allocation?v"

# Delete unnecessary indices curl -X DELETE "localhost:9200/old-logs-*"

# Or add nodes for more capacity ```

Temporarily lower the watermark:

bash

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "98%"
  }
}
'

Problem 6: Shard Allocation Failures

Shards fail to allocate during restore:

bash

curl -X GET "localhost:9200/_cat/shards?v&s=state" | grep -v STARTED

Solution: Check Allocation Settings

```bash # Check allocation settings curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty" | grep allocation

# Ensure allocation is enabled curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' { "transient": { "cluster.routing.allocation.enable": "all" } } ' ```

Check for shard allocation filtering:

bash

curl -X GET "localhost:9200/_cluster/settings?include_defaults=true&flat_settings=true&pretty" | grep -A5 "allocation"

Problem 7: Global State Conflicts

Restoring global state (templates, ILM policies) can conflict:

json

{
  "error": {
    "type": "snapshot_restore_exception",
    "reason": "cannot restore global state because it already exists"
  }
}

Solution: Exclude or Overwrite Global State

Exclude global state:

bash

curl -X POST "localhost:9200/_snapshot/backup-repo/snapshot-2024-01-15/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "*",
  "include_global_state": false
}
'

Or delete existing conflicting objects first:

```bash # List templates curl -X GET "localhost:9200/_cat/templates?v"

# Delete specific template curl -X DELETE "localhost:9200/_template/my-template" ```

Monitoring Restore Progress

Track restore progress:

bash

curl -X GET "localhost:9200/_snapshot/backup-repo/snapshot-2024-01-15/_status?pretty"

json

{
  "snapshots" : [
    {
      "snapshot" : "snapshot-2024-01-15",
      "repository" : "backup-repo",
      "state" : "IN_PROGRESS",
      "shards_stats" : {
        "initializing" : 0,
        "started" : 5,
        "finalizing" : 2,
        "done" : 3,
        "failed" : 0,
        "total" : 10
      }
    }
  ]
}

Wait for completion:

bash

curl -X GET "localhost:9200/_snapshot/backup-repo/snapshot-2024-01-15/_status?wait_for_completion=true&timeout=300s"

Best Practices for Successful Restores

1. Test Your Backups

Regularly test restore procedures:

```bash # Create a test restore curl -X POST "localhost:9200/_snapshot/backup-repo/snapshot-2024-01-15/_restore" -H 'Content-Type: application/json' -d' { "indices": "logs-2024-01", "rename_pattern": "(.+)", "rename_replacement": "restored_test_$1" } '

# Verify data curl -X GET "localhost:9200/restored_test_logs-2024-01/_count"

# Clean up curl -X DELETE "localhost:9200/restored_test_*" ```

2. Use a Restore Configuration File

Create a restore configuration for complex restores:

bash

curl -X POST "localhost:9200/_snapshot/backup-repo/snapshot-2024-01-15/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "logs-*,products,users",
  "ignore_unavailable": true,
  "include_global_state": false,
  "rename_pattern": "(.+)",
  "rename_replacement": "restored_$1",
  "include_aliases": false
}
'

3. Verify Index Settings After Restore

```bash curl -X GET "localhost:9200/restored_logs-2024-01/_settings?pretty"

# Adjust replica count if needed curl -X PUT "localhost:9200/restored_logs-2024-01/_settings" -H 'Content-Type: application/json' -d' { "index": { "number_of_replicas": 1 } } ' ```

Troubleshooting Checklist

When restore fails, check these in order:

1.Repository accessible? GET _snapshot/repo/_verify
2.Snapshot valid? GET _snapshot/repo/snapshot
3.Index exists? GET _cat/indices
4.Disk space available? GET _cat/allocation
5.Allocation enabled? GET _cluster/settings
6.Version compatible? Check snapshot vs cluster version
7.Shards allocating? GET _cat/shards?v&s=state

Summary

Snapshot restore failures are usually caused by:

1.Existing indices with same name
2.Version incompatibility
3.Repository access issues
4.Corrupted snapshots
5.Disk space limitations
6.Shard allocation settings
7.Global state conflicts

Resolve by closing or renaming indices, ensuring repository access, freeing disk space, and using proper restore options. Always test your backup restore process before you need it in an emergency.

How to Fix Elasticsearch Snapshot Restore Failures

Recognizing Restore Failures

Pre-Restore Checklist

Problem 1: Index Already Exists

Solution A: Close the Existing Index

Solution B: Delete the Existing Index

Solution C: Restore to a Different Index Name

Problem 2: Version Incompatibility

Solution: Upgrade Path

Problem 3: Repository Not Accessible

Solution: Fix Repository Configuration

Problem 4: Corrupted Snapshot

Solution: Restore Specific Indices

Problem 5: Insufficient Disk Space

Solution: Free Space or Add Nodes

Problem 6: Shard Allocation Failures

Solution: Check Allocation Settings

Problem 7: Global State Conflicts

Solution: Exclude or Overwrite Global State

Monitoring Restore Progress

Best Practices for Successful Restores

1. Test Your Backups

2. Use a Restore Configuration File

3. Verify Index Settings After Restore

Troubleshooting Checklist

Summary

Share this guide

More Monitoring Troubleshooting Guides

Metric Retention Expired

Timeseries Storage Full

Collector Agent Crashed

Webhook Notification Timeout

SMS Notification Failed

Email Notification Bounced