Backup and Restore Elasticsearch Clusters with Snapshots

Elasticsearch is all about data, and as you probably already know, data is important—to you and Elasticsearch. However, in as much as both you and Elasticsearch love data, data failures may occur, leading to data loss.

To help safeguard against data loss, Elasticsearch has various features that allow you to ensure data availability, even in data failure instances.

Some of the ways that Elasticsearch uses to provide you with data availability include:

Cross-cluster replications, a feature that allows you to replicate data to a set of follower clusters; a follower cluster is a standby cluster used in case of failure from the master cluster.
Another method that Elasticsearch uses to prevent data using backups—also called cluster snapshots. If the need arises, you can use these snapshots to restore data on a completely new cluster.

This tutorial shows you how to create cluster snapshots, which will help you be ready should an irreversible data failure event occur.

Let’s get started.

What is An Elasticsearch Snapshot?

As mentioned, an elastic snapshot is a backup copy of a running Elasticsearch cluster. This snapshot can be of an entire cluster or specific indices and data streams within a particular cluster.

As you will soon learn, a repository plugin manages Elasticsearch snapshots. These snapshots are storable in various storage locations defined by the plugin. These include local systems and remote systems such as GCP Storage, Amazon EC2, Microsoft Azure, and many more.

How to Create An Elasticsearch Snapshot Repository

Before we dive into creating Elasticsearch snapshots, we need to create a snapshot repository because many of Elasticsearch’s services use the Snapshot API to perform these tasks.

Some of the tasks handled by the Snapshot API are:

Put snapshot repository
Verify snapshot repository
Get snapshot repository
Delete snapshot repository
Clean up snapshot repository
Create snapshot
Clone snapshot
Get snapshot
Get snapshot status
Restore snapshot
Delete snapshot

To create a snapshot repository, we use the _snapshot API endpoint followed by the name we want to assign to the snapshot repository. Consider the request below that creates a repository called backup_repo

PUT /_snapshot/backup_repo
{
"type": "fs",
"settings": {
"location": "/home/root/backups",
"compress": true
}
}

Here’s a cURL command for the above request:

curl -XPUT "http://localhost:9200/_snapshot/backup_repo" -H 'Content-Type: application/json' -d'{ "type": "fs", "settings": { "location": "/home/root/backups", "compress": true }}'

To pass the snapshot repository path, you must first add the system’s path or the parent directory to the path.repo entry in elasticsearch.yml

The path.repo entry should look similar to:

path.repo: [“/home/root/backups”]

You can find the Elasticsearch configuration file located in /etc/elasticsearch/elasticsearch.yml

NOTE: After adding the path.repo, you may need to restart Elasticsearch clusters. Additionally, the values supported for path.repo may vary wildly depending on the platform running Elasticsearch.

How to View the Snapshot Repository

To confirm the successful creation of the snapshot repository, use the GET request with the _snapshot endpoint as:

GET /_snapshot/backup_repo

You can also use the following cURL command:

curl -XGET "http://localhost:9200/_snapshot/backup_repo"

This should display information about the backup repository, for example:

{
"backup_repo" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : """/home/root/backups"""
}
}
}

If you have more than one snapshot repositories and do not remember the name, you can omit the repo name and call the _snapshot endpoint to list all the existing repositories.

GET /_snapshot or cURL curl -XGET http://localhost:9200/_snapshot

How to Create an Elasticsearch Snapshot

Creating an Elasticsearch snapshot for a specific snapshot repository is handled by the create snapshot API. The API requires the snapshot repository name and the name of the snapshot.

NOTE: A single snapshot repository can have more than one snapshot of the same clusters as long as they have unique identities/names.

Consider the following request to add a snapshot called snapshot_2021 to the backup_repo repository.

PUT /_snapshot/backup_repo/snapshot_2021

To use cURL, use the command:

curl -XPUT “http://localhost:9200/_snapshot/backup_repo/snapshot_2021”

The command should return a response from Elasticsearch with 200 OK and accepted: true

{
"accepted" : true
}

Since it does not specify which data streams and indices you want to have backed up, calling the above request backups all the data and the cluster state. To specify which data streams and indices to back up, add that to the request body.

Consider the following request that backups the .kibana index (a system index) and specifies which user authorized the snapshot and the reason.

PUT /_snapshot/backup_repo/snapshot_2
{
"indices": ".kibana",
"ignore_unavailable": true,
"include_global_state": true,
"metadata": {
"taken_by": "elasticadmin",
“taken_because”: “Daily Backup”
}
}

The cURL command for that is:

curl -XPUT "http://localhost:9200/_snapshot/backup_repo/snapshot_2" -H 'Content-Type: application/json' -d'{ "indices": ".kibana", "ignore_unavailable": true, "include_global_state": true, "metadata": { "taken_by": "elasticadmin", "taken_because": "Daily Backup" }}'

The ignore_unavailable sets a Boolean state that returns an error if any data streams or indices specified in the snapshot are missing or closed.

The include_global_state parameter saves the cluster’s current state if true. Some of the cluster information saved include:

Persistent cluster settings
Index templates
Legacy index templates
Ingest pipelines
ILM lifecycle policies

NOTE: You can specify more than one indices separated by commas.

A common argument used with the _snapshot endpoint is wait_for_completion, a Boolean value defining whether (true) or not (false) the request should return immediately after snapshot initialization (default) or wait for a snapshot completion.

For example:

PUT /_snapshot/backup_repo/snapshot_3?wait_for_completion=true
{
"indices": ".kibana",
"ignore_unavailable": true,
"include_global_state": false,
"metadata": {
"taken_by": "elasticadmin",
“taken_because”: “Weekly Backup”
}
}

The cURL command is:

curl -XPUT "http://localhost:9200/_snapshot/backup_repo/snapshot_3?wait_for_completion=true" -H 'Content-Type: application/json' -d'{ "indices": ".kibana", "ignore_unavailable": true, "include_global_state": false, "metadata": { "taken_by": "elasticadmin", "taken_because": "Weekly Backup" }}'

When you have the wait_for_completion parameter set to true, you’ll give an output similar to the one shown below:

{
"snapshot" : {
"snapshot" : "snapshot_3",
"uuid" : "tQUHyofIRnGMMtw0AGBACQ",
"version_id" : 7100299,
"version" : "7.10.2",
"indices" : [
".kibana_1"
],
"data_streams" : [ ],
"include_global_state" : false,
"metadata" : {
"taken_by" : "elasticadmin",
“taken_because”: “Weekly Backup”
},
"state" : "SUCCESS",
"start_time" : "2021-01-19T13:36:59.615Z",
"start_time_in_millis" : 1611063419615,
"end_time" : "2021-01-19T13:37:00.433Z",
"end_time_in_millis" : 1611063420433,
"duration_in_millis" : 818,
"failures" : [ ],
"shards" : {
"total" : 1,
"failed" : 0,
"successful" : 1
}
}
}

How to View Snapshots

The GET snapshot API handles the view snapshots functionality.

All you need to pass in the request is the snapshot repository and the name of the snapshot you wish to view the details.

The snapshot should respond with details about a specified snapshot. These details include:

Start and end time values
The version of Elasticsearch that created the snapshot
List of included indices
The snapshot’s current state
List of failures that occurred during the snapshot

For example, to view the details about the snapshot_3 created above, use the request shown below:

GET /_snapshot/backup_repo/snapshot_3

To use cURL, use the command below:

[cc lang="text" width="100%" height="100%" escaped="true" theme="blackboard" nowrap="0"]
curl -XGET “http://localhost:9200/_snapshot/backup_repo/snapshot_3”

The request should return a response with the details of the snapshot as:

{
"snapshots" : [
{
"snapshot" : "snapshot_3",
"uuid" : "tQUHyofIRnGMMtw0AGBACQ",
"version_id" : 7100299,
"version" : "7.10.2",
"indices" : [
".kibana_1"
],
"data_streams" : [ ],
"include_global_state" : false,
"metadata" : {
"taken_by" : "elasticadmin",
“taken_because”: “Weekly Backup”
},
"state" : "SUCCESS",
"start_time" : "2021-01-19T13:36:59.615Z",
"start_time_in_millis" : 1611063419615,
"end_time" : "2021-01-19T13:37:00.433Z",
"end_time_in_millis" : 1611063420433,
"duration_in_millis" : 818,
"failures" : [ ],
"shards" : {
"total" : 1,
"failed" : 0,
"successful" : 1
}
}
]
}

You can also customize the request body to get specific details about a snapshot. However, we will not look into that for now.

Let us say you want to view information about all snapshots in a specific snapshot repository; in that case, you can pass an asterisk wildcard in the request as:

GET /_snapshot/backup_repo/*

The cURL command for that is:

curl -XGET “http://localhost:9200/_snapshot/backup_repo/*”

The response is a detailed dump of all the snapshots in that repository as:

{
"snapshots" : [
{
"snapshot" : "snapshot_2021",
"uuid" : "7CFigHzvRtyZW07c60d2iw",
"version_id" : 7100299,
"version" : "7.10.2",
"indices" : [
"my_index",
"single_index_with_body",
"my_index_2",
"single_index",
".kibana_1",
“test”
],
"data_streams" : [ ],
"include_global_state" : true,
"state" : "SUCCESS",
"start_time" : "2021-01-19T13:28:48.172Z",
"start_time_in_millis" : 1611062928172,
"end_time" : "2021-01-19T13:28:50.831Z",
"end_time_in_millis" : 1611062930831,
"duration_in_millis" : 2659,
"failures" : [ ],
"shards" : {
"total" : 7,
"failed" : 0,
"successful" : 7
}
},
{
"snapshot" : "snapshot_2",
"uuid" : "w58IrYmORAub8VC7cg04Wg",
"version_id" : 7100299,
"version" : "7.10.2",
"indices" : [
".kibana_1"
],
"data_streams" : [ ],
"include_global_state" : false,
"metadata" : {
"taken_by" : "elasticadmin",
"taken_because" : "Daily Backup"
},
"state" : "SUCCESS",
"start_time" : "2021-01-19T13:33:34.482Z",
"start_time_in_millis" : 1611063214482,
"end_time" : "2021-01-19T13:33:35.921Z",
"end_time_in_millis" : 1611063215921,
"duration_in_millis" : 1439,
"failures" : [ ],
"shards" : {
"total" : 1,
"failed" : 0,
"successful" : 1
}
},
{
"snapshot" : "snapshot_3",
"uuid" : "tQUHyofIRnGMMtw0AGBACQ",
"version_id" : 7100299,
"version" : "7.10.2",
"indices" : [
".kibana_1"
],
"data_streams" : [ ],
"include_global_state" : false,
"metadata" : {
"taken_by" : "elasticadmin",
“taken_because”: “Weekly Backup”
},
"state" : "SUCCESS",
"start_time" : "2021-01-19T13:36:59.615Z",
"start_time_in_millis" : 1611063419615,
"end_time" : "2021-01-19T13:37:00.433Z",
"end_time_in_millis" : 1611063420433,
"duration_in_millis" : 818,
"failures" : [ ],
"shards" : {
"total" : 1,
"failed" : 0,
"successful" : 1
}
}
]
}

Wildcards are very useful for filtering specific information about the snapshots.

How to Delete a Snapshot

Deleting a snapshot is very simple: all you have to do is use the DELETE request as:

DELETE /_snapshot/backup_repo/snapshot_2021/

The cURL command is:

curl -XDELETE “http://localhost:9200/_snapshot/backup_repo/snapshot_2021/”

The response should be acknowledged:true

{
“acknowledged”: true
}

If the snapshot does not exist, you will get a 404 status code and snapshot missing error as:

{
"error" : {
"root_cause" : [
{
"type" : "snapshot_missing_exception",
"reason" : "[backup_repo:snapshot_2021] is missing"
}
],
"type" : "snapshot_missing_exception",
"reason" : "[backup_repo:snapshot_2021] is missing"
},
"status" : 404
}

Conclusion

In this guide, we have discussed how to create Elasticsearch snapshots using the Snapshot API. What you’ve learned should be enough to allow you to create a snapshot repository, view the snapshot repositories, create, view, and delete snapshots. Although there’re customizations you can make with the API, the knowledge in this guide should be enough to get you started.

Thank you for reading.