Skip to content

Add a new section for transitioning indices to data streams #2216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ If you’ve been using Curator or some other mechanism to manage periodic indice
* Reindex into an {{ilm-init}}-managed index.

::::{note}
Starting in Curator version 5.7, Curator ignores {{ilm-init}} managed indices.
Starting in Curator version 5.7, Curator ignores {{ilm-init}}-managed indices.
::::


Expand Down Expand Up @@ -103,5 +103,4 @@ To reindex into the managed index:

Querying using this alias will now search your new data and all of the reindexed data.

6. Once you have verified that all of the reindexed data is available in the new managed indices, you can safely remove the old indices.

6. Once you have verified that all of the reindexed data is available in the new managed indices, you can safely remove the old indices.
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,12 @@ products:

When you continuously index timestamped documents into {{es}}, you typically use a [data stream](../../data-store/data-streams.md) so you can periodically [roll over](rollover.md) to a new index. This enables you to implement a [hot-warm-cold architecture](../data-tiers.md) to meet your performance requirements for your newest data, control costs over time, enforce retention policies, and still get the most out of your data.

::::{tip}
[Data streams](../../data-store/data-streams.md) are best suited for [append-only](../../data-store/data-streams.md#data-streams-append-only) use cases. If you need to update or delete existing time series data, you can perform update or delete operations directly on the data stream backing index. If you frequently send multiple documents using the same `_id` expecting last-write-wins, you may want to use an index alias with a write index instead. You can still use [ILM](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md) to manage and [roll over](rollover.md) the alias’s indices. Skip to [Manage time series data without data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams).
::::
To simplify index management and automate rollover, select one of the scenarios that best applies to your situation:

* **Roll over data streams with ILM.** When ingesting write-once, timestamped data that doesn't change, follow the steps in [Manage time series data with data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-with-data-streams) for simple, automated data stream rollover. ILM-managed backing indices are automatically created under a single data stream alias. ILM also tracks and transitions the backing indices through the lifecycle automatically.
* **Roll over time series indices with ILM.** Data streams are best suited for [append-only](../../data-store/data-streams.md#data-streams-append-only) use cases. If you need to update or delete existing time series data, you can perform update or delete operations directly on the data stream backing index. If you frequently send multiple documents using the same `_id` expecting last-write-wins, you may want to use an index alias with a write index instead. You can still use [ILM](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md) to manage and roll over the alias’s indices. Follow the steps in [Manage time series data without data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-time-series-data-without-data-streams) for more information.
* **Roll over general content as data streams with ILM.** If some of your indices store data that isn't timestamped, but you would like to get the benefits of automatic rotation when the index reaches a certain size or age, or delete already rotated indices after a certain amount of time, follow the steps in [Manage general content with data streams](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams). These steps include injecting a timestamp field during indexing time to mimic time series data.


## Manage time series data with data streams [manage-time-series-data-with-data-streams]

Expand Down Expand Up @@ -295,3 +298,160 @@ Retrieving the status information for managed indices is very similar to the dat
GET timeseries-*/_ilm/explain
```

## Manage general content with data streams [manage-general-content-with-data-streams]

Data streams are specifically designed for time series data.
If you want to manage general content (data without timestamps) with data streams, you can set up [ingest pipelines](/manage-data/ingest/transform-enrich/ingest-pipelines.md) to transform and enrich your general content by adding a timestamp field at [ingest](/manage-data/ingest.md) time and get the benefits of time-based data management.

For example, search use cases such as knowledge base, website content, e-commerce, or product catalog search, might require you to frequently index general content (data without timestamps). As a result, your index can grow significantly over time, which might impact storage requirements, query performance, and cluster health. Following the steps in this procedure (including a timestamp field and moving to ILM-managed data streams) can help you rotate your indices in a simpler way, based on their size or lifecycle phase.

To roll over your general content from indices to a data stream, you:

1. [Create an ingest pipeline](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-ingest) to process your general content and add a `@timestamp` field.

1. [Create a lifecycle policy](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-policy) that meets your requirements.

1. [Create an index template](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-template) that uses the created ingest pipeline and lifecycle policy.

1. [Create a data stream](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-create-stream).

1. *Optional:* If you have an existing, non-managed index and want to migrate your data to the data stream you created, [reindex with a data stream](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-reindex).

1. *Optional:* To check if your index gets rotated, you can [roll over](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-roll-over).

1. [Update your ingest endpoint](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#manage-general-content-with-data-streams-endpoint) to target the created data stream.

1. *Optional:* You can use the [ILM explain API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ilm-explain-lifecycle) to get status information for your managed indices.
For more information, refer to [Check lifecycle progress](/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover.md#ilm-gs-check-progress).


### Create an ingest pipeline to transform your general content [manage-general-content-with-data-streams-ingest]

Create an ingest pipeline that uses the [`set` enrich processor](elasticsearch://reference/enrich-processor/set-processor.md) to add a `@timestamp` field:

```console
PUT _ingest/pipeline/ingest_time_1
{
"description": "Add an ingest timestamp",
"processors": [
{
"set": {
"field": "@timestamp",
"value": "{{_ingest.timestamp}}"
}
}]
}
```

### Create a lifecycle policy [manage-general-content-with-data-streams-policy]

In this example, the policy is configured to roll over when the shard size reaches 10 GB:

```console
PUT _ilm/policy/indextods
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"set_priority": {
"priority": 100
},
"rollover": {
"max_primary_shard_size": "10gb"
}
}
}
}
}
}
```

For more information about lifecycle phases and available actions, check [Create a lifecycle policy](configure-lifecycle-policy.md#ilm-create-policy).


### Create an index template to apply the ingest pipeline and lifecycle policy [manage-general-content-with-data-streams-template]

Create an index template that uses the created ingest pipeline and lifecycle policy:

```console
PUT _index_template/index_to_dot
{
"template": {
"settings": {
"index": {
"lifecycle": {
"name": "indextods"
},
"default_pipeline": "ingest_time_1"
}
},
"mappings": {
"_source": {
"excludes": [],
"includes": [],
"enabled": true
},
"_routing": {
"required": false
},
"dynamic": true,
"numeric_detection": false,
"date_detection": true,
"dynamic_date_formats": [
"strict_date_optional_time",
"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
]
}
},
"index_patterns": [
"movetods"
],
"data_stream": {
"hidden": false,
"allow_custom_routing": false
}
}
```

### Create a data stream [manage-general-content-with-data-streams-create-stream]

Create a data stream using the [_data_stream API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-create-data-stream):

```console
PUT /_data_stream/movetods
```

### Optional: Reindex your data with a data stream [manage-general-content-with-data-streams-reindex]

If you want to copy your documents from an existing index to the data stream you created, reindex with a data stream using the [_reindex API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-reindex):

```console
POST /_reindex
{
"source": {
"index": "indextods"
},
"dest": {
"index": "movetods",
"op_type": "create"

}
}
```

For more information, check [Reindex with a data stream](../../data-store/data-streams/use-data-stream.md#reindex-with-a-data-stream).


### Optional: Roll over the reindexed data stream [manage-general-content-with-data-streams-roll-over]

Use the [_rollover API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-rollover) to create a new write index for the stream. This ensures that the lifecycle policy and ingest pipeline you've created will apply to any new documents that you index.

```console
POST movetods/_rollover
```

### Update your ingest endpoint to target the created data stream [manage-general-content-with-data-streams-endpoint]

If you use Elastic clients, scripts, or any other third party tool to ingest data to {{es}}, make sure you update these to use the created data stream.
Loading