Skip to content

mariusoe/influx-baselining

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InfluxDB Baseline Generator

This project is a standalone application which generates baselines on live and historical data stored in InfluxDB. The baselines are written back as series to Influx.

Introduction

Baselines are seasonality based: E.g. a daily baseline is computed by averaging the observed values from the past days at the same hour of the day.

The two core configuration properties for a baselines are precision and seasonality.

The precision defines how many points of the baseline will be computed. For example, given a precision of 30 minutes, the resulting baseline measurement will consist of two points per hour. Each point defines the value and the standard deviation for the baseline for a given time interval of 30 minutes.

The seasonality defines in what pattern the baselines is expected to reoccur. A seasonality of one day therefore means that you expect the data to repeat on a daily pattern. E.g. the value of today at 11 am is expected to correlate with the values of yesterday and the day before at 11 am. Similarly, a seasonality of seven days can be used for weekly baselines: This means you expect teh value of monday at 11 am to correlate with the value on the previous mondays at 11 am.

This application is primarily designed for Prometheus-style metrics:

  • Counters: Series where the value increases monotonic (e.g. the number of HTTP requests)
  • Gauges: Series where the value can go up or down (e.g. the CPU Usage)

For gauges this application will simply use the mean value as baseline.

For counters the increase per second will be baselined. For example given a counter for the HTTP requests, the resulting baseline will denote the expected average requests per second in the interval specified by the precision.

In addition it is possible to baseline response times which are derived from counters: With Prometheus-style metrics, response times are represented using two different counters: The number of requests and the total time spent processing these requests. This means that the response time is the ratio of the two: the total time spent divided by the number of requests. This value can be baselines too, the joining of the series happens within the baseline generator.

Configuration

The application is a Spring-Boot application without any interface. It is configured by placing an application.yml file next to the JAR file.

In the application.yml it is first required that you configure the connection to influx:

influx:
  url: http://localhost:8086
  user: "myuser" # OPTIONAL: username used to connect to influx
  password: "mypw" # OPTIONAL: password used to connect to influx
  
  connect-timeout: 60s # OPTIONAL: timeout to use when connecting to influx
  read-timeout: 60s # OPTIONAL: timeout to use when reading data from influx
  write-timeout: 60s # OPTIONAL: timeout to use when writing data to influx

Next you can configure the actual baselining:

baselining:

  # When starting up, the service will compute baselines based on historical data
  # This defines how far the service should look into the past
  backfill: 30d
  
  # Commonly data takes some time until it actual gets to the influxDB
  # This property tells the service to wait the given amount of time before updating the baselines.
  # E.g. a delay of 30s means that the baselines for 14:00 to 15:00 will be computed at 15:00:30
  update-delay: 30s
  
  #Baselines for gauge metrics
  gauges:
    - precision: 15m
      seasonality: 1d
      input: telegraf.autogen.system_cpu_usage.gauge
      output: baselines.autogen.system_cpu_usage_daily
      
    - precision: 15m
      seasonality: 7d
      input: telegraf.autogen.system_cpu_usage.gauge
      output: baselines.autogen.system_cpu_usage_weekly
      
  # Baselines for counters (increase per second)
  counters:
    - precision: 15m
      seasonality: 7d
      windows: [28d, 56d]
      input: telegraf.autogen.http_requests_count.value
      output: baselines.autogen.http_request_rate_weekly
      tags: [http_path]
  
  # Baselines for ratio between two counters (e.g. response time)    
  counter-ratios:
    - precision: 15m
      seasonality: 1d
      windows: [15d, 30d]
      input: telegraf.autogen.http_requests_time.counter
      divide-by: telegraf.autogen.http_requests_count.counter
      output: baselines.autogen.http_time_daily
      tags: [http_path]

As shown in the examples, each baselines requires you to specify the precision and seasonality which were described above.

In addition, input series are defined in the form <database>.<retention>.<measurement>.<field>. The name of the output baseline is defined as <database>.<retention>.<measurement>.

It is possible to specify time windows for each baseline, which have to be multiples of the seasonality. The time windows define how far the service looks into the past when computing baselines: E.g. a window of 10d on a baseline with seasonality: 1d means that the baseline values will always only take the past 10 days into account.

The defined output measurement name is actually only used as a prefix, because each window results in a separate measurement. In the example above, the response time baseline defines http_time_daily as output with two windows: 15d and 30d. As result, the service will generate two measurements: http_time_daily_15d and http_time_daily_30d. The measurements contain two fields: value, which is the baseline and stddev which is the standard deviation.

By default the baseline service will preserve all tags from the input measurement. When this is not the intended behaviour, it is possible to keep only certain tags (or none). The values of all other tags will be aggregated together.

For example, if we assume that the http_requests_count measurement has two tags (http_path and http_status), we can specify tags: [http_path] as shown above. This means that the baseline will be generated for each http_path individually, however the http_status will not be used for differentiation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%