Skip to content

Commit

Permalink
Add gitlab-health (fix #670)
Browse files Browse the repository at this point in the history
  • Loading branch information
markuslf committed Aug 24, 2023
1 parent b1686b7 commit d186286
Show file tree
Hide file tree
Showing 12 changed files with 599 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Icinga Director:

Monitoring Plugins:

* gitlab-health (fix #670)
* gitlab-version
* openstack-nova-list
* postgresql-version
Expand Down
43 changes: 43 additions & 0 deletions assets/icingaweb2-module-director/all-the-rest.json
Original file line number Diff line number Diff line change
Expand Up @@ -9086,6 +9086,49 @@
"object_name": "GitLab Service Set",
"object_type": "template",
"services": {
"GitLab Health": {
"action_url": null,
"apply_for": null,
"assign_filter": null,
"check_command": null,
"check_interval": null,
"check_period": null,
"check_timeout": null,
"command_endpoint": null,
"disabled": false,
"display_name": null,
"enable_active_checks": null,
"enable_event_handler": null,
"enable_flapping": null,
"enable_notifications": null,
"enable_passive_checks": null,
"enable_perfdata": null,
"event_command": null,
"fields": [],
"flapping_threshold_high": null,
"flapping_threshold_low": null,
"groups": [],
"host": null,
"icon_image": null,
"icon_image_alt": null,
"imports": [
"tpl-service-gitlab-health"
],
"max_check_attempts": null,
"notes": null,
"notes_url": null,
"object_name": "GitLab Health",
"object_type": "object",
"retry_interval": null,
"service_set": null,
"template_choice": null,
"use_agent": null,
"use_var_overrides": null,
"uuid": "161fb5d8-24ed-4bdf-b990-1818e9ab57c2",
"vars": {},
"volatile": null,
"zone": null
},
"GitLab Version": {
"action_url": null,
"apply_for": null,
Expand Down
86 changes: 86 additions & 0 deletions check-plugins/gitlab-health/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
Check example
=============

Overview
--------

Checks whether the GitLab application server is running. It does not hit the database or verifies other services are running. Its purpose is to notify that the application server is handling requests, but a STATE_OK response does not signify that the database or other services are ready.

Hints:

* To access monitoring resources, the requesting client IP needs to be included in the allowlist. For details, see `how to add IPs to the allowlist for the monitoring endpoints <https://docs.gitlab.com/ee/administration/monitoring/ip_allowlist.html>`.
* GitLab Health Checks: https://docs.gitlab.com/ee/administration/monitoring/health_check.html


Fact Sheet
----------

.. csv-table::
:widths: 30, 70

"Check Plugin Download", "https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/gitlab-health"
"Check Interval Recommendation", "Once a minute"
"Can be called without parameters", "Yes"
"Compiled for", "Linux"


Help
----

.. code-block:: text
usage: gitlab-health [-h] [-V] [--always-ok] [--severity {warn,crit}]
[--test TEST] [--timeout TIMEOUT] [--url URL]
Checks whether the GitLab application server is running. It does not hit the
database or verifies other services are running.
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
--always-ok Always returns OK.
--severity {warn,crit}
Severity for alerting. Default: warn
--test TEST For unit tests. Needs "path-to-stdout-file,path-to-
stderr-file,expected-retc".
--timeout TIMEOUT Network timeout in seconds. Default: 3 (seconds)
--url URL GitLab health URL endpoint. Default:
http://localhost/-/health
Usage Examples
--------------

.. code-block:: bash
./gitlab-health --severity warn --timeout 3 --url http://localhost/-/health
Output:

.. code-block:: text
The GitLab application server is processing requests, but this does not mean that the database or other services are ready.
States
------

* Depending on the given ``--severity``, returns WARN (default) or CRIT if liveness and readiness probes to indicate service health and reachability to required services fail.


Perfdata / Metrics
------------------

.. csv-table::
:widths: 25, 15, 60
:header-rows: 1

Name, Type, Description
gitlab-health-state, Number, "The current state (0 = OK, 1 = WARN, 2 = CRIT, 3 = UNKNOWN)."


Credits, License
----------------

* Authors: `Linuxfabrik GmbH, Zurich <https://www.linuxfabrik.ch>`_
* License: The Unlicense, see `LICENSE file <https://unlicense.org/>`_.
124 changes: 124 additions & 0 deletions check-plugins/gitlab-health/gitlab-health
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
#!/usr/bin/env python3
# -*- coding: utf-8; py-indent-offset: 4 -*-
#
# Author: Linuxfabrik GmbH, Zurich, Switzerland
# Contact: info (at) linuxfabrik (dot) ch
# https://www.linuxfabrik.ch/
# License: The Unlicense, see LICENSE file.

# https://github.com/Linuxfabrik/monitoring-plugins/blob/main/CONTRIBUTING.rst

"""See the check's README for more details.
"""

import argparse # pylint: disable=C0413
import sys # pylint: disable=C0413

import lib.args # pylint: disable=C0413
import lib.base # pylint: disable=C0413
import lib.test # pylint: disable=C0413
import lib.url # pylint: disable=C0413
from lib.globals import (STATE_CRIT, STATE_OK, # pylint: disable=C0413
STATE_UNKNOWN, STATE_WARN)


__author__ = 'Linuxfabrik GmbH, Zurich/Switzerland'
__version__ = '2023082401'

DESCRIPTION = """Checks whether the GitLab application server is running. It does not hit
the database or verifies other services are running."""

DEFAULT_SEVERITY = 'warn'
DEFAULT_TIMEOUT = 3
DEFAULT_URL = 'http://localhost/-/health'


def parse_args():
"""Parse command line arguments using argparse.
"""
parser = argparse.ArgumentParser(description=DESCRIPTION)

parser.add_argument(
'-V', '--version',
action='version',
version='%(prog)s: v{} by {}'.format(__version__, __author__)
)

parser.add_argument(
'--always-ok',
help='Always returns OK.',
dest='ALWAYS_OK',
action='store_true',
default=False,
)

parser.add_argument(
'--severity',
help='Severity for alerting. Default: %(default)s',
dest='SEVERITY',
default=DEFAULT_SEVERITY,
choices=['warn', 'crit'],
)

parser.add_argument(
'--test',
help='For unit tests. Needs "path-to-stdout-file,path-to-stderr-file,expected-retc".',
dest='TEST',
type=lib.args.csv,
)

parser.add_argument(
'--timeout',
help='Network timeout in seconds. Default: %(default)s (seconds)',
dest='TIMEOUT',
type=int,
default=DEFAULT_TIMEOUT,
)

parser.add_argument(
'--url',
help='GitLab health URL endpoint. Default: %(default)s',
dest='URL',
default=DEFAULT_URL,
)

return parser.parse_args()


def main():
"""The main function. Hier spielt die Musik.
"""

# parse the command line, exit with UNKNOWN if it fails
try:
args = parse_args()
except SystemExit:
sys.exit(STATE_UNKNOWN)

# init some vars
state = STATE_OK

# fetch and analyze data
if args.TEST is None:
result = lib.base.coe(lib.url.fetch(args.URL, timeout=args.TIMEOUT))
else:
# do not call the command, put in test data
result, stderr, retc = lib.test.test(args.TEST)

if result == 'GitLab OK':
msg = 'The GitLab application server is processing requests, but this does not mean ' \
'that the database or other services are ready.'
else:
msg = 'The GitLab application server seems to have a problem.'
state = lib.base.str2state(args.SEVERITY)
perfdata = lib.base.get_perfdata('gitlab-health', state, None, None, None, 0, STATE_UNKNOWN)

# over and out
lib.base.oao(msg, state, perfdata, always_ok=args.ALWAYS_OK)


if __name__ == '__main__':
try:
main()
except Exception: # pylint: disable=W0703
lib.base.cu()
115 changes: 115 additions & 0 deletions check-plugins/gitlab-health/grafana/gitlab-health.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
apiVersion: grizzly.grafana.com/v1alpha1
kind: Dashboard
metadata:
folder: linuxfabrik-monitoring-plugins
name: gitlab-health
spec:
schemaVersion: 2023041201
tags:
- Linuxfabrik
- Grizzly
- static
time:
from: now-90d
to: now
timepicker:
hidden: false
refresh_intervals:
- 1m
timezone: browser
title: GitLab Health
uid: linuxfabrik-monitoring-plugins-gitlab-health
editable: true
liveNow: true
refresh: 1m
templating:
list:
- hide: 2
label: Command
name: command
query: cmd-check-gitlab-health
type: constant
- label: Hostname
name: hostname
query: SHOW TAG VALUES FROM "cmd-check-gitlab-health" WITH KEY = "hostname"
refresh: 2
sort: 1
type: query

panels:

- title: GitLab Health
type: timeseries
gridPos:
h: 8
w: 12
x: 12
y: 8
fieldConfig:
defaults:
color:
mode: palette-classic
custom:
lineInterpolation: smooth
spanNulls: true
decimals: 0
max: 3
min: 0
unit: short
overrides:
- matcher:
id: byName
options: gitlab-health
properties:
- id: mappings
value:
- options:
'0':
text: OK
'1':
text: WARN
'2':
text: CRIT
'3':
text: UNKN
type: value
options:
legend:
calcs:
- min
- max
displayMode: table
placement: bottom
showLegend: true
tooltip:
mode: multi
sort: none

targets:

- alias: gitlab-health
refId: gitlab-health
groupBy:
- params:
- $interval
type: time
measurement: /^$command$/
resultFormat: time_series
select:
- - params:
- value
type: field
- params: []
type: mean
tags:
- key: hostname
operator: '=~'
value: /^$hostname$/
- condition: AND
key: service
operator: '='
value: GitLab Health
- condition: AND
key: metric
operator: '='
value: gitlab-health
Loading

0 comments on commit d186286

Please sign in to comment.