Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 79 additions & 2 deletions metrics-collector/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

Code Engine job that demonstrates how to collect resource metrics (CPU, memory and disk usage) of running Code Engine apps, jobs, and builds

![Dashboard overview](./images/icl-dashboard-overview.png)

## Installation

### Capture metrics every n seconds
Expand All @@ -17,11 +19,11 @@ $ ibmcloud ce job create \
--wait
```

* Submit a daemon job that collects metrics in an endless loop. The daemon job queries the Metrics API every 10 seconds
* Submit a daemon job that collects metrics in an endless loop. The daemon job queries the Metrics API every 30 seconds
```
$ ibmcloud ce jobrun submit \
--job metrics-collector \
--env INTERVAL=10
--env INTERVAL=30
```


Expand Down Expand Up @@ -57,6 +59,81 @@ One can use the environment variable `COLLECT_DISKUSAGE=true` to also collect th

Once your IBM Cloud Code Engine project has detected a corresponding IBM Cloud Logs instance, which is configured to receive platform logs, you can consume the resource metrics in IBM Cloud Logs. Use the filter `metric:instance-resources` to filter for log lines that print resource metrics for each detected IBM Cloud Code Engine instance that is running in a project.

### Custom dashboard

Follow the steps below to create a custom dashboard in your IBM Cloud Logs instance, to gain insights into resource consumption metrics.

![Dashboard overview](./images/icl-dashboard-overview.png)

**Setup instructions:**

* Navigate to the "Custom dashboards" view, hover of the "New" button, and click "Import dashboard"

![New dashboard](./images/icl-dashboard-new.png)

* In the "Import" modal, select the file [./setup/dashboard-code_engine_resource_consumption_metrics.json](./setup/dashboard-code_engine_resource_consumption_metrics.json) located in this repository, and click "Import"

![Import modal](./images/icl-dashboard-import.png)

* Confirm the import by clicking "Import" again

![Import confirmation](./images/icl-dashboard-import-confirm.png)


### Logs view

Follow the steps below to create a Logs view in your IBM Cloud Logs instance, that allows you to drill into individual instance-resources log lines.

![Logs overview](./images/icl-logs-view-overview.png)

**Setup instructions:**

* Filter only log lines related collected istio-proxy logs, by filtering for the following query
```
app:"codeengine" AND message.metric:"instance-resources"
```

![Query](./images/icl-logs-view-query.png)

* In the left bar, click "Add Filter" and add the following filters
* `Application`
* `App`
* `Label.Project`
* `Message.Component_name`

![Filters](./images/icl-logs-view-filters.png)

* In the top-right corner, click on "Columns" and configure the following columns:
* `Timestamp`
* `label.Project`
* `message.component_type`
* `message.component_name`
* `message.message`
* `Text`

![Columns](./images/icl-logs-view-columns.png)

* Once applied adjust the column widths appropriately

* In the top-right corner, select `1-line` as view mode

![View](./images/icl-logs-view-mode.png)

* In the graph title it says "**Count** all grouped by **Severity**". Click on `Severity` and select `message.component_name` instead. Furthermore, select `Max` as aggregation metric and choose `message.memory.usage` as aggregation field

![Graph](./images/icl-logs-view-graph.png)

* Save the view

![Save](./images/icl-logs-view-save.png)

* Utilize the custom logs view to drill into HTTP requests

![Logs overview](./images/icl-logs-view-overview.png)


## IBM Log Analysis setup (deprecated)

### Log lines

Along with a human readable message, like `Captured metrics of app instance 'load-generator-00001-deployment-677d5b7754-ktcf6': 3m vCPU, 109 MB memory, 50 MB ephemeral storage`, each log line passes specific resource utilization details in a structured way allowing to apply advanced filters on them.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added metrics-collector/images/icl-dashboard-import.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added metrics-collector/images/icl-dashboard-new.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added metrics-collector/images/icl-logs-view-graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added metrics-collector/images/icl-logs-view-save.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 13 additions & 6 deletions metrics-collector/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,12 @@ func main() {
}

// If the 'INTERVAL' env var is set then sleep for that many seconds
sleepDuration := 10
sleepDuration := 30
if t := os.Getenv("INTERVAL"); t != "" {
sleepDuration, _ = strconv.Atoi(t)
if sleepDuration < 30 {
sleepDuration = 30
}
}

// In daemon mode, collect resource metrics in an endless loop
Expand Down Expand Up @@ -111,10 +114,10 @@ func collectInstanceMetrics() {

// fetches all pods
pods := getAllPods(coreClientset, namespace, config)

// fetch all pod metrics
podMetrics := getAllPodMetrics(namespace, config)

var wg sync.WaitGroup

for _, metric := range *podMetrics {
Expand Down Expand Up @@ -258,7 +261,7 @@ func getAllPods(coreClientset *kubernetes.Clientset, namespace string, config *r

// Helper function to retrieve all pods from the Kube API
func obtainDiskUsage(coreClientset *kubernetes.Clientset, namespace string, pod string, container string, config *rest.Config) float64 {

// per default, we do not collect disk space statistics
if os.Getenv("COLLECT_DISKUSAGE") != "true" {
return 0
Expand Down Expand Up @@ -304,12 +307,16 @@ func obtainDiskUsage(coreClientset *kubernetes.Clientset, namespace string, pod

// Render captured system error messages, in case the stdout stream did not receive any valid content
if err != nil {
fmt.Println("obtainDiskUsage of pod:" + pod + "/container:" + container + " failed with a stream err - " + err.Error() + " - stderr: '" + errBuf.String() + "'")
if err.Error() == "Internal error occurred: failed calling webhook \"validating.webhook.pod-exec-auth-check.codeengine.cloud.ibm.com\": failed to call webhook: Post \"https://validating-webhook-serving.ibm-cfn-system.svc:443/validate/pod-exec?timeout=5s\": EOF" {
// Do nothing and silently ignore this issue as it is most likely related to pod terminations
} else {
fmt.Println("obtainDiskUsage of pod:" + pod + "/container:" + container + " failed with a stream err - " + err.Error() + " - stderr: '" + errBuf.String() + "'")
}
}

return float64(0)
}

// Parse the output "4000 /" by splitting the words
diskUsageOutput := strings.Fields(strings.TrimSuffix(diskUsageOutputStr, "\n"))
if len(diskUsageOutput) > 2 {
Expand Down
Loading
Loading