Skip to content

Update Console/Cortex requirements #602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
26 changes: 21 additions & 5 deletions docs/platform/get-started/installation/hardware.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
---
sidebar_position: 3
title: System Requirements
description: Conduktor Console is provided as a single Docker container.
description: Conduktor Console consists of two containers, Console and Cortex.
---

# System Requirements

Conduktor Console is provided as a single Docker container.
Conduktor Console consists of two containers, the Console container and the Cortex container.

The Console container provides the web interface while the Cortex container provides the metrics.

**Note:** It is not supported to run Console without the Cortex container.

Jump to:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this mini-doc entirely? The right-hand navigation is there to help you navigate what's on the page; this seems unnecessary. We've also been removing these from other pages (like SNI routing), so would be good to keep consistency.

- [Production Requirements](#production-requirements)
Expand All @@ -22,14 +26,14 @@ To ensure you meet these requirements, you must:
- Set up an [external PostgreSQL (13+) database](/platform/get-started/configuration/database/) with appropriate backup policy
- This is used to store data relating to your Conduktor deployment; such as your users, permissions, tags and configurations
- Note we recommend configuring your PostgreSQL database for [high-availability](#database-connection-fail-over)
- Setup [block storage](/platform/get-started/configuration/env-variables#monitoring-properties) (S3, GCS, Azure, Swift) to store metrics data required for Monitoring
- Setup [block storage](/platform/get-started/configuration/env-variables#monitoring-properties) (S3, GCS, Azure, Swift) to store metrics data required for Cortex
- Meet the [hardware requirements](#hardware-requirements) so that Conduktor has sufficient resources to run without issue

Note that if you are deploying the [Helm chart](/platform/get-started/installation/get-started/kubernetes/), the [production requirements](/platform/get-started/installation/get-started/kubernetes#production-requirements) are clearly outlined in the installation guide.

## Hardware Requirements

To configure Conduktor Console for particular hardware, you can use container CGroups limits. More details [here](/platform/get-started/configuration/memory-configuration)
### Console

**Minimum**

Expand All @@ -43,7 +47,19 @@ To configure Conduktor Console for particular hardware, you can use container CG
- 4+ GB of RAM
- 10+ GB of disk space

See more about [environment variables](/platform/get-started/configuration/env-variables/), or starting the Platform in [Docker Quick Start](/platform/get-started/installation/get-started/docker/).
### Cortex

**Minimum**

- 4 CPU Cores
- 6 GB of RAM
- 10 GB of disk space

**Recommended**

- 8 CPU Cores
- 8+ GB of RAM
- [block storage](/platform/get-started/configuration/env-variables#monitoring-properties)
Comment on lines +54 to +64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird to me. First, it doesn't match our defaults in the helm chart, so we're not consistent. Second, for the "recommended", I'd mention it should depend on the amount of metrics we're getting. We wrote a KB article to explain how to properly size it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I believe the disk space should depend on if we externalize the storage of the metrics or not. This leads to mentioning how long the metrics are persisted locally, how often they're pushed to the storage, ...

Copy link
Contributor Author

@juma-conduktor juma-conduktor Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a platform consistency problem which we are not trying to solve in this PR. Our Helm Charts are also not our official documentation, and while I agree they should be consistent, I can only fix one issue at a time.

In addition, every customer I encounter has different settings for Cortex, which causes a lot of problems/support cases. Based on the data I have gathered, I have made the above recommendations. Will these fit every use case, certainly not, however they will greatly reduce the number of issues our customers have hit over the past several months.

The KB article you mentioned, unfortunately, is too limited in scope when it comes to sizing. Simply knowing how many metrics you have and calculating the amount of RAM required, does not solve the majority of metrics issues I've encountered. It's certainly a good starting point when troubleshooting, but it's not a solution in and of itself in most cases.

The ultimate solution is for engineering to perform benchmarking and provide a sizing guide either in chart form or using a formula, along with the settings to adjust. I don't believe that will happen in the short or even medium term, so the goal here is to stop the bleeding as much as possible today.

I also think we need to completely re-evaluate how we think of and provide metrics in general, but again that is much bigger discussion, that has a lot more stake holders.

TL:DR - I hear you, however this PR is meant as a bandaid, as the other solutions will take more more time and energy with buy in from other stake holders.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ultimate solution is for engineering to perform benchmarking and provide a sizing guide either in chart form or using a formula > YES

Completely agree with you, my point is just not to say "with this, it will work", but be a bit more nuanced and say "this is the recommended resources, but based on your infra you might need more or less. If this is not enough, because you have many topics, partitions etc, please consider adding more resources", or something like that, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you think we should reduce the minimum here and leave the recommended as is, or should we adjust both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's reduce the minimum and leave the recommended, but add a sentence to explain this might depend on their infra, and that the bigger the cluster, the more resources they should give

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to reduce the minimum and leave the recommended.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AurelieMarcuzzo - good with the change?


## Deployment Architecture

Expand Down