Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion claude/content/wiki/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The result is collaboration that feels like working with an expert colleague who
The platform consists of two primary architectural systems that work together to transform AI interactions into persistent professional collaboration.

> [!NOTE]
> The platform was designed using **Site Reliability Engineering** methodologies and **behavioral psychology** principles, treating AI collaboration as infrastructure requiring systematic observability, monitoring, and reliability engineering for consistent professional partnership.
> The platform was [designed](/wiki/guide/components/design) using **Site Reliability Engineering** methodologies and **behavioral psychology** principles, treating AI collaboration as infrastructure requiring systematic observability, monitoring, and reliability engineering for consistent professional partnership.

### Platform Components

Expand Down
4 changes: 2 additions & 2 deletions k3s-cluster/content/tutorials/handbook/ansible/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,10 @@ ansible_password: !vault |
36613833363662323261373266333565633430643639366435303061313039643637
```

Insert the `ansible_password` encrypted output into [`all.yaml`](https://{{< param variables.repository.cluster >}}/blob/main/inventory/cluster/group_vars/all.yaml) group variables file, while respecting the *existing* indentation.
Insert the `ansible_password` encrypted output into [`all.yaml`](https://{{< param variables.repository.cluster >}}/blob/main/inventory/cluster/group_vars/all.yaml) group variables file, while respecting the _existing_ indentation.

> [!TIP]
> Once all variables have been initially encrypted with the same global password, they can be decrypted or updated with the [Vault](/k3s-cluster/wiki/guide/playbooks/vault) playbook.
> Once all variables have been initially encrypted with the same global password, they can be decrypted or updated with the [Vault](/wiki/guide/playbooks/vault) playbook.

### Playbook Usage

Expand Down
12 changes: 6 additions & 6 deletions k3s-cluster/content/tutorials/handbook/argocd/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ This repository uses [ArgoCD](https://argoproj.github.io/cd) to deploy applicati
See the related role variables, defined into [`main.yaml`](https://{{< param variables.repository.cluster >}}/blob/main/roles/argo-cd/defaults/main.yaml) defaults file. Review the [`README.md`](https://{{< param variables.repository.cluster >}}/tree/main/roles/argo-cd) file, for additional details and the advanced configuration settings, listed below.

> [!IMPORTANT]
> A [role upgrade](/k3s-cluster/wiki/guide/configuration/roles/argocd/#upgrade) is required, in order to apply any changes related to configuration.
> A [role upgrade](/wiki/guide/configuration/roles/argocd/#upgrade) is required, in order to apply any changes related to configuration.

### Credentials

Expand All @@ -23,9 +23,9 @@ While still implemented, the `admin` credentials are disabled by default and `us
argocd_resources:
server:
users:
- name: '{{ argocd_map.credentials.server.user.name }}'
password: '{{ argocd_map.credentials.server.user.password }}'
permissions: 'apiKey, login'
- name: "{{ argocd_map.credentials.server.user.name }}"
password: "{{ argocd_map.credentials.server.user.password }}"
permissions: "apiKey, login"
role: admin
enabled: true
```
Expand All @@ -40,15 +40,15 @@ The `name` and `password` keys listed above are defined into [`all.yaml`](https:
Additional configuration parameters can be defined into [`config_params.j2`](https://{{< param variables.repository.cluster >}}/blob/main/roles/argo-cd/templates/config_params.j2) template.

> [!TIP]
> Perform a [role validation](/k3s-cluster/wiki/guide/configuration/roles/argocd/#validation), to visualize all rendered templates and variables.
> Perform a [role validation](/wiki/guide/configuration/roles/argocd/#validation), to visualize all rendered templates and variables.

### RBAC

Additional RBAC policies can be defined into [`config_rbac.j2`](https://{{< param variables.repository.cluster >}}/blob/main/roles/argo-cd/templates/config_rbac.j2) template. The role automatically injects the users specified into [`facts.yaml`](https://{{< param variables.repository.cluster >}}/blob/main/roles/argo-cd/tasks/facts.yaml) tasks file, under `argocd_resources.server.users` collection.

## Repository Setup

Login into [ArgoCD UI](/k3s-cluster/tutorials/handbook/externaldns/#argocd), navigate to `ArgoCD Settings` > `Repositories` and connect to official project repository:
Login into [ArgoCD UI](/tutorials/handbook/externaldns/#argocd), navigate to `ArgoCD Settings` > `Repositories` and connect to official project repository:

| Key | Value |
| :------ | :------------------------------------------------------------ |
Expand Down
5 changes: 3 additions & 2 deletions k3s-cluster/content/tutorials/handbook/cilium/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Due to the intricate nature of its requirements, Cilium is deployed in three ste

## Dependencies

See below the required Cilium dependencies, used into chart configuration.
See below the required Cilium dependencies, used into chart configuration.

### CertManager

Expand All @@ -30,6 +30,7 @@ During chart post-install provisioning, Cilium Hubble is configured to take adva

> [!IMPORTANT]
> Cilium details the following instructions, into their `cert-manager` [installation](https://docs.cilium.io/en/latest/observability/hubble/configuration/tls) steps:
>
> > Please make sure that your issuer is able to create certificates under the `cilium.io` domain name.
>
> CertManager cannot control a domain not owned by end-user, therefore the above listed `Certificate` and `ClusterIssuer` resources are created.
Expand All @@ -42,7 +43,7 @@ Refer to Cilium Hubble [documentation](https://docs.cilium.io/en/stable/gettings

#### Usage Example

This is an example of `Gateway` and `HTTPRoute` resources usage for [Cilium Hubble UI](/k3s-cluster/tutorials/handbook/externaldns/#cilium), as replacement for `Ingress` resource:
This is an example of `Gateway` and `HTTPRoute` resources usage for [Cilium Hubble UI](/tutorials/handbook/externaldns/#cilium), as replacement for `Ingress` resource:

- `Gateway` resource template, see [`gateway.j2`](https://{{< param variables.repository.cluster >}}/blob/main/roles/cilium/templates/gateway.j2)
- `HTTPRoute` insecure resource template, see [`http_route_insecure.j2`](https://{{< param variables.repository.cluster >}}/blob/main/roles/cilium/templates/http_route_insecure.j2)
Expand Down
10 changes: 5 additions & 5 deletions k3s-cluster/content/tutorials/handbook/externaldns/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ This repository uses [ExternalDNS](https://github.com/kubernetes-sigs/external-d
Generate the Cloudflare domain [API token](https://developers.cloudflare.com/fundamentals/api/get-started/create-token/), with following permissions:

{{< filetree/container >}}
{{< filetree/folder name="ACCOUNT" >}}
{{< filetree/folder name="domain.com - Zone:Read, DNS:Edit" state="closed" >}}
{{< /filetree/folder >}}
{{< /filetree/folder >}}
{{< filetree/folder name="ACCOUNT" >}}
{{< filetree/folder name="domain.com - Zone:Read, DNS:Edit" state="closed" >}}
{{< /filetree/folder >}}
{{< /filetree/folder >}}
{{< /filetree/container >}}

Encrypt the `global_map.credentials.externaldns.cloudflare.api.token` value with [`ansible-vault`](/k3s-cluster/tutorials/handbook/ansible/#vault) and insert it into
Encrypt the `global_map.credentials.externaldns.cloudflare.api.token` value with [`ansible-vault`](/tutorials/handbook/ansible/#vault) and insert it into
[`all.yaml`](https://{{< param variables.repository.cluster >}}/blob/main/inventory/cluster/group_vars/all.yaml) group variables file.

## Front-Ends
Expand Down
126 changes: 63 additions & 63 deletions k3s-cluster/content/tutorials/handbook/k3s-monitor/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,78 +13,78 @@ The **K3s Monitor** tool is a comprehensive Python utility designed to collect,

## Tool Features

* **Cluster Resource Monitoring**: Collects various resource metrics from nodes and pods
* **Component-Specific Monitoring**: Tracks resource usage for all K3s Cluster [components](/k3s-cluster/wiki/guide/configuration/roles/#charts)
* **Log Collection**: Gathers logs from system services and Kubernetes components
* **Automated Analysis**: Identifies high resource consumption and potential issues
* **Comparative Reporting**: Compares current metrics with previous monitoring runs
* **Comprehensive Summary**: Generates detailed reports with recommendations, ready for AI-assisted analysis with tools like [Claude](https://claude.ai)
- **Cluster Resource Monitoring**: Collects various resource metrics from nodes and pods
- **Component-Specific Monitoring**: Tracks resource usage for all K3s Cluster [components](/wiki/guide/configuration/roles/#charts)
- **Log Collection**: Gathers logs from system services and Kubernetes components
- **Automated Analysis**: Identifies high resource consumption and potential issues
- **Comparative Reporting**: Compares current metrics with previous monitoring runs
- **Comprehensive Summary**: Generates detailed reports with recommendations, ready for AI-assisted analysis with tools like [Claude](https://claude.ai)

## Prerequisites

The following dependencies are required to run the **K3s Monitor** tool, automatically deployed with [Provisioning](/k3s-cluster/wiki/guide/playbooks/provisioning) playbook:
The following dependencies are required to run the **K3s Monitor** tool, automatically deployed with [Provisioning](/wiki/guide/playbooks/provisioning) playbook:

* Python 3.8+
* `python3-kubernetes` library
* `python3-yaml` library
* `kubectl` configured to access the K3s cluster
* `journalctl` for log collection
* `jq` for JSON processing
- Python 3.8+
- `python3-kubernetes` library
- `python3-yaml` library
- `kubectl` configured to access the K3s cluster
- `journalctl` for log collection
- `jq` for JSON processing

## Generated Reports

The following reports are generated:

* **cilium-metrics.log**: Detailed Cilium networking status, endpoints and services information
* **cluster-info.log**: Basic information about the cluster
* **comparison.log**: Comparison with previous monitoring runs
* **component-metrics.csv**: Time-series data for component resource usage
* **summary.log**: Overall resource usage summary and recommendations
* **etcd-metrics.log**: Status of HA clusters, `etcd` cluster health and metrics
* **k3s-monitor.log**: Operational log of the monitoring tool itself, including all actions taken during execution
* **log-summary.txt**: Summary of important log events (errors, warnings)
* **pod-metrics.csv**: Detailed pod-level resource metrics
* **sysctl.txt**: System kernel parameter settings
* **summary.log**: Overall resource usage summary and recommendations
- **cilium-metrics.log**: Detailed Cilium networking status, endpoints and services information
- **cluster-info.log**: Basic information about the cluster
- **comparison.log**: Comparison with previous monitoring runs
- **component-metrics.csv**: Time-series data for component resource usage
- **summary.log**: Overall resource usage summary and recommendations
- **etcd-metrics.log**: Status of HA clusters, `etcd` cluster health and metrics
- **k3s-monitor.log**: Operational log of the monitoring tool itself, including all actions taken during execution
- **log-summary.txt**: Summary of important log events (errors, warnings)
- **pod-metrics.csv**: Detailed pod-level resource metrics
- **sysctl.txt**: System kernel parameter settings
- **summary.log**: Overall resource usage summary and recommendations

See below the directories and files structure, containing the generated reports.

> [!NOTE]
> Submit the generated tarball to [Claude](https://claude.ai), for AI-assisted analysis. Upload the tarball to a chat with Claude and ask for an analysis of your K3s cluster metrics and performance.

{{< filetree/container >}}
{{< filetree/folder name="/var/log/k3s" >}}
{{< filetree/folder name="YYYYMMDD-HHMMSS (click to expand)" state="closed" >}}
{{< filetree/file name="cilium-metrics.log" >}}
{{< filetree/file name="cluster-info.log" >}}
{{< filetree/file name="comparison.log" >}}
{{< filetree/file name="component-metrics.csv" >}}
{{< filetree/file name="etcd-metrics.log" >}}
{{< filetree/file name="k3s-monitor.log" >}}
{{< filetree/file name="log-summary.txt" >}}
{{< filetree/file name="pod-metrics.csv" >}}
{{< filetree/folder name="service" >}}
{{< filetree/folder name="components" >}}
{{< filetree/file name="argo-cd_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="cert-manager_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="cilium_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="coredns_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="external-dns_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="kured_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="longhorn_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="metrics-server_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="victorialogs_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="victoriametrics_YYYYMMDD-HHMMSS.log" >}}
{{< /filetree/folder >}}
{{< filetree/file name="containerd.log" >}}
{{< filetree/file name="k3s.log" >}}
{{< filetree/file name="kubelet.log" >}}
{{< /filetree/folder >}}
{{< filetree/file name="summary.log" >}}
{{< filetree/file name="sysctl.txt" >}}
{{< /filetree/folder >}}
{{< filetree/file name="k3s-monitor-YYYYMMDD-HHMMSS.tar.gz" >}}
{{< /filetree/folder >}}
{{< filetree/folder name="/var/log/k3s" >}}
{{< filetree/folder name="YYYYMMDD-HHMMSS (click to expand)" state="closed" >}}
{{< filetree/file name="cilium-metrics.log" >}}
{{< filetree/file name="cluster-info.log" >}}
{{< filetree/file name="comparison.log" >}}
{{< filetree/file name="component-metrics.csv" >}}
{{< filetree/file name="etcd-metrics.log" >}}
{{< filetree/file name="k3s-monitor.log" >}}
{{< filetree/file name="log-summary.txt" >}}
{{< filetree/file name="pod-metrics.csv" >}}
{{< filetree/folder name="service" >}}
{{< filetree/folder name="components" >}}
{{< filetree/file name="argo-cd_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="cert-manager_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="cilium_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="coredns_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="external-dns_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="kured_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="longhorn_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="metrics-server_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="victorialogs_YYYYMMDD-HHMMSS.log" >}}
{{< filetree/file name="victoriametrics_YYYYMMDD-HHMMSS.log" >}}
{{< /filetree/folder >}}
{{< filetree/file name="containerd.log" >}}
{{< filetree/file name="k3s.log" >}}
{{< filetree/file name="kubelet.log" >}}
{{< /filetree/folder >}}
{{< filetree/file name="summary.log" >}}
{{< filetree/file name="sysctl.txt" >}}
{{< /filetree/folder >}}
{{< filetree/file name="k3s-monitor-YYYYMMDD-HHMMSS.tar.gz" >}}
{{< /filetree/folder >}}
{{< /filetree/container >}}

## Tool Usage
Expand All @@ -111,7 +111,7 @@ options:
-n NAMESPACE, --namespace NAMESPACE
Default namespace (default: kube-system)
-v, --verbose Enable verbose logging (default: False)
```
```

See below various **K3s Monitor** tool usage examples.

Expand Down Expand Up @@ -145,9 +145,9 @@ sudo k3s-monitor --duration 600 --interval 60

## Best Practices

* **Regular Monitoring**: Run the tool periodically (e.g., weekly) to establish baseline metrics
* **After Changes**: Run after cluster upgrades or significant workload changes
* **Retention**: Keep monitoring results for trend analysis
* **Size Appropriately**: Adjust duration and interval based on cluster size:
* Small clusters: 1-hour duration, 5-minute intervals
* Large clusters: 6-hour duration, 15-minute intervals
- **Regular Monitoring**: Run the tool periodically (e.g., weekly) to establish baseline metrics
- **After Changes**: Run after cluster upgrades or significant workload changes
- **Retention**: Keep monitoring results for trend analysis
- **Size Appropriately**: Adjust duration and interval based on cluster size:
- Small clusters: 1-hour duration, 5-minute intervals
- Large clusters: 6-hour duration, 15-minute intervals
6 changes: 3 additions & 3 deletions k3s-cluster/content/tutorials/handbook/k3s/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,15 @@ The end-user can use `kubectl` to operate the cluster via local terminal, or [Le
[![K3s: Lens](k3s-lens.webp)](k3s-lens.webp)

> [!TIP]
> Lens automatically detects and exposes the metrics produced by [VictoriaMetrics](/k3s-cluster/wiki/guide/configuration/roles/victoriametrics) role.
> Lens automatically detects and exposes the metrics produced by [VictoriaMetrics](/wiki/guide/configuration/roles/victoriametrics) role.

## Upgrade

Upon a new K3s version release, end-user can perform a [role upgrade](/k3s-cluster/wiki/guide/configuration/roles/k3s/#upgrade), which will schedule a [Kured](/k3s-cluster/wiki/guide/configuration/roles/kured) reboot.
Upon a new K3s version release, end-user can perform a [role upgrade](/wiki/guide/configuration/roles/k3s/#upgrade), which will schedule a [Kured](/wiki/guide/configuration/roles/kured) reboot.

### Manual Upgrade

Once the [role upgrade](/k3s-cluster/wiki/guide/configuration/roles/k3s/#upgrade) performed, end-user can choose to manually upgrade each cluster node. A [node drain](/k3s-cluster/tutorials/handbook/longhorn/#node-drain) must be executed one node at the time, followed by a node reboot. Once the node is up and running, it can be uncordoned with Lens or `kubectl`, via local terminal:
Once the [role upgrade](/wiki/guide/configuration/roles/k3s/#upgrade) performed, end-user can choose to manually upgrade each cluster node. A [node drain](/tutorials/handbook/longhorn/#node-drain) must be executed one node at the time, followed by a node reboot. Once the node is up and running, it can be uncordoned with Lens or `kubectl`, via local terminal:

```shell
kubectl uncordon <node>
Expand Down
2 changes: 1 addition & 1 deletion k3s-cluster/content/tutorials/handbook/kured/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ https://hooks.slack.com/services/<token>/<token>/<token>

### Notify URL

Encrypt the `global_map.credentials.kured.slack.notify.url` value with [`ansible-vault`](/k3s-cluster/tutorials/handbook/ansible/#vault) and insert it into
Encrypt the `global_map.credentials.kured.slack.notify.url` value with [`ansible-vault`](/tutorials/handbook/ansible/#vault) and insert it into
[`all.yaml`](https://{{< param variables.repository.cluster >}}/blob/main/inventory/cluster/group_vars/all.yaml) group variables file. Notify URL decrypted format:

```yaml
Expand Down
Loading