Skip to content

Commit 4bcc943

Browse files
committed
PR comments and rename.
1 parent ae80642 commit 4bcc943

File tree

2 files changed

+105
-81
lines changed

2 files changed

+105
-81
lines changed
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# `RunConfig` as Architectural Boundary
2+
3+
## Problem Statement
4+
5+
As of the time of this writing, ToolHive has a few interfaces and as released as part of three user facing
6+
systems, namely
7+
8+
* **ToolHive CLI** `thv`
9+
* **ToolHive Operator**, which is used in both Kubernetes and OpenShift
10+
* **ToolHive UI**, which internally uses APIs exposed by `thv serve`
11+
12+
These interfaces differ in their execution environment and "quality of life" features that one might implement. For example, one obvious difference is the way configuration is accessed: CLI must both accept CLI options and read files on file system, the Kubernetes Operator requires all parameters to be fully specified in the main CRD or into "linked" ones (e.g. `MCPToolConfig`), and finally HTTP API that is currently used to back ToolHive Studio also requires all parameters to be fully specified in the body of the request.</b>
13+
Another one is in config reload semantics, which must implemented in an ad-hoc fashion for CLI, while Kubernetes handles it as part of the life cycle of resources. Yet another example is the location of runtime configuration and state. The Kubernetes Operator relies on both being stored "inside the cluster", while the CLI and UI must both rely on file system, yet the configuration might be semantically equivalent.</b>
14+
A final useful use case is exporting `RunConfig` so that the same workload can be "moved" to a different place or shipped as configuration. A similar use case is that implemented by the `thv restart` command, which fetches the serialized version of a run config to restart the workload when necessary.
15+
16+
These differences warrant the introduction of an architectural boundary where workloads are executed, which is currently specified via what we call a RunConfig.
17+
18+
## Goals
19+
20+
* make RunConfig the entrypoint for workloads execution
21+
* stabilize RunConfig so that new user or application interfaces can be built upon it
22+
* clearly specify responsibilities of code above and below this new boundary
23+
24+
## Responsibilities of a RunConfig
25+
26+
A `RunConfig` struct contains either information necessary for a OCI-compatible runner to run a workload, or alternatively the remote URL at which an already running MCP is reachable. In both scenarios, ToolHive aims to "wrap" the workload in a proxy to handle auth and gather telemetry.
27+
28+
Finally, `RunConfig` are currently serialized as JSON and used by `thv restart` command.
29+
30+
## Current Interface
31+
32+
Conceptually, a `RunConfig` contains three things
33+
* details on how to run or reach the Workload
34+
* configuration for the Proxy itself
35+
* Metadata like name pertaining both components that allow ToolHive to refer to them as one
36+
37+
**Metadata details** amount to
38+
* name
39+
* group
40+
* schema version
41+
* debug settings, common to both proxy and workload
42+
43+
**Workload details** for local Workloads amount to
44+
* OCI-compatible container config (image, its command arguments, container name, etc...)
45+
* desired workload name
46+
* host and port to expose
47+
* environment variables to set (literal or file-based)
48+
* secrets to set
49+
* volumes to mount
50+
* container labels
51+
* Kubernetes pod template patch (Kubernetes specific)
52+
* network isolation flag
53+
54+
While when the Workload is remote, details are
55+
* remote URL
56+
* auth configuration
57+
58+
**Proxy details** amount to
59+
* host and port to expose
60+
* workload transport type
61+
* permission profile (literal or file-based)
62+
* OIDC configuration parameters
63+
* authorization config (literal or file-based)
64+
* audit config (literal or file-based)
65+
* proxy headers trust flag
66+
* proxy mode (i.e. transport to expose)
67+
* CA bundle
68+
* JWKS token file
69+
* tools config
70+
* IgnoreConfig (?)
71+
* middleware configuration settings
72+
73+
## Current `RunConfig` users
74+
75+
This is the list of users of `RunConfig` users outside of `pkg/runner` package as of commit [6e18a3c](https://github.com/stacklok/toolhive/tree/6e18a3c967d257e8ff715775beea948fe18230f2)
76+
77+
* [cmd/thv-operator/controllers/mcpserver_runconfig.go](https://github.com/stacklok/toolhive/blob/6e18a3c967d257e8ff715775beea948fe18230f2/cmd/thv-operator/controllers/mcpserver_runconfig.go) which translates the CRD into the equialent run config and accesses several fields for validation
78+
* [cmd/thv-proxyrunner/app/execution.go](https://github.com/stacklok/toolhive/blob/6e18a3c967d257e8ff715775beea948fe18230f2/cmd/thv-proxyrunner/app/execution.go) which is the binary used to run proxy Pods on Kubernetes
79+
* [cmd/thv-proxyrunner/app/run.go](https://github.com/stacklok/toolhive/blob/6e18a3c967d257e8ff715775beea948fe18230f2/cmd/thv-proxyrunner/app/run.go) which loads the JSON representation of a RunConfig from one of three possible standard paths of a Kubernetes Pod
80+
* [cmd/thv/app/run.go](https://github.com/stacklok/toolhive/blob/6e18a3c967d257e8ff715775beea948fe18230f2/cmd/thv/app/run.go) which implements the `thv run --foreground` and uses [cmd/thv/app/run_flags.go](https://github.com/stacklok/toolhive/blob/6e18a3c967d257e8ff715775beea948fe18230f2/cmd/thv/app/run_flags.go) to map CLI flags to run config options
81+
* [pkg/api/v1/workload_service.go](https://github.com/stacklok/toolhive/blob/6e18a3c967d257e8ff715775beea948fe18230f2/pkg/api/v1/workload_service.go) that is used by the HTTP API layer to map a workload creation request into the respective run config
82+
* [pkg/api/v1/workload_types.go](https://github.com/stacklok/toolhive/blob/6e18a3c967d257e8ff715775beea948fe18230f2/pkg/api/v1/workload_types.go) which does the opposite, mapping a run config to a workload creation request returned by the `GET /api/v1beta/workloads/{name}` endpoint
83+
* [pkg/mcp/server/run_server.go](https://github.com/stacklok/toolhive/blob/6e18a3c967d257e8ff715775beea948fe18230f2/pkg/mcp/server/run_server.go) that implements the `thv mcp` command
84+
85+
It's worth mentioning that the RunConfig struct is referenced in [pkg/workloads/manager.go](https://github.com/stacklok/toolhive/blob/6e18a3c967d257e8ff715775beea948fe18230f2/pkg/workloads/manager.go) as well, but that package implements the default workload manager and might be considered part of the runner.
86+
87+
## Responsibility Split
88+
89+
We propose the following split in responsibilities
90+
91+
**RunConfig** data structure and routines will be responsible for holding configuration parameters, basic validation, and serialization, but not storage. Simply put, the package should accept bytes and readers, and return bytes, similarly to how `encoding/json` works.
92+
93+
**CLI** and **Operator** will be responsible for mapping their respective representation of configuration parameters to the representation allowed by `RunConfig`. Specifically, no file-based representation of configuration parameters is allowed in the `RunConfig` struct.
94+
95+
Consequences of changes to configuration parameters must be managed outside the `RunConfig` code. For example, configuration reload for the CLI must be managed within CLI commands, and not within the `Runner` or `RunConfig`. That said, `Runner`s can implement behaviors specific to their execution environment, but they must not rely on references to the "outside world" being stored in the `RunConfig`.
96+
97+
Users of the `RunConfig` package should not build their interfaces using types exposed by `RunConfig` package and should not rely on their stability to guarantee their API contracts. For example, said types should not be used for externally facing interfaces like HTTP API if not for trivial cases.
98+
99+
### Excursus on Middleware Configs and Validation
100+
101+
Middleware configuration is a very important piece of ToolHive since most features are implemented as middleware functions and RunConfig must contain all parameters used to configure middleware in order be able to run and restart an MCP server consistently, but the middleware functions must be composed in the right order, which is logic that hardly belongs to `pkg/runconfig`. In fact, middleware configuration is represented in an opaque way via the [`NewMiddlewareConfig`](https://github.com/stacklok/toolhive/blob/6e18a3c967d257e8ff715775beea948fe18230f2/pkg/transport/types/transport.go#L41-L54) function that accepts a middleware type (string) and an opaque payload representing its configuration.</b>
102+
This function is used by [`PopulateMiddlewareConfigs`](https://github.com/stacklok/toolhive/blob/6e18a3c967d257e8ff715775beea948fe18230f2/pkg/runner/middleware.go#L29-L31) that builds and collects middleware configurations using fields from `RunConfig`.</b>
103+
The function `PopulateMiddlewareConfigs` is effectively responsible for collecting middleware functions in the correct order, which in the implementation is from the outermost to the innermost, but it does not perform any validation on the received parameters outside of null checks. Validation is instead delegated to `MiddlewareFactory` functions implemented for each middleware. All code mentioned is then glued to gether in the implementation of the [Run](https://github.com/stacklok/toolhive/blob/6e18a3c967d257e8ff715775beea948fe18230f2/pkg/runner/runner.go#L92-L127) function of the runner.
104+
105+
This design runs validation of middleware function parameters when the proxy is about to start, which is fine for CLI and UI, but in a Kubernetes environment this happens long after the CRD is created, delaying feedback to the user.

docs/proposals/runconfig-as-architectural-boundary.md

Lines changed: 0 additions & 81 deletions
This file was deleted.

0 commit comments

Comments
 (0)