Skip to content

Conversation

@blkt
Copy link
Contributor

@blkt blkt commented Oct 6, 2025

This change contains a design doc describing an architectural boundary at RunConfig level. Aim of this design doc is

  • listing the current responsibilities of RunConfig and related code (e.g. Runner)
  • propose a new split of responsibilities and a clear golang API for it
  • clarify what responsibilities should be moved in the respective user interfaces, e.g. CLI or Operator

@blkt blkt self-assigned this Oct 6, 2025
@codecov
Copy link

codecov bot commented Oct 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 53.45%. Comparing base (3453db0) to head (6beef0c).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2096      +/-   ##
==========================================
+ Coverage   53.43%   53.45%   +0.01%     
==========================================
  Files         230      230              
  Lines       29385    29385              
==========================================
+ Hits        15703    15707       +4     
+ Misses      12552    12549       -3     
+ Partials     1130     1129       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@blkt blkt force-pushed the docs/runconfig-proposal branch from b7dca4b to fb9c334 Compare October 7, 2025 07:34
@JAORMX
Copy link
Collaborator

JAORMX commented Oct 7, 2025

@claude please review this architectural proposal

@claude
Copy link
Contributor

claude bot commented Oct 7, 2025

Claude finished @JAORMX's task —— View job


Architectural Proposal Review

I've reviewed the RunConfig as architectural boundary proposal. Here's my comprehensive feedback:


Overall Assessment

This is a well-structured and necessary proposal that addresses real architectural concerns in ToolHive. The separation of concerns between RunConfig and the consumer interfaces (CLI, Operator, UI) is architecturally sound and will improve maintainability and extensibility.


Strengths

  1. Clear Problem Statement: The proposal effectively articulates the current challenges with configuration handling across different interfaces (CLI, Operator, UI).

  2. Practical Examples: The use cases (config reload semantics, storage location differences, export/import) clearly demonstrate why this boundary is needed.

  3. Appropriate Scope: Making RunConfig analogous to encoding/json (accepting bytes/readers, returning bytes) is the right level of abstraction.

  4. Current State Documentation: Lines 22-70 provide an excellent inventory of what RunConfig currently contains, split into metadata, workload details, and proxy details.


Areas for Improvement

1. Implementation Guidance Missing

The proposal describes what should change but lacks how to achieve it. Consider adding:

  • Migration Strategy: How do we move from the current state to the proposed state?

    • Which fields need to be removed from RunConfig?
    • What's the deprecation timeline for file-based parameters?
    • How do we handle backward compatibility?
  • API Surface Definition: What should the new public API look like?

    // Example:
    // Instead of exposing RunConfig directly, consider:
    type RunConfig struct { /* ... */ }
    func NewRunConfig(opts ...Option) (*RunConfig, error)
    func (rc *RunConfig) Validate() error
    func (rc *RunConfig) MarshalJSON() ([]byte, error)
    func UnmarshalRunConfig(data []byte) (*RunConfig, error)

2. Concrete Examples of Violations

The proposal would benefit from specific examples of current violations:

Current problematic fields in RunConfig (from pkg/runner/config.go:34-162):

  • EnvFileDir (line 88) - File system reference
  • PermissionProfileNameOrPath (line 79) - File path reference
  • AuthzConfigPath (line 107) - File path reference
  • AuditConfigPath (line 113) - File path reference
  • JWKSAuthTokenFile (line 145) - File path reference
  • Deployer (line 127) - Runtime-specific dependency

These should be called out explicitly as fields that violate the proposed boundary.

3. Validation Responsibility Unclear

Line 75 mentions "basic validation" as a RunConfig responsibility, but doesn't define what qualifies as "basic":

Suggested clarification:

  • Basic validation (RunConfig's responsibility): Structural validity, required fields present, type constraints
  • Contextual validation (CLI/Operator responsibility): File existence, network availability, secret references, registry lookups

4. Serialization Format Not Specified

Line 75 mentions serialization but doesn't specify:

  • What formats should be supported? (JSON only? YAML? Both?)
  • Should the format be versioned? (The struct has SchemaVersion but no migration strategy)
  • What happens when SchemaVersion changes?

5. Missing Discussion of Runner Responsibilities

The proposal mentions Runner in the title and problem statement but doesn't clearly define its responsibilities post-refactor. Consider adding:

Proposed Runner Responsibilities:

  • Accept a validated RunConfig
  • Execute workloads using the provided deployer
  • Manage workload lifecycle (start, stop, monitor)
  • Handle middleware chain construction
  • NOT responsible for: config file reading, secret fetching, registry lookups

6. Impact on Existing Code Not Assessed

The proposal doesn't mention:

  • How many call sites need updates?
  • What's the blast radius of this change?
  • Which packages are affected?

Affected areas (from my analysis):

  • cmd/thv/app/run.go - CLI configuration building
  • cmd/thv-operator/controllers/mcpserver_runconfig.go - Operator CRD mapping
  • pkg/runner/config_builder.go - Builder pattern needs updating
  • pkg/state/runconfig.go - Storage abstraction

7. "No File-Based Representation" Needs Clarification

Line 77 states: "no file-based representation of configuration parameters is allowed in the RunConfig struct"

Question: Does this mean:

  • ❌ No file paths (like EnvFileDir, AuthzConfigPath)?
  • ❌ No references to external files at all?
  • ✅ File contents should be resolved by CLI/Operator before creating RunConfig?

This should be explicit with before/after examples:

// ❌ Current (bad)
type RunConfig struct {
    AuthzConfigPath string
}

// ✅ Proposed (good)  
type RunConfig struct {
    AuthzConfig *authz.Config // already resolved
}

8. External API Stability Concern

Line 81 mentions: "Types exposed by RunConfig package should not be used for externally facing formats like HTTP API"

Concern: The current RunConfig struct (pkg/runner/config.go:34-162) has a comment stating it's "part of ToolHive's API contract" (line 36-37). This proposal doesn't address:

  • Is RunConfig JSON format considered a public API?
  • How do we version it if we need breaking changes?
  • What's the compatibility guarantee?

Specific Technical Concerns

1. Storage Abstraction Already Exists

The proposal treats storage as CLI/Operator responsibility, but pkg/state/runconfig.go already provides a storage abstraction layer. How does this fit into the new design?

Recommendation: Clarify whether pkg/state is above or below the boundary.

2. Current Builder Pattern May Need Redesign

pkg/runner/config_builder.go currently handles file resolution during building. With the proposed changes, this logic moves to CLI/Operator, but the builder pattern might still be useful for constructing valid RunConfig instances.

Recommendation: Provide guidance on whether builders should exist and where.

3. Middleware Configuration Complexity

The MiddlewareConfigs field (pkg/runner/config.go:159-161) is complex and touches both proxy configuration and execution environment. The proposal doesn't clarify if/how middleware configuration should be handled.


Missing Sections

Consider adding:

  1. Non-Goals: What are we explicitly NOT trying to solve?

    • Are we keeping backward compatibility for existing serialized configs?
    • Are we refactoring the Operator CRD structure?
  2. Success Criteria: How do we know when this is complete?

    • All file path references removed from RunConfig?
    • CLI and Operator can construct RunConfig without accessing filesystem during Runner.Run()?
    • Configuration reload is handled at CLI/Operator level?
  3. Testing Strategy: How do we validate the boundary is maintained?

    • Linter rules to prevent file paths in RunConfig?
    • Interface contracts to enforce separation?
  4. Timeline: When do we want this implemented?


Recommendations

Priority 1 (Critical for proposal acceptance):

  1. Add concrete examples of fields that violate the current boundary
  2. Define what "basic validation" means
  3. Clarify the "no file-based representation" rule with examples
  4. Add migration strategy and backward compatibility approach

Priority 2 (Important for implementation):

  1. Define the new public API surface
  2. Clarify Runner responsibilities post-refactor
  3. Specify serialization format and versioning strategy
  4. Assess impact on existing code

Priority 3 (Nice to have):

  1. Add non-goals and success criteria
  2. Include testing strategy
  3. Define timeline

Conclusion

This proposal identifies a legitimate architectural issue and proposes a sensible solution. However, it needs more implementation detail to be actionable. The core idea of treating RunConfig as a pure data structure (like encoding/json) is sound, but the transition path from current state to desired state needs to be mapped out.

Recommendation: Expand the "Responsibility Split" section significantly before implementation begins, or create a follow-up implementation plan document that addresses the gaps identified above.


References

  • Current RunConfig implementation: pkg/runner/config.go:34-162
  • Current storage abstraction: pkg/state/runconfig.go
  • Operator usage: cmd/thv-operator/controllers/mcpserver_runconfig.go
  • CLI usage: cmd/thv/app/run.go

Job Run | Branch: docs/runconfig-proposal

Copy link
Collaborator

@JAORMX JAORMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An open question I'd like to discuss tomorrow is: Should we have information about the secret provider that's used and a key to retrieve the secret at runtime? This would make it easier to share configurations.

@@ -0,0 +1,81 @@
# `RunConfig` as Architectural Boundary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment worth noting: There are a bunch of fields in the run config right now which are only needed for backwards compatibility. We should clean those up as part of stabilization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we have a VersionSchema field in the RunConfig struct, we should leverage that.
Action Item: list fields that are either no longer used or not desired anymore.

@blkt
Copy link
Contributor Author

blkt commented Oct 8, 2025

An open question I'd like to discuss tomorrow is: Should we have information about the secret provider that's used and a key to retrieve the secret at runtime? This would make it easier to share configurations.

@JAORMX currently, RunConfig contains "how to identify the secret", but not "where to get it from". Additionally, it is only possible to configure a single secrets provider at a time. We're fine keeping this as is for now.

* permission profile (literal or file-based)
* OIDC configuration parameters
* authorization config (literal or file-based)
* audit config (literal or file-based)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these are deprecated.


We propose the following split in responsibilities

**RunConfig** data structure and routines will be responsible for holding configuration parameters, basic validation, and serialization, but not storage. Simply put, the package should accept bytes and readers, and return bytes, similarly to how `encoding/json` works.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: currently, validation for middleware configs is done late in the process and should be done at least where it's currently done as well as when instantiating a RunConfig.
Action Item: document here where validation is implemented and where validation routines should be invoked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a section at the bottom of the file.

@blkt blkt force-pushed the docs/runconfig-proposal branch 2 times, most recently from 4bcc943 to bcd010d Compare October 14, 2025 16:23
blkt added 2 commits October 17, 2025 14:44
This change contains a design doc describing an architectural boundary
at `RunConfig` level. Aim of this design doc is
* listing the current responsibilities of `RunConfig` and related code
  (e.g. `Runner`)
* propose a new split of responsibilities and a clear golang API for
  it
* clarify what responsibilities should be moved in the respective user
  interfaces, e.g. CLI or Operator
@blkt blkt force-pushed the docs/runconfig-proposal branch from bcd010d to 6beef0c Compare October 17, 2025 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants