Skip to content

feat: add request logging and admin audit foundations#48

Merged
wauputr4 merged 16 commits into
mainfrom
feature/issue-7-observability
May 30, 2026
Merged

feat: add request logging and admin audit foundations#48
wauputr4 merged 16 commits into
mainfrom
feature/issue-7-observability

Conversation

@wauputr4
Copy link
Copy Markdown
Contributor

@wauputr4 wauputr4 commented May 19, 2026

Summary

  • Add request tracing and admin audit primitives in the API gateway flow
  • Persist request lifecycle events in the database via migration 0002_request_log_foundations.sql
  • Add shared request logging configuration in mizan-core
  • Extend billing/proxy/provider paths with correlation and trace metadata
  • Expose foundation pieces needed for admin audit workflows
  • Add an OpenAI-compatible model sync helper for upstream /v1/models
  • Harden alpha smoke timing so the first cargo run -p mizan-api build is less likely to false-fail
  • Make gateway request-log persistence asynchronous so response latency is not blocked on the database insert
  • Parameterize request log method/path through request context instead of hardcoding them in the persistence helper
  • Remove the jq dependency from scripts/model-sync.sh by parsing model sync JSON with python3
  • Reduce UUID allocation overhead in logging persistence with stack-backed UUID text helpers

Testing

  • cargo fmt --all -- --check
  • cargo test --workspace
  • zsh -n scripts/model-sync.sh
  • zsh -n scripts/alpha-smoke.sh
  • Manual local smoke against http://127.0.0.1:18180 confirmed:
    • admin login
    • API key creation
    • provider connection creation
    • model route creation
    • credit grant
    • GET /v1/models
    • non-stream POST /v1/chat/completions
    • stream POST /v1/chat/completions
    • GET /v1/usage
    • GET /v1/credits
    • GET /metrics
    • scripts/model-sync.sh against a mock OpenAI-compatible upstream
    • scripts/model-sync.sh now works without jq

Notes

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 045e04d372

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/mizan-api/src/gateway.rs Outdated
Comment on lines +1046 to +1049
app_error_status_code(&error),
Some(&state.route_alias),
state.context.provider.as_deref(),
Some(error_code.as_str()),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep request_log error_code aligned with logged status

When a streamed upstream chunk fails and then billing::record_usage also fails, this call logs status from the billing error (app_error_status_code(&error)) but keeps error_code from the earlier provider error (error_code). That produces inconsistent request log rows (e.g., 500 with provider_error), which can corrupt failure-rate analytics and incident triage for this path. Use an error code derived from the same error used for status, or log the original provider status/code pair together.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements observability foundations by introducing a new logging module for request and admin audit logs, updating the database schema, and integrating these logs into the gateway and provider management flows. The implementation includes new structs for logging inputs and helper functions for database persistence. Feedback suggests optimizing the implementation by offloading synchronous database writes to background tasks to reduce API latency, improving error handling types, optimizing database bindings for Uuid to reduce allocations, increasing the reusability of the logging helper by parameterizing hardcoded values, and adopting Tower middleware to avoid repetitive manual logging calls.

Comment thread crates/mizan-api/src/gateway.rs Outdated
Comment on lines +644 to +686
#[allow(clippy::too_many_arguments)]
async fn record_gateway_request_completion(
database: &AnyPool,
database_backend: DatabaseBackend,
context: &RequestContext,
request_started_at: &Instant,
status: StatusCode,
route_alias: Option<&str>,
provider_alias: Option<&str>,
error_code: Option<&str>,
) {
let route = context
.route
.clone()
.or_else(|| route_alias.map(|value| value.to_string()));
let provider = context
.provider
.clone()
.or_else(|| provider_alias.map(|value| value.to_string()));
let latency_ms = request_started_at.elapsed().as_millis() as u64;

if let Err(error) = record_request_log(
database,
database_backend,
&RequestLogInput {
request_id: context.request_id,
user_id: context.user_id,
api_key_id: context.api_key_id,
provider_id: context.provider_id,
route_id: context.route_id,
method: "POST".to_string(),
path: "/v1/chat/completions".to_string(),
route,
provider,
status_code: status,
latency_ms,
error_code: error_code.map(|value| value.to_string()),
},
)
.await
{
warn!(error = %error, "failed to persist gateway request log");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Recording request logs synchronously in the gateway handler adds significant latency to every API response, as it blocks on a database INSERT. For a high-performance gateway, consider offloading these writes to a background task (e.g., using tokio::spawn or a dedicated worker channel) to avoid blocking the client response.

Comment thread crates/mizan-api/src/logging.rs Outdated
Comment on lines +50 to +55
AppError::invalid_config(
"request_log.latency_ms",
format!("latency_ms must fit into i64: {error}"),
)
})?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error type AppError::invalid_config is misleading here. An overflow during u64 to i64 conversion for latency is an internal infrastructure or data validation issue, not a user-provided configuration error. Consider using AppError::infrastructure or simply using as i64 since millisecond latency will not realistically exceed i64::MAX (~292 million years).

    let latency_ms = i64::try_from(input.latency_ms).map_err(|error| {
        AppError::infrastructure(format!("latency_ms overflow: {error}"))
    })?;

Comment thread crates/mizan-api/src/logging.rs Outdated
Comment on lines +76 to +81
.bind(input.request_id.to_string())
.bind(input.user_id.map(|value| value.to_string()))
.bind(input.api_key_id.map(|value| value.to_string()))
.bind(input.provider_id.map(|value| value.to_string()))
.bind(input.route_id.map(|value| value.to_string()))
.bind(&input.method)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Converting Uuid to String before binding to the query causes unnecessary allocations. sqlx supports binding Uuid directly for both SQLite and PostgreSQL backends if the uuid feature is enabled. This improves efficiency in the request logging path.

Suggested change
.bind(input.request_id.to_string())
.bind(input.user_id.map(|value| value.to_string()))
.bind(input.api_key_id.map(|value| value.to_string()))
.bind(input.provider_id.map(|value| value.to_string()))
.bind(input.route_id.map(|value| value.to_string()))
.bind(&input.method)
.bind(Uuid::now_v7())
.bind(input.request_id)
.bind(input.user_id)
.bind(input.api_key_id)
.bind(input.provider_id)
.bind(input.route_id)

Comment thread crates/mizan-api/src/gateway.rs Outdated
Comment on lines +673 to +674
route_id: context.route_id,
method: "POST".to_string(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The HTTP method and path are hardcoded in this helper. While currently only used for chat completions, this limits the reusability of the logging foundation. Consider passing these values as arguments to record_gateway_request_completion or extracting them from the request context.

Suggested change
route_id: context.route_id,
method: "POST".to_string(),
method: method.to_string(),
path: path.to_string(),

Comment on lines +143 to +153
record_gateway_request_completion(
&state.database,
state.database_backend(),
&context,
&request_started_at,
status,
Some(public_model),
None,
Some(&error_code),
)
.await;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Manually calling record_gateway_request_completion and awaiting it in every error and success branch is repetitive and error-prone. It also increases response latency. Consider using a Tower middleware/layer to handle request logging automatically at the end of the request lifecycle.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dcaa7297a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/alpha-smoke.sh Outdated
curl -fsS "${BASE_URL}/v1/models" -H "authorization: Bearer ${api_key}" >/dev/null
echo "Syncing upstream model list"
synced_models="$(
MODEL_SYNC_BASE_URL="${MOCK_URL}" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Point smoke model sync at /v1 models

In the default smoke path, MOCK_URL is the unversioned mock origin and the mock only serves models at /v1/models (this script already waits on ${MOCK_URL}/v1/models). Passing ${MOCK_URL} here makes model-sync.sh build ${MOCK_URL}/models, so curl receives a 404 and set -e aborts the alpha smoke before the chat/usage checks. Use ${MOCK_URL}/v1 here or make the sync helper normalize unversioned OpenAI-compatible base URLs.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 98ebf2c919

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/model-sync.sh Outdated
"${url}"
)"

jq -e '.object == "list" and (.data | type == "array")' >/dev/null <<<"${response}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add jq to smoke prerequisites or avoid it

In environments that follow the documented docs/ALPHA_RUNBOOK.md prerequisites (Rust, curl, python3, Redis) but do not have jq installed, scripts/alpha-smoke.sh now always invokes this helper and this first jq call exits 127 under set -e, aborting the smoke before the chat/usage checks. Either document/install jq as a required dependency or parse with the already-required python3 so the documented alpha smoke path remains reproducible.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e32c7233c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

latency_ms,
state.route_price,
);
let _ = record_gateway_request_completion(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record streamed requests when clients disconnect early

For streaming requests, this request-log write is only reached after the upstream stream returns None and the code is about to emit [DONE]. If the client disconnects before that point, Axum drops StreamBuildState and its Drop implementation only releases the limit lease, so those aborted streaming requests never get a request_logs row. That leaves a gap in the new durable request-log coverage for common SSE cancellations; consider logging from the drop/abort path as well.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef8b39353e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +260 to +262
if let Err(error) = record_admin_audit(&state.database, state.database_backend(), &audit).await
{
warn!(error = %error, "failed to record credit grant audit");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make admin audit writes atomic with mutations

When the admin_audit_logs insert fails here (for example because migrations were disabled or the audit table/schema is unavailable), the credit grant has already committed and this handler still returns success, leaving a balance-changing admin action with no audit row. Since the audit log is being introduced as durable admin audit coverage, record it in the same transaction as the credit ledger/wallet update or fail the mutation when the audit write cannot be persisted.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4644b0973a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

State(state): State<AppState>,
Extension(identity): Extension<ApiKeyIdentity>,
headers: HeaderMap,
Json(payload): Json<ChatCompletionsRequest>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Support the Responses API schema on /v1/responses

When a client uses the OpenAI Responses API payload (for example { "model": "...", "input": "..." }), this extractor deserializes it as ChatCompletionsRequest, which requires messages, so Axum rejects the request before the gateway can route or request-log it; successful calls also continue through the chat-completions mapper and return a chat.completion object rather than a Responses API object. Since this PR advertises /v1/responses as OpenAI-compatible, add a responses-specific request/response mapping or keep the route disabled until that contract is implemented.

Useful? React with 👍 / 👎.

Comment on lines +353 to +356
if let Err(error) = record_admin_audit(&state.database, state.database_backend(), &audit).await
{
warn!(error = %error, "failed to record provider connection creation audit");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make provider audit writes atomic with mutations

If this audit insert fails after the provider connection insert has already committed (for example with migrations disabled or the audit table unavailable), the handler still returns success and leaves an admin credential/configuration change with no audit row. Since this PR is adding durable admin audit coverage, record the audit row in the same transaction as the provider/model-route mutation or fail the mutation when the audit write cannot be persisted; the same post-commit pattern appears in the delete and model-route admin handlers.

Useful? React with 👍 / 👎.

@wauputr4 wauputr4 merged commit a2d8e4b into main May 30, 2026
1 check passed
@wauputr4 wauputr4 deleted the feature/issue-7-observability branch May 30, 2026 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

P2: Add request log and admin audit storage foundations

1 participant