Skip to content

docs: add native model client dev note#465

Open
nabinchha wants to merge 8 commits intomainfrom
nmulepati/docs/native-model-client-dev-notes
Open

docs: add native model client dev note#465
nabinchha wants to merge 8 commits intomainfrom
nmulepati/docs/native-model-client-dev-notes

Conversation

@nabinchha
Copy link
Copy Markdown
Contributor

📋 Summary

Adds a new dev note covering the native model client layer and its adaptive throttling system (AIMD-based concurrency control).

🔄 Changes

✨ Added

  • New dev note: owning-the-model-stack.md — covers the native HTTP client architecture, AIMD adaptive throttling, ceiling stabilization, cascade dampening, two-level throttle keying, and the retry boundary design
  • Architecture diagrams in docs/devnotes/posts/assets/owning-the-model-stack/ (hero image, layer diagram, AIMD concurrency chart, throttle keying diagram, retry boundary diagram)
  • Author entry for nmulepati in .authors.yml
  • Nav entry in mkdocs.yml

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:


🤖 Generated with AI

Made with Cursor

@nabinchha nabinchha requested a review from a team as a code owner March 25, 2026 17:48
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 25, 2026

Greptile Summary

This PR adds a new dev note, owning-the-model-stack.md, documenting the native model client layer that replaced LiteLLM and the AIMD-based adaptive throttling system introduced in Data Designer v0.5.4. It also adds the author entry for nmulepati and the corresponding mkdocs.yml nav entry.

  • All technical claims were verified against the live source (ThrottleConfig in run_config.py and ThrottleManager in throttle_manager.py): parameter names, defaults (reduce_factor=0.75, additive_increase=1, success_window=25, cooldown_seconds=2.0, ceiling_overshoot=0.10), and all behavioral descriptions match the implementation.
  • The AIMD mechanics (additive increase, multiplicative decrease, ceiling stabilization, cascade dampening, two-level keying) are accurately described and consistent with the code.
  • The cascade dampening math ("collapsed from 20 to 4") is numerically correct: 20 × 0.75⁵ floors to 4 with integer intermediate rounding.
  • The retry boundary section accurately reflects the asymmetry between async mode (full AIMD loop) and sync mode (transport-layer retries), with a clear note that the sync codepath is temporary.
  • Previously flagged review concerns (inconsistent model name prefix in log examples, duplicate closing phrase) were resolved in prior commits.
  • Binary image assets for all five diagrams are included.

Confidence Score: 5/5

  • Documentation-only PR; no runtime code changes. Safe to merge.
  • All technical claims in the dev note are accurate — verified against the actual ThrottleConfig and ThrottleManager source. The prior review concerns have been resolved. The author entry, nav placement, and image assets are all correctly structured. No blocking issues remain.
  • No files require special attention.

Important Files Changed

Filename Overview
docs/devnotes/posts/owning-the-model-stack.md New long-form dev note covering the native model client architecture, AIMD throttling, ceiling stabilization, cascade dampening, two-level keying, and retry boundary. All technical claims verified against source (ThrottleConfig, ThrottleManager): parameter names, defaults, and behavioral descriptions are accurate.
docs/devnotes/.authors.yml Adds nmulepati author entry consistent with the existing format in the file.
mkdocs.yml Adds nav entry for the new dev note in the correct most-recent-first position, consistent with the existing ordering comment.

Sequence Diagram

sequenceDiagram
    participant CG as ColumnGenerator
    participant MF as ModelFacade
    participant TMC as ThrottledModelClient
    participant TM as ThrottleManager
    participant HMC as HttpModelClient
    participant API as Provider API

    CG->>MF: generate(request)
    MF->>TMC: complete(ChatCompletionRequest)
    TMC->>TM: try_acquire(provider, model, domain)
    TM-->>TMC: slot acquired or wait_seconds
    TMC->>HMC: complete(request)
    HMC->>API: HTTP POST via RetryTransport

    alt 200 OK
        API-->>HMC: 200 OK
        HMC-->>TMC: ChatCompletionResponse
        TMC->>TM: release_success()
        TMC-->>MF: ChatCompletionResponse
        MF-->>CG: result
    else 502/503/504 transient error
        API-->>HMC: server error
        HMC->>API: retry with exponential backoff
        API-->>HMC: 200 OK
        HMC-->>TMC: ChatCompletionResponse
        TMC->>TM: release_success()
    else 429 rate limited
        API-->>HMC: 429
        HMC-->>TMC: ProviderError 429
        TMC->>TM: release_rate_limited(retry_after)
        TMC->>TMC: wait cooldown then re-acquire
    end
Loading

Reviews (3): Last reviewed commit: "update example model name" | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant