Open
Conversation
Contributor
Greptile SummaryThis PR adds a new dev note,
|
| Filename | Overview |
|---|---|
| docs/devnotes/posts/owning-the-model-stack.md | New long-form dev note covering the native model client architecture, AIMD throttling, ceiling stabilization, cascade dampening, two-level keying, and retry boundary. All technical claims verified against source (ThrottleConfig, ThrottleManager): parameter names, defaults, and behavioral descriptions are accurate. |
| docs/devnotes/.authors.yml | Adds nmulepati author entry consistent with the existing format in the file. |
| mkdocs.yml | Adds nav entry for the new dev note in the correct most-recent-first position, consistent with the existing ordering comment. |
Sequence Diagram
sequenceDiagram
participant CG as ColumnGenerator
participant MF as ModelFacade
participant TMC as ThrottledModelClient
participant TM as ThrottleManager
participant HMC as HttpModelClient
participant API as Provider API
CG->>MF: generate(request)
MF->>TMC: complete(ChatCompletionRequest)
TMC->>TM: try_acquire(provider, model, domain)
TM-->>TMC: slot acquired or wait_seconds
TMC->>HMC: complete(request)
HMC->>API: HTTP POST via RetryTransport
alt 200 OK
API-->>HMC: 200 OK
HMC-->>TMC: ChatCompletionResponse
TMC->>TM: release_success()
TMC-->>MF: ChatCompletionResponse
MF-->>CG: result
else 502/503/504 transient error
API-->>HMC: server error
HMC->>API: retry with exponential backoff
API-->>HMC: 200 OK
HMC-->>TMC: ChatCompletionResponse
TMC->>TM: release_success()
else 429 rate limited
API-->>HMC: 429
HMC-->>TMC: ProviderError 429
TMC->>TM: release_rate_limited(retry_after)
TMC->>TMC: wait cooldown then re-acquire
end
Reviews (3): Last reviewed commit: "update example model name" | Re-trigger Greptile
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📋 Summary
Adds a new dev note covering the native model client layer and its adaptive throttling system (AIMD-based concurrency control).
🔄 Changes
✨ Added
owning-the-model-stack.md— covers the native HTTP client architecture, AIMD adaptive throttling, ceiling stabilization, cascade dampening, two-level throttle keying, and the retry boundary designdocs/devnotes/posts/assets/owning-the-model-stack/(hero image, layer diagram, AIMD concurrency chart, throttle keying diagram, retry boundary diagram)nmulepatiin.authors.ymlmkdocs.yml🔍 Attention Areas
docs/devnotes/posts/owning-the-model-stack.md— new long-form technical content; review for accuracy on AIMD behavior, retry boundary semantics, and configuration parameter descriptions🤖 Generated with AI
Made with Cursor