Skip to content

Conversation

@kushalthaman
Copy link

No description provided.

@kushalthaman kushalthaman force-pushed the feat/progress-monitor-and-logging branch from da76445 to 484ab9a Compare August 23, 2025 16:33
self._bars[model_id] = (req_bar, None, None)
self._next_position += 1

async def update_openai_usage(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its used by other providers too so maybe don't call openai_usage?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, corrected

self.model_wait_times.setdefault(response.model_id, []).append(response.duration - response.api_duration)

# Update progress monitor with usage info
if hasattr(self, "progress_monitor") and self.progress_monitor is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason more of this can't go in the class and happen when you run self.progress_monitor.update_openai_usage in each model class?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, It’s centralized now, and the progress update happens only in InferenceAPI.call

if isinstance(model_class, AnthropicChatModel) or isinstance(model_class, HuggingFaceModel) or isinstance(model_class, GeminiModel) or isinstance(model_class, GeminiVertexAIModel):
request_increment = num_candidates

await self.progress_monitor.update_openai_usage(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I'm confused why this is called again when you call it in each model class

@jplhughes
Copy link
Contributor

This is awesome work! Tysm for implementing this

@jplhughes
Copy link
Contributor

Can you post a screenshot of what it looks like in the terminal?

@kushalthaman
Copy link
Author

kushalthaman commented Aug 25, 2025

Screenshot 2025-08-25 at 12 57 42 PM let me know if you have any cosmetic suggestions!

we x-ratelimit-limit-tokens (TPM) and x-ratelimit-limit-requests (RPM) on an initial header probe, and (intentionally) run under the cap by openai_fraction_rate_limit (defaults to 0.8). Also we seed “consumed” from x-ratelimit-remaining-* so the bars reflect the live window state.

@kushalthaman
Copy link
Author

@jplhughes bumping this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants