Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to capture token usage in @track-decorated LLM calls where response.usage is missing? #1636

Open
maheerarshad-emumba opened this issue Mar 25, 2025 · 3 comments
Labels
Question Further information is requested

Comments

@maheerarshad-emumba
Copy link

Hi OPiK team,

I’m using the @track decorator in a hybrid orchestration setup where each step is an agent function (e.g. UserProfileAgent, ReviewAgent, etc.), and inside these functions, I make calls to LLMs (e.g. via llm.acomplete() or FunctionCallingAgent.aquery() from LlamaIndex).

Everything is being tracked fine — spans are visible in the OPiK UI with input/output metadata — but I’m not seeing token usage for the LLM calls.

What I’ve Tried:

  • I checked the response object, and response.usage is empty.
  • Adding manual logic to estimate tokens defeats the purpose of using @track as a clean decorator.

Questions:

  • Is there a recommended way to capture token usage inside a @track function if response.usage is not available?
  • If LLM calls are made outside LangChain/OpenAI native interfaces (e.g., via LlamaIndex wrappers), how can we still make OPiK show token info?
@jverre
Copy link
Collaborator

jverre commented Mar 25, 2025

Hi @maheerarshad-emumba

Can you share a couple of small code snippets ? Will make it easier to recommend the right solution for you

@jverre jverre added the Question Further information is requested label Mar 25, 2025
@maheerarshad-emumba
Copy link
Author

I'm currently using OPiK's @track decorator across my agents and orchestration layers to trace workflows in a LlamaIndex-based multi-agent system. While the decorator works great for logging prompts, inputs/outputs, and metadata, I’m facing an issue:

Problem: Missing Token Usage

When making LLM calls via LlamaIndex’s FunctionCallingAgent, the response.usage field is empty, so I can’t pass accurate token usage to opik_context.update_current_span.

1. LLM Call (Tracked)

python
from opik import track, opik_context

@track
async def _get_agent_analysis(self, prompt: str) -> Dict[str, Any]:
    logger.info("Invoking LLM for analysis")
    try:
        tools = self._create_tools()
        display_fc_agent = self._create_agent(tools=tools)

        response = await display_fc_agent.aquery(prompt)  # llama-index FunctionCallingAgent
        raw_text = response.response

        # ❌ response.usage is None
        result_dict = extract_json_response(raw_text)

        opik_context.update_current_span(
            name="_get_agent_analysis",
            input={"prompt_preview": prompt[:200]},
            output={"result_preview": str(result_dict)[:200]},
            usage={"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},  # ← Want real usage here
            metadata={"agent_name": self.name}
        )

        return result_dict
    except Exception as e:
        logger.error(f"Error in LLM call: {e}")
        return {"error": str(e)}

2. Agent Setup (LlamaIndex)

from llama_index.core.agent import FunctionCallingAgent

def _create_agent(self, tools):
    return FunctionCallingAgent.from_tools(
        tools=tools,
        llm=self.llm,  # OpenAI or Azure client
        verbose=False,
        system_prompt=self.system_prompt,
    )

What's Happening

  • The OPiK @track spans show all metadata correctly.
  • But token usage stays at 0.
  • This seems to happen because FunctionCallingAgent.aquery() does not return usage, or it doesn’t expose the response.usage field like OpenAI clients do.

What I'm Looking For

Is there a way to:

  • Extract or estimate token usage in this context without removing the @track decorator?
  • Or, is there a workaround that integrates well with LlamaIndex agents and OPiK tracking?

I’d prefer not to duplicate or wrap every LLM call manually just to inject token usage.

@jverre
Copy link
Collaborator

jverre commented Mar 27, 2025

Hi @maheerarshad-emumba

I'll take a look but seems like the issue might be with LlamaIndex right ? If they don't surface the token usage, I don't think we can compute usage on our side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants