Skip to content

Google-Vertex: Support include_thinking in reasoning configuration and extraction of model thoughts. #6259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Und3rf10w opened this issue May 10, 2025 · 1 comment · Fixed by #6261
Labels

Comments

@Und3rf10w
Copy link
Contributor

Und3rf10w commented May 10, 2025

Description

Vertex now supports extraction of thinking tokens in certain Gemini models.

I have opened a PR #6261 to provide a suggested implementation of this.

Thinking budget is "technically supported" via:

  providerOptions: {
    google: {
      thinkingConfig: {
        thinkingBudget: 2048,
      },
    }
  },

But actual extraction and usage of the thinking tokens requires additional logic.

Ideally, you'd send something like:

  providerOptions: {
    google: {
      thinkingConfig: {
        thinkingBudget: 2048,
        includeThoughts: true  // This line WOULD make vertex output thinking tokens
      },
    }
  },

This would be identical to how the request is shaped on the vertex side.

The proper request body sent to vertex looks something like:

{"generationConfig":{"maxOutputTokens":65535,"temperature":0.7,"frequencyPenalty":0,"presencePenalty":0, "thinkingConfig": {"includeThoughts": true, "thinking_budget": 2048}},"contents":[{"role":"user","parts":[{"text":"Describe the most unusual or striking architectural feature you've ever seen in a building or structure."}]}]}

When the includeThoughts option is passed to the aisdk via providerOptions, it is stripped from the request sent to vertex, and thus included thoughts are not sent.

Example Streamed response with thought

The response sent by vertex for thought tokens is like this for thoughts:

{"candidates": [{"content": {"role": "model","parts": [{"text": "Thinking... \n\n","thought": true}]}}],"usageMetadata": {"trafficType": "ON_DEMAND"},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyyy","responseId": "xxxxxx"}

And like this for normal text parts:

{"candidates": [{"content": {"role": "model","parts": [{"text": " form.\n\nWhile historical examples exist (like cliff dwellings), seeing this concept applied in modern, high-design architecture is particularly striking because it feels both primal and cutting-edge simultaneously. It's a feature that grounds the building quite literally and figuratively, making it feel less like an object placed *on* the earth"}]}}],"usageMetadata": {"trafficType": "ON_DEMAND"},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyy","responseId": "xxxxxx"}

For completeness, here's the last data part with token usage metadata:

{"candidates": [{"content": {"role": "model","parts": [{"text": " and more like something emerging *from* it."}]},"finishReason": "STOP"}],"usageMetadata": {"promptTokenCount": 157,"candidatesTokenCount": 417,"totalTokenCount": 1930,"trafficType": "ON_DEMAND","promptTokensDetails": [{"modality": "TEXT","tokenCount": 157}],"candidatesTokensDetails": [{"modality": "TEXT","tokenCount": 417}],"thoughtsTokenCount": 1356},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyyyyy","responseId": "xxxxxxxxx"}

Non-streamed response with thoughts

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "**My Selection: The Nativity Facade of the Sagrada Familia**...",
            "thought": true
          },
          {
            "text": "Okay, drawing from the vast amount of architectural data I've processed, the most unusual and striking architectural feature I can describe is ..."
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -1.2357349219472042
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 19,
    "candidatesTokenCount": 541,
    "totalTokenCount": 1785,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 19
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 541
      }
    ],
    "thoughtsTokenCount": 1225
  },
  "modelVersion": "gemini-2.5-flash-preview-04-17",
  "createTime": "yyyyyyy",
  "responseId": "xxxxxxx"
}

Like before, a thought key is included in reasoning parts, so this should be straighforward to extract.

@ap-inflection
Copy link

@lgrammel, FYI, ported this to v5 in this PR: #6428

I know v5 is in alpha, but ran into it today when starting a new project with it, so thought I'd submit a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants