Google-Vertex: Support `include_thinking` in reasoning configuration and extraction of model thoughts. #6259

Und3rf10w · 2025-05-10T00:40:45Z

Description

Vertex now supports extraction of thinking tokens in certain Gemini models.

I have opened a PR #6261 to provide a suggested implementation of this.

Thinking budget is "technically supported" via:

  providerOptions: {
    google: {
      thinkingConfig: {
        thinkingBudget: 2048,
      },
    }
  },

But actual extraction and usage of the thinking tokens requires additional logic.

Ideally, you'd send something like:

  providerOptions: {
    google: {
      thinkingConfig: {
        thinkingBudget: 2048,
        includeThoughts: true  // This line WOULD make vertex output thinking tokens
      },
    }
  },

This would be identical to how the request is shaped on the vertex side.

The proper request body sent to vertex looks something like:

{"generationConfig":{"maxOutputTokens":65535,"temperature":0.7,"frequencyPenalty":0,"presencePenalty":0, "thinkingConfig": {"includeThoughts": true, "thinking_budget": 2048}},"contents":[{"role":"user","parts":[{"text":"Describe the most unusual or striking architectural feature you've ever seen in a building or structure."}]}]}

When the includeThoughts option is passed to the aisdk via providerOptions, it is stripped from the request sent to vertex, and thus included thoughts are not sent.

Example Streamed response with thought

The response sent by vertex for thought tokens is like this for thoughts:

{"candidates": [{"content": {"role": "model","parts": [{"text": "Thinking... \n\n","thought": true}]}}],"usageMetadata": {"trafficType": "ON_DEMAND"},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyyy","responseId": "xxxxxx"}

And like this for normal text parts:

{"candidates": [{"content": {"role": "model","parts": [{"text": " form.\n\nWhile historical examples exist (like cliff dwellings), seeing this concept applied in modern, high-design architecture is particularly striking because it feels both primal and cutting-edge simultaneously. It's a feature that grounds the building quite literally and figuratively, making it feel less like an object placed *on* the earth"}]}}],"usageMetadata": {"trafficType": "ON_DEMAND"},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyy","responseId": "xxxxxx"}

For completeness, here's the last data part with token usage metadata:

{"candidates": [{"content": {"role": "model","parts": [{"text": " and more like something emerging *from* it."}]},"finishReason": "STOP"}],"usageMetadata": {"promptTokenCount": 157,"candidatesTokenCount": 417,"totalTokenCount": 1930,"trafficType": "ON_DEMAND","promptTokensDetails": [{"modality": "TEXT","tokenCount": 157}],"candidatesTokensDetails": [{"modality": "TEXT","tokenCount": 417}],"thoughtsTokenCount": 1356},"modelVersion": "gemini-2.5-flash-preview-04-17","createTime": "yyyyyyyy","responseId": "xxxxxxxxx"}

Non-streamed response with thoughts

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "**My Selection: The Nativity Facade of the Sagrada Familia**...",
            "thought": true
          },
          {
            "text": "Okay, drawing from the vast amount of architectural data I've processed, the most unusual and striking architectural feature I can describe is ..."
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -1.2357349219472042
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 19,
    "candidatesTokenCount": 541,
    "totalTokenCount": 1785,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 19
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 541
      }
    ],
    "thoughtsTokenCount": 1225
  },
  "modelVersion": "gemini-2.5-flash-preview-04-17",
  "createTime": "yyyyyyy",
  "responseId": "xxxxxxx"
}

Like before, a thought key is included in reasoning parts, so this should be straighforward to extract.

The text was updated successfully, but these errors were encountered:

ap-inflection · 2025-05-22T16:03:43Z

@lgrammel, FYI, ported this to v5 in this PR: #6428

I know v5 is in alpha, but ran into it today when starting a new project with it, so thought I'd submit a PR

Und3rf10w added the support label May 10, 2025

Und3rf10w mentioned this issue May 10, 2025

feat(providers/google): Add reasoning token output support #6261

Merged

5 tasks

lgrammel closed this as completed in #6261 May 11, 2025

lgrammel closed this as completed in fe24216 May 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Google-Vertex: Support `include_thinking` in reasoning configuration and extraction of model thoughts. #6259

Google-Vertex: Support `include_thinking` in reasoning configuration and extraction of model thoughts. #6259

Und3rf10w commented May 10, 2025 •

edited

Loading

ap-inflection commented May 22, 2025

Uh oh!

Google-Vertex: Support include_thinking in reasoning configuration and extraction of model thoughts. #6259

Google-Vertex: Support include_thinking in reasoning configuration and extraction of model thoughts. #6259

Comments

Und3rf10w commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Example Streamed response with thought

Non-streamed response with thoughts

ap-inflection commented May 22, 2025

Uh oh!

Google-Vertex: Support `include_thinking` in reasoning configuration and extraction of model thoughts. #6259

Google-Vertex: Support `include_thinking` in reasoning configuration and extraction of model thoughts. #6259

Und3rf10w commented May 10, 2025 •

edited

Loading