feat: Add inputAudioTranscription support to Java ADK #463

jinnigu · 2025-09-28T06:22:16Z

Summary

Adds inputAudioTranscription support to the Java ADK to achieve feature parity with Python. When enabled, the live connect config requests model-side transcription of input audio into text, allowing real-time processing of spoken input in live streaming scenarios.

Changes

Core Implementation

RunConfig: Added inputAudioTranscription field with getter/setter and builder support
Basic: Maps RunConfig.inputAudioTranscription to LiveConnectConfig.inputTranscription for model-side transcription
Runner: Auto-enables input/output transcription for live multi-agent scenarios to match Python behavior

Bug Fix

Runner: Fixed unreachable condition in newInvocationContextForLive() where the outer check !CollectionUtils.isNullOrEmpty(runConfig.responseModalities()) (NOT empty) made the inner check CollectionUtils.isNullOrEmpty(runConfig.responseModalities()) (IS empty) impossible to reach. This prevented the "default to AUDIO modality" logic from ever executing.

Behavior Alignment with Python

Auto-sets inputAudioTranscription only for live multi-agent runs (when agent.subAgents() is non-empty)
Auto-sets outputAudioTranscription when response modalities imply audio usage
Leaves transcription settings unchanged for single-agent scenarios

Testing

Added unit tests for RunConfig transcription field handling
Added unit tests for Basic flow mapping to LiveConnectConfig

vorburger

@jinnigu Thank You for contributing this! Is there any way to illustrate that this actually really 😆 fully works, in this PR? The unit tests are... well, unit tests. Would a full-blown integration test for this be possible? Or, how would you feel about if I invite you to add, as part of this PR, a very (most) simple "MVP" in tutorials/audio with just a super simple LlmAgent (without even any sub-agents), with just an AdkWebServer.start(), which allows us to "see this work in action"? That would be awesome!

vorburger · 2025-10-07T09:48:47Z

core/src/main/java/com/google/adk/runner/Runner.java

-    if (!CollectionUtils.isNullOrEmpty(runConfig.responseModalities())
-        && liveRequestQueue.isPresent()) {
+    if (liveRequestQueue.isPresent() && !this.agent.subAgents().isEmpty()) {
+      // Parity with Python: apply modality defaults and transcription settings
+      // only for multi-agent live scenarios.


@jinnigu The inline comment and the code don't seem to align, here? The "text" says "apply modality defaults" but then this removes !CollectionUtils.isNullOrEmpty(runConfig.responseModalities()... is that intentional? (It may well be, I'm entirely sure about why this was originally like this; but it seems worth double checking.) Also, why would we limit transcription only for multi-agent live scenarios? I would personally love to use this even for a very simple trivial only-LlmAgent use case... you speak to it, and get a persistent transcript in your session store, that's very cool! I'd love to use this e.g. in my (personal) https://docs.enola.dev project - but don't see why it needs to be limited to work only if !this.agent.subAgents().isEmpty().

The reason why I made the change like this is because I want to make adk-java equivalent to adk-python (https://github.com/google/adk-python/blob/main/src/google/adk/runners.py#L939-L971). I also agree that we should not limit transcription to multi-agent live scenarios only. I will raise an issue in adk-python to gather some feedbacks and make PRs to both adk-python and adk-java.

@jinnigu awesome! (It's google/adk-python#3259.)

vorburger · 2025-10-07T09:56:06Z

/gemini review

gemini-code-assist

Code Review

This pull request successfully adds inputAudioTranscription support to the Java ADK, achieving feature parity with the Python version. The changes are well-structured, including updates to RunConfig, the Basic flow, and the Runner. A significant improvement is the fix for an unreachable code block in Runner.java, which enhances correctness. The new functionality is also thoroughly covered by unit tests. I have one suggestion to refactor a small portion of the logic in Runner.java to improve maintainability by removing duplicated code. Overall, this is a solid and valuable contribution.

vorburger

LGTM! (Very sorry for the delay; somehow this fell "through the cracks" on my end.)

This is a port of the python implementation and part of the "human in the loop" workflow. FUTURE_COPYBARA_INTEGRATE_REVIEW=#463 from jinnigu:feature/inputAudioTranscription 408913d PiperOrigin-RevId: 820215719

jinnigu force-pushed the feature/inputAudioTranscription branch from f2d2406 to 2ff85c5 Compare September 28, 2025 06:44

jinnigu marked this pull request as ready for review September 28, 2025 07:48

jinnigu mentioned this pull request Sep 28, 2025

Need support inputAudioTranscription in core/src/main/java/com/google/adk/agents/RunConfig.java #281

Closed

jinnigu force-pushed the feature/inputAudioTranscription branch 4 times, most recently from e19b579 to 2eb051c Compare September 28, 2025 17:12

jinnigu force-pushed the feature/inputAudioTranscription branch 2 times, most recently from e884108 to c1da61c Compare October 7, 2025 07:07

vorburger requested changes Oct 7, 2025

View reviewed changes

gemini-code-assist bot reviewed Oct 7, 2025

View reviewed changes

jinnigu force-pushed the feature/inputAudioTranscription branch from c1da61c to 1e030a0 Compare October 9, 2025 06:56

vorburger approved these changes Oct 27, 2025

View reviewed changes

Add inputAudioTranscription support to Java ADK

408913d

jinnigu force-pushed the feature/inputAudioTranscription branch from 723ed18 to 408913d Compare October 27, 2025 19:17

Poggecci self-requested a review October 27, 2025 19:37

Poggecci approved these changes Oct 27, 2025

View reviewed changes

Poggecci added the ready to pull label Oct 27, 2025

copybara-service bot merged commit 230f750 into google:main Oct 27, 2025
6 checks passed

copybara-service bot mentioned this pull request Oct 29, 2025

feat: HITL/Wire up tool confirmation support #531

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add inputAudioTranscription support to Java ADK #463

feat: Add inputAudioTranscription support to Java ADK #463

jinnigu commented Sep 28, 2025 •

edited

Loading

Uh oh!

vorburger left a comment

Uh oh!

vorburger Oct 7, 2025

Uh oh!

jinnigu Oct 8, 2025

Uh oh!

vorburger Oct 27, 2025

Uh oh!

vorburger commented Oct 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

vorburger left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add inputAudioTranscription support to Java ADK #463

feat: Add inputAudioTranscription support to Java ADK #463

Conversation

jinnigu commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core Implementation

Bug Fix

Behavior Alignment with Python

Testing

Uh oh!

vorburger left a comment

Choose a reason for hiding this comment

Uh oh!

vorburger Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

jinnigu Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

vorburger Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

vorburger commented Oct 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

vorburger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jinnigu commented Sep 28, 2025 •

edited

Loading