-
Notifications
You must be signed in to change notification settings - Fork 171
feat: Add inputAudioTranscription support to Java ADK #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add inputAudioTranscription support to Java ADK #463
Conversation
f2d2406 to
2ff85c5
Compare
e19b579 to
2eb051c
Compare
e884108 to
c1da61c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinnigu Thank You for contributing this! Is there any way to illustrate that this actually really 😆 fully works, in this PR? The unit tests are... well, unit tests. Would a full-blown integration test for this be possible? Or, how would you feel about if I invite you to add, as part of this PR, a very (most) simple "MVP" in tutorials/audio with just a super simple LlmAgent (without even any sub-agents), with just an AdkWebServer.start(), which allows us to "see this work in action"? That would be awesome!
| if (!CollectionUtils.isNullOrEmpty(runConfig.responseModalities()) | ||
| && liveRequestQueue.isPresent()) { | ||
| if (liveRequestQueue.isPresent() && !this.agent.subAgents().isEmpty()) { | ||
| // Parity with Python: apply modality defaults and transcription settings | ||
| // only for multi-agent live scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinnigu The inline comment and the code don't seem to align, here? The "text" says "apply modality defaults" but then this removes !CollectionUtils.isNullOrEmpty(runConfig.responseModalities()... is that intentional? (It may well be, I'm entirely sure about why this was originally like this; but it seems worth double checking.) Also, why would we limit transcription only for multi-agent live scenarios? I would personally love to use this even for a very simple trivial only-LlmAgent use case... you speak to it, and get a persistent transcript in your session store, that's very cool! I'd love to use this e.g. in my (personal) https://docs.enola.dev project - but don't see why it needs to be limited to work only if !this.agent.subAgents().isEmpty().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I made the change like this is because I want to make adk-java equivalent to adk-python (https://github.com/google/adk-python/blob/main/src/google/adk/runners.py#L939-L971). I also agree that we should not limit transcription to multi-agent live scenarios only. I will raise an issue in adk-python to gather some feedbacks and make PRs to both adk-python and adk-java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinnigu awesome! (It's google/adk-python#3259.)
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully adds inputAudioTranscription support to the Java ADK, achieving feature parity with the Python version. The changes are well-structured, including updates to RunConfig, the Basic flow, and the Runner. A significant improvement is the fix for an unreachable code block in Runner.java, which enhances correctness. The new functionality is also thoroughly covered by unit tests. I have one suggestion to refactor a small portion of the logic in Runner.java to improve maintainability by removing duplicated code. Overall, this is a solid and valuable contribution.
c1da61c to
1e030a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (Very sorry for the delay; somehow this fell "through the cracks" on my end.)
723ed18 to
408913d
Compare
Summary
Adds
inputAudioTranscriptionsupport to the Java ADK to achieve feature parity with Python. When enabled, the live connect config requests model-side transcription of input audio into text, allowing real-time processing of spoken input in live streaming scenarios.Changes
Core Implementation
inputAudioTranscriptionfield with getter/setter and builder supportRunConfig.inputAudioTranscriptiontoLiveConnectConfig.inputTranscriptionfor model-side transcriptionBug Fix
newInvocationContextForLive()where the outer check!CollectionUtils.isNullOrEmpty(runConfig.responseModalities())(NOT empty) made the inner checkCollectionUtils.isNullOrEmpty(runConfig.responseModalities())(IS empty) impossible to reach. This prevented the "default to AUDIO modality" logic from ever executing.Behavior Alignment with Python
inputAudioTranscriptiononly for live multi-agent runs (whenagent.subAgents()is non-empty)outputAudioTranscriptionwhen response modalities imply audio usageTesting
RunConfigtranscription field handlingBasicflow mapping toLiveConnectConfig