-
Notifications
You must be signed in to change notification settings - Fork 166
preemptive generation feature #783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
| isEquivalent(other: ChatContext): boolean { | ||
| // Same object reference | ||
| if (this === other) { | ||
| return true; | ||
| } | ||
|
|
||
| // Different lengths | ||
| if (this._items.length !== other._items.length) { | ||
| return false; | ||
| } | ||
|
|
||
| // Compare each item pair | ||
| for (let i = 0; i < this._items.length; i++) { | ||
| const a = this._items[i]!; | ||
| const b = other._items[i]!; | ||
|
|
||
| // IDs and types must match | ||
| if (a.id !== b.id || a.type !== b.type) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice, can you also add a unittest for this function? Inside chat_context.test.ts?
| const preemptive = this.preemptiveGeneration; | ||
| if (preemptive) { | ||
| // Add the user message to the chat context for comparison | ||
| const validationChatCtx = this.agent.chatCtx.copy(); | ||
| if (userMessage) { | ||
| validationChatCtx.insert(userMessage); | ||
| } | ||
|
|
||
| // Validate: transcript matches, context equivalent, tools unchanged, toolChoice unchanged | ||
| const transcriptMatches = preemptive.info.newTranscript === info.newTranscript; | ||
| const contextEquivalent = preemptive.chatCtx.isEquivalent(validationChatCtx); | ||
| const toolsUnchanged = preemptive.tools === this.agent.toolCtx; | ||
| const toolChoiceUnchanged = preemptive.toolChoice === this.toolChoice; | ||
|
|
||
| if (transcriptMatches && contextEquivalent && toolsUnchanged && toolChoiceUnchanged) { | ||
| // Use preemptive generation! | ||
| const speechHandle = preemptive.speechHandle; | ||
| this.preemptiveGeneration = undefined; | ||
|
|
||
| const leadTime = Date.now() - preemptive.createdAt; | ||
| this.logger.info( | ||
| { | ||
| transcript: info.newTranscript, | ||
| leadTimeMs: leadTime, | ||
| confidence: preemptive.info.transcriptConfidence, | ||
| }, | ||
| 'using preemptive generation', | ||
| ); | ||
|
|
||
| // Schedule the preemptive speech | ||
| this.scheduleSpeech(speechHandle, SpeechHandle.SPEECH_PRIORITY_NORMAL); | ||
|
|
||
| // Emit metrics | ||
| const eouMetrics: EOUMetrics = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make sure we have the parity implementation as in python agent framework? https://github.com/livekit/agents/blob/a9bc03562f498f3666978ad008fc93b2cbbd22a9/livekit-agents/livekit/agents/voice/agent_activity.py#L1384-L1420
| // Update preflight transcript and confidence | ||
| this.audioPreflightTranscript = `${this.audioTranscript} ${preflightTranscript}`.trim(); | ||
| this.preflightTranscriptConfidence = preflightConfidence; | ||
|
|
||
| // Trigger preemptive generation if conditions are met | ||
| if ( | ||
| this.hooks.onPreemptiveGeneration && | ||
| (this.turnDetectionMode !== 'manual' || this.userTurnCommitted) | ||
| ) { | ||
| // Calculate confidence including all final transcripts plus the current preflight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's follow the same params naming as in python agent:
# still need to increment it as it's used for turn detection,
self._last_final_transcript_time = time.time()
# preflight transcript includes all pre-committed transcripts (including final transcript from the previous STT run)
self._audio_preflight_transcript = (self._audio_transcript + " " + transcript).lstrip()
self._audio_interim_transcript = transcript
if not self._vad or self._last_speaking_time == 0:
# vad disabled, use stt timestamp
self._last_speaking_time = time.time()
Description
this allows STTs to send PREFLIGHT_TRANSCRIPT events.
"exact" same implemenation as the python library.
see #773
Changes Made
brings the SpeechEventType PREFLIGHT_TRANSCRIPT to the typescript library. but besides this it also enables preempetive generation before VAD end-of-speech or turn detection completes, to start generation early.
Btw: I couldn't find any actual PREFLIGHT_TRANSCRIPT events in the python version... Am I missing something?I found it ;-) Deepgram STT feature support for preemptive gen is prepared here: simllll/agents-js@feat/preemtive-gen...simllll:agents-js:feat/preemptive-gen-deepgram-stt
Pre-Review Checklist
Testing
restaurant_agent.tsandrealtime_agent.tswork properly (for major changes)Additional Notes
Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.