Skip to content

100rabhkr/live-transcribe

Repository files navigation

Live Transcribe

Professional live speech transcription library for TypeScript/JavaScript with multi-provider support

npm version License: MIT TypeScript Tests

Built by 360labs - We build AI-powered developer tools and APIs.

360labs.dev

Table of Contents


Features

Feature Description
Multi-Provider Support Web Speech API, Deepgram, AssemblyAI, and custom providers
Real-time Transcription Live results with interim and final transcripts
40+ Languages Extensive language support across all providers
Session Management Full control with start, stop, pause, and resume
Voice Activity Detection Automatic speech detection (VAD)
Audio Recording Built-in recording capabilities
Export Formats JSON, Plain Text, SRT, VTT, CSV
TypeScript First Complete type definitions and IntelliSense support
Event-Driven Subscribe to transcription events easily
Lightweight ~200KB package size with zero runtime dependencies
Cross-Platform Works in browsers and Node.js

Installation

# npm
npm install @360labs/live-transcribe

# yarn
yarn add @360labs/live-transcribe

# pnpm
pnpm add @360labs/live-transcribe

Quick Start

Basic Usage (Web Speech API)

import { createTranscriber, TranscriptionProvider } from '@360labs/live-transcribe';

// Create a transcriber (Web Speech API - no API key required)
const transcriber = createTranscriber({
  provider: TranscriptionProvider.WebSpeechAPI,
  language: 'en-US',
  interimResults: true,
});

// Listen for transcription results
transcriber.on('transcript', (result) => {
  if (result.isFinal) {
    console.log('Final:', result.text);
  } else {
    console.log('Interim:', result.text);
  }
});

// Handle errors
transcriber.on('error', (error) => {
  console.error('Error:', error.message);
});

// Start transcribing
await transcriber.initialize();
await transcriber.start();

// Stop when done
await transcriber.stop();

Using Sessions (Recommended)

import { createSession, TranscriptionProvider } from '@360labs/live-transcribe';

// Create a session for full control
const session = createSession({
  provider: TranscriptionProvider.WebSpeechAPI,
  language: 'en-US',
});

// Access the provider for events
session.provider.on('transcript', (result) => {
  console.log(result.text);

  // Add to session for later export
  if (result.isFinal) {
    session.addTranscript(result);
  }
});

// Lifecycle control
await session.start();
session.pause();    // Pause transcription
session.resume();   // Resume transcription
await session.stop();

// Get results
const transcripts = session.getTranscripts();
const fullText = session.getFullText();
const stats = session.getStatistics();

// Export in various formats
const srtFile = session.export('srt');
const jsonFile = session.export('json');

Providers

Web Speech API (Browser)

The Web Speech API is built into modern browsers and requires no API key. It's perfect for quick prototypes and applications that don't need cloud-based accuracy.

import { createTranscriber, TranscriptionProvider } from '@360labs/live-transcribe';

const transcriber = createTranscriber({
  provider: TranscriptionProvider.WebSpeechAPI,
  language: 'en-US',
  interimResults: true,  // Get real-time interim results
});

// Check browser support
if (transcriber.isSupported()) {
  await transcriber.initialize();
  await transcriber.start();
}

Pros:

  • No API key required
  • Free to use
  • Works offline (in some browsers)
  • Low latency

Cons:

  • Accuracy varies by browser
  • Limited language support compared to cloud providers
  • Requires internet connection in most browsers

Deepgram

Deepgram offers high-accuracy, real-time transcription with advanced features like speaker diarization and custom vocabularies.

import { createTranscriber, TranscriptionProvider } from '@360labs/live-transcribe';

const transcriber = createTranscriber({
  provider: TranscriptionProvider.Deepgram,
  apiKey: 'your-deepgram-api-key',
  language: 'en-US',
  model: 'nova-2',           // Latest model
  punctuate: true,           // Auto-punctuation
  interimResults: true,
});

transcriber.on('transcript', (result) => {
  console.log(result.text);
  console.log('Confidence:', result.confidence);
});

await transcriber.initialize();
await transcriber.start();

Configuration Options:

Option Type Default Description
apiKey string required Your Deepgram API key
model string 'nova-2' Model to use (nova-2, nova, enhanced, base)
language string 'en-US' Language code
punctuate boolean true Enable auto-punctuation
interimResults boolean true Enable interim results
smartFormat boolean false Enable smart formatting
diarize boolean false Enable speaker diarization

AssemblyAI

AssemblyAI provides state-of-the-art transcription with features like automatic language detection and content moderation.

import { createTranscriber, TranscriptionProvider } from '@360labs/live-transcribe';

const transcriber = createTranscriber({
  provider: TranscriptionProvider.AssemblyAI,
  apiKey: 'your-assemblyai-api-key',
  sampleRate: 16000,
});

transcriber.on('transcript', (result) => {
  console.log(result.text);
});

await transcriber.initialize();
await transcriber.start();

Configuration Options:

Option Type Default Description
apiKey string required Your AssemblyAI API key
sampleRate number 16000 Audio sample rate in Hz
wordBoost string[] [] Words to boost recognition

Custom Provider

You can create custom providers by extending the BaseTranscriber class:

import { BaseTranscriber, TranscriptionConfig, SessionState } from '@360labs/live-transcribe';

class MyCustomProvider extends BaseTranscriber {
  private recognition: any;

  constructor(config: TranscriptionConfig) {
    super(config);
  }

  isSupported(): boolean {
    return true; // Check if your provider is available
  }

  async initialize(): Promise<void> {
    // Initialize your provider
    this.setState(SessionState.INITIALIZING);
  }

  async start(): Promise<void> {
    this.setState(SessionState.ACTIVE);
    this.emit('start');
    // Start transcription
  }

  async stop(): Promise<void> {
    this.setState(SessionState.STOPPED);
    this.emit('stop');
    // Stop transcription
  }

  pause(): void {
    this.setState(SessionState.PAUSED);
    this.emit('pause');
  }

  resume(): void {
    this.setState(SessionState.ACTIVE);
    this.emit('resume');
  }

  sendAudio(audioData: ArrayBuffer): void {
    // Send audio to your provider
  }

  async cleanup(): Promise<void> {
    // Clean up resources
  }
}

Session Management

Sessions provide a higher-level API for managing transcription with built-in transcript storage and export capabilities.

import { createSession, SessionManager, TranscriptionProvider } from '@360labs/live-transcribe';

// Single session
const session = createSession({
  provider: TranscriptionProvider.WebSpeechAPI,
  language: 'en-US',
});

// Session properties
console.log(session.id);        // Unique session ID
console.log(session.getState()); // Current state

// Multiple sessions with SessionManager
const manager = new SessionManager();

const session1 = manager.createSession({
  provider: TranscriptionProvider.WebSpeechAPI,
  language: 'en-US',
});

const session2 = manager.createSession({
  provider: TranscriptionProvider.WebSpeechAPI,
  language: 'es-ES',
});

// Get all sessions
const allSessions = manager.getAllSessions();

// Get session by ID
const session = manager.getSession('session-id');

// Get active sessions
const activeSessions = manager.getActiveSessions();

Session States

import { SessionState } from '@360labs/live-transcribe';

// Available states
SessionState.IDLE         // Initial state
SessionState.INITIALIZING // Provider initializing
SessionState.ACTIVE       // Transcription in progress
SessionState.PAUSED       // Transcription paused
SessionState.STOPPING     // Stopping transcription
SessionState.STOPPED      // Transcription stopped
SessionState.ERROR        // Error occurred

Events

Subscribe to events for real-time updates:

const transcriber = createTranscriber({ /* config */ });

// Transcript events
transcriber.on('transcript', (result) => {
  console.log('Text:', result.text);
  console.log('Is Final:', result.isFinal);
  console.log('Confidence:', result.confidence);
  console.log('Timestamp:', result.timestamp);
});

transcriber.on('final', (result) => {
  // Only final transcripts
  console.log('Final transcript:', result.text);
});

transcriber.on('interim', (result) => {
  // Only interim transcripts
  console.log('Interim:', result.text);
});

// Lifecycle events
transcriber.on('start', () => {
  console.log('Transcription started');
});

transcriber.on('stop', () => {
  console.log('Transcription stopped');
});

transcriber.on('pause', () => {
  console.log('Transcription paused');
});

transcriber.on('resume', () => {
  console.log('Transcription resumed');
});

// State changes
transcriber.on('stateChange', (state) => {
  console.log('State changed to:', state);
});

// Language changes
transcriber.on('languageChange', (change) => {
  console.log(`Language: ${change.from} -> ${change.to}`);
});

// Error handling
transcriber.on('error', (error) => {
  console.error('Error code:', error.code);
  console.error('Error message:', error.message);
  console.error('Provider:', error.provider);
});

// Remove listeners
transcriber.off('transcript', myHandler);
transcriber.removeAllListeners();

TranscriptionResult Object

interface TranscriptionResult {
  text: string;           // Transcribed text
  isFinal: boolean;       // Is this a final result?
  confidence?: number;    // Confidence score (0-1)
  timestamp: number;      // Unix timestamp
  speaker?: string;       // Speaker ID (if diarization enabled)
  language?: string;      // Detected language
  words?: Word[];         // Word-level timing
}

interface Word {
  text: string;
  start: number;  // Start time in ms
  end: number;    // End time in ms
  confidence?: number;
}

Export Formats

Export transcripts in multiple formats:

const session = createSession({ /* config */ });

// Add transcripts during session
session.provider.on('transcript', (result) => {
  if (result.isFinal) {
    session.addTranscript(result);
  }
});

// After transcription, export in various formats

// JSON - Full data with metadata
const jsonExport = session.export('json');
console.log(jsonExport.data);     // JSON string
console.log(jsonExport.filename); // 'transcript-{id}.json'
console.log(jsonExport.mimeType); // 'application/json'

// Plain Text - Just the text
const textExport = session.export('text');
// Output: "Hello world. How are you today?"

// SRT - SubRip subtitles
const srtExport = session.export('srt');
// Output:
// 1
// 00:00:01,000 --> 00:00:03,500
// Hello world.
//
// 2
// 00:00:04,000 --> 00:00:06,500
// How are you today?

// VTT - WebVTT subtitles
const vttExport = session.export('vtt');
// Output:
// WEBVTT
//
// 00:00:01.000 --> 00:00:03.500
// Hello world.
//
// 00:00:04.000 --> 00:00:06.500
// How are you today?

// CSV - Spreadsheet format
const csvExport = session.export('csv');
// Output: timestamp,text,confidence,isFinal
// 1234567890,Hello world,0.95,true

// Download in browser
function downloadTranscript(format: string) {
  const exported = session.export(format);
  const blob = new Blob([exported.data], { type: exported.mimeType });
  const url = URL.createObjectURL(blob);
  const a = document.createElement('a');
  a.href = url;
  a.download = exported.filename;
  a.click();
  URL.revokeObjectURL(url);
}

Supported Languages

The library supports 40+ languages. Language support varies by provider.

Web Speech API Languages

Language Code Language Code
English (US) en-US English (UK) en-GB
English (Australia) en-AU English (India) en-IN
Spanish (Spain) es-ES Spanish (Mexico) es-MX
French (France) fr-FR French (Canada) fr-CA
German de-DE Italian it-IT
Portuguese (Brazil) pt-BR Portuguese (Portugal) pt-PT
Chinese (Simplified) zh-CN Chinese (Traditional) zh-TW
Japanese ja-JP Korean ko-KR
Hindi hi-IN Arabic (Saudi Arabia) ar-SA
Russian ru-RU Dutch nl-NL
Polish pl-PL Turkish tr-TR
Thai th-TH Vietnamese vi-VN
Indonesian id-ID Hebrew he-IL
Czech cs-CZ Greek el-GR
Swedish sv-SE Danish da-DK
Finnish fi-FI Norwegian no-NO
Ukrainian uk-UA Romanian ro-RO
Hungarian hu-HU Malay ms-MY

Setting Language

// At creation
const transcriber = createTranscriber({
  provider: TranscriptionProvider.WebSpeechAPI,
  language: 'es-ES', // Spanish (Spain)
});

// Or use session
const session = createSession({
  provider: TranscriptionProvider.WebSpeechAPI,
  language: 'fr-FR', // French
});

Changing Language Mid-Transcript

You can change the language during an active transcription session. The library automatically handles stopping and restarting the transcription with the new language:

const session = createSession({
  provider: TranscriptionProvider.WebSpeechAPI,
  language: 'en-US',
});

await session.start();

// User switches to Spanish mid-conversation
await session.setLanguage('es-ES');
// Transcription continues seamlessly in Spanish

// Switch to French
await session.setLanguage('fr-FR');

// Get current language
console.log(session.getLanguage()); // 'fr-FR'

With Transcriber Directly:

const transcriber = createTranscriber({
  provider: TranscriptionProvider.WebSpeechAPI,
  language: 'en-US',
});

// Listen for language changes
transcriber.on('languageChange', (change) => {
  console.log(`Language changed from ${change.from} to ${change.to}`);
});

await transcriber.initialize();
await transcriber.start();

// Change language while recording
await transcriber.setLanguage('de-DE');

// Get current language
console.log(transcriber.getLanguage()); // 'de-DE'

React Example with Language Selector:

function TranscriptionWithLanguageSwitch() {
  const [language, setLanguage] = useState('en-US');
  const sessionRef = useRef<TranscriptionSession | null>(null);

  const changeLanguage = async (newLang: string) => {
    setLanguage(newLang);
    if (sessionRef.current) {
      await sessionRef.current.setLanguage(newLang);
    }
  };

  return (
    <div>
      <select value={language} onChange={(e) => changeLanguage(e.target.value)}>
        <option value="en-US">English</option>
        <option value="es-ES">Spanish</option>
        <option value="fr-FR">French</option>
        <option value="de-DE">German</option>
        <option value="ja-JP">Japanese</option>
      </select>
      {/* Transcription continues without interruption */}
    </div>
  );
}

API Reference

createTranscriber(config)

Creates a new transcriber instance.

function createTranscriber(config: TranscriptionConfig): ITranscriptionProvider;

Config Options:

Option Type Required Default Description
provider TranscriptionProvider Yes - Provider to use
apiKey string For cloud - API key for cloud providers
language string No 'en-US' Language code
interimResults boolean No true Enable interim results
punctuation boolean No true Enable auto-punctuation
profanityFilter boolean No false Filter profanity

createSession(config)

Creates a new transcription session.

function createSession(config: TranscriptionConfig): TranscriptionSession;

TranscriptionSession

Method Returns Description
start() Promise Start transcription
stop() Promise Stop transcription
pause() void Pause transcription
resume() void Resume transcription
getState() SessionState Get current state
getTranscripts(finalOnly?) TranscriptionResult[] Get all transcripts
getFullText() string Get concatenated text
getStatistics() SessionStatistics Get session stats
addTranscript(result) void Add a transcript
export(format) ExportResult Export transcripts
setLanguage(language) Promise Change language mid-session
getLanguage() string Get current language

SessionStatistics

interface SessionStatistics {
  wordCount: number;
  transcriptCount: number;
  duration: number;
  averageConfidence: number;
}

Examples

React Integration

import React, { useState, useEffect, useRef } from 'react';
import { createSession, TranscriptionProvider, TranscriptionSession } from '@360labs/live-transcribe';

function TranscriptionComponent() {
  const [isRecording, setIsRecording] = useState(false);
  const [transcript, setTranscript] = useState('');
  const sessionRef = useRef<TranscriptionSession | null>(null);

  useEffect(() => {
    return () => {
      sessionRef.current?.stop();
    };
  }, []);

  const startRecording = async () => {
    const session = createSession({
      provider: TranscriptionProvider.WebSpeechAPI,
      language: 'en-US',
    });

    session.provider.on('transcript', (result) => {
      if (result.isFinal) {
        setTranscript(prev => prev + ' ' + result.text);
        session.addTranscript(result);
      }
    });

    sessionRef.current = session;
    await session.start();
    setIsRecording(true);
  };

  const stopRecording = async () => {
    await sessionRef.current?.stop();
    setIsRecording(false);
  };

  return (
    <div>
      <button onClick={isRecording ? stopRecording : startRecording}>
        {isRecording ? 'Stop' : 'Start'} Recording
      </button>
      <p>{transcript}</p>
    </div>
  );
}

Vue Integration

<template>
  <div>
    <button @click="toggleRecording">
      {{ isRecording ? 'Stop' : 'Start' }} Recording
    </button>
    <p>{{ transcript }}</p>
  </div>
</template>

<script setup lang="ts">
import { ref, onUnmounted } from 'vue';
import { createSession, TranscriptionProvider } from '@360labs/live-transcribe';

const isRecording = ref(false);
const transcript = ref('');
let session: any = null;

const toggleRecording = async () => {
  if (isRecording.value) {
    await session?.stop();
    isRecording.value = false;
  } else {
    session = createSession({
      provider: TranscriptionProvider.WebSpeechAPI,
      language: 'en-US',
    });

    session.provider.on('transcript', (result: any) => {
      if (result.isFinal) {
        transcript.value += ' ' + result.text;
      }
    });

    await session.start();
    isRecording.value = true;
  }
};

onUnmounted(() => {
  session?.stop();
});
</script>

Node.js with Deepgram

import { createTranscriber, TranscriptionProvider } from '@360labs/live-transcribe';
import { createReadStream } from 'fs';

const transcriber = createTranscriber({
  provider: TranscriptionProvider.Deepgram,
  apiKey: process.env.DEEPGRAM_API_KEY,
  language: 'en-US',
});

transcriber.on('transcript', (result) => {
  console.log(result.text);
});

await transcriber.initialize();
await transcriber.start();

// Send audio data
const audioStream = createReadStream('audio.wav');
audioStream.on('data', (chunk) => {
  transcriber.sendAudio(chunk);
});

audioStream.on('end', async () => {
  await transcriber.stop();
});

Browser Support

Browser Web Speech API WebSocket (Cloud)
Chrome 33+ ✅ Full ✅ Full
Edge 79+ ✅ Full ✅ Full
Safari 14.1+ ✅ Partial ✅ Full
Firefox ✅ Full
Opera 20+ ✅ Full ✅ Full

Note: Web Speech API requires an internet connection in most browsers as it uses cloud-based recognition.


Error Handling

import { TranscriptionError, ErrorCode } from '@360labs/live-transcribe';

transcriber.on('error', (error: TranscriptionError) => {
  switch (error.code) {
    case ErrorCode.MICROPHONE_ACCESS_DENIED:
      console.log('Please allow microphone access');
      break;
    case ErrorCode.NETWORK_ERROR:
      console.log('Network error - check your connection');
      break;
    case ErrorCode.AUTHENTICATION_FAILED:
      console.log('Invalid API key');
      break;
    case ErrorCode.UNSUPPORTED_BROWSER:
      console.log('Browser not supported');
      break;
    default:
      console.log('Error:', error.message);
  }
});

Error Codes

Code Description
INITIALIZATION_FAILED Provider failed to initialize
AUTHENTICATION_FAILED Invalid or missing API key
NETWORK_ERROR Network connection error
MICROPHONE_ACCESS_DENIED Microphone permission denied
UNSUPPORTED_BROWSER Browser doesn't support required APIs
INVALID_CONFIG Invalid configuration provided
PROVIDER_ERROR Provider-specific error
UNKNOWN_ERROR Unknown error occurred

Audio Processing Utilities

The library includes audio processing utilities:

import { AudioProcessor } from '@360labs/live-transcribe';

// Convert Float32 to Int16 (for sending to APIs)
const int16Data = AudioProcessor.convertFloat32ToInt16(float32Array);

// Convert Int16 to Float32
const float32Data = AudioProcessor.convertInt16ToFloat32(int16Array);

// Resample audio
const resampled = AudioProcessor.resampleBuffer(buffer, 44100, 16000);

// Normalize audio levels
const normalized = AudioProcessor.normalizeBuffer(buffer);

// Apply gain
const amplified = AudioProcessor.applyGain(buffer, 1.5);

// Mix two audio buffers
const mixed = AudioProcessor.mixBuffers(buffer1, buffer2, 0.5);

Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.


Support


About 360labs

360labs builds AI-powered developer tools and APIs. We focus on creating simple, powerful libraries that help developers build amazing products faster.

Other products by 360labs:


Made with ❤️ by 360labs

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages