Speech Analysis and Summarization Pipeline

A comprehensive natural language processing pipeline that combines speech recognition, entity analysis, and text summarization. This project was developed as part of a Text Analytics course, focusing on processing and analyzing audio content using state-of-the-art ML models.

Project Overview

This project implements an end-to-end pipeline for processing audio content, with three main components:

Speech Recognition: Uses Facebook's Wav2Vec2 model to convert speech to text
Entity Analysis: Employs spaCy for named entity recognition and analysis
Text Summarization: Leverages FLAN-T5 for generating multi-level summaries

The pipeline was initially developed and tested using the LibriSpeech dataset, achieving high accuracy in transcription and meaningful entity extraction and summarization results.

Project Structure

├── app.py                    # Streamlit web interface
├── transcribe_audio.py       # Speech recognition module
├── entity_analyzer.py        # Named entity recognition module
├── text_summarizer.py        # Text summarization module
├── text_restorer.py         # Text preprocessing module
├── process_transcripts.py    # Transcript processing utilities
├── data_processor.py        # Data handling utilities
├── requirements.txt         # Python dependencies
└── environment.yml          # Conda environment specification

Core Components

1. Audio Transcription (`transcribe_audio.py`)

Implements AudioTranscriber class using Wav2Vec2
Supports multiple audio formats
Automatic device selection (CUDA/MPS/CPU)
Includes resampling and audio preprocessing
Outputs results in CSV format

2. Entity Analysis (`entity_analyzer.py`)

EntityAnalyzer class powered by spaCy
Extracts and categorizes named entities
Provides detailed entity statistics:
- Entity type distribution
- Unique entity counts
- Overall entity statistics

3. Text Summarization (`text_summarizer.py`)

Uses FLAN-T5 for advanced summarization
Generates multiple summary levels:
- Short summaries (30-50 words)
- Long summaries (50-150 words)
Adaptive length based on input text
Batch processing support for DataFrames

4. Web Interface (`app.py`)

Interactive Streamlit dashboard
Real-time audio processing
Visualization of entity distribution
Downloadable results in multiple formats
Support for various audio input formats

Installation

Set up the environment (choose one):

Using conda:

conda env create -f environment.yml
conda activate speech-analysis

Using pip:

pip install -r requirements.txt
python -m spacy download en_core_web_sm

Usage

Web Interface

streamlit run app.py

Command Line Usage

Transcribe audio:

python transcribe_audio.py path/to/audio.wav --output results.csv

Process entities and generate summaries:

from entity_analyzer import EntityAnalyzer
from text_summarizer import TextSummarizer

# Initialize components
analyzer = EntityAnalyzer()
summarizer = TextSummarizer()

# Process text
entities = analyzer.extract_entities(text)
short_summary, long_summary = summarizer.generate_summaries(text)

System Requirements

Python 3.10+
4GB+ RAM
GPU Support:
- CUDA-compatible GPU (optional)
- Apple M1/M2 chip (MPS support)
Storage:
- ~5GB for models and dependencies
- Additional space for audio processing

Key Dependencies

torch==2.7.0
transformers==4.30.2
spacy==3.8.7
streamlit==1.24.0
torchaudio (for audio processing)
pandas (for data handling)
plotly (for visualizations)

Performance Notes

First run downloads required models
Processing time depends on:
- Audio file length
- Selected device (GPU/CPU)
- Chosen summarization length
GPU acceleration recommended for batch processing

Future Improvements

Integration with real-time audio streaming
Support for additional languages
Custom model fine-tuning options
Enhanced entity visualization
Batch processing optimization

Contributors

Yunze Wei, Lanfeng Zheng, Keyu Shen, Bo Zhao, Kaiyuan Deng

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Analysis and Summarization Pipeline

Project Overview

Project Structure

Core Components

1. Audio Transcription (`transcribe_audio.py`)

2. Entity Analysis (`entity_analyzer.py`)

3. Text Summarization (`text_summarizer.py`)

4. Web Interface (`app.py`)

Installation

Usage

Web Interface

Command Line Usage

System Requirements

Key Dependencies

Performance Notes

Future Improvements

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.DS_Store		.DS_Store
README.md		README.md
app.py		app.py
data_processor.py		data_processor.py
entity_analyzer.py		entity_analyzer.py
environment.yml		environment.yml
final_output.csv		final_output.csv
final_version.ipynb		final_version.ipynb
process_transcripts.py		process_transcripts.py
text_restorer.py		text_restorer.py
text_summarizer.py		text_summarizer.py
transcribe_audio.py		transcribe_audio.py

lancezheng15/NLP_Project

Folders and files

Latest commit

History

Repository files navigation

Speech Analysis and Summarization Pipeline

Project Overview

Project Structure

Core Components

1. Audio Transcription (transcribe_audio.py)

2. Entity Analysis (entity_analyzer.py)

3. Text Summarization (text_summarizer.py)

4. Web Interface (app.py)

Installation

Usage

Web Interface

Command Line Usage

System Requirements

Key Dependencies

Performance Notes

Future Improvements

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Audio Transcription (`transcribe_audio.py`)

2. Entity Analysis (`entity_analyzer.py`)

3. Text Summarization (`text_summarizer.py`)

4. Web Interface (`app.py`)

Packages