An advanced audio analysis platform that leverages AI to provide deep insights into conversations. This tool performs speaker diarization, transcription, sentiment analysis, and emotion detection to generate comprehensive conversation summaries.
- Speaker Diarization: Automatically identifies and separates different speakers
- Speech-to-Text: Accurate transcription of conversations
- Sentiment Analysis: Analyzes the sentiment of each speaker's utterances
- Emotion Detection: Identifies emotions in speech using audio features
- Conversation Summary: AI-powered detailed analysis of conversation dynamics
- Batch Processing: Analyze multiple audio files simultaneously
- GPU Acceleration: Optimized performance with CUDA support
- Single file analysis interface
- Batch processing capability (up to 10 files)
- Detailed conversation insights
- Downloadable analysis results
- Progress tracking and status updates
- Clone the repository
git clone https://github.com/Sarthakischill/Conversation_Analysis.git
cd conversation-analysis-platform- Create and activate a virtual environment
# Windows
python -m venv venv
venv\Scripts\activate
# Linux/Mac
python -m venv venv
source venv/bin/activate- Install required packages
pip install -r requirements.txt- Set up authentication
- Create a Hugging Face account
- Accept the license for pyannote/speaker-diarization-3.1
- Get your Hugging Face token
- Replace the token in
final.py
# In final.py
self.diarization_pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1",
use_auth_token="your-token-here"
)- Start the application
streamlit run Home.py-
Access the web interface at
http://localhost:8501 -
Choose between:
- Single file analysis (Home page)
- Batch analysis (Batch Analysis page)
-
Upload WAV format audio file(s)
-
Click "Analyze Conversation" to start processing
-
View and download results
app/
├── pages/
│ └── 1_Batch_Analysis.py
├── models/
│ └── emotion_detection_model.pkl
├── uploads/
├── results/
└── Home.py
- Python 3.10 or later
- CUDA-capable GPU (recommended)
- CUDA Toolkit (for GPU acceleration)
- Minimum 8GB RAM
Major dependencies include:
- streamlit
- torch
- pyannote.audio
- whisper
- transformers
- librosa
- google-generativeai
- scikit-learn
See requirements.txt for complete list.
- Speaker separation and identification
- High-quality speech-to-text conversion
- Real-time sentiment analysis
- Emotion detection from audio features
- Speaker interaction patterns
- Emotional tone mapping
- Sentiment progression
- Key topics identification
- Multiple file upload support
- Parallel processing capability
- Combined results in ZIP format
- Progress tracking for each file
The analysis generates a structured summary including:
- Main Topics
- Conversation Dynamics
- Speaker Analysis
- Key Points
- Predominant Emotions
- Sentiments
- Behavior Analysis
- Tone
- Overall Emotional Tone
- Key Insights
- Actionable Suggestions
Common issues and solutions:
-
CUDA related warnings:
- Ensure CUDA toolkit is installed
- Update GPU drivers
- Check CUDA compatibility with PyTorch version
-
Memory issues:
- Reduce batch size
- Process shorter audio segments
- Close other GPU-intensive applications
-
Model loading errors:
- Verify Hugging Face token
- Check internet connection
- Ensure model files are present
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- PyAnnote Audio for speaker diarization
- OpenAI Whisper for transcription
- Hugging Face for transformer models
- Google for Gemini API
- Streamlit for the web interface
[Sarthak]
- GitHub: [@Sarthakischill]
- Email: sarthakshitole@gmail.com
If you encounter any issues or have questions, please:
- Check the troubleshooting section
- Open an issue on GitHub!
- Contact the author

