Voice Vision is a Python-based virtual voice assistant capable of understanding voice commands and responding with relevant information using speech synthesis. It uses real-time speech recognition and natural language processing to perform tasks such as answering queries using Wikipedia, telling the current time, and more — all through voice interaction.
- Technology Purpose
- Python 3.x Core programming language
- SpeechRecognition Speech-to-text conversion
- pyttsx3 Text-to-speech output
- Wikipedia Fetching information from the web
- datetime Telling current time
- Streamlit GUI
- Clone the repository:
- git clone https://github.com/mayank-joshi525/voice-vision.git
- cd voice-vision
- streamlit run app.py
- pip install -r requirements.txt
- pip install SpeechRecognition pyttsx3 wikipedia streamlit
- Note: You may also need to install PyAudio. If you face issues installing it via pip, refer to platform-specific installation instructions.
- streamlit run app.py
- You will be prompted to speak. Try commands like:
“What is Python?”
“Tell me about Albert Einstein”
“What time is it?”
- voice-vision/
- ├── app.py # Main script
- ├── README.md # Project documentation
- ├── requirements.txt # List of dependencies
- └── features/
-
Manual testing was conducted using various test phrases.
-
Tested across different accents and noise conditions.
-
Wikipedia search and time announcements were validated with dynamic queries.
-
Dependent on microphone quality and internet connectivity.
-
May misinterpret accents or speech in noisy environments.
-
Limited to English and basic tasks (not a full-fledged AI assistant).