Communication is very significant to human beings as it facilitates the spread of knowledge and forms relationships between people. We communicate through speech, facial expressions, hand signs, reading, writing or drawing etc. But speech is the most commonly used mode of communication. However, people with hearing and speaking disability only communicate through signs, which makes them highly dependent on non-verbal forms of communication. India is a vast country, having nearly five million people deaf and hearing impaired. Still very limited work has been done in this research area, because of its complex nature. Indian Sign Language (ISL) is predominantly used in South Asian countries and sometimes, it is also called as Indo-Pakistani Sign Language (IPSL).
The purpose of this project is to recognize all the alphabets (A-Z) and digits (0-9) of Indian sign language using bag of visual words model and convert them to text/speech. Dual mode of recognition is implemented for better results. The system is tested using various machine learning classifiers like KNN, SVM, logistic regression and a convolutional neural network (CNN) is also implemented for the same. The dataset for this system is created manually in different hand orientations and a train-test ratio of 80:20 is used.
If you use this in your research, please cite:
Shagun Katoch, Varsha Singh, Uma Shanker Tiwary, Indian Sign Language recognition system using SURF with SVM and CNN, Array, Volume 14, 2022, 100141, ISSN 2590-0056, https://doi.org/10.1016/j.array.2022.100141
.
Before running this project, make sure you have following dependencies -
Download the images from here
Some images of the dataset are shown below:
To run the project, perform the following steps -
git clone https://github.com/shag527/Indian-Sign-Language-Recognition.git
conda create --name sign python=3.7.1
conda activate sign
.pip install -r requirements.txt
.cd to the GitHub Repo till Predict signs folder
.
Command may look like: cd 'C:\Users\.....\Indian-Sign-Language-Recognition\Code\Predict signs\'
python main.py
A tkinter window like this will open.
- Create your account to access the system.
- Now, the main tkinter window will open.
- To create your own dataset, following the steps given above, go to the create signs panel and create signs.
- Now, divide the dataset into train and test by running the Dividing_Dataset.ipynb file in the preprocessing folder.
- To create histograms and saving them to .csv file, run the create_train_hist.py and create_test_hist.py respectively by extrating the SURF features and clustering them using MiniKbatchMeans.
- Lastly, go to the classification folder and run different python files to check the results.
- After saving the model, you can load the model for testing purposes.
Here 2 methods for preprocessing are used. First one is the background subtraction using an additive method, in which the first 30 frames are considered as background and any new object in the frame is then filtered out. Second one uses the skin segmentation concept, which is based on the extraction of skin color pixels of the user.
Mask After applying mask Canny Edge detection
The Speeded Up Robust Feature (SURF) technique is used to extract descriptors from the segmented hand gesture images. These descriptors are then clustered to form the similar clusters and then the histograms of visual words are generated, where each image is represented by the frequency of occurrence of all the clustered features. The total classes are 36.
In this phase, various classifiers are used in order to check the best classifier for prediction. The classifiers used are:
- Naive Bayes
- Logistic Regression
- K-Nearest Neighbours (KNN)
- Support Vector Machine (SVM)
- Convolution Neural Network (CNN)
The accuracy rate of different classifiers obtained are shown below:
The predicted labels are shown in the form of text as well as speech using the python text to speech conversion library, Pyttsx3.
Dual mode of communication is implemented. The spoken word is taken as input and the corresponding sign images are shown in sequence. Google speech API is used for this purpose.