Automatic-Signal-Detector

Automatically detect hand gestures using the laptop camera and opencv

Project originally done on google colab, just putting the code also on here.

University Computer Vision project

Below the description of the different tasks implemented.

Task 1: In this first task, I identified the face using a Haarcascade over a converted grayscale image. I used a gray image as they have less information to process, improving speed and efficiency compared to detection over a colored image. The region of interest is then identified and drawn with the coordinates identified in the detect function. It is important to note that the image is defined from the top left at 0 ,0 to the bottom right.

Task 3: For the third task I am going to use facedetect once to get the face and compute the region of interest. I am then switching to camshift to calculate the hystogram of the face. The algoritmh then calculates on the window where its the most probable to get a face and searches in the region of interest where is the face that has the same distribution.

Task 4: In the fourth task the goal is to remove the face so we can detect the hands. I am goin to remove the face from the probability map so the algorithm will look at the picture to find where there is a similar distribution of color as the face and it will find the hands.

Task 5: In task 5 the scope is to detect and store the hand as an image in two different sizes: 16x16 and 224x224. We start by asking how many pictures you want to take and how many seconds between each picture. We continue by actually taking the pictures, cropping them so it takes just the hand and saving them.

Task 6: In task 6 we create our dataset, the letters chosen were M, N and W.

Task 7: In task 7 task we build our MLP.

Comparison: After training the 3 models we are going to see how each models performs on each dataset. Reminder: Dataset 1: 3 letters with equal number of pictures and a lot of variability Dataset 2: 3 letters with unbalanced number of pictures (50 - 100 -150) and a lot of variability Dataset 3: 3 letters with equal number of pictures and one of them with no variability (N)

-- Model 1 --

dataset1.txt

210 train samples - 90 test samples
Validation loss: 1.4552514553070068
Validation accuracy: 0.6555555462837219

dataset2.txt

244 train samples - 106 test samples
Validation loss: 0.9062689542770386
Validation accuracy: 0.8301886916160583

dataset3.txt

210 train samples - 90 test samples
Validation loss: 0.669061005115509
Validation accuracy: 0.8444444537162781

Model 1 performs best with dataset 2 and 3, this is probably due to the unbalanced number of pictures in dataset 2 and lack of variability in dataset 3.

-- Model 2 --

dataset1.txt

210 train samples - 90 test samples
Validation loss: 1.7094557285308838
Validation accuracy: 0.7666666507720947

dataset2.txt

244 train samples - 106 test samples
Validation loss: 0.9224103689193726
Validation accuracy: 0.8396226167678833

dataset3.txt

210 train samples - 90 test samples
Validation loss: 1.2521220445632935
Validation accuracy: 0.7555555701255798

Model 2 performs best with its own dataset. To make it simple, it's probably easier for the model to guess the right letter when one letter has such an high number of pictures compared to the others. There's a lot more probability that it's going to be the letter with the highest number of pictures.

-- Model 3 --

dataset1.txt

210 train samples - 90 test samples
Validation loss: 1.2043957710266113
Validation accuracy: 0.7888888716697693

dataset2.txt

244 train samples - 106 test samples
Validation loss: 1.269263744354248
Validation accuracy: 0.7641509175300598

dataset3.txt

210 train samples - 90 test samples
Validation loss: 1.8944649696350098
Validation accuracy: 0.699999988079071

Model 3 performs best with datase 1 and 2.

Task 8: Test phase: for the test phase I am going to use model 1. I show the hand to the camera doing some of the letters with which I trained the model. The program finds my hand, generates a gray scale image of the probability of your hand, resize the image to (1,256). Pass this image to the loaded model and predict. I show the prediction with a text in the video.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
images		images
CompVision_Ilaria_Enache.ipynb		CompVision_Ilaria_Enache.ipynb
README.md		README.md
dataset1.txt		dataset1.txt
dataset2.txt		dataset2.txt
dataset3.txt		dataset3.txt
model1.json		model1.json
model1_weights.h5		model1_weights.h5
model2.json		model2.json
model2_weights.h5		model2_weights.h5
model3.json		model3.json
model3_weights.h5		model3_weights.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic-Signal-Detector

University Computer Vision project

About

Releases

Packages

Languages

ilariae/Automatic-Signal-Detector

Folders and files

Latest commit

History

Repository files navigation

Automatic-Signal-Detector

University Computer Vision project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages