Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch/Tensorflow #2

Open
ClashLuke opened this issue Nov 24, 2019 · 14 comments
Open

PyTorch/Tensorflow #2

ClashLuke opened this issue Nov 24, 2019 · 14 comments
Assignees
Labels
architectural change Someone have to take a decision here ... enhancement New feature or request good first issue Good for newcomers performance Same work, less readable

Comments

@ClashLuke
Copy link

Hello there,
I recently stumbled upon this repository and was interested in trying out your code. However, using single-threaded sklearn doesn't seem to be efficient to me, compared to using GPU-optimized PyTorch or TF.
Do you have any plans of moving to those frameworks, or would you accept a pullrequest implementing these?
Regards,
Luke

@alessiosavi alessiosavi added enhancement New feature or request good first issue Good for newcomers labels Nov 25, 2019
@alessiosavi
Copy link
Owner

alessiosavi commented Nov 25, 2019

Hi Clash,

Now i'm trying to understand which type of neural network suits better for recognize the 68 points extract from the face. So the work that you find here is only test/study purpose, for me and for everyone that need a basecode.

I'm currently changing the KNN in order to use a MLP classifier, that is obviosly more effifcient (in terms of precision) for this purpouse.

Of course, pull request are welcome!
I've played very little with PyTorch, and i think that Tensorflow will be a more preferable choiche.

@alessiosavi
Copy link
Owner

alessiosavi commented Nov 25, 2019

I've made some tests.
During the predict phase, the most time consuming process is the face encoding.

image

As you can see, encode two face cost ~3s on my hardware (GeForce 940MX).
It's because the jitter parameter used during the training/tuning phase have to be equal when make prediction, and i've choose 300 in order to increase the type of distortion made on the photos before training/predict.

Are you talking about tuning/training phase or predict?


Be sure to pull from master, i've migrated to MLP classifier that is more precise during prediction.

@alessiosavi alessiosavi added architectural change Someone have to take a decision here ... performance Same work, less readable labels Nov 25, 2019
alessiosavi pushed a commit that referenced this issue Nov 25, 2019
Enhancements

Issue #2
 - Model saving mechanism rewritten from scratch (using timestamp as name)
  - Every model will be now saved in a different directory
  - Every data related to the model (dataset + configuration) will be saved in the same folder
  - Configuration file changed due to new implementation of model folder
  - dump_model (dataset) rewritten and migrated to utils
  - dump_model (classifier) rewriten in order to be compliant with new folder architecture
 - Remove migrated parallelism from "different person" from "different image same person"
 - Enabled progress bar during face analysis
 - Response constuctor will now accept parameter

Issue #4
 - Create function for retrieve the dataset from the input HTML form and return to tune/train function
 - Standardize and refactor logic for train/tune

BugFix
 - Dump the real classifier (grid.best_estimator_)
@ClashLuke
Copy link
Author

First of all, thank you very much for taking your time to reply to this issue regarding training-optimization.

Second, you pointed out that the most time is consumed when encoding faces. This process could mostly be skipped, by using a couple keras features, such as the ImageDataGenerator.

Why does the jitter have to be equal during training and prediction? Isn't it normally used as a regularization technique, and therefore should be left out during inference, or am I thinking about something else here?

Lastly, I'd love to know if you'd be fine with a (backwards-compatible) switch to a CNN, so that we could compare the performance of Inception-v4 with an MLP.

@alessiosavi
Copy link
Owner

alessiosavi commented Nov 27, 2019

Hi Sir,

Thank you for the interest in the project!
You was completely right!

The problems related to the jitter parameter, was caused from the KNN that was not able to recognize the faces if trained using an high number of distortion. So far as that they does not correspond quite strictly (during the train/prediction phase), seems that the network is not very precise (i think that this have to be investigated in the archiecture of the KNN, but is out of scope).
With the MLP architecture (that have increase by an huge factor the confidence during the prediction), this little strange things is no more relevant, and we can use a different number of jitter during the training and predict phase.

Of course, the jitter parameter cause lot of time spent in image distortion (from documentation, jitter=300 -> 300x the time used), so use a different approach for create distortion will be a great performance improvement.
We have to understand the necessary parameters for ImageDataGenerator in order to preserve the quality of prediction (jitter make an average of the augmented data) and increase the speed of faces encodings.

From the architectural POV, we can play as much as we want with the code. So we can try lot of different type of network, using the same dataset for compare the result.

I expect that (with an higher amount of data), the CNN/RNN perform better. I suspect, instead, that with very few photos (and the majority of the ones of this dataset are lesser than 10), the MLP will perform slightly better.

During the change of the NN basecode (Classifer.py), it's important to maintain the possibility of recognize multiple faces in the same photo.

Before migrate to tensorflow, i think that are some work from my side in order to clean the code and standardize return function.

alessiosavi added a commit that referenced this issue Jan 3, 2020
Enhancements

Issue #2
 - Model saving mechanism rewritten from scratch (using timestamp as name)
  - Every model will be now saved in a different directory
  - Every data related to the model (dataset + configuration) will be saved in the same folder
  - Configuration file changed due to new implementation of model folder
  - dump_model (dataset) rewritten and migrated to utils
  - dump_model (classifier) rewriten in order to be compliant with new folder architecture
 - Remove migrated parallelism from "different person" from "different image same person"
 - Enabled progress bar during face analysis
 - Response constuctor will now accept parameter

Issue #4
 - Create function for retrieve the dataset from the input HTML form and return to tune/train function
 - Standardize and refactor logic for train/tune

BugFix
 - Dump the real classifier (grid.best_estimator_)
alessiosavi added a commit that referenced this issue Jan 3, 2020
Enhancements

Issue #2
 - Model saving mechanism rewritten from scratch (using timestamp as name)
  - Every model will be now saved in a different directory
  - Every data related to the model (dataset + configuration) will be saved in the same folder
  - Configuration file changed due to new implementation of model folder
  - dump_model (dataset) rewritten and migrated to utils
  - dump_model (classifier) rewriten in order to be compliant with new folder architecture
 - Remove migrated parallelism from "different person" from "different image same person"
 - Enabled progress bar during face analysis
 - Response constuctor will now accept parameter

Issue #4
 - Create function for retrieve the dataset from the input HTML form and return to tune/train function
 - Standardize and refactor logic for train/tune

BugFix
 - Dump the real classifier (grid.best_estimator_)
alessiosavi added a commit that referenced this issue Jan 3, 2020
Enhancements

Issue #2
 - Model saving mechanism rewritten from scratch (using timestamp as name)
  - Every model will be now saved in a different directory
  - Every data related to the model (dataset + configuration) will be saved in the same folder
  - Configuration file changed due to new implementation of model folder
  - dump_model (dataset) rewritten and migrated to utils
  - dump_model (classifier) rewriten in order to be compliant with new folder architecture
 - Remove migrated parallelism from "different person" from "different image same person"
 - Enabled progress bar during face analysis
 - Response constuctor will now accept parameter

Issue #4
 - Create function for retrieve the dataset from the input HTML form and return to tune/train function
 - Standardize and refactor logic for train/tune

BugFix
 - Dump the real classifier (grid.best_estimator_)
alessiosavi added a commit that referenced this issue Jan 3, 2020
Enhancements

Issue #2
 - Model saving mechanism rewritten from scratch (using timestamp as name)
  - Every model will be now saved in a different directory
  - Every data related to the model (dataset + configuration) will be saved in the same folder
  - Configuration file changed due to new implementation of model folder
  - dump_model (dataset) rewritten and migrated to utils
  - dump_model (classifier) rewriten in order to be compliant with new folder architecture
 - Remove migrated parallelism from "different person" from "different image same person"
 - Enabled progress bar during face analysis
 - Response constuctor will now accept parameter

Issue #4
 - Create function for retrieve the dataset from the input HTML form and return to tune/train function
 - Standardize and refactor logic for train/tune

BugFix
 - Dump the real classifier (grid.best_estimator_)
@ClashLuke
Copy link
Author

Since the ImageDataGenerator has a lot of parameters it might be better to switch to the new system instead. Unfortunately I got lost in another repository trying to figure out what the jitter paramter does. Could you explain it real quick?
I also have to agree that MLPs might perform better than CNNs on tiny datasets. Luckily the ImageDataGenerator can alleviate this issue quite a bit.

@alessiosavi
Copy link
Owner

alessiosavi commented Jan 12, 2020

Hi @ClashLuke,

The num_jitter is related to the number of times to re-sample the face when calculating encoding.
If num_jitters>1 then each face will be randomly jittered slightly num_jitters times, each run through the 128D projection, and the average used as the face descriptor.

After some test, I've realized that cv2 is better to "find" faces in photos, when the image have low quality or the person in the photo have a "not centred" face angle.

I think that the first step is to migrate the face recognition from dlib (face_recognition use dlib internally) to the cv2.dnn.readNetFromCaffe. This will traduce to an increase of the quality related to face detection. In context like CCVT camera or low resolution/quality of the photo, we can be sure to have a quite optimal face detection tool. Than we can move to generate some augmented data that is helpful for the train/tune process. I think that i can start to work to the cv2 migration in the next month. I've lost the jupyter-notebook where i start to develop the poc of the migration

@ClashLuke
Copy link
Author

I dont think I understand. You sample the jitter paramter randomly from a uniform distribution u=[-P;P], where P is a parameter you set somewhere, correct?
Doesn't that mean that, we have an irwin-hall distribution, implying that we now have a normal distribution with zero mean and σ=((P*300)/12)^0.5*P=5P? Why jitter so many times?

Another thing I don't quite understand is where opencv and dlib come from. I assumed there was a MLP involved in this process?
Lastly, do you know why it's better at figuring out what the jittereted faced contain? Are the labels jittered as well?

@ClashLuke
Copy link
Author

I can finally say that I know what you're doing when training the model.
We can definitely keep the initial pipeline, even though it would be nice to jitter while training.
Should I give it a try to rewrite the Classifier in a new Tensorflow branch/fork?

I'd have to rewrite the hyperparameter search though. While I'm at it I'd also change the architecture to a densenet, as they are insanely powerful for mlps.
Would that be an issue for you?

@alessiosavi
Copy link
Owner

Hi @ClashLuke, thank you for the interest and sorry for the late response. I'm very busy these days and i can only contribute in the weekend.

Of course, you can rewrite every part of the code that you are confident (:

Another thing I don't quite understand is where opencv and dlib come from. I assumed there was a MLP involved in this process?

dlib is used (from the face_recognition high level API) in order to extract the point related to the face. It use a custom version of the library but now we can use the one present in the master branch of the project.
The MLP is delegated to "link" the face encodings to the label

Lastly, do you know why it's better at figuring out what the jittereted faced contain? Are the labels jittered as well?
From my understanding, no. The jitter perform data augmentation on the image, so the label is the same.

I've created a gitter channel in order to discuss the future change/roadmap of the project.
https://gitter.im/PyRecognizer/PyRecognizer-Development

Thank you another time for the interest for the project.

@ClashLuke
Copy link
Author

Are the hyperparameter search and the architecture search important? If not, I'd postpone them for now. The basic model already exists. What's next are the training loop and regularization.

@alessiosavi
Copy link
Owner

Hi Clash!

Thank you for the effort of the analysis! I'm here for explanation if you need some tips on the code.

Of course, we can tune the hyperparameters in the next phase :D

@ClashLuke
Copy link
Author

ClashLuke commented Apr 22, 2020

Pretty sure I've got a testable state with model tuning now here. I had to remove balanced accuracy and precision for now, as I wasn't keen on calculating accuracy in buckets.
Now, how would I go about testing this unit?

@ClashLuke
Copy link
Author

Any news?

@alessiosavi
Copy link
Owner

Hi Clash, I'm going to rewrite the "backen engine" from scratch using dlib and tensorflow. I'm going to update the repo in the next month.

I'm testing the neural network and it have ~97% accuracy on validation dataset!
I'm changing completely the architecture of the code.
I think that the repository will split in two different part:
Python NeuralNetwork, a webserver that run on localhost delegated to:

  • Load the image
  • Recognize face bound
  • Perform data augmentation using ImageDataGenerator
  • Encode face using shape_predictor_68_face_landmarks.dat and dlib_face_recognition_resnet_model_v1.dat instead of face_recognition wrapper in order to get more accuracy
  • Use a Tensorflow Dense network

Go webservice:

  • New go frontend delegated to talk with the python daemon in order to expose the predict functionality.

In first instance the train will be delegated to run without HTTP interaction, so scripts will be released in order to train "offline" the network

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architectural change Someone have to take a decision here ... enhancement New feature or request good first issue Good for newcomers performance Same work, less readable
Projects
None yet
Development

No branches or pull requests

2 participants