This is a TensorFlow implementation of the face recognizer described in the paper "FaceNet: A Unified Embedding for Face Recognition and Clustering". The project also uses ideas from the paper "Deep Face Recognition" from the Visual Geometry Group at Oxford.
The code is tested using Tensorflow r1.14 under Windows with Python 3.6.
Date | Update |
---|---|
2019-10-30 | Added changes to work with newer version of TensorFlow and removed scipy deprecated dependencies. |
Model name | LFW accuracy | Training dataset | Architecture |
---|---|---|---|
20180408-102900 | 0.9905 | CASIA-WebFace | Inception ResNet v1 |
20180402-114759 | 0.9965 | VGGFace2 | Inception ResNet v1 |
NOTE: If you use any of the models, please do not forget to give proper credit to those providing the training dataset as well.
The code is heavily inspired by the OpenFace implementation.
The CASIA-WebFace dataset has been used for training. This training set consists of total of 453 453 images over 10 575 identities after face detection. Some performance improvement has been seen if the dataset has been filtered before training. Some more information about how this was done will come later. The best performing model has been trained on the VGGFace2 dataset consisting of ~3.3M faces and ~9000 classes.
One problem with the above approach seems to be that the Dlib face detector misses some of the hard examples (partial occlusion, silhouettes, etc). This makes the training set too "easy" which causes the model to perform worse on other benchmarks. To solve this, other face landmark detectors has been tested. One face landmark detector that has proven to work very well in this setting is the Multi-task CNN. A Matlab/Caffe implementation can be found here and this has been used for face alignment with very good results. A Python/Tensorflow implementation of MTCNN can be found here. This implementation does not give identical results to the Matlab/Caffe implementation but the performance is very similar.
Currently, the best results are achieved by training the model using softmax loss. Details on how to train a model using softmax loss on the CASIA-WebFace dataset can be found on the page Classifier training of Inception-ResNet-v1 and .
A couple of pretrained models are provided. They are trained using softmax loss with the Inception-Resnet-v1 model. The datasets has been aligned using MTCNN.
The accuracy on LFW for the model 20180402-114759 is 0.99650+-0.00252. A description of how to run the test can be found on the page Validate on LFW. Note that the input images to the model need to be standardized using fixed image standardization (use the option --use_fixed_image_standardization
when running e.g. validate_on_lfw.py
).