This is a project to make a CNN able to accurately count people in videos.
Input: ./data-videos/[*video_name.avi|mp4]
Output: ./dataset/[*video_name]/[*frame.jpg]
Automatically generated annotations are not perfect, consider manually optimizing them using cvat and re-running next steps. (COCO 1.0 format is used to export/import annotations)
Input: ./dataset/[*video_name]/[*frame.jpg]
Output: ./dataset-annotations/[*video_name]/[*frame_annotation.json]
3- Using the image dataset and the associated annotations, build a custom image model on top of the pretrained model to optimize detection for the dataset images domain
pip install -r requirements.txt
test.py
: Visualize/benchmark current existing annotations- Generate dataset and annotations if they don't exist
- Provide feedback on how efficient the selected model is for predicting people
merge_cvat_annotations.py
: When importing annotations from cvat, a lot of information is either lost or overwritten such as category_id, info, categories, etc, so this script solves this problem by inputting 2 annotation files and correctly merging the information- Input two relative filenames, which are expected to be:
- File with the correct metadata (auto generated previously by
test.py
) - File with the correct labels (which just got manually optimized and imported on cvat)
- File with the correct metadata (auto generated previously by
- Write new merged file to the current file path (overwrites existing file)
- Input two relative filenames, which are expected to be:
main.py
: Main project file which does the following- Test the application in real-time using the current video device
- First of all download some video files and move them to
data-videos
folder. - Run
test.py
to build dataset, build dataset annotations and generate initial predictions, as well as visualize those predictions. - Create a task on cvat and import the annotations to visualize the predicted bounding boxes and manually optimize them as much as needed.
- Download the updated annotations as "COCO 1.0" and run
merge_cvat_annotations.py
passing the original annotations and the downloaded annotations as arguments. - Use
train.py
to train the pretrained model. - Run
python test.py -model yolov8s_trained
to test the trained model against the labels it was trained on. - Run
main.py
to test the trained model against the current video device.