The objective of this project is to support research in the field of Active Learning by conducting a detailed study on the effectiveness of some of its techniques for collecting and classifying information about an on-going event. This project aims to examine how these techniques can be used to improve the data collection process during a dynamic and rapidly changing situation, to achieve an accurate and useful information system.
In particular, the Coreset approach is studied in two different approaches:
- Farthest-First
This algorithm selects a randomly labeled data point and it searches for the furthest unlabelled data point from it and it is labelled and added to the list. This process is repeated until it has reached the budget.
- Minimax
This algorithm’s focus is on the unlabelled elements, since in every iteration it searches for the data point that is the furthest from its nearest labeled element, and it is later labeled and added to the set with the others.
The development of the models can be seen in the al_project notebook, while the results are analyzed in the al_analysis one.
Inside the report, an in-depth explanation of the project is found; for a more visual description, there is also a presentation that can be looked at.
Finally, in the models folder, the neural networks are saved inside each .h5 file, with some additional information about the parameters used and metrics performance, they can be further analyzed with the Model class inside the al_analysis folder.
N.B. the datasets used for training can't be shared.