Skip to content

The project has the goal of realizing a proof of concept for evaluating the quality of collected information through social media for the configuration of data preparation pipelines.

Notifications You must be signed in to change notification settings

Crippius/al_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

al_project

Objective

The objective of this project is to support research in the field of Active Learning by conducting a detailed study on the effectiveness of some of its techniques for collecting and classifying information about an on-going event. This project aims to examine how these techniques can be used to improve the data collection process during a dynamic and rapidly changing situation, to achieve an accurate and useful information system.

data-pipeline

In particular, the Coreset approach is studied in two different approaches:

  • Farthest-First

This algorithm selects a randomly labeled data point and it searches for the furthest unlabelled data point from it and it is labelled and added to the list. This process is repeated until it has reached the budget.

Farthest-First

  • Minimax

This algorithm’s focus is on the unlabelled elements, since in every iteration it searches for the data point that is the furthest from its nearest labeled element, and it is later labeled and added to the set with the others.

Minimax

Repository information

The development of the models can be seen in the al_project notebook, while the results are analyzed in the al_analysis one.

Inside the report, an in-depth explanation of the project is found; for a more visual description, there is also a presentation that can be looked at.

Finally, in the models folder, the neural networks are saved inside each .h5 file, with some additional information about the parameters used and metrics performance, they can be further analyzed with the Model class inside the al_analysis folder.

N.B. the datasets used for training can't be shared.

About

The project has the goal of realizing a proof of concept for evaluating the quality of collected information through social media for the configuration of data preparation pipelines.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published