We want to prototype a method for removing discriminating biases in data. First the bias has to be detected and measured. Then the data should be adjusted acordingly. E.g via undersampling/oversampling. In this way, we want to create unbiased data with which a fair AI can be trained.
- find nice datasets
- make prediction (e.g predict pay)
- statistik dies das -> show bias in prediction (e.g towards woman)
- based on statistics adjust training set
- retrain
- profit.
Salary based on gender, experience, education:
Detailed Salary, Gender by name:
- https://www.kaggle.com/datasets/kaggle/sf-salaries
- https://www.kaggle.com/datasets/franjmartin21/gender-by-name
Other: https://www.kaggle.com/datasets/kaggle/sf-salaries/discussion/45482 way to evaluate pay gap in ethnics
https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/ethnicity/datasets/2011censusanalysisethnicityandthelabourmarket economic activity by gender and ethnicity