Responsible AI

We want to prototype a method for removing discriminating biases in data. First the bias has to be detected and measured. Then the data should be adjusted acordingly. E.g via undersampling/oversampling. In this way, we want to create unbiased data with which a fair AI can be trained.

Plan:

find nice datasets
make prediction (e.g predict pay)
statistik dies das -> show bias in prediction (e.g towards woman)
based on statistics adjust training set
retrain
profit.

Possible Data Sets:

Salary based on gender, experience, education:

https://www.kaggle.com/datasets/rkiattisak/salaly-prediction-for-beginer

Detailed Salary, Gender by name:

Other: https://www.kaggle.com/datasets/kaggle/sf-salaries/discussion/45482 way to evaluate pay gap in ethnics

https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/ethnicity/datasets/2011censusanalysisethnicityandthelabourmarket economic activity by gender and ethnicity

https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/ethnicity/datasets/ethnicgroupbyeconomicactivitystatusandoccupationenglandandwalescensus2021

https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/ethnicity/datasets/ethnicgroupbyhighestlevelqualificationenglandandwalescensus2021

https://www-genesis.destatis.de/genesis/online?operation=abruftabelleBearbeiten&levelindex=2&levelid=1684845403062&auswahloperation=abruftabelleAuspraegungAuswaehlen&auswahlverzeichnis=ordnungsstruktur&auswahlziel=werteabruf&code=62111-0005&auswahltext=&nummer=7&variable=7&name=ALT035&werteabruf=Werteabruf#abreadcrumb