The White Box Project is a project that introduces many ways to solve the part of the black box of machine learning. This project is based on Interpretable Machine Learning by Christoph Molnar [1]. I recommend you to read the book first and practice this project. If you are R user, you can see R code used in examples here.
한글로 변역된 내용은 여기서 확인하실 수 있습니다. 변역은 저자와 협의 후 진행되었음을 알립니다.
만약 번역본에 잘못된 해석이 있다면 [email protected] 또는 issue에 남겨주세요. 감사합니다.
The goal is to analysis various data into black box models and to build a pipeline of analysis reports using interpretable methods.
numpy == 1.17.3
scikit-learn == 0.21.2
xgboost == 0.90
tensorflow == 1.14.0
- Titanic: Machine Learning from Disaster (Classification) [2]
- Cervical Cancer (Classification) [3]
- House Prices: Advanced Regression Techniques (Regression) [4]
- Bike Sharing (Regression) [5]
- Youtube Spam (Classification & NLP) [6]
The parameters used to learn the model can be found here
- Random Forest (RF)
- XGboost (XGB)
- LigthGBM (LGB)
- Deep Neural Network (DNN)
Model-specific methods [English|Korean]
- Linear Regression [English|Korean]
- Logistic Regression [English|Korean]
- GLM, GAM and more [English|Korean]
- Decision Tree [English|Korean]
- Decision Rules [English|Korean]
- RuleFit [English|Korean]
- Other Interpretable Models [English|Korean]
Model-agnostic methods [English|Korean]
- Partial Dependence Plot (PDP) [English|Korean]
- Individual Conditional Expectation (ICE) [English|Korean]
- Accumulated Local Effects (ALE) Plot [English|Korean]
- Feature Interaction [English|Korean]
- Permutation Feature Importance [English|Korean]
- Global Surrogate [English|Korean]
- Local Surrogate (LIME) [English|Korean]
- Scoped Rules (Anchors) [English|Korean]
- Shapley Values [English|Korean]
- SHAP (SHapley Additive exPlanations) [English|Korean]
[1] Molnar, Christoph. "Interpretable machine learning. A Guide for Making Black Box Models Explainable", 2019. https://christophm.github.io/interpretable-ml-book/.
[2] Kaggle Competiton : Titanic: Machine Learning from Disaster
[3] Kelwin Fernandes, Jaime S. Cardoso, and Jessica Fernandes. 'Transfer Learning with Partial Observability Applied to Cervical Cancer Screening.' Iberian Conference on Pattern Recognition and Image Analysis. Springer International Publishing, 2017. [Link]
[4] Kaggle Competition : House Prices: Advanced Regression Techniques
[5] Fanaee-T, Hadi, and Gama, Joao, "Event labeling combining ensemble detectors and background knowledge", Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg, doi:10.1007/s13748-013-0040-3. [Link]
[6] Alberto, T.C., Lochter J.V., Almeida, T.A. TubeSpam: Comment Spam Filtering on YouTube. Proceedings of the 14th IEEE International Conference on Machine Learning and Applications (ICMLA'15), 1-6, Miami, FL, USA, December, 2015. [Link]
[7] Lundberg, Scott M., and Su-In Lee. “A unified approach to interpreting model predictions.” Advances in Neural Information Processing Systems. 2017. (Korean Version)