multi_label_wine_quality_classification:bar_chart:

This is a project where I practiced training various different multi-label wine quality classifiers with one vs. all method.

The workflow includes EDA (exploratory analysis, data visualization), data preprocessing (feature selection with chi-square test, oversampling minority classes with synthetic data, feature scaling), and trained data on different classification ML models (logistic regression, linear supported vector machine (SVM), kernel SVM, and K-NN)

Feel free to click into the .ipynb notebook for detailed analysis.

EDA

The dataset is extremely skewed with minority class (i.e. wine quality) like '3' and '8' share less than 1% of the total population. We can see this by plotting a histogram on 'quality' column.

A clearer visualization of the correlations between features by plotting out a heatmap:

Further visualize the relations between features and wine quality. Notice features like "pH", "chlorides", "residual sugar" almost have no impact on classifying the quality of the wine.

Preprocessing

Feature selection using chi-square test
Drop irrelevant features
Split dataset
Apply SMOTE to oversample minority classes data by generating synthetic training data using K-NN. Note we do not oversample testing data.
Feature scaling

Result

Because of the skewed nature of the dataset. Use F1-score as the performance metric. By applying synthetic minority oversampling technique, KNN model has a notable increase in its weighted F1-score avg from 0.52 to 0.67. The accuracy also went from 51% to 65%. The other models like logistic regression, linear SVM, and kernel SVM did not perform better as expected.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
README.md		README.md
wine_quality_multi_classification.ipynb		wine_quality_multi_classification.ipynb
wine_quality_multi_classification_final.ipynb		wine_quality_multi_classification_final.ipynb
winequality-red.csv		winequality-red.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multi_label_wine_quality_classification:bar_chart:

EDA

Preprocessing

Result

Logistic Regression

Linear SVM & Kernel SVM

K-NN (Rapid Prototype)

K-NN (Final)

About

Releases

Packages

Languages

yixin0829/multi-label-wine-quality-classification

Folders and files

Latest commit

History

Repository files navigation

multi_label_wine_quality_classification:bar_chart:

EDA

Preprocessing

Result

Logistic Regression

Linear SVM & Kernel SVM

K-NN (Rapid Prototype)

K-NN (Final)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages