This repository contains the outcome of an 'Introduction to Machine Learning" course offered by the University of Houston-Hewllet Packard Data Science Instititue. EARNING CRITERIA Recipients must complete the earning criteria to earn this badge
I obtained a PASS grade (above 90%) as per the process outlined in the course syllabus to demonstrate the following five core competencies:
[Exploratory Data Analysis (EDA) using Python] Demonstrated the ability to use common Python libraries (e.g., Pandas, NumPy, Matplotlib, Scikit-Learn) to complete EDA tasks, including: a. Loading Data from flat CSV files containing structured, tabular datasets, including: i. local/remote file locations (e.g., local disk, OpenML.org)
ii. variables of different types (e.g., categorical, numerical) b. Summarizing Data, including: i. numerical and graphical descriptions (e.g., tables, figures)
ii. uni- and bi- variate descriptive statistics (e.g., centrality and dispersion measures, histograms, scatter plots) c. Preprocessing Data as appropriate, including: i. selecting specific columns/rows based on basic index or value criteria
ii. transforming (e.g., scale, center)
iii. factorizing (e.g., PCA for dimension reduction) [Supervised Machine Learning (ML) using Python] Demonstrated the ability to use common Python libraries (e.g., Pandas, NumPy, Matplotlib, Scikit-Learn) to complete supervised ML tasks, including: a. Training models to predict classification/regression targets, including: i. basic predictive estimators (e.g., kNN, Logistic Regression, linear SVM, Decision Tree)
ii. non-linear extensions such as kernels or bagging/boosting ensembles (e.g., Random Forest, Stochastic Gradient Boost)
iii. multi-layer perceptron estimators with varying network architectures (e.g., # of layers / neurons per layer) and back-propagation optimization parameter combinations (e.g., solver, epoch, learning rate) b. Selecting an optimally tuned model by assessing an individual estimators’ performance potential, relying on: i. a common data and measurement framework (e.g., k-fold cross-validation, fixed randomization seed)
ii. the exploration of several hyperparameter values (e.g., regularization)
iii. considerations about model generalization (e.g., bias/variance trade-off within measured performance results) c. Validating the selected model’s performance using scoring methods appropriate for the problem/question, including: precision, recall, F-1 score, Cohen’s Kappa. [Unsupervised Machine Learning (ML) using Python] Demonstrated the ability to use common Python libraries (e.g., Pandas, NumPy, Matplotlib, Scikit-Learn) to complete unsupervised ML tasks, including: Vector Quantization estimators (e.g., k-means) that partition the dataset into a fixed number of clusters Hierarchical Clustering estimators (e.g., agglomerative) that produce nested multi-level clusters (e.g., dendrograms) based on different linkages (e.g., single, complete) [Knowledge of EDA/ML algorithms’ high-level structure and logic] Demonstrated knowledge of the logic behind and of the high-level structure of the algorithms implemented by common Python libraries (e.g., Pandas, NumPy, Matplotlib, Scikit-Learn) for EDA/ML tasks (e.g., estimators, preprocessing and scoring methods) [Reproducible dissemination of EDA/ML results] Demonstrated the ability to present and discuss their process and findings by creating interactive documents using the Jupyter Notebook web environment