Skip to content


Repository files navigation

This repository contains the outcome of an 'Introduction to Machine Learning" course offered by the University of Houston-Hewllet Packard Data Science Instititue. EARNING CRITERIA Recipients must complete the earning criteria to earn this badge

I obtained a PASS grade (above 90%) as per the process outlined in the course syllabus to demonstrate the following five core competencies:

[Exploratory Data Analysis (EDA) using Python] Demonstrated the ability to use common Python libraries (e.g., Pandas, NumPy, Matplotlib, Scikit-Learn) to complete EDA tasks, including: a. Loading Data from flat CSV files containing structured, tabular datasets, including: i. local/remote file locations (e.g., local disk,

ii. variables of different types (e.g., categorical, numerical) b. Summarizing Data, including: i. numerical and graphical descriptions (e.g., tables, figures)

ii. uni- and bi- variate descriptive statistics (e.g., centrality and dispersion measures, histograms, scatter plots) c. Preprocessing Data as appropriate, including: i. selecting specific columns/rows based on basic index or value criteria

ii. transforming (e.g., scale, center)

iii. factorizing (e.g., PCA for dimension reduction) [Supervised Machine Learning (ML) using Python] Demonstrated the ability to use common Python libraries (e.g., Pandas, NumPy, Matplotlib, Scikit-Learn) to complete supervised ML tasks, including: a. Training models to predict classification/regression targets, including: i. basic predictive estimators (e.g., kNN, Logistic Regression, linear SVM, Decision Tree)

ii. non-linear extensions such as kernels or bagging/boosting ensembles (e.g., Random Forest, Stochastic Gradient Boost)

iii. multi-layer perceptron estimators with varying network architectures (e.g., # of layers / neurons per layer) and back-propagation optimization parameter combinations (e.g., solver, epoch, learning rate) b. Selecting an optimally tuned model by assessing an individual estimators’ performance potential, relying on: i. a common data and measurement framework (e.g., k-fold cross-validation, fixed randomization seed)

ii. the exploration of several hyperparameter values (e.g., regularization)

iii. considerations about model generalization (e.g., bias/variance trade-off within measured performance results) c. Validating the selected model’s performance using scoring methods appropriate for the problem/question, including: precision, recall, F-1 score, Cohen’s Kappa. [Unsupervised Machine Learning (ML) using Python] Demonstrated the ability to use common Python libraries (e.g., Pandas, NumPy, Matplotlib, Scikit-Learn) to complete unsupervised ML tasks, including: Vector Quantization estimators (e.g., k-means) that partition the dataset into a fixed number of clusters Hierarchical Clustering estimators (e.g., agglomerative) that produce nested multi-level clusters (e.g., dendrograms) based on different linkages (e.g., single, complete) [Knowledge of EDA/ML algorithms’ high-level structure and logic] Demonstrated knowledge of the logic behind and of the high-level structure of the algorithms implemented by common Python libraries (e.g., Pandas, NumPy, Matplotlib, Scikit-Learn) for EDA/ML tasks (e.g., estimators, preprocessing and scoring methods) [Reproducible dissemination of EDA/ML results] Demonstrated the ability to present and discuss their process and findings by creating interactive documents using the Jupyter Notebook web environment


No description, website, or topics provided.






No releases published


No packages published