diff --git a/episodes/03-ml-1.md b/episodes/03-ml-1.md index 22e474f..4477eb4 100644 --- a/episodes/03-ml-1.md +++ b/episodes/03-ml-1.md @@ -47,14 +47,13 @@ Before diving into ML in HEP, participants should have a basic understanding of: ## Data Preparation ### Cleaning and Preprocessing -- Handling missing data points and outliers. -- Normalizing data to ensure consistency across features. -- Exploratory data analysis (EDA) to understand distributions and correlations. - -### Feature Engineering -- Selecting relevant features for ML models. -- Creating new features to enhance model performance. -- Dimensionality reduction techniques (PCA, t-SNE) for visualization and model efficiency. +- [Handling missing data points and outliers.](https://levelup.gitconnected.com/handling-missing-data-and-outliers-in-machine-learning-challenges-and-solutions-c02b1be2ca36) +- [Normalizing data to ensure consistency across features.](https://www.markovml.com/blog/normalization-in-machine-learning#) +- [Exploratory data analysis (EDA) to understand distributions and correlations.](https://medium.com/@avicsebooks/ml-part-7-introduction-to-exploratory-data-analysis-eda-8b781adfce51) + +As you dive into the hackathon, keep in mind that feature engineering—like selecting relevant features, creating new ones to enhance model performance, and using dimensionality reduction techniques play a crucial role in both supervised and unsupervised learning. Mastering these techniques will significantly impact your models' ability to learn from and make sense of your data, so be sure to leverage them effectively in your projects! + +![supervised vs unsupervised learning](../fig/s-vs-us.png) ## Supervised Learning in HEP diff --git a/episodes/fig/s-vs-us.png b/episodes/fig/s-vs-us.png new file mode 100644 index 0000000..8877313 Binary files /dev/null and b/episodes/fig/s-vs-us.png differ