Skip to content

Commit

Permalink
Update content
Browse files Browse the repository at this point in the history
  • Loading branch information
xaviertintin committed Jul 19, 2024
1 parent 6396d22 commit e8aef1c
Showing 1 changed file with 15 additions and 29 deletions.
44 changes: 15 additions & 29 deletions episodes/03-ml-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,9 @@ exercises: 0

::::::::::::::::::::::::::::::::::::::::::::::::

# Introduction to Machine Learning in HEP
## Overview

Machine learning (ML) is a powerful tool for extracting insights from complex datasets, making it invaluable in high-energy physics (HEP) research. This lesson introduces participants to applying ML techniques to CMS Open Data, providing a structured approach for beginners with some programming experience.

### Overview

The Machine Learning in High-Energy Physics (HEP) activity bridges the gap between data science and particle physics, utilizing CMS Open Data to explore real-world applications. Participants will learn to leverage ML algorithms to analyze particle collision data, enabling them to classify events, discover new particles, and enhance their understanding of fundamental physics.
Machine learning (ML) is a powerful tool for extracting insights from complex datasets, making it invaluable in high-energy physics (HEP) research. The Machine Learning in High-Energy Physics (HEP) activity bridges the gap between data science and particle physics, utilizing CMS Open Data to explore real-world applications. Participants will learn to leverage ML algorithms to analyze particle collision data, enabling them to classify events, discover new particles, and enhance their understanding of fundamental physics.

### Prerequisites

Expand All @@ -39,7 +35,7 @@ Before diving into ML in HEP, participants should have a basic understanding of:

[Machine learning](https://www.ibm.com/topics/machine-learning) (ML) is a branch of artificial intelligence (AI) and computer science that focuses on the using data and algorithms to enable AI to imitate the way that humans learn, gradually improving its accuracy. If that is not clear, please watch [this video](https://www.youtube.com/watch?v=4RixMPF4xis).

![AI vs ML](../fig/ml-vs-ai.png)
![Artificial Intelligence (AI) is the broader concept of machines being able to carry out tasks in a way that we would consider "smart". Machine Learning (ML) is a subset of AI that involves training algorithms to learn from and make predictions based on data.](../fig/ml-vs-ai.png)

Check warning on line 38 in episodes/03-ml-1.md

View workflow job for this annotation

GitHub Actions / Build Full Site

[missing file]: [Artificial Intelligence (AI) is the broader concept of machines being able to carry out tasks in a way that we would consider "smart". Machine Learning (ML) is a subset of AI that involves training algorithms to learn from and make predictions based on data.](../fig/ml-vs-ai.png) [image missing alt-text]: ../fig/ml-vs-ai.png

::::::::::::::::::::::::::::::::::::: callout

Expand All @@ -48,74 +44,64 @@ Machine learning, deep learning, and neural networks are all sub-fields of artif
::::::::::::::::::::::::::::::::::::::::::::::::


### Nerual Networks

To have an overview of neural networks, visit [3Blue1Brown's basics of neural networks, and the math behind how they learn](https://www.3blue1brown.com/lessons/neural-networks).

## Data Acquisition and Understanding

By now we must have a basic understanding of how Machine Learning functions, to use this in the realm of High Energy Physics, we must have the following basics.

### CMS Open Data Overview
**CMS Open Data Overview:**
- Accessing and understanding the CMS Open Data repository.
- Types of datasets available (e.g., AOD, MiniAOD, NanoAOD) and their differences.
- Introduction to the CMS experiment and its detectors.

## Data Preparation
## Data Preparation - Cleaning and Preprocessing

As you dive into the hackathon, keep in mind that feature engineering—like selecting relevant features, creating new ones to enhance model performance, and using dimensionality reduction techniques play a crucial role in both supervised and unsupervised learning. Mastering these techniques will significantly impact your models' ability to learn from and make sense of your data, so be sure to leverage them effectively in your projects!

### Cleaning and Preprocessing
- [Handling missing data points and outliers.](https://levelup.gitconnected.com/handling-missing-data-and-outliers-in-machine-learning-challenges-and-solutions-c02b1be2ca36)
- [Normalizing data to ensure consistency across features.](https://www.markovml.com/blog/normalization-in-machine-learning#)
- [Exploratory data analysis (EDA) to understand distributions and correlations.](https://medium.com/@avicsebooks/ml-part-7-introduction-to-exploratory-data-analysis-eda-8b781adfce51)

As you dive into the hackathon, keep in mind that feature engineering—like selecting relevant features, creating new ones to enhance model performance, and using dimensionality reduction techniques play a crucial role in both supervised and unsupervised learning. Mastering these techniques will significantly impact your models' ability to learn from and make sense of your data, so be sure to leverage them effectively in your projects!

![supervised vs unsupervised learning](../fig/s-vs-us.png)
![Supervised learning involves training a model on labeled data, where the input comes with corresponding output labels, allowing the model to learn the relationship between inputs and outputs. In contrast, unsupervised learning works with unlabeled data, identifying patterns and structures within the data without predefined labels, often used for clustering and association tasks.](../fig/s-vs-us.png)

Check warning on line 67 in episodes/03-ml-1.md

View workflow job for this annotation

GitHub Actions / Build Full Site

[missing file]: [Supervised learning involves training a model on labeled data, where the input comes with corresponding output labels, allowing the model to learn the relationship between inputs and outputs. In contrast, unsupervised learning works with unlabeled data, identifying patterns and structures within the data without predefined labels, often used for clustering and association tasks.](../fig/s-vs-us.png) [image missing alt-text]: ../fig/s-vs-us.png

You can get a glimpse of the differences in [this video](https://www.youtube.com/watch?v=rHeaoaiBM6Y).

## Supervised Learning in HEP

### Basics of Supervised Learning
#### Basics of Supervised Learning
- Understanding labeled datasets and target variables.
- Classification tasks: distinguishing particle types (e.g., muons, electrons).
- Regression tasks: a possible application in HEP can be predicting particle properties (e.g., energy, momentum).

### Model Selection and Training
#### Model Selection and Training
- [Choosing appropriate algorithms](https://www.simplilearn.com/10-algorithms-machine-learning-engineers-need-to-know-article) (e.g., Decision Trees, Random Forests, Neural Networks).
- [Cross-validation techniques](https://www.turing.com/kb/different-types-of-cross-validations-in-machine-learning-and-their-explanations) to optimize model performance.
- [Hyperparameter tuning](https://cloud.google.com/vertex-ai/docs/training/hyperparameter-tuning-overview) to fine-tune model behavior.

### Model Evaluation
#### Model Evaluation
- [Metrics](https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234): accuracy, precision, recall, F1-score.
- Confusion matrices and [ROC curves](https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc) for performance visualization.
- Interpreting results and refining models based on feedback: Watch [this video](https://www.youtube.com/watch?v=nt5DwCuYY5c&t) for Learning Curves In Machine Learning explanation.

![Confusion metrics](../fig/metrics.png)
![Confusion metrics, also known as a confusion matrix, is a table used to evaluate the performance of a classification model. It displays the true positives, true negatives, false positives, and false negatives, providing insight into the accuracy, precision, recall, and overall effectiveness of the model's predictions.](../fig/metrics.png)

Check warning on line 88 in episodes/03-ml-1.md

View workflow job for this annotation

GitHub Actions / Build Full Site

[missing file]: [Confusion metrics, also known as a confusion matrix, is a table used to evaluate the performance of a classification model. It displays the true positives, true negatives, false positives, and false negatives, providing insight into the accuracy, precision, recall, and overall effectiveness of the model's predictions.](../fig/metrics.png) [image missing alt-text]: ../fig/metrics.png

## Unsupervised Learning in HEP

### Basics of Unsupervised Learning
#### Basics of Unsupervised Learning
- [Clustering algorithms](https://cloud.google.com/discover/what-is-unsupervised-learning?hl=en#) (K-means, DBSCAN) for grouping similar events.
- Anomaly detection techniques to identify unusual data points.
- [Dimensionality reduction](https://www.ibm.com/topics/dimensionality-reduction) methods (PCA, LDA) for visualizing high-dimensional data.

### Applications in Particle Physics
#### Applications in Particle Physics
- Discovering new particles through anomaly detection.
- Grouping events based on similar characteristics (clustering).
- Simplifying complex datasets for further analysis.

## Advanced Topics and Applications

### Deep Learning and Neural Networks
- Introduction to deep learning architectures (CNNs, RNNs).
- Applications in image analysis (e.g., detector images) and sequential data (e.g., event sequences).

### Reinforcement Learning
- Exploring dynamic decision-making processes in particle physics experiments.
- Potential applications in optimizing experimental setups and data collection strategies.

# Conclusion
## Conclusion

The Machine Learning with Open Data lesson equips participants with fundamental skills to apply ML techniques effectively in high-energy physics research. By mastering data preparation, model training, and evaluation, participants gain insights into both machine learning principles and their practical applications in particle physics.

Expand Down

0 comments on commit e8aef1c

Please sign in to comment.