Update structure

cms-opendata-workshop · Jul 16, 2024 · bdf198d · bdf198d
1 parent d553dc5
commit bdf198d
Show file tree

Hide file tree

Showing 8 changed files with 157 additions and 73 deletions.
diff --git a/config.yaml b/config.yaml
@@ -25,7 +25,7 @@ keywords: 'software, data, CMS, hackathon'
 
 # Life cycle stage of the lesson
 # possible values: pre-alpha, alpha, beta, stable
-life_cycle: 'alpha' # FIXME
+life_cycle: 'beta' # FIXME
 
 # License of the lesson
 license: 'CC-BY 4.0'
@@ -63,10 +63,11 @@ contact: '[email protected]'
 
 # Order of episodes in your lesson
 episodes: 
-- 01-particlediscoverylab.md
-- 02-ppp.md
-- 03-ml.md
-- 04-agc.md
+- 01-ppp.md
+- 02-pdl.md
+- 03-ml-1.md
+- 04-ml-2.md
+- 05-agc.md
 
 # Information for Learners
 learners: 

diff --git a/episodes/02-ppp.md → episodes/01-ppp.md b/episodes/02-ppp.md → episodes/01-ppp.md
@@ -28,6 +28,10 @@ This lesson offers a variety of exercises designed to help undergraduates unders
 
 The Particle Physics Playground provides an engaging and interactive way for participants to delve into the core principles of particle physics. By working with real CMS Open Data, students will enhance their theoretical knowledge through hands-on experience.
 
+### Pre-learning Lesson
+
+Before diving into the Particle Physics Playground, participants are encouraged to review the [Particle Physics Primer pre-learning lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-particle-physics-primer/instructor/index.html). This foundational lesson will prepare you for the exercises ahead.
+
 ### Fundamental Concepts
 
 Participants will explore key concepts in particle physics, such as the Standard Model, particle interactions, and conservation laws. Understanding these principles is crucial for analyzing and interpreting particle collision data.

diff --git a/episodes/01-particlediscoverylab.md → episodes/02-pdl.md b/episodes/01-particlediscoverylab.md → episodes/02-pdl.md
@@ -7,16 +7,18 @@ exercises: 60
 :::::::::::::::::::::::::::::::::::::: questions 
 
 - How can we identify different particles in collision data?
-- What are the characteristics of muons and electrons in the dataset?
+- What are the characteristics of muons in the dataset?
 - How do we perform basic and advanced data analysis in particle physics?
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 ::::::::::::::::::::::::::::::::::::: objectives
 
-- Understand the basics of particle collision data.
-- Learn how to identify different particles such as muons and electrons.
-- Perform both basic and advanced data analysis tasks.
+- Reconstruct decays of an unknown particle X to 2 muons.
+- Use histograms to display the calculated mass of particle X.
+- Learn to fit and subtract background contributions from data.
+- Understand uncertainty propagation throughout the analysis.
+- Identify the discovered particle and compare its properties to known values.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
@@ -36,16 +38,29 @@ Participants will learn to identify different particles by analyzing their colli
 
 The lab will guide participants through both basic and advanced data analysis tasks. Initially, they will perform simple tasks such as plotting histograms and calculating basic statistics. As they progress, more advanced techniques will be introduced, including fitting data to theoretical models and performing complex statistical analyses.
 
+### Instructions for the Exercise
+
+To get started with the Particle Discovery Lab, follow these steps:
+
+1. **Set Up the Python Container**: Ensure you are working within the provided Python container environment.
+2. **Clone the Repository**: Open a terminal in the Python container and run the following command to clone the repository:
+   ```bash
+   git clone https://github.com/bethel-physics/ParticleDiscoveryLab
+3. **Follow the Instructions**: Navigate to the cloned repository directory and follow the instructions provided in either the Python script or the Jupyter notebook to complete the exercise.
+
+### Visualize with CMS Spy WebGL
+To enhance your understanding and visualization of the particle collision events, use the [CMS Spy WebGL visualizer](https://opendata.cern.ch/visualise/events/cms#). This tool provides a 3D visualization of the CMS collision data, allowing you to better grasp the spatial distribution and interactions of particles.
+
 ::::::::::::::::::::::::::::: callout
-## You Have Choices!
+
+You Have Choices!
 
 While ROOT and C++ are essential for early-stage analysis of CMS Open Data in the AOD (Run 1) or MiniAOD (Run 2) formats, participants can use other tools and file formats for downstream analysis or for analyzing Run 2 NanoAOD files. Feel free to choose the tools that best suit your needs and preferences.
 :::::::::::::::::::::::::::::
 
-::::::::::::::::::::::::::::::::::::: keypoints 
-
-- Introduction to particle collision data.
-- Techniques for identifying particles such as muons and electrons.
-- Methods for performing both basic and advanced data analysis.
+::::::::::::::::::::::::::::::::::::: keypoints
 
-::::::::::::::::::::::::::::::::::::::::::::::::
+Introduction to particle collision data.
+Techniques for identifying particles such as muons and electrons.
+Methods for performing both basic and advanced data analysis.
+::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/03-ml-1.md b/episodes/03-ml-1.md
@@ -0,0 +1,61 @@
+---
+title: "Machine Learning with Open Data"
+teaching: 5
+exercises: 0
+---
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How can machine learning be applied to particle physics data?
+- What are the steps involved in preparing data for machine learning analysis?
+- How do we train and evaluate a machine learning model in this context?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Learn the basics of machine learning and its applications in particle physics.
+- Understand the process of preparing data for machine learning.
+- Gain practical experience in training and evaluating a machine learning model.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+# Introduction to Machine Learning in HEP 
+
+This lesson guides participants through applying machine learning techniques to CMS Open Data. It's designed for those with some programming experience and an interest in machine learning.
+
+### Overview
+
+The Machine Learning in High-Energy Physics (HEP) activity provides participants with an introduction to the exciting intersection of machine learning and particle physics. By leveraging CMS Open Data, participants will learn how to apply machine learning algorithms to real-world data, enhancing their analytical skills and understanding of both fields. 
+
+Machine learning plays a crucial role in analyzing vast amounts of data generated by experiments in High-Energy Physics (HEP). It enables researchers to extract meaningful insights, classify particle collisions, and discover new physics phenomena efficiently.
+
+# Data Preparation
+A crucial step in any machine learning project is data preparation. Participants will learn how to clean and preprocess CMS Open Data to make it suitable for machine learning algorithms. This includes handling missing values, normalizing data, and creating training and test datasets.
+
+## Supervised Learning in HEP
+
+- Definition: Supervised learning involves training a model on a labeled dataset, where each input data point is paired with its corresponding target label or output.
+- Objective: The goal is to learn a mapping from inputs to outputs, based on the labeled examples provided during training.
+- Examples: Classification (predicting categories), regression (predicting continuous values), and sequence prediction tasks are common supervised learning problems.
+- Process: Models are trained using algorithms that minimize the error between predicted and actual outputs, adjusting parameters to improve accuracy.
+
+## Unsupervised Learning in HEP
+
+- Definition: Unsupervised learning involves training a model on an unlabeled dataset, where the algorithm tries to identify patterns, relationships, or structures in the data without explicit guidance.
+- Objective: The goal is to explore the data and extract meaningful insights, such as clusters, associations, or anomalies.
+- Examples: Clustering (grouping similar data points), anomaly detection (identifying unusual patterns), and dimensionality reduction (reducing the number of features while preserving important information) are common unsupervised learning tasks.
+- Process: Algorithms in unsupervised learning rely on statistical properties of the data to uncover patterns or structures. They do not aim to predict specific outputs but rather to understand the inherent structure of the data.
+
+# Model Training and Evaluation
+
+Participants will gain hands-on experience in training and evaluating machine learning models. This includes selecting appropriate algorithms, tuning hyperparameters, and assessing model performance using metrics such as accuracy, precision, recall, and F1 score.
+
+
+::::::::::::::::::::::::::::::::::::: keypoints 
+
+- Introduction to machine learning in particle physics.
+- Data preparation for machine learning analysis.
+- Model training and evaluation techniques.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/03-ml.md b/episodes/03-ml.md
diff --git a/episodes/04-ml-2.md b/episodes/04-ml-2.md
@@ -0,0 +1,55 @@
+---
+title: "Machine Learning with Open Data"
+teaching: 5
+exercises: 0
+---
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How can machine learning be applied to particle physics data?
+- What are the steps involved in preparing data for machine learning analysis?
+- How do we train and evaluate a machine learning model in this context?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Learn the basics of machine learning and its applications in particle physics.
+- Understand the process of preparing data for machine learning.
+- Gain practical experience in training and evaluating a machine learning model.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+## Practical Application
+
+CNNs (Convolutional Neural Networks) and autoencoders are both types of neural networks, but they serve different purposes and have distinct architectures:
+
+### CNN (Convolutional Neural Network):
+
+- Purpose: CNNs are primarily used for supervised learning tasks such as image classification, object detection, and image segmentation.
+- Architecture: CNNs consist of convolutional layers that apply learnable filters to input data, capturing spatial hierarchies of features. They typically include pooling layers to reduce spatial dimensions and dense (fully connected) layers for final classification or regression.
+- Training: CNNs are trained with labeled data, optimizing parameters to minimize classification error or regression loss.
+- Applications: CNNs are widely used in computer vision tasks where spatial relationships and local patterns in data (such as images) are important.
+
+
+### Autoencoders:
+
+- Purpose: Autoencoders are used for unsupervised learning tasks such as dimensionality reduction, feature learning, and anomaly detection.
+- Architecture: An autoencoder consists of an encoder network that compresses the input data into a latent representation and a decoder network that reconstructs the input from this representation. Convolutional layers can be used in convolutional autoencoders (CAEs) for image data.
+- Training: Autoencoders are trained on unlabeled data, learning to reconstruct the input data effectively. They are optimized based on reconstruction error or other metrics that measure the quality of the reconstructed output.
+- Applications: Autoencoders are applied in tasks where finding underlying patterns in data or reducing its dimensionality is beneficial, such as in denoising data, anomaly detection, and feature extraction.
+Key Differences:
+
+### Supervised vs Unsupervised: 
+
+- CNNs are supervised learning models that require labeled data for training, while autoencoders are unsupervised models that learn from unlabeled data.
+- Output: CNNs produce predictions (class labels or regression values) based on input data, whereas autoencoders reconstruct input data or extract meaningful representations from it.
+- Use Cases: CNNs are suitable for tasks requiring classification or regression on structured data like images, whereas autoencoders are used for tasks involving data exploration, anomaly detection, or preprocessing.
+
+::::::::::::::::::::::::::::::::::::: keypoints 
+
+- Introduction to machine learning in particle physics.
+- Data preparation for machine learning analysis.
+- Model training and evaluation techniques.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/04-agc.md → episodes/05-agc.md b/episodes/04-agc.md → episodes/05-agc.md
diff --git a/index.md b/index.md
@@ -42,10 +42,11 @@ During this hackathon, participants will engage in a variety of self-guided less
 
 #### Key Activities
 
-1. **Particle Discovery Lab**: Analyze real particle collision data from the CMS experiment, identify different particles, and perform both basic and advanced data analysis tasks.
-2. **Particle Physics Playground**: Explore fundamental concepts in particle physics through practical analysis of CMS Open Data, including studying particle decay patterns and understanding particle properties.
-3. **Machine Learning with Open Data**: Apply machine learning techniques to classify different types of particle collisions and gain practical experience in data preparation, model training, and evaluation.
-4. **Advanced Generative Challenge**: Tackle generative modeling tasks using older CMS data, create synthetic data, and validate complex models.
+1. **Particle Physics Playground**: Link back to the Particle Physics Primer pre-learning lesson that Matt made, and encourage participants to try out different exercises exploring fundamental concepts in particle physics.
+2. **Particle Discovery Lab**: Share instructions to do the exercise in the Python container, git clone the repository in the container, and follow the instructions in either the Python script or the Jupyter notebook. Analyze real particle collision data from the CMS experiment, identify different particles, and perform both basic and advanced data analysis tasks.
+3. **Machine Learning 1**: Introduction to using ML in HEP, covering initial aspects and practical applications.
+4. **Machine Learning 2**: Share the supervised and unsupervised learning links with instructions on which container to use. Include a section to discuss results and next steps, applying machine learning techniques to classify different types of particle collisions and gain practical experience in data preparation, model training, and evaluation.
+5. **Analysis Grand Challenge**: Short introduction to this larger exercise using up-to-the-moment HEP software tools. Provide the link and tackle generative modeling tasks using older CMS data, create synthetic data, and validate complex models.
 
 By the end of the hackathon, participants will have gained valuable skills and insights into high-energy physics, data analysis, and machine learning. They will also have the opportunity to contribute to the open science community by sharing their findings and collaborating with others.