-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #39 from danilyef/readme_branch
README.md
- Loading branch information
Showing
45 changed files
with
405 additions
and
438 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,84 @@ | ||
# How to start: | ||
# Machine Learning in Production | ||
|
||
#### create virtual environment in the root folder: | ||
![image](intro.jpg) | ||
|
||
|
||
The **Machine Learning in Production Course** is a comprehensive curriculum designed to equip learners with the knowledge and practical skills needed to build, deploy, and manage machine learning systems at scale. The course combines theoretical insights with hands-on assignments to prepare participants for real-world challenges in MLOps (Machine Learning Operations). Below is an overview of the key topics covered in this course: | ||
|
||
### **Course Modules** | ||
|
||
1. **MLOps Introduction** | ||
- Fundamentals of MLOps and its importance in modern machine learning workflows. | ||
|
||
2. **Infrastructure Setup** | ||
- Setting up infrastructure for machine learning projects. | ||
- Focus on tools, cloud platforms, and deployment environments. | ||
|
||
3. **Data Storage and Processing** | ||
- Best practices for managing data at scale. | ||
- Storage strategies, data preprocessing, and pipelines. | ||
|
||
4. **Versioning and Labeling** | ||
- Version control for datasets and models. | ||
- Effective labeling and validation strategies. | ||
|
||
5. **Training and Experimentation** | ||
- Designing robust training pipelines and running experiments. | ||
- Tools for tracking metrics and improving model performance. | ||
|
||
6. **Testing and CI/CD** | ||
- Implementing testing strategies for machine learning systems. | ||
- Continuous Integration and Continuous Deployment for ML projects. | ||
|
||
7. **Orchestration with Kubeflow and Airflow** | ||
- Automating workflows using orchestration tools like Kubeflow and Airflow. | ||
|
||
8. **Orchestration with Dagster** | ||
- Advanced orchestration techniques with Dagster. | ||
|
||
9. **Serving Basics** | ||
- Fundamentals of serving machine learning models via APIs. | ||
|
||
10. **Inference Servers** | ||
- Understanding inference servers and optimizing their performance. | ||
|
||
11. **Advanced Serving Features and Benchmarking** | ||
- Advanced serving techniques and benchmarking model performance. | ||
|
||
12. **Scaling Infrastructure and Models** | ||
- Techniques for scaling machine learning models and infrastructure to handle production workloads. | ||
|
||
13. **Monitoring and Observability** | ||
- Tools and techniques for monitoring ML systems in production. | ||
- Implementing observability to track model health and data quality. | ||
|
||
14. **Tools, LLMs, and Data Moats** | ||
- Exploring state-of-the-art tools and methodologies. | ||
- Leveraging large language models (LLMs) and building competitive data strategies. | ||
|
||
15. **ML Platforms** | ||
- Overview of ML platforms and their role in scaling machine learning operations. | ||
|
||
|
||
### How to start: | ||
|
||
1. **Create virtual environment in the root folder:** | ||
```bash | ||
cd /path/to/your/root/folder | ||
python -m venv env | ||
``` | ||
|
||
#### activate virtual environment: | ||
2. **Activate virtual environment:** | ||
```bash | ||
source env/bin/activate | ||
``` | ||
|
||
#### upgrade pip: | ||
3. **Upgrade pip:** | ||
```bash | ||
python -m pip install --upgrade pip | ||
``` | ||
|
||
#### install requirements: | ||
4. **Install requirements:** | ||
```bash | ||
pip install -r main_requirements.txt | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Homework 3: Storage and Processing | ||
|
||
## Tasks: | ||
|
||
- PR1: Write README instructions detailing how to deploy MinIO with the following options: Local, Docker, Kubernetes (K8S)-based. | ||
- PR2: Develop a CRUD Python client for MinIO and accompany it with comprehensive tests. | ||
- PR3: Write code to benchmark various Pandas formats in terms of data saving/loading, focusing on load time and save time. | ||
- PR4: Create code to benchmark inference performance using single and multiple processes, and report the differences in time. | ||
- PR5: Develop code for converting your dataset into the StreamingDataset format. | ||
- PR6: Write code for transforming your dataset into a vector format, and utilize VectorDB for ingestion and querying. | ||
|
||
|
||
### PR6: example | ||
|
||
```bash | ||
python main.py create-index | ||
python main.py search-index "Who are you?" --top-n 2 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
''' | ||
Before starting the script, create a virtual environment: | ||
1. cd /path/to/your/project | ||
2. python -m venv env | ||
3. source env/bin/activate | ||
4. pip install -r requirements.txt | ||
After these steps start script from cmd: | ||
5. python main.py | ||
''' | ||
import time | ||
import multiprocessing as mp | ||
from multiprocessing import Pool, cpu_count | ||
from sklearn.linear_model import LinearRegression | ||
from sklearn.datasets import make_regression | ||
from sklearn.model_selection import train_test_split | ||
import numpy as np | ||
import matplotlib.pyplot as plt | ||
|
||
|
||
# Prepare a sample dataset and model for benchmarking | ||
def create_model_and_data(): | ||
# Create a synthetic regression dataset with 100 features | ||
X, y = make_regression(n_samples=200000, n_features=100, noise=0.1, random_state=42) | ||
|
||
# Split into training and testing sets | ||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.99, random_state=42) | ||
|
||
# Train a simple Linear Regression model | ||
model = LinearRegression() | ||
model.fit(X_train, y_train) | ||
|
||
return model, X_test | ||
|
||
# Single inference task using the sklearn model | ||
def inference_task(args): | ||
model, data = args | ||
time.sleep(0.005) | ||
# Simulate model inference (predicting the data) | ||
return model.predict(data) | ||
|
||
|
||
def single_process_inference(model, batches): | ||
start_time = time.time() | ||
|
||
for batch in batches: | ||
inference_task((model, batch)) | ||
|
||
elapsed_time = time.time() - start_time | ||
return elapsed_time | ||
|
||
|
||
def multiple_process_inference(model, batches, num_processes=16): | ||
start_time = time.time() | ||
|
||
with mp.Pool(processes=num_processes) as pool: | ||
pool.map(inference_task, [(model, batch) for batch in batches]) | ||
|
||
elapsed_time = time.time() - start_time | ||
return elapsed_time | ||
|
||
|
||
if __name__ == '__main__': | ||
model, X_test = create_model_and_data() | ||
|
||
batch_sizes = [100, 2000] | ||
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 6)) | ||
|
||
colors = ['#1f77b4', '#ff7f0e'] | ||
|
||
for i, batch_size in enumerate(batch_sizes): | ||
num_batches = len(X_test) // batch_size + (1 if len(X_test) % batch_size != 0 else 0) | ||
data_batches = np.array_split(X_test, num_batches) | ||
|
||
single_process_time = single_process_inference(model, data_batches) | ||
multiple_process_time = multiple_process_inference(model, data_batches) | ||
|
||
methods = ['Single Process', 'Multiple Processes'] | ||
times = [single_process_time, multiple_process_time] | ||
|
||
ax = ax1 if i == 0 else ax2 | ||
ax.bar(methods, times, color=colors) | ||
ax.set_title(f'Inference Time Comparison (Batch Size: {batch_size})') | ||
ax.set_xlabel('Method') | ||
ax.set_ylabel('Time (seconds)') | ||
|
||
plt.tight_layout() | ||
plt.savefig('inference_time_comparison.jpg') | ||
plt.close(fig) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
pandas==2.2.2 | ||
numpy===1.26.4 | ||
pyarrow==17.0.0 | ||
matplotlib==3.8.4 | ||
tables==3.9.2 | ||
scikit-learn==1.5.1 |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.