During my Data Scientist course at SkillFactory I am solving tasks and working on projects. I am presenting the solutions of those tasks and projects as well as auxiliary code here for any potential employers. If you are just interested in data science, please feel free to contact me with any questions.
For the moment the following project on taxi ride duration analysis including data cleaning, exploratory data analysis, building a regression model and evaluating the quality of the model prediciton results, is the most illustrative example of my work: Project-5.Taxi_ride_duration.ipynb
In addition I am working on a library of reusable code for data-science purposes: Data Science Helpers
Link: Projects
The goal of the project is to build a regression model to predict a taxi ride duration and thus be able to calculate the taxi ride price. Before that the data gets cleaned, basic exploratory data analysis are performed, categorical features get encoded, important features get selected and and the data is normalied. Then several regression models are being built and compared with each other using the Root Mean Squared Log Error metric. See Project-5.Taxi_ride_duration.ipynb
For more details see: Project-5 -> readme
The goal of the project is to build a classification model which would predict if a customer is likely to open a deposit with the bank. That is achieved by going through the following steps of the CRISP DM process: requirements analysis, data analysis, data preparation, building a model, model evaluation. See: Project_4.ML.ipynb
For more details see: Project-4 -> readme
The goal of the project was to perform data cleaning, exploratory data analysis, build a regression model and evaluate the quality of the model prediciton results. See: Project-3.Hotel_reviews_analysis.ipynb
For more details see: Project-3 -> readme
The goal of the project was to practise SQL knowledge on a database which contained the HeadHunger job openings. Using the SQL queries different analyses should be made revealing the patterns in the data. See: Project-2.HH_Job_openings_analysis.ipynb
For more details see: Project-2 -> readme
The goal of the project was to perform data preparation and clean-up before building a machine learning model. The model should be able to predict a candidate salary based on other candidate attributes. See: Project-1.HH_Resume_analysis.ipynb
For more details see: Project-1 -> readme
Link: Tasks
The goal of the task is to use naive Bayes classifier to determine if an email is a spam or not. I used the ComplementNB classifier implementation from the sklearn library for that. See: Task-9.MATH-ML-7.Naive_bayes_classifier.ipynb
For more details see: Task-9 -> readme
The goal of the task was to implement coordinate and stochastic gradient descent. See: Task-8.MATH-ML-5_optimization.ipynb
For more details see: Task-8 -> readme
The goal of the task was to experiment with linear regression, starting at the low level with Ordinary Least Squares method, proceeding with the standard sklearn linear regression, adding polynomial features,
For more details see: Task-7 -> readme
The goal of the task was to apply different hyper parameter optimization methods like Grid Search, Random Search and Tree-Structured Parzen Estimators for two types of models Logistic Regression and Random Forest. F1-score metric must be calculated on all applied methods and thus the methods can be compared between each other on a given set of data. See: Task-6.ML_predicting_biological_response.ipynb
For more details see: Task-6 -> readme
The goal of the task was to apply the RFE and the filter-based methods for the feature selection, identify the important features, train the models on those features. Calculate metrics for the models and evaluate which of the feature selection methods is more efficient. See: Task-5.ML_feature_selection.ipynb
For more details see: Task-5 -> readme
The goal of the task was to explore the following models of classification: logistic regression, decision tree and random forest on the bank customer churn dataset and perform some basic set-up of the models, like respectively regularization type, C-coefficient, max depth, max leaf objects, and identify optimum probability thresholds. See: Task-4.Classification_Of_Bank_Customers_Churn.ipynb
For more details see: Task-4 -> readme
The goal of the task was to perform basic data cleaning, exploratory data model and building a logistic regression model. Diagrams, parameters and metrics have been logged to Comet ML. See: Task-3.Introduction-to-comet-ml.ipynb
For more details see: Task-3 -> readme
The goal of the task was to perform data cleaning and exploratory data analysis (EDA) on a data set of data science related jobs. The EDA consisted of a visual analysis and of picking and performing statistical tests. See: Task-2.Data_Science_job_analysis.ipynb
For more details see: Task-2 -> readme
The goal of the task was to analyze the bank customers data set and unsing the visual library plotly identify the possible reasons why the customers are exiting their relationship with the bank. See: Task-1.Research_Of_Bank_Customers_Churn.ipynb
For more details see: Task-1 -> readme
Link: Data Science Helpers
In this folder I am collecting auxiliary code for any data science purposes. Currently it contains functions for finding attributes having a certain ratio of empty values, low information attributes and outliers.
For more details see: Data Science Helpers -> readme
In the course of studies I have mastered:
- python
- SQL
- pandas
- sklearn
- numpy
- plotly express
- scipy.stats
- psycopg2
- BeautifulSoup
- git
Dmitriy Golubitskiy | LinkedIn Profile | Codewars Profile