This repository contains the code and documentation for the Machine Learning project completed as part of the Machine Learning course at Ghent University. The project aimed to explore and make predictions on a dataset of 2500 TripAdvisor listings in East-Flanders, scraped from the website. The data consisted mostly of tabular, text, and image data.
The project consisted of three sprints:
-
Exploratory Data Analysis (EDA): Conducted a thorough EDA of the data, including finding interesting insights such as recommended restaurants for dog-walking and the location of expensive restaurants in Ghent.
-
Traditional ML Methods: Built predictive models using traditional ML methods, including a recommendation system based on image clustering, a "budget filter" model, and an "influencer maker" model to determine measures for maximizing restaurant reviews.
-
Deep Learning Methods: Improved the recommendation system from sprint 2 with the use of deep learning methods, including anomaly detection to improve classification models and text generation models such as fine-tuning GPT-2 to generate restaurant reviews.
The code for each sprint is organized into separate folders.
To run the code in this repository, you will need to have Python and the following packages installed:
- pandas
- numpy
- scikit-learn
- keras
- tensorflow
- matplotlib
- seaborn
- transformers
You can install these packages using pip by running the command pip install -r requirements.txt
.
Each sprint's code can be found in a separate folder. Simply navigate to the desired sprint's folder and run the code files.
If you would like to contribute to this project, please fork the repository and submit a pull request.
We would like to thank the Machine Learning course staff at Ghent University for their guidance and support throughout the project.
This project is licensed under the MIT License - see the LICENSE file for details.