This repository contains the data and the code used in Polimi Kaggle competition. The application domain is book recommendation. The datasets provided contains interactions of users with books, in particular, if the user attributed to the book a rating of at least 4. The main goal of the competition is to discover which items (books) a user will interact with.
The datasets includes around 600k interactions, 13k users, 22k items (books). The training-test split is done via random holdout, 80% training, 20% test. The goal is to recommend a list of 10 potentially relevant items for each user. MAP@10 is used for evaluation. You can use any kind of recommender algorithm you wish written in Python.
Deadline 2 (final):
- Public leaderboard: 2th
- Private leaderboard: 3th
Deadline 1:
- Public leaderboard: 2th
- Private leaderboard: 2th
There were 63 teams in competition.
The recommender system that we used to achieve the third position in the challenge was a model composed of a hybrid approach where we combined:
- Slim Elastic Net
- Item KNN
- Rp3Beta We integrated the similarity matrices by assigning different weights to each recommender based on hyperparameter tuning conducted in Optuna.
We used XGB to further improve the performances of our model. We used as features:
- Top Popular
- RP3beta
- SLIMen
- SLIMbpr
- ItemKNN
- The best hybrid (SLIMen + ItemKNN + RP3b)
- P3alpha
- User profile length
- Item popularity
This is a representation of the architecture:
N.B. the candidate generator we have used is strongly optimized on recall, no more on MAP.
The hyperparameters tuning was done using:
- Kaggle free GPU plan
- Asus Zenbook
For further informations on how the tuning is been done and on how we have structured our work pipeline, you can read the presentation.
This repository is based on Maurizio Ferrari Dacrema Repository.