Introduction
- Zindi is the first data science competition platform in Africa which hosts an entire data science ecosystem of scientists, engineers, academics, companies, NGOs, governments and institutions.
- The data which is provided to us is a part of Zindi user activity.
- Our task is to determine if a new user will be active in the upcoming month using the data of their previous months.
- This helps Zindi track the user activity and improve the platform.
Problem Definition To build a model that, given data of the user activity of month of sign up, can predict user activity for the upcoming month.
Objectives
-
To perform data analysis and identify criteria for what constitutes an active user.
-
To build a model to predict whether the user will engage in the Zindi platform in the upcoming month, based on their activity in the previous months.
Proposed Methodology
- Classify a user as active or inactive in a particular month.
- KMeans clustering
- Criteria
- Approaches to solve this problem
- Concatenate each month user activity as new columns
- Data as sequence
- Activity Based Grouping
- Model Building
Conclusion
- We performed data analysis to decide which features help determine user activity and how they affect user activity.
- Using this information, we derived a criteria for classifying user as active or inactive.
- Finally, we came up with approaches to solve this problem and built models to predict user activity in the upcoming month.