Skip to content

Latest commit

 

History

History
16 lines (14 loc) · 1.51 KB

README.md

File metadata and controls

16 lines (14 loc) · 1.51 KB

Insights

  1. Data cleaning was crucial, involving rectifying typos and addressing missing/duplicated values.
  2. A temporal analysis revealed the dataset's timeline, spanning from the initial entry in 1931 to a movie with a mere 45 minutes of duration.
  3. Mithun emerged as the most frequently featured lead actor.
  4. Identification of both top-performing and poorly-rated movies based on votes and ratings was accomplished.
  5. Insights were gained regarding directors with the highest and lowest movie counts.
  6. The distribution of movies over the years exhibited a skewed pattern, with a concentration in the 2015-2019 period.
  7. In 2010, some movies garnered the highest average votes.
  8. Short-duration movies tended to receive higher ratings and votes, suggesting a potential preference for concise films.
  9. Drama consistently maintained popularity, while Comedy and Action genres originated in 1953 and 1964, respectively.
  10. Ratings and votes displayed Gaussian-like patterns, with specific peaks and evolving trends over time.
  11. The Random Forest regression outperformed Linear Regression, boasting an impressive R-squared score of 0.79, highlighting its robustness.
  12. The analysis provided a comprehensive understanding of the dataset and its trends, enabling informed decision-making in areas such as movie
  13. production and genre selection. Future endeavors could include the development of advanced machine learning models or a more in-depth exploration of specific genres or time periods to unveil additional insights.