The focus of this project was Sentiment Analysis using Wordcloud and Natural Language Processing with Supervised Learning Models.
NLP Tools:
-
NLTK - This is the traditional go to library for NLP. NLTK is also used almost exclusively in academic contexts. Let me stress, this does NOT mean that NLTK is the best NLP library out there. Here is a good article on NLTK vs. spaCy.
-
TF-IDF Vectorizer - This is an excellent vectorizer to use in tandem with K-Means clustering. TF-IDF is often more appropriate than a regular count vectorizer. TF-IDF analysis represents a core component of this project!
-
CountVectorizer - implements both tokenization and occurrence counting in a single class.
Infrastructure Tools: