Sentiment Analysis on Twitter

Task: Classification

Highlights

Natural Language Processing (NLP)
BERT
Machine Learning
Deep Learning
Exploratory Data Analysis (EDA)

Data Source

The dataset is sourced from Kaggle and includes tweets from Twitter, labeled as positive (normal speech) or negative (hate speech).

Size: 50k rows x 2 columns (2/3 for training, 1/3 for testing)
Description: Each row represents a tweet and its corresponding classification as positive or negative.

Approaches of Analysis

Task

Using BERT encoder to vectorize the tweets, followed by classification using logistic regression, random forest, neural network, and BERT models to distinguish between normal speech and hate speech, aiming to improve the social media environment.

Data Preprocessing

Convert to lowercase
Remove numbers
Remove punctuation
Remove whitespaces
Remove non-ASCII characters
Remove HTML characters
Tokenization
Remove stopwords
Stemming
Rejoin tokens

Visualization

Top 25 Words

Tweet Length Distribution

Word Clouds (General)

Word Clouds (Hate Speech)

Words of Hate Topics

Models Used

Using BERT encoder to vectorize the tweets, followed by classification using logistic regression, random forest, neural network, and BERT models to distinguish between normal speech and hate speech, aiming to improve the social media environment.

Evaluation

Plot ROC-AUC curve to evaluate the model performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Sentiment Analysis on Twitter

Task: Classification

Highlights

Data Source

Approaches of Analysis

Task

Data Preprocessing

Visualization

Models Used

Evaluation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Sentiment Analysis on Twitter

Task: Classification

Highlights

Data Source

Approaches of Analysis

Task

Data Preprocessing

Visualization

Models Used

Evaluation