Sentiment Analysis on Twitter

Task: Classification

Highlights

Natural Language Processing (NLP)
BERT
Machine Learning
Deep Learning
Exploratory Data Analysis (EDA)

Data Source

The dataset is sourced from Kaggle and includes tweets from Twitter, labeled as positive (normal speech) or negative (hate speech).

Size: 50k rows x 2 columns (2/3 for training, 1/3 for testing)
Description: Each row represents a tweet and its corresponding classification as positive or negative.

Approaches of Analysis

Task

Using BERT encoder to vectorize the tweets, followed by classification using logistic regression, random forest, neural network, and BERT models to distinguish between normal speech and hate speech, aiming to improve the social media environment.

Data Preprocessing

Convert to lowercase
Remove numbers
Remove punctuation
Remove whitespaces
Remove non-ASCII characters
Remove HTML characters
Tokenization
Remove stopwords
Stemming
Rejoin tokens

Visualization

Top 25 Words

Tweet Length Distribution

Word Clouds (General)

Word Clouds (Hate Speech)

Words of Hate Topics

Models Used

Using BERT encoder to vectorize the tweets, followed by classification using logistic regression, random forest, neural network, and BERT models to distinguish between normal speech and hate speech, aiming to improve the social media environment.

Evaluation

Plot ROC-AUC curve to evaluate the model performance.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
images		images
Deliverable2.docx		Deliverable2.docx
Deliverable2.pdf		Deliverable2.pdf
Final_report.pdf		Final_report.pdf
README.md		README.md
hate_speech (5).ipynb		hate_speech (5).ipynb
test_tweets_anuFYb8.csv		test_tweets_anuFYb8.csv
train_E6oV3lV.csv		train_E6oV3lV.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis on Twitter

Task: Classification

Highlights

Data Source

Approaches of Analysis

Task

Data Preprocessing

Visualization

Models Used

Evaluation

About

Releases

Packages

Languages

VioletGo319/Hate_speech_detection

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on Twitter

Task: Classification

Highlights

Data Source

Approaches of Analysis

Task

Data Preprocessing

Visualization

Models Used

Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages