Skip to content
/ oasis_infobyte Public template

Hey folks this is 4 week data science internship programme provide by oasis onfobyte

Notifications You must be signed in to change notification settings

kabeer09/oasis_infobyte

Repository files navigation

Import necessary libraries

import pandas as pd from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Load the SMS Spam Collection dataset from UCI ML Repository

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip" df = pd.read_csv(url, compression='zip', sep='\t', names=['label', 'message'])

Preprocess the data

df['label'] = df['label'].map({'ham': 0, 'spam': 1}) # Convert labels to binary (0: ham, 1: spam)

Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(df['message'], df['label'], test_size=0.2, random_state=42)

Convert text data to a matrix of token counts

vectorizer = CountVectorizer() X_train_matrix = vectorizer.fit_transform(X_train) X_test_matrix = vectorizer.transform(X_test)

Train a Naive Bayes classifier

classifier = MultinomialNB() classifier.fit(X_train_matrix, y_train)

Make predictions on the test set

predictions = classifier.predict(X_test_matrix)

Evaluate the model

accuracy = accuracy_score(y_test, predictions) conf_matrix = confusion_matrix(y_test, predictions) classification_rep = classification_report(y_test, predictions)

Print the results

print(f"Accuracy: {accuracy}") print("Confusion Matrix:") print(conf_matrix) print("Classification Report:") print(classification_rep)

About

Hey folks this is 4 week data science internship programme provide by oasis onfobyte

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published