Twitter Sentiment-Analysis

Overview

This repository contains a Twitter sentiment analysis project using classical machine learning models. The goal is to classify tweets into four sentiment categories: Positive, Negative, Neutral, and Irrelevant. The project includes full preprocessing, feature extraction, model training, evaluation, and saving the final model for deployment.

Dataset

The dataset used for this project is taken from Kaggle: Twitter Entity Sentiment Analysis.

Training set: twitter_training.csv
Validation set: twitter_validation.csv
Columns:
- ID → Tweet ID
- Topic → Topic of the tweet
- Sentiment → Sentiment label (Positive, Negative, Neutral, Irrelevant)
- Text → Original tweet text

Features & Preprocessing

Text Cleaning
- Lowercasing, HTML decoding
- Remove URLs, mentions (@user), hashtags, emojis, and special characters
- Tokenization, stopwords removal, lemmatization
Label Encoding
- Convert sentiment labels into integers for model training
Feature Extraction
- TF-IDF vectorization with unigrams and bigrams
- Max features: 10,000

Models Used

Logistic Regression
Multinomial Naive Bayes
Decision Tree
Random Forest (final model)
Linear SVM

Evaluation metrics: Accuracy, Precision, Recall, F1-score, Confusion Matrix, ROC Curve

Final Model: Random Forest with 200 estimators trained on the full dataset.

Results

The table below shows the accuracy of different machine learning models on the validation set:

Model	Accuracy
Logistic Regression	0.865
Naive Bayes	0.745
Decision Tree	0.886
Random Forest	0.946
Linear SVM	0.904

Key Insight:

Random Forest achieved the highest accuracy (0.946) across all classes.
It demonstrates the best balance between precision, recall, and F1-score.
Therefore, Random Forest is selected as the final model for deployment and further predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
twitter-sentiment-analysis.ipynb		twitter-sentiment-analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Sentiment-Analysis

Overview

Dataset

Features & Preprocessing

Models Used

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Twitter Sentiment-Analysis

Overview

Dataset

Features & Preprocessing

Models Used

Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages