Advanced Sentiment Analysis: Naive Bayes Classifier and Rule-Based Methods

Project Overview

This project demonstrates a comprehensive approach to sentiment analysis using both Naive Bayes and rule-based models on datasets consisting of movie reviews and Nokia product feedback. The focus is on comparing these models to assess their effectiveness in understanding nuanced language, particularly handling negations and intensifiers in sentiment analysis.

Objective

The primary objective is to compare two different sentiment analysis methods:

A Naive Bayes classifier that uses statistical probabilities.
An enhanced rule-based classifier that incorporates handling for negations and intensifiers to improve the accuracy of sentiment predictions.

The aim is to demonstrate advanced NLP techniques and the application of Python in developing and evaluating models that can predict sentiment with high accuracy from textual data.

Technical Approach

Data Preparation: Text data is preprocessed using regular expressions to handle various data sources and split into training and test datasets.
Feature Engineering: Sentiment scoring is applied to convert words into numerical data using predefined positive and negative word lists.
Model Training: Both a Naive Bayes classifier and a rule-based classifier are trained with methods suited to their respective paradigms.
Model Evaluation: Both models are evaluated on unseen data. Key performance metrics such as accuracy, precision, recall, and F1-score are used to compare their effectiveness.

Detailed Workflow

Reading Data: The data consists of labelled sentences from movie reviews and Nokia product feedback.
Sentiment Dictionary Construction: Positive and negative word lists are compiled into a sentiment dictionary.
Dataset Splitting: Data is partitioned into training and testing sets to validate the models' generalization.
Model Training and Testing:
- Naive Bayes Classifier: Uses probability calculations based on word frequencies.
- Rule-Based Classifier: Applies logical rules to handle negations and intensifiers, enhancing the understanding of context.
Performance Comparison: The outcomes of the Naive Bayes and rule-based classifiers are compared to determine which method better handles complex language nuances.

Technologies & Libraries Used

Python: Main programming language.
Pandas & NumPy: For data manipulation and numerical operations.
Matplotlib & Seaborn: For visualisation of the models’ performance.
Regular Expressions: Essential for text data preprocessing.

Evaluation Metrics

The following table summarizes the performance of the sentiment analysis model across three different datasets using the enhanced rule-based approach.

Data	Accuracy (%)	Precision (Positive) (%)	Recall (Positive) (%)	Precision (Negative) (%)	Recall (Negative) (%)	F1 Score (Positive) (%)	F1 Score (Negative) (%)
Film training data	65.30	57.13	73.50	73.50	73.50	57.12	77.50
Film testing data	63.42	53.45	53.45	72.97	72.97	53.45	72.97
Nokia data	79.32	80.11	80.11	77.50	77.50	80.11	77.50

Table 5.1: Accuracy metrics output after the testDictionary() command was used on three separate datasets.

Conclusion & Key Takeaways

This analysis highlights the strengths and limitations of both probabilistic and rule-based approaches in sentiment analysis. The project shows how different methodologies can be tailored to enhance model performance, especially in handling linguistic subtleties such as negations ("not great") which are crucial for accurate sentiment analysis. This comparison not only deepens the understanding of NLP applications but also showcases potential enhancements for more robust sentiment analysis models.

Note

This project is positioned as a high-value addition to a CV, demonstrating not just technical NLP skills but also the ability to engage in critical analysis and methodology comparison in machine learning.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
MostUseful.png		MostUseful.png
NaiveBayesAccuracy.png		NaiveBayesAccuracy.png
NormalVEnhancedAccuracy.png		NormalVEnhancedAccuracy.png
PositiveNegativeWords.png		PositiveNegativeWords.png
README.md		README.md
Sentiment.py		Sentiment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced Sentiment Analysis: Naive Bayes Classifier and Rule-Based Methods

Project Overview

Objective

Technical Approach

Detailed Workflow

Technologies & Libraries Used

Evaluation Metrics

Conclusion & Key Takeaways

Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Advanced Sentiment Analysis: Naive Bayes Classifier and Rule-Based Methods

Project Overview

Objective

Technical Approach

Detailed Workflow

Technologies & Libraries Used

Evaluation Metrics

Conclusion & Key Takeaways

Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages