This project uses natural language processing and machine learning techniques to analyze book plot descriptions and predict genres. Before the classification task, data preprocessing and exploratory analysis were performed, including text cleaning, lemmatization and visualizations. A recommendation system was also created based on 1-grams and 2-grams to suggest similar books.
- Text preprocessing: punctuation removal, lowercasing, stopword removal, lemmatization
- Exploratory analysis with visualizations to understand genre distributions and description patterns
- TF-IDF feature extraction using unigrams and bigrams
- Recommendation system leveragin n-gram similariy for movie suggestions
- Classification using Multinomial Naive Bayes, Random Forest and Linear SVC
- Cross-validation and test set evaluation with precision, recall, F1 and accuracy
- Key insights on genre-specific language patterns and feature importance