The models are built on top of PySpark platform to detect spam emails. With PySpark framework, additional works are done to develope applications that runs on HDFS and streams data through flume and kafka, enabling real-time detection.
- Data Preprocessing
- Modeling
- Naive Bayes
- Naive Bayes + ngram
- Logistic Regression
- Random Forest
- Best Model
- Naive Bayes Classifier
- Assumptions
- References for Model Introduction and Algorithms
- More Model Introductions
- Naive Bayes Classifier