π This project detects fraudulent credit card transactions using a Random Forest Classifier enhanced with SMOTE (Synthetic Minority Over-sampling Technique).
π Built on the Kaggle dataset, it includes preprocessing, resampling, model training, threshold tuning, evaluation, and feature importance analysis.
π¦ Records: 284,807 transactions
𧬠Features:
V1βV28: PCA-anonymized featuresAmount: Transaction amount (scaled)Time: Seconds since first transaction (dropped)Class: Target (0 = Legit β , 1 = Fraud β)
- Legit: 284,315 π’
- Fraud: 492 π΄
π Visualized:
- Class distribution
- Amount distribution by class
- Correlation heatmaps (features vs
Class) - Boxplots (e.g.,
V14vsClass) - Hourly frequency of transactions
- 2D PCA scatter plot
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
df['Amount'] = StandardScaler().fit_transform(df[['Amount']])
X = df.drop(['Time', 'Class'], axis=1)
y = df['Class']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, stratify=y, random_state=42
)from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
X_train_resampled, y_train_resampled = sm.fit_resample(X_train, y_train)from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced')
rfc.fit(X_train_resampled, y_train_resampled)π§ Trained with class_weight='balanced' to handle imbalance
π 100 decision trees used
from sklearn.metrics import precision_recall_curve
y_proba = rfc.predict_proba(X_test)[:, 1]
precision, recall, thresholds = precision_recall_curve(y_test, y_proba)
optimal_threshold = 0.3
y_pred_adjusted = (y_proba > optimal_threshold).astype(int)ποΈ Adjusted threshold from default (0.5) to improve fraud recall
| π Metric | π Value |
|---|---|
| Accuracy | 99.75% |
| ROC AUC Score | 0.99 |
| Precision (Fraud) | ~0.87 |
| Recall (Fraud) | ~0.78 |
| F1-Score (Fraud) | ~0.82 |
π§Ύ Confusion Matrix:
[[85268 27]
[ 32 116]]
π Classification Report:
- Class 0: Precision = 1.00, Recall = 1.00
- Class 1: Precision = 0.87, Recall = 0.78
π Most important features by Random Forest:
V17,V14,V12,V10
π Visualized with horizontal bar plot
creditcard-fraud-detection/
β
βββ notebook.ipynb # π Full implementation and analysis
βββ README.md # π Project overview
βββ images/ # πΌοΈ Visuals and plots
βββ requirements.txt # π¦ Python dependencies
βοΈ Random Forest + SMOTE = Powerful combo for imbalanced fraud detection
π Threshold tuning improves recall for fraud cases
π Features V14, V17, V12, and V10 are highly informative
π‘ Easy to interpret, scalable, and reproducible
π Try alternative models:
- XGBoost π²
- LightGBM β‘
- Logistic Regression π
π§ͺ Add:
- GridSearchCV for hyperparameter tuning
- Real-time deployment using Flask / Gradio / Streamlit
numpy
pandas
matplotlib
seaborn
scikit-learn
imblearnMIT License Β© 2025 Anton Atef
π¨βπ» Feel free to fork, clone, and submit pull requests!
π¬ Suggestions and issues are welcome anytime!
π§ Email: tony.atef.954@gmail.com