This project focuses on customer segmentation using various clustering techniques. The goal is to categorize customers into different groups based on demographic and financial data to better understand customer behavior and improve business decision-making.
The dataset contains customer information with the following attributes:
- Sex
- Marital Status
- Age
- Education
- Income
- Occupation
- Settlement Size
After preprocessing, the dataset is encoded and standardized for clustering analysis.
- Handling Missing Values: Any missing values were either imputed or removed.
- Encoding Categorical Variables: Applied appropriate encoding (one-hot or label encoding) for categorical features.
- Feature Scaling: Standardized numerical features such as
AgeandIncometo ensure uniformity in clustering. - Feature Engineering: Created new meaningful features based on existing data.
Different clustering techniques were implemented and evaluated:
- Used the Elbow Method and Silhouette Score to determine the optimal number of clusters.
- Applied K-Means to segment customers into groups.
- Created dendrograms to visualize cluster formation.
- Used Agglomerative Clustering with Ward’s linkage method.
- Identified clusters based on density and detected outliers.
- Used probabilistic clustering to determine soft cluster memberships.
- Automatically detected the number of clusters without predefined values.
- Silhouette Score: Used to assess clustering performance.
- Cluster Visualization: Plotted clusters to understand distributions across different customer features.
- Business Insights: Interpreted clusters to derive actionable insights for marketing strategies and customer engagement.
- Gender Distribution: 54.3% Male
- Settlement Size: 49.5% from small cities, 27.2% from mid-sized cities, 23.4% from big cities.
- Marital Status: 50.3% single, 49.6% married.
- Education: 69.3% graduate school, 14.6% university, 14.4% other/unknown.
- Python (pandas, numpy, sklearn, seaborn, matplotlib)
- Jupyter Notebook
- scikit-learn for clustering models
- Clone the repository:
git clone https://github.com/EstherMamai/customer-segmentation.git cd customer-segmentation - Install dependencies:
pip install -r requirements.txt
- Run the clustering notebook:
Open
jupyter notebook
customer_segmentation.ipynband execute the cells.
- Implement deep learning for customer behavior analysis.
- Integrate clustering results into a business intelligence dashboard.
- Explore supervised learning models for targeted marketing.