π The application is deployed and live
Note
The web app may take 1-2 minutes to load.
Tip
For the best experience, please refer to the Usage Guide section below to learn how to navigate and use the web app effectively.
Customer churn represents the percentage of users discontinuing service within a given period. This project builds a machine learning pipeline to predict customer churn in a telecom business using historical data and deploys the model as a Flask web app with CI/CD integration using GitHub Actions and AWS EKS for scalable production.
- Developed a machine learning model to predict whether a customer of a telecommunication company will churn.
- Followed a modular structure for the entire project.
- Utilized data of over 7000 records to train and develop the model.
- Cleaned and preprocessed the raw data.
- Performed feature transformation, scaled the numerical features and handled imbalance in the dataset.
- Trained the model using various ML algorithms and selected the best one with higher accuracy.
- Deployed the model using a Flask web application for real-time predictions.
- Integrated CI/CD automation using GitHub Actions to build, test, containerize, and deploy the application to AWS EKS (Elastic Kubernetes Service) on every code update.
- Utilized the company's historical data of over 7000 records which includes information such as demographic details, services subscribed and account information.
- For each customer the following information is available:
- Gender
- Senior Citizen
- Partner
- Dependents
- Tenure
- Phone Service
- Multiple Lines
- Internet Service
- Online Security
- Online Backup
- Device Protection
- Tech Support
- Streaming TV
- Streaming Movies
- Contract Type
- Paperless Billing
- Payment Method
- Monthly Charges
- Total Charges
- Cleaned and preprocessed the raw data:
- Handled missing values.
- Removed duplicate records.
- Removed outliers using zscore to avoid overfitting.
- Replaced boolean values with numerical values.
- Converted the values of tenure column in to bin values with a range of 12 months to ensure effective information understanding.
-
Once the data is cleaned and preprocessed I analyzed the data to identify hidden patterns, relationships between features.
-
Implemented both single and cross feature analysis to find relationships betweent features.
-
Analyzed and visualized each feature to understand its values and the value counts to determine its overall importance.
-
Some of the major findings:
- Among the entire customer base around 16% of them are senior citizens.
- Customers who are more likely to churn have lower monthly and total charges.
- Senior citizen customer have higher churn rates than non senior citizen customers.
- The longer a customer stays with the business, the lower the chances of churning.
- Customers with a tenure of within 1 years have equal chances of both churning and staying in the business.
- Customers with a contract type of month-to-month have left the business more often.
-
Visualizations:
-
Distribution of tenure:
-
Imbalance in churn:
-
Monthly and Total Charges by churn:
- Used different classification algorithms to train the model.
- Logistic Regression
- Naive Bayes
- Knn Classifier
- Decision Tree
- Random Forest
- Adaboost Classifier
- Xgboost Classifier
- Support Vector Classifier
- Performed hyper parameter tunning using GridSearchCV to optimize and improve the performance models.
- Evaluated the models with accuracy score and confusion matrix (percision, recall, f1 score) and selected the model with higher accuracy.
- Out of all the algorithms used, Xgboost classifier had the highest accuracy of 81%.
- Developed a Flask web application to deploy the model for real-time predictions.
- Built both front-end and back-end components for the web app.
- Created a custom website where users can enter customer data and receive predictions from the model.
- Deployed the Flask app on local host server for easy access.
- Implemented an end-to-end continuous integration and deployment pipeline using GitHub Actions.
- The pipeline performs the following steps:
- Runs tests (unit test) on our web application using pytest to ensure the application is working as expected.
- Builds a Docker image of the application and pushes it to Amazon Elastic Container Registry (ECR).
- Updates the Kubernetes manifests with the latest image and deploys the application to Amazon EKS.
- Verifies deployment health by checking pod and service status.
- ML Pipeline: Data preprocessing, feature engineering, and XGBoost modeling
- Web Interface: Flask-based prediction interface
- CI/CD Automation: GitHub Actions pipeline for testing, Dockerization, and deployment
- Cloud Deployment: Kubernetes-managed scalable infrastructure on AWS EKS
- Modular Codebase: Production-ready Python implementation
graph LR
A[Code Commit/trigger] --> B[GitHub Actions]
B --> C[Build Docker Image]
C --> D[Push to AWS ECR]
D --> E[Deploy to EKS]
E --> F[Production API]
| Technology | Description |
|---|---|
| Python | Programming language used |
| Flask | Web framework for UI and API integration |
| HTML & CSS | Frontend design and styling |
| Pandas | Cleaning and preprocessing the data |
| Numpy | Performing numerical operations |
| Matplotlib | Visualization of the data |
| GitHub Actions | Automates build, test, and deployment pipelines |
| Docker | Containerization of the application |
| Amazon ECR | Docker image registry for container storage |
| Amazon EKS | Managed Kubernetes service for production deployment |
| Kubernetes | Orchestration platform for scalable deployment |
/πCustomer-Churn-Project
βββ /π.github # GitHub Actions CI/CD workflow
β βββ /πworkflows
β
βββ /πk8s # Kubernetes deployment manifests
β βββ deployment.yaml
β βββ service.yaml
β
βββ /πartifacts # Model artifacts and intermediate data
β
βββ /πdata # Raw and EDA-processed data
β
βββ /πeda_images # Visualizations for EDA
β
βββ /πnotebook # Jupyter notebooks for experimentation
β
βββ /πsrc # Source code (modular ML pipeline)
β βββ /πcomponents # Individual pipeline components
β βββ /πpipelines # Training and prediction pipelines
β
βββ /πstatic # Static assets for the web app
β βββ /πcss
β βββ /πimages
β
βββ /πtemplates # HTML templates for the Flask frontend
β
βββ .dockerignore # Ignore rules for Docker build
βββ Dockerfile # Docker image definition
βββ test_app.py # Unit tests for app functionality
βββ .gitignore # Git ignore rules
βββ README.md # Project documentation
βββ app.py # Flask backend app
βββ requirements.txt # Python dependency list
βββ setup.py # Setup script for packaging
git clone https://github.com/Dhanush-Raj1/Customer-Churn-Project.git
cd Customer-Churn-Projectconda create -p envi python==3.9 -y
source venv/bin/activate # On macOS/Linux
conda activate envi # On Windowspip install -r requirements.txtpython app.pyThe app will be available at: http://127.0.0.1:5000/
1οΈβ£ Open the web app in your browser.
2οΈβ£ Click "Predict" on the home page of the web app.
3οΈβ£ Enter the customer details in the respective dropdowns.
4οΈβ£ Click "Predict" and scroll down, the predicted results will appear.
β
Improved accuracy of the model with advanced fine tunning
β
Real-Time Prediction System
β
Automated Retraining Pipeline
β
Improve UI with a more interactive design.
β
Customer Retention Strategy Recommender.
β
Anomaly Detection for Unexpected Churn
π‘ Contributions, issues, and pull requests are welcome! Feel free to open an issue or submit a PR to improve this project. π
This project is licensed under the MIT License β LICENSE


