π― Built an end-to-end machine learning pipeline to predict passenger survival in the Titanic dataset. Applied classification models (SVM, Random Forest, Logistic Regression) and achieved strong results using feature engineering, visualization, and evaluation metrics.
This project applies classic ML techniques to predict the survival of Titanic passengers. It involves:
- Cleaning and preprocessing data
- Feature extraction and transformation
- Exploratory data analysis (EDA)
- Applying supervised & unsupervised models
- Model evaluation using classification metrics
The dataset used is the Titanic dataset, which includes information like:
- Passenger ID, Name
- Age, Gender, Class
- Fare, Embarked Port
- Survival status
π¦ Data Processing & Feature Engineering
- Handled missing values and outliers
- Extracted features like
Title,FamilySize,IsAlone - One-hot encoded categorical variables
- Correlation heatmap + feature importance
π Exploratory Data Analysis (EDA)
- Distribution plots for Age, Fare, Class
- Survival rate by Gender, Class, Embarked
- Cross-tab visualizations
π€ Model Training & Evaluation
- Supervised Models:
- Logistic Regression
- Random Forest Classifier
- SVM
- K-Nearest Neighbors (KNN)
- Naive Bayes
- Unsupervised Models:
- K-Means Clustering
- DBSCAN Clustering
- Model Metrics:
- Accuracy, F1-score, Recall
- Confusion Matrix
- ROC-AUC Score
pandas,numpyβ Data manipulationmatplotlib,seabornβ Data visualizationscikit-learnβ Modeling & preprocessingjupyter notebookβ Development interface
The models were evaluated on:
- Accuracy
- Precision
- Recall
- F1-score
π‘ Achieved high predictive performance using Random Forest and SVM models with properly tuned hyperparameters.
# Clone the repo
git clone https://github.com/kumarritik24/Titanic-Survival-Prediction-using-Machine-Learning.git
cd Titanic-Survival-Prediction-using-Machine-Learning
# Install required packages
pip install -r requirements.txt
# Run the notebook
jupyter notebook titanic-ml.ipynb