This repository contains the report and supporting code for the project "Sales Prediction in Financial Domain". The project explores various machine learning techniques applied to different datasets, each focused on different aspects of sales prediction.
- Introduction
- Project Structure
- Datasets
- Methods and Techniques
- Results
- Conclusion
- Future Work
- Contact
Sales prediction is crucial in the financial domain as it helps businesses make informed decisions, optimize strategies, and enhance customer satisfaction. This project explores different machine learning models applied to five datasets from Kaggle, each representing different sales-related challenges.
The project is organized into the following sections:
- Chapter 1: Overview of Sales Prediction
- Chapter 2: Product Recommendation System using the BigBasket Dataset
- Chapter 3: House Price Prediction using Linear Regression and RandomForestRegressor
- Chapter 4: Sentiment Analysis on McDonald's Store Reviews
- Chapter 5: Health Insurance Cross-sell Prediction
- Chapter 6: SuperStore Sales Prediction using Time Series Analysis
Each section includes code (Week3/ folder), results, and discussions on the findings.
The following datasets from Kaggle were used in this project:
- BigBasket Entire Product List - For building a product recommendation system. Dataset Link
- House Price Prediction - For predicting house prices using various regression techniques. Dataset Link
- McDonald's US Store Reviews - For sentiment analysis using text classification models. Dataset Link
- Health Insurance Cross Sell Prediction - For predicting customer interest in vehicle insurance. Dataset Link
- SuperStore Sales Dataset - For time series analysis to predict future sales. Dataset Link
The project employs a variety of machine learning methods, including:
- TF-IDF and Cosine Similarity: For the product recommendation system.
- Linear Regression and RandomForestRegressor: For house price prediction.
- RandomForestClassifier, DecisionTreeClassifier, SVC, MultiNomialNB: For sentiment analysis on customer reviews.
- RandomForestClassifier: For predicting customer interest in vehicle insurance.
- ARIMA: For time series analysis and sales prediction.
Evaluation metrics such as MAE, R^2 score, accuracy score, and ROC AUC were used to measure the performance of the models.
Each section of the report details the results obtained from the applied models, highlighting the strengths and limitations of each approach.
The project demonstrates the importance of machine learning in making informed predictions across different domains. The insights gained from each dataset offer valuable guidance for businesses in optimizing their strategies and improving customer satisfaction.
Further improvements can be made by exploring more advanced machine learning models, neural networks, fine-tuning the existing models, and applying them to larger, more diverse datasets.
Feel free to explore the repository, and if you have any questions or suggestions, please feel free to raise an issue or contact me directly. [minhhieutran181103@gmail.com](mailto: minhhieutran181103@gmail.com)