🧾 Project Overview

This project presents a data-driven approach to pricing optimization using real-world-style product and sales data from a dairy product portfolio that includes multiple yogurt brands and flavor variations.
The analysis was conducted as part of a professional assessment scenario and has been slightly modified and anonymized to protect sensitive business information.

The goal was to explore, model, and simulate the relationship between price and demand, supporting revenue optimization and strategic pricing decisions.
The dataset reflects diverse product attributes such as brand positioning, flavor, volume, and pack size, providing a realistic foundation for demand modeling and price elasticity analysis.

🧩 Project Overview

The workflow includes:

Data preparation & integration
- Merging product and sales data
- Cleaning and standardizing features for modeling
Exploratory Data Analysis (EDA)
- Identifying price and sales distributions
- Correlation heatmaps and scatter plots
- K-Means clustering to segment products by price and demand behavior
Modeling Demand & Price Elasticity
- Linear Regression as baseline model
- Quadratic Regression (price, price², log(price)) to capture non-linear demand effects
- Tree-based Models: Random Forest Regression for predictive comparison
Quadratic Price Simulation
- Simulation of predicted sales and revenue across price ranges
- Identification of the revenue-maximizing price point
- Visualization of demand drop vs. revenue increase to illustrate elasticity

📊 Key Insights

The relationship between price and demand follows a non-linear (quadratic) curve, reflecting real-world elasticity behavior.
Brand and flavor variables significantly impact sales - customer preference plays a stronger role than price alone.
The model identifies an optimal mid-range price that balances revenue and volume.
Tree-based ensemble model as Random Forest confirm similar trends, adding robustness.

🧩 Tech Stack

Python → Data preprocessing, modeling, and simulation
Pandas, NumPy, Matplotlib, Seaborn → Data analysis and visualization
Scikit-learn → Regression models, Random Forest, K-Means clustering
Statsmodels → Elasticity analysis and regression diagnostics
Jupyter Notebook → Interactive workflow and documentation
CSV data sources → already merged and uploaded under pricing_data.xlsx

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
README.md		README.md
pricing analysis.ipynb		pricing analysis.ipynb
pricing_data.xlsx		pricing_data.xlsx
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧾 Project Overview