Skip to content

vivusia/pricing-analysis-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🧾 Project Overview

This project presents a data-driven approach to pricing optimization using real-world-style product and sales data from a dairy product portfolio that includes multiple yogurt brands and flavor variations.
The analysis was conducted as part of a professional assessment scenario and has been slightly modified and anonymized to protect sensitive business information.

The goal was to explore, model, and simulate the relationship between price and demand, supporting revenue optimization and strategic pricing decisions.
The dataset reflects diverse product attributes such as brand positioning, flavor, volume, and pack size, providing a realistic foundation for demand modeling and price elasticity analysis.


🧩 Project Overview

The workflow includes:

  1. Data preparation & integration

    • Merging product and sales data
    • Cleaning and standardizing features for modeling
  2. Exploratory Data Analysis (EDA)

    • Identifying price and sales distributions
    • Correlation heatmaps and scatter plots
    • K-Means clustering to segment products by price and demand behavior
  3. Modeling Demand & Price Elasticity

    • Linear Regression as baseline model
    • Quadratic Regression (price, price², log(price)) to capture non-linear demand effects
    • Tree-based Models: Random Forest Regression for predictive comparison
  4. Quadratic Price Simulation

    • Simulation of predicted sales and revenue across price ranges
    • Identification of the revenue-maximizing price point
    • Visualization of demand drop vs. revenue increase to illustrate elasticity

📊 Key Insights

  • The relationship between price and demand follows a non-linear (quadratic) curve, reflecting real-world elasticity behavior.
  • Brand and flavor variables significantly impact sales - customer preference plays a stronger role than price alone.
  • The model identifies an optimal mid-range price that balances revenue and volume.
  • Tree-based ensemble model as Random Forest confirm similar trends, adding robustness.

🧩 Tech Stack

  • Python → Data preprocessing, modeling, and simulation
  • Pandas, NumPy, Matplotlib, Seaborn → Data analysis and visualization
  • Scikit-learn → Regression models, Random Forest, K-Means clustering
  • Statsmodels → Elasticity analysis and regression diagnostics
  • Jupyter Notebook → Interactive workflow and documentation
  • CSV data sources → already merged and uploaded under pricing_data.xlsx