This project aims to analyze customer shopping behavior using transactional, demographic, and purchase data. The goal is to uncover insights on customer segments, purchasing patterns, product preferences, and revenue contributions to inform marketing, sales, and business strategies.
The analysis leverages Python (Pandas, Seaborn, Matplotlib) for data cleaning and exploration, SQL for querying aggregated metrics, and Power BI for interactive visualizations.
| File | Description |
|---|---|
customer_shopping_behavior.csv |
Raw dataset containing customer transactions, demographics, and purchase details. |
Customer_behavior_analysis.ipynb |
Python notebook performing data cleaning, feature engineering, exploratory data analysis (EDA), and visualizations. |
Customer_behavior_analysis.sql |
SQL queries for analyzing revenue, discounts, top products, shipping comparisons, and customer segmentation. |
Customer Behavior.pbix |
Power BI report visualizing key metrics such as revenue by segment, purchase trends, and seasonal patterns. |
The dataset customer_shopping_behavior.csv contains the following fields:
customer_id: Unique identifier for each customerage: Customer agegender: Customer genderitem_purchased: Name of the purchased itemcategory: Item categorypurchase_amount: Amount spent per transactionlocation: Customer locationsize: Size of the purchased itemcolor: Color of the purchased itemseason: Season of purchasereview_rating: Customer review ratingsubscription_status: Whether customer has a subscriptionshipping_type: Type of shipping selecteddiscount_applied: Whether a discount was appliedprevious_purchases: Number of previous purchasespayment_method: Payment method usedage_group: Derived age group based onagepurchase_frequency_days: Derived numeric representation of purchase frequencycustomer_segement: Derived segment (New,Returning,Loyal)
- Data Cleaning and preprocessing (handling nulls, renaming columns, creating derived columns like
age_groupandcustomer_segement) - Exploratory Data Analysis (EDA):
- Distribution of numeric variables (purchase amount, review rating, age, etc.)
- Countplots for categorical variables (gender, category, subscription status, shipping type, payment method, etc.)
- Barplots for revenue by gender, discount usage by age group, and subscription status per customer segment
- Stacked bar chart of purchase amount by season and category with values annotated
- Total revenue by gender
- Customers who used a discount but spent above average
- Top 5 products by average review rating
- Comparison of average purchase amount between standard and express shipping
- Average spend and total revenue for subscribers vs non-subscribers
- Top 5 products with the highest percentage of discounted purchases
- Customer segmentation counts (New, Returning, Loyal)
- Top 3 most purchased products within each category
- Subscription behavior of repeat buyers
- Revenue contribution by age group
- Interactive dashboards visualizing:
- Purchase trends over seasons and categories
- Revenue distribution across customer segments and age groups
- Subscription and discount impact on purchases
- Top products and categories
- Python: Pandas, NumPy, Matplotlib, Seaborn
- SQL: PostgreSQL for aggregation and querying
- Power BI: Interactive dashboards and visualizations
- Revenue is influenced by customer segments, age groups, and gender.
- Discounts increase sales for certain age groups but may not always correlate with higher purchase amounts.
- Subscription customers generally spend more on average.
- Top products can be identified by purchase frequency and review ratings to guide marketing and inventory decisions.
- Seasonal trends impact category-specific purchases.
- Open
Customer_behavior_analysis.ipynbin Jupyter Notebook to explore the dataset, preprocessing, and visualizations. - Run SQL queries in
Customer_behavior_analysis.sqlon a PostgreSQL database with the tablecustomerloaded. - Open
Customer Behavior.pbixin Power BI to view interactive dashboards and insights.