An end-to-end data pipeline built on the Azure cloud to ingest, process, and analyze real-time fleet performance and Environmental, Social, and Governance (ESG) metrics.
The solution delivers actionable insights through a live Power BI dashboard connected to a scalable data lakehouse.
This project addresses the growing need for businesses to monitor not only their operational efficiency but also their environmental impact.
The platform provides a near real-time view of a simulated logistics fleet, answering key business questions such as:
- Driver performance
- Truck efficiency
- Regional operations
- CO₂ emissions
- Fuel Efficiency
The solution is built on a modern, cloud-native data stack, demonstrating a robust implementation of the Medallion Architecture for a scalable and reliable data lakehouse.
The platform provides insights into:
- Operational KPIs: Total Deliveries, Avg Delivery Duration, Avg Speed, Trip Distance
- ESG Metrics: Total CO₂ Emitted (kg), CO₂ Efficiency (kg/km), Fuel Efficiency (km/L)
- Driver Analytics: Individual driver performance & efficiency
- Fleet Analytics: Truck performance by model, emission standard, maintenance status
- Regional Analysis: KPIs sliced by region & traffic index
The Medallion Architecture ensures progressive data refinement.
- landing stage:
raw_json - bronze Table:
raw_converted to (delta) - Silver Fact Table:
delivery_summary - Dimension Tables:
driver_details,truck_details,location_details - Final Gold Table:
enriched_delivery_performance- Wide, denormalized table with 20+ performance metrics and attributes
The platform follows a streaming + batch processing architecture, moving data through Bronze, Silver, and Gold layers.

Data Flow:
- Ingestion – A Python simulator generates JSON data (fleet deliveries & truck telemetry) → sent to Azure Event Hubs.
- Stream Processing – Azure Stream Analytics captures events, performs shaping, and lands data in Azure Data Lake Storage Gen2 (ADLS).
- Bronze Layer (Raw) – PySpark adds metadata and uses Delta Lake MERGE for deduplicated, idempotent tables.
- Silver Layer (Cleansed) – Transforms raw data into a single source of truth, calculating KPIs and aggregated summaries.
- Gold Layer (Business-Ready) – Enriches Silver with dimension tables (drivers, trucks, locations) to produce a wide denormalized table.
- Serving & Visualization – A Synapse Serverless SQL view provides a high-performance endpoint for Power BI in DirectQuery mode.
| Category | Technology | Purpose |
|---|---|---|
| Cloud Platform | Microsoft Azure | Foundational cloud provider for all services |
| Ingestion | Azure Event Hubs, Stream Analytics | Real-time data ingestion & routing |
| Storage | Azure Data Lake Storage Gen2 | Scalable, secure storage for multi-layered data lake |
| Processing | Azure Synapse Analytics, PySpark | ETL & data transformations at scale |
| Lakehouse | Delta Lake | ACID reliability for the data lake |
| Serving | Synapse Serverless SQL | Logical data warehouse over the Gold layer |
| BI & Viz | Microsoft Power BI | Interactive dashboards & reports |
| Languages | Python, SQL | Data processing & query logic |
- Active Azure Subscription
- Deployed resources:
- Synapse Analytics Workspace
- ADLS Gen2
- Event Hubs
- Azure Stream Analytics
- Synapse Managed Identity with Storage Blob Data Contributor role on ADLS
- Create a Namespace in the Azure Portal (SKU: Standard).
- Add Event Hubs:
esg-fleet-events(delivery events)esg-fleet-telemetry(truck telemetry)
- Configure partitions (4–8) and retention (7 days).
- Add a Consumer Group for Stream Analytics (e.g.,
asa_consumer).
- Create containers in your ADLS account:
bronzesilvergoldbronze/errors
- In Access control (IAM), assign:
- ENABLE hierarchical namespace and soft delete
- Storage Blob Data Contributor role to the Stream Analytics Managed Identity.
- Storage Blob Data Contributor role to the Synapse Managed Identity.
- In Azure Portal → Stream Analytics job → Create
esg-shaper. - Inputs:
- Event Hub stream inputs (
esg-fleet-events,esg-fleet-telemetry).
- Event Hub stream inputs (
- Outputs:
- ADLS Gen2 (
bronze/processed/) in JSON. - Optional: Power BI for real-time dashboards.
- ADLS Gen2 (
- Query:
Define aggregation and joins between telemetry and delivery events (seeSQL script).
- Start the ASA job from the portal.
- Push sample events into Event Hubs using the simulator.
- Validate outputs in ADLS:
- Raw JSON in
eventhub-capture(via Event Hub Capture). - Processed/aggregated JSON/Parquet in
bronze/processed/YYYY/....
- Raw JSON in
- Verify Event Hub metrics and Stream Analytics diagnostics in the Azure Portal.
- Synapse/PySpark jobs load data from
bronze/processed/→ transform into Delta tables in Bronze/Silver/Gold zones. - Build Serverless SQL external views for Power BI (
vw_enriched_delivery_performance). - Connect Power BI via DirectQuery for real-time reporting.
- Stream Analytics Managed Identity:
- Enable under Identity → System assigned → On.
- Assign Permissions:
- Storage account → Access control → Add Storage Blob Data Contributor → assign ASA managed identity.
- Do the same for Synapse Managed Identity.
- Event Hubs:
- Use minimal Shared access policies (Data Sender for simulator).
- Store secrets in Azure Key Vault for secure use across services.
- Event Hubs namespace
- ADLS containers provisioned (
bronze/silver/gold/errors) - Stream Analytics job created with Event Hub input + ADLS output
- Managed Identities enabled and assigned correct RBAC roles
- End-to-end pipeline tested with simulator → Event Hubs → ASA → ADLS → Synapse → Power BI
- Data Processing Pipeline
- Run Bronze ingestion script(sample_code_snipets)
- Run Silver transformation script(sample_code_snipets)
- Run Gold enrichment script(sample_code_snipets)
- Power BI Connection
The live dashboard provides a real-time view of fleet performance & ESG KPIs, enabling data-driven decision-making.
- In Power BI Desktop →
Get Data > Azure Synapse Analytics SQL - Enter Synapse Serverless SQL endpoint:
your-workspace-ondemand.sql.azuresynapse.net - Connect in DirectQuery mode
- Select the
vw_enriched_delivery_performanceview
- In Power BI Desktop →
- Streaming ETL – Convert Silver & Gold jobs to Structured Streaming with Auto Loader for lower latency
- Predictive Analytics – Train ML models on Gold data for:
- Delivery time predictions
- Predictive truck maintenance

