I am an Applied Data Scientist and Data Engineer with a foundation in engineering, machine learning, and scalable data systems. I am currently pursuing an M.S. in Applied Data Science at Clarkson University (GPA: 4.0), where my work focuses on building reliable data pipelines, production-ready analytical systems, and decision-support tools for research, operations, and policy.
My background combines:
- Engineering-driven problem solving
- End-to-end data system design (OLTP → ETL → analytics)
- Statistical modeling and machine learning
- Translating complex datasets into actionable insight
- Data Engineering: SQL, data warehousing, star schemas, SCD Type 2, ETL/ELT pipelines, incremental fact loads
- Analytics & Machine Learning: Python, R, regression, PCA, clustering, classification, feature engineering
- Visualization & Reporting: Tableau, Plotly, Shiny, interactive dashboards
- Applications: Flask, SQLAlchemy, REST-style backends, full CRUD systems
- Tools & Platforms: Git, MySQL, Snowflake, Docker, AWS (EC2, S3, Lambda), Airflow
SQL | Data Engineering
End-to-end OLTP → staging → data warehouse implementation for a retail and rental business. Designed dimensional models with SCD Type 2 handling, intermediate fact tables, incremental and late-arriving fact loads, analytical aggregates, and automated ETL validation.
https://github.com/Thooms-coder/zagi-data-warehouse
Python | Big Data | ETL | Visualization
Large-scale analysis of urban traffic using image and audio data. Built reproducible ETL pipelines and cross-modal validation workflows to detect anomalies and data quality issues across heterogeneous data sources.
https://github.com/Thooms-coder/multimodal-taxi-data-analysis-big-data
R | Statistical Analysis | Policy Analytics
Research-driven analysis of 3,142 U.S. counties examining how SNAP participation relates to structural cost-of-living burdens. Applied PCA, regression, and clustering to uncover regional cost patterns and policy-relevant insights.
https://github.com/Thooms-coder/snap-participation-and-structural-cost-analysis
Python | Flask | Full-Stack Application
Designed and built a full-stack productivity and wellness planning platform with user authentication, task tracking, balance scoring, and a relational SQL backend supporting multi-user workflows.
https://github.com/Thooms-coder/shibui-work-wellness-planner
R | Shiny | Decision Support
Interactive Shiny application enabling engineering teams to compare materials by cost, strength, and sustainability using Ashby-style plots, radar charts, and dynamic filtering.
https://github.com/Thooms-coder/material_selector_shinyApp
Python | Machine Learning
Machine learning pipeline predicting U.S. metro status using engineered cost-of-living features and multiple classification models, with an emphasis on interpretability and feature structure.
https://github.com/Thooms-coder/metro-status-prediction-pipeline
-
Research Assistant – Data Analysis (Clarkson University):
Analyzed 50,000+ time-series records from wastewater treatment systems, built regression and time-series models, and developed interactive visual analytics to guide operational decisions. -
Software Developer & Database Assistant (Clarkson University):
Engineered a SQL-backed system for a 200-member rowing club, designing relational schemas and ETL pipelines that automated reporting and reduced manual effort.
- Potsdam, New York
- Email: mungosmj@clarkson.edu
- GitHub: https://github.com/Thooms-coder
I am interested in roles and collaborations involving data engineering, applied machine learning, analytics systems, and research-driven data work.