Skip to content
View Thooms-coder's full-sized avatar
  • Clarkson University
  • New York

Block or report Thooms-coder

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Thooms-coder/README.md

Mutsa Mungoshi

I am an Applied Data Scientist and Data Engineer with a foundation in engineering, machine learning, and scalable data systems. I am currently pursuing an M.S. in Applied Data Science at Clarkson University (GPA: 4.0), where my work focuses on building reliable data pipelines, production-ready analytical systems, and decision-support tools for research, operations, and policy.

My background combines:

  • Engineering-driven problem solving
  • End-to-end data system design (OLTP → ETL → analytics)
  • Statistical modeling and machine learning
  • Translating complex datasets into actionable insight

Technical Focus

  • Data Engineering: SQL, data warehousing, star schemas, SCD Type 2, ETL/ELT pipelines, incremental fact loads
  • Analytics & Machine Learning: Python, R, regression, PCA, clustering, classification, feature engineering
  • Visualization & Reporting: Tableau, Plotly, Shiny, interactive dashboards
  • Applications: Flask, SQLAlchemy, REST-style backends, full CRUD systems
  • Tools & Platforms: Git, MySQL, Snowflake, Docker, AWS (EC2, S3, Lambda), Airflow

Selected Projects

ZAGI Data Warehouse

SQL | Data Engineering
End-to-end OLTP → staging → data warehouse implementation for a retail and rental business. Designed dimensional models with SCD Type 2 handling, intermediate fact tables, incremental and late-arriving fact loads, analytical aggregates, and automated ETL validation.

https://github.com/Thooms-coder/zagi-data-warehouse


Multimodal Taxi Data Analysis

Python | Big Data | ETL | Visualization
Large-scale analysis of urban traffic using image and audio data. Built reproducible ETL pipelines and cross-modal validation workflows to detect anomalies and data quality issues across heterogeneous data sources.

https://github.com/Thooms-coder/multimodal-taxi-data-analysis-big-data


SNAP Participation and Structural Cost Analysis

R | Statistical Analysis | Policy Analytics
Research-driven analysis of 3,142 U.S. counties examining how SNAP participation relates to structural cost-of-living burdens. Applied PCA, regression, and clustering to uncover regional cost patterns and policy-relevant insights.

https://github.com/Thooms-coder/snap-participation-and-structural-cost-analysis


Shibui Productivity Planner

Python | Flask | Full-Stack Application
Designed and built a full-stack productivity and wellness planning platform with user authentication, task tracking, balance scoring, and a relational SQL backend supporting multi-user workflows.

https://github.com/Thooms-coder/shibui-work-wellness-planner


Material Selector Dashboard

R | Shiny | Decision Support
Interactive Shiny application enabling engineering teams to compare materials by cost, strength, and sustainability using Ashby-style plots, radar charts, and dynamic filtering.

https://github.com/Thooms-coder/material_selector_shinyApp


Metro Status Prediction Pipeline

Python | Machine Learning
Machine learning pipeline predicting U.S. metro status using engineered cost-of-living features and multiple classification models, with an emphasis on interpretability and feature structure.

https://github.com/Thooms-coder/metro-status-prediction-pipeline


Applied Experience

  • Research Assistant – Data Analysis (Clarkson University):
    Analyzed 50,000+ time-series records from wastewater treatment systems, built regression and time-series models, and developed interactive visual analytics to guide operational decisions.

  • Software Developer & Database Assistant (Clarkson University):
    Engineered a SQL-backed system for a 200-member rowing club, designing relational schemas and ETL pipelines that automated reporting and reduced manual effort.


Contact

I am interested in roles and collaborations involving data engineering, applied machine learning, analytics systems, and research-driven data work.

Pinned Loading

  1. multimodal-taxi-data-analysis-big-data multimodal-taxi-data-analysis-big-data Public

    IA626 Big Data project analyzing multimodal urban traffic data (image and audio) through reproducible ETL pipelines and cross-modal visual analytics to detect anomalies and data quality issues.

    Python

  2. zagi-data-warehouse zagi-data-warehouse Public

    End-to-end OLTP to data warehouse implementation with SCD Type 2 dimensions, incremental ETL, and analytical aggregates.

    SQL

  3. snap-participation-and-structural-cost-analysis snap-participation-and-structural-cost-analysis Public

    This project analyzes how SNAP participation rates relate to structural cost-of-living patterns across U.S. counties. Using PCA, regression, and clustering on county-level cost shares, income, and …

    R

  4. shibui-work-wellness-planner shibui-work-wellness-planner Public

    Shibui Planner is a full-stack productivity and wellness application designed to help users maintain balance between focused work and physical activity. Built with Flask, SQLAlchemy ORM, and a 4-ta…

    Python

  5. material_selector_shinyApp material_selector_shinyApp Public

    Interactive Shiny app for exploring and comparing construction materials. Includes dynamic filtering, Ashby-style plots, and radar chart comparisons, with a polished themed UI.

    R

  6. metro-status-prediction-pipeline metro-status-prediction-pipeline Public

    Predicting U.S. metro status using structural cost-of-living shares and machine learning.

    Python