AI enthusiast & Data Engineer | M.S. in Artificial Intelligence @ Northeastern University
I like AI/ML research, enterprise-scale data engineering, building systems in deep learning, NLP, and computer vision. Please visit: https://mjsushanth.github.io/
- Do visit my portfolio here
- My study notes are categorized and placed in here!
- Credits to Obsidian, such a perfect app for taking notes. Here, you can expect to find deep-math diving, clear mental models, intuition, project-research on whatever my work has produced.
- FinRAG/FinSights: Production-Grade Financial Intelligence System — Hybrid dual-path architecture combining structured queries (DuckDB/SQL dimension tables) with semantic retrieval. Processes 72M→1M sentences via stratified sampling with temporal weighting across regulatory eras. Check this out!
- Data Engineering Pipeline: DuckDB stratified sampling (30+ SQL scripts) with weighted multi-objective scoring, fuzzy-matched integrations, conditional temporal stratification.
- Advanced RAG Engineering: Sentence-level embeddings, multi-query expansion with window-hopping retrieval, citation provenance via document headers for exact traceability. Polars/Parquet logging, serverless-ready architecture. Achieves $0.017-0.025/query cost, no managed DB overhead.
- Text-to-Pose Diffusion: Built a CLIP-conditioned diffusion model with cross-attention + anatomical loss for 3D pose generation.
- Has deeply researched concepts on Motion/3D data: (pose representation, N-joint hierarchical mapping, kinematic chains, pelvis-spine-extremity validation) and the architecture of Hybrid CNN-Transformer Diffusion, CLIP Semantic Encoding & Projection, Dual-Pass CFG and Anatomical Constraint Enforcement. See Report here., See Design here.
- Multi-View 3D Scene Analysis: Created a 10k+ LOC pipeline with MV scene analysis, pose-guided filtering, occlusion handling, and RANSAC validation on ETH3D. See Design Flow.
- Protein Structure Prediction: Implemented HMM, CRF, BiLSTM; CRF reached 67% accuracy on CB513 using evolutionary + context features. See Report here.
- SocrAItic Circle: Multi-Agent Debate LLMs workflow, designed with multi-phase debate cycles, iterative refinement, YAML-driven orchestration, and judge modules.
- Artist Classification: Compared SVM-SIFT-BoVW, CAEs, VAEs, and CNNs; SVM achieved 89% accuracy on 50-class dataset.
- Multi-vector ViT+CLIP with LoRA and ColBERT-style MaxSim retrieval Demo Notebook.
- An example workflow of ML-Serving using Gitub CI/CD and AWS Lambda, SAM Infrastructure. Src Code. , Notes here. Study notes.
- Usage of Optuna and MLFlow using a synthetic time-series generator Src Code.