Joel M mjsushanth

Hello, I'm Joel M

AI enthusiast & Data Engineer | M.S. in Artificial Intelligence @ Northeastern University

I like AI/ML research, enterprise-scale data engineering, building systems in deep learning, NLP, and computer vision. Please visit: https://mjsushanth.github.io/

Portfolio:

Do visit my portfolio here

Study Notes:

My study notes are categorized and placed in here!
Credits to Obsidian, such a perfect app for taking notes. Here, you can expect to find deep-math diving, clear mental models, intuition, project-research on whatever my work has produced.

Research & Academic Projects

FinRAG/FinSights: Production-Grade Financial Intelligence System — Hybrid dual-path architecture combining structured queries (DuckDB/SQL dimension tables) with semantic retrieval. Processes 72M→1M sentences via stratified sampling with temporal weighting across regulatory eras. Check this out!
- Data Engineering Pipeline: DuckDB stratified sampling (30+ SQL scripts) with weighted multi-objective scoring, fuzzy-matched integrations, conditional temporal stratification.
- Advanced RAG Engineering: Sentence-level embeddings, multi-query expansion with window-hopping retrieval, citation provenance via document headers for exact traceability. Polars/Parquet logging, serverless-ready architecture. Achieves $0.017-0.025/query cost, no managed DB overhead.
Text-to-Pose Diffusion: Built a CLIP-conditioned diffusion model with cross-attention + anatomical loss for 3D pose generation.
- Has deeply researched concepts on Motion/3D data: (pose representation, N-joint hierarchical mapping, kinematic chains, pelvis-spine-extremity validation) and the architecture of Hybrid CNN-Transformer Diffusion, CLIP Semantic Encoding & Projection, Dual-Pass CFG and Anatomical Constraint Enforcement. See Report here., See Design here.
Multi-View 3D Scene Analysis: Created a 10k+ LOC pipeline with MV scene analysis, pose-guided filtering, occlusion handling, and RANSAC validation on ETH3D. See Design Flow.
Protein Structure Prediction: Implemented HMM, CRF, BiLSTM; CRF reached 67% accuracy on CB513 using evolutionary + context features. See Report here.
SocrAItic Circle: Multi-Agent Debate LLMs workflow, designed with multi-phase debate cycles, iterative refinement, YAML-driven orchestration, and judge modules.
Artist Classification: Compared SVM-SIFT-BoVW, CAEs, VAEs, and CNNs; SVM achieved 89% accuracy on 50-class dataset.

Other works:

Multi-vector ViT+CLIP with LoRA and ColBERT-style MaxSim retrieval Demo Notebook.
An example workflow of ML-Serving using Gitub CI/CD and AWS Lambda, SAM Infrastructure. Src Code. , Notes here. Study notes.
Usage of Optuna and MLFlow using a synthetic time-series generator Src Code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joel M mjsushanth

Highlights

Block or report mjsushanth

Hello, I'm Joel M

Portfolio:

Study Notes:

Research & Academic Projects

Other works:

📫 Connect with Me

Pinned Loading

Uh oh!