PROJECTS

Every portfolio-ready system, in one place

13 projects across data and platform systems, search and retrieval, applied machine learning research, and MLOps. Each card states what's verified — tests, CI, real metrics — and links straight to source.

Data & Platform Systems (3)

Production-style Prototype
Data & Platform Systems

SignalLake

Backend, data, and platform engineers need fast answers about service latency, failures, and slow endpoints without always requiring a warehouse or managed observability stack.

I designed and built an end-to-end log analytics platform that validates incoming events, preserves raw records in JSONL, transforms them into Hive-partitioned Parquet, and serves operational metrics through DuckDB and FastAPI. On a benchmark of 1 million events across 192 files, partition pruning reduced data scanned from 86.7 MB to 6.3 MB.

FastAPIPythonPydanticDuckDBApache ParquetPyArrowpytest

1M events · 192 files

Proof of Concept
Data & Platform Systems

NYC Subway Foot-Traffic Forecasting

Transit teams need a way to forecast station-level foot traffic and validate streaming ML behavior without depending on a live transit feed. I built a setup that uses historical turnstile records for modeling and a live self-hosted stream of simulated events for runtime evaluation.

I designed and built the real-time streaming infrastructure for a subway foot-traffic forecasting system, including a realistic Kafka producer, a Spark Structured Streaming pipeline into MongoDB, and live Random Forest inference for station entries and exits. Working with a teammate on the historical side, we analyzed approximately 13 million MTA turnstile records, where Random Forest regression and classification models reached approximately 2,700 RMSE, below 4.8% MAE, and 93.36% classification accuracy.

Apache KafkaSpark Structured StreamingPySparkMongoDBscikit-learnPythonPandas

13M historical MTA records

Prototype
Data & Platform Systems

StreamVault

A streaming-platform backend needs to authenticate two different user roles safely, stay correct under concurrent writes, close off SQL injection everywhere, not just in the obvious form fields, and turn raw catalog data into queryable business insight rather than plain CRUD.

I built the backend for a streaming-platform management system: Flask authentication with bcrypt hashing, role-based access control for customer and employee workflows, deadlock-safe transactions, and a Chart.js analytics dashboard backed by six advanced SQL queries.

FlaskPythonMySQLSQLbcryptAuthenticationRole-Based Access Control

bcrypt + RBAC across customer/employee roles

Search, Retrieval & Applied AI (3)

Production-style Prototype
Search, Retrieval & Applied AI

ExpertMatchAI

Keyword search can miss qualified experts whose profiles use different terminology, while semantic search can return conceptually related profiles that omit important specialties or location requirements. ExpertMatchAI combines both retrieval signals with structured filtering to preserve semantic recall, exact-term relevance, and location-aware matching.

During my Software Engineering internship focused on AI/ML and data infrastructure at Global Futures Group, I built the retrieval and ranking infrastructure for an expert-matching platform covering 12,168 profiles. The system combined semantic search, BM25 lexical retrieval, structured filters, and tunable ranking signals, with average search latency under 200 ms, p95 latency under 350 ms, and full index rebuilds in under 60 seconds.

Next.jsTypeScriptFastAPIPythonPostgreSQLPrismaFAISS

12,168 profiles indexed

Production-style Prototype
Search, Retrieval & Applied AI

FuseRank

A single collaborative-filtering signal can miss the broader pattern of what an individual user actually prefers, while item-only similarity has less context for users with sparse history. FuseRank blends user-based and item-based signals so both a user’s nearest neighbors and the items most similar to their preferences influence the same ranking path.

I built and deployed a two-stage hybrid recommender that blends user-based collaborative filtering with embedding-similarity expansion in a single weighted ranking pass, improved relevance over KNN, and reduced p95 latency through that hybrid pipeline. FuseRank trains 128-dimensional user and anime embeddings on a five-million-interaction subset of a 70M+ record dataset, with artifact versioning, experiment tracking, automated testing, and deployment automation across Docker, GitHub Actions, Kubernetes, and GKE.

Hybrid RecommenderSearch and RankingCosine SimilarityTensorFlow EmbeddingsCollaborative FilteringMatrix FactorizationWeighted Scoring

5M interactions (70M+ source)

Proof of Concept
Search, Retrieval & Applied AI

VeriWire

Suspicious wire transfers often require a customer confirmation step, but a voice workflow must verify identity, resist simple replay attempts, and avoid inventing transaction details. I used a simulated bank environment to explore how an AI-assisted call flow could enforce those checks consistently.

I designed and built the end-to-end voice workflow: Twilio Media Streams into a WebSocket bridge, Deepgram’s real-time voice agent for speech-to-text, text-to-speech, and GPT-4o-mini-driven conversation, a FastAPI bank sandbox, an explicit LangGraph decision model, and SQLite persistence for every call event. The system stays inside a simulated bank environment and was tested through live mobile calls using an active Twilio number during the project period.

Voice AIConversational AILangGraphLLM orchestrationFastAPIWebSocketsTwilio Media Streams

pytest · 5 test files

Machine Learning & Research (4)

Research
Machine Learning & Research

DyT vs. LayerNorm in LLM Fine-Tuning

Transformer normalization affects both training stability and runtime cost. I studied whether Dynamic Tanh could replace LayerNorm in post-training without losing quality across multiple model scales and datasets.

I implemented DyT substitutions across DistilGPT-2 and Pythia 17M/410M, fine-tuned the variants with LoRA via Hugging Face PEFT (training under 1% of parameters) on Alpaca, ShareGPT, and RE-WILD, and measured validation loss, inference time, and MT-Bench judged output quality across frozen, selectively unfrozen, and full-SFT setups.

PyTorchHugging FacePEFTLoRADynamic TanhLayerNormDistilGPT-2

DistilGPT-2 + Pythia 17M/410M

Research
Machine Learning & Research

Adversarial Robustness of ResNet-34 on ImageNet

Small, tightly-bounded pixel perturbations can silently break deployed image classifiers. I built a structured evaluation to measure exactly how much accuracy degrades under increasing attack sophistication, and whether that degradation transfers across architectures, a stronger test of model fragility than single-model accuracy alone.

Designed and implemented five adversarial attack strategies against an ImageNet-pretrained ResNet-34, FGSM, three PGD variants (targeted, random-start, momentum), and a patch-based attack, that cut top-1 accuracy from a 70.40% clean baseline to as low as 0.60% under the strongest PGD variant, then measured how much of that degradation transferred cold to DenseNet-121.

PyTorchtorchvisionAdversarial Machine LearningFGSMPGDPatch AttacksL∞ Perturbation

70.40% / 87.00% clean baseline (ResNet-34)

Research
Machine Learning & Research

Electricity Market Price Prediction

Evaluating a price-prediction model on accuracy alone says little about whether it adds real value over a trading-relevant baseline. I paired time-series forecasting models directly with baseline comparisons and backtested trading strategies so every prediction could be judged against a concrete reference point instead of in isolation.

I built a time-series research dashboard that forecasts next-day electricity price direction and returns from 20 lagged, rolling, volatility, momentum, and trend features. I trained GRU, LSTM, and Random Forest models on a strictly time-ordered split, evaluated each one against trivial baselines, and exposed the full evaluation and strategy analysis through Streamlit.

PythonPyTorchscikit-learnLSTMGRURandom ForestTime-Series Forecasting

0.689 ROC-AUC for next-day direction

Benchmark Study
Machine Learning & Research

ResNet-18 Training Performance Benchmark

Model accuracy is only one axis of a training setup. I wanted to isolate the systems questions that decide how fast a model actually trains, including which optimizer, how many data loader workers, and CPU versus GPU execution, independent of accuracy tuning.

I benchmarked ResNet-18 training performance on CIFAR-10 across optimizers, data loader worker counts, and CPU versus GPU execution, then profiled the GPU runs on an NVIDIA A100 with the PyTorch Profiler to see exactly where each training step spent its time.

PythonPyTorchtorchvisionCUDAResNet-18CIFAR-10PyTorch Profiler

NVIDIA A100-SXM4-40GB GPU profiling

MLOps & Production Engineering (3)

Prototype
MLOps & Production Engineering

DevOps Swarm AI

Reviewing a pull request well means reading the diff, checking code quality, and tracing CI failures back to a cause, three different review tasks that a single general-purpose assistant tends to blend together. Purpose-built agents, each focused on one part of that review, can handle each task more precisely than one assistant trying to do everything at once.

I built a DevOps review assistant on top of Google Cloud's agent-starter-pack scaffold and implemented the same four capabilities through three orchestration patterns: a CrewAI crew, a LangGraph pipeline, and a ReAct-style tool-calling agent. I connected the tool-calling version to FastAPI, added structured failure handling, traced requests through Traceloop and Google Cloud Trace, and added a GitLab CI pipeline for the test suite.

CrewAILangGraphLangChainGemini 2.5 ProVertex AIFastAPIPython

Three agent architectures: CrewAI crew, LangGraph pipeline, ReAct tool-calling agent

Course Project
MLOps & Production Engineering

Hotel Reservation Cancellation Prediction

Hotels lose revenue to late cancellations. A model that flags a likely cancellation ahead of time can support overbooking strategy, targeted retention offers, and the detection of habitual or fraudulent cancellers.

I built an end-to-end machine learning pipeline that predicts hotel reservation cancellations from booking and guest features. I benchmarked ten classification algorithms, selected the most informative features with Random Forest importance ranking, balanced the training data with SMOTE, tracked training runs in MLflow, and deployed the serving workflow through Jenkins, Docker, and Google Cloud Run.

PythonLightGBMRandom ForestXGBoostscikit-learnSMOTEMLflow

10-algorithm model comparison

MLOps Prototype
MLOps & Production Engineering

Colorectal Cancer Survival Prediction

Public healthcare data becomes more useful when preprocessing, feature selection, training, and serving all run through a reproducible MLOps workflow rather than a one-off notebook.

I built a reproducible machine-learning workflow on 167,497 public clinical records that preprocesses data, uses chi-square scoring to select the five most predictive features (healthcare costs, tumor size, treatment type, diabetes status, and a population mortality rate), trains a Gradient Boosting classifier, and serves predictions through a Flask UI that was live during the project period.

Pythonscikit-learnMachine LearningSurvival PredictionGradient BoostingFeature EngineeringChi-Square Feature Selection

167,497 clinical records