About
I am a Data Scientist and AI/ML Engineer focused on applied Machine Learning, Generative AI (GANs), and scalable Data Engineering. I enjoy taking on messy, high-dimensional data problems and architecting elegant, highly-optimized predictive systems.
My recent work revolves around large-scale feature engineering, modernizing legacy pipelines for exponential speedups, and leveraging deep learning to synthesize and augment data. My toolkit primarily consists of Python, SQL, PyTorch, PySpark, and cloud platforms like AWS and Databricks.
Experience
Data Scientist
Designed a multi-model regression system predicting customer purchase behavior across 900+ merchants, reducing dimensionality by 99%. Overhauled legacy Python architecture to PySpark, slashing production scoring time from 80 to 6 hours. Developed synthetic data pipelines using GANs and CTGANs for advanced augmentation.
- PySpark
- PyTorch
- GANs
- CTGANs
Data Scientist
Engineered targeted data solutions addressing bottom-line impact. Built a Random Forest classification system yielding 88% accuracy for distinguishing business statuses. Implemented record linkage using deep string matching and GraphFrame algorithms to unify disjoint datasets into multi-tier corporate hierarchies.
- Random Forest
- GraphFrame
- Data Integration
Project Engineer
Diagnosed and resolved a critical revenue-leakage bug within a major airline booking platform. Contributed to the design and development of the core booking and cancellation API for robust third-party integration.
- API Design
- Software Engineering
Education