Thaju K Habeeb

About

I am a Data Scientist and AI/ML Engineer focused on applied Machine Learning, Generative AI (GANs), and scalable Data Engineering. I enjoy taking on messy, high-dimensional data problems and architecting elegant, highly-optimized predictive systems.

My recent work revolves around large-scale feature engineering, modernizing legacy pipelines for exponential speedups, and leveraging deep learning to synthesize and augment data. My toolkit primarily consists of Python, SQL, PyTorch, PySpark, and cloud platforms like AWS and Databricks.

Experience

Nov 2024 — Present

Data Scientist

Transunion

Designed a multi-model regression system predicting customer purchase behavior across 900+ merchants, reducing dimensionality by 99%. Overhauled legacy Python architecture to PySpark, slashing production scoring time from 80 to 6 hours. Developed synthetic data pipelines using GANs and CTGANs for advanced augmentation.

PySpark
PyTorch
GANs
CTGANs

2022 — 2024

Data Scientist

EXL Analytics

Engineered targeted data solutions addressing bottom-line impact. Built a Random Forest classification system yielding 88% accuracy for distinguishing business statuses. Implemented record linkage using deep string matching and GraphFrame algorithms to unify disjoint datasets into multi-tier corporate hierarchies.

Random Forest
GraphFrame
Data Integration

2021 — 2022

Project Engineer

Wipro

Diagnosed and resolved a critical revenue-leakage bug within a major airline booking platform. Contributed to the design and development of the core booking and cancellation API for robust third-party integration.

API Design
Software Engineering

Education

2017 — 2021

B. Tech Mechanical Engineering

National Institute of Technology Calicut