Open to Contract & Full-Time

I find what the
data is hiding.

Senior Data Scientist and Analyst with 6+ years building fraud detection systems, ML pipelines, and real-time analytics at government scale. I dig into complex data problems and surface what matters — from $600K+ in fraudulent tax claims caught to real-time arbitrage engines processing live market data.

View Projects Get In Touch
6+
Years of experience
$600K+
Fraud identified at IRS
50K+
Records analyzed with NLP
40%
Reporting overhead reduced

Personal Projects
Built Outside Work
Real systems. Real math. Real stakes.
Results dashboard — in progress
Options Flow Trading Bot
Paper Trading

Quantitative trading system that monitors institutional options flow (unusual whale activity) via API, scores signals on a composite of implied volatility, premium size, and volume/open-interest ratio, then sizes positions using half-Kelly criterion for variance reduction. Runs two concurrent strategies: same-day gamma scalps on SPY and multi-day swing trades following high-conviction whale flow.

  • Half-Kelly position sizing: reduces variance vs full Kelly while preserving long-run growth
  • Two concurrent strategies with distinct risk profiles — 0-1 DTE scalps force-closed by 3:45 PM ET
  • Signal scoring composite: IV environment + premium magnitude + vol/OI conviction + sweep flag
  • Pre-signal gating: risk checks before API calls; JSON ledger + CSV audit trail for reproducibility
  • APScheduler cron jobs for EOD sweeps and morning status briefings via Discord webhooks
Python APScheduler Unusual Whales API Kelly Criterion Discord Webhooks

Professional Experience
Where I've Applied It
Methodology highlights from past roles — no proprietary data.
IRS · Voyatek
Senior Data Analyst
Oct 2019 – Jan 2025 · 5 years

Detecting fraudulent tax credit claims across millions of annual filings — Section 45Q carbon capture credits, Section 30D clean vehicle credits, and identity theft patterns in tax returns. The challenge: high-volume structured data mixed with ~50,000 unstructured filing descriptions, under strict data governance constraints.

Impact & Approach
$600K+ in fraudulent claims flagged — inflated sequestration claims, duplicate subsidiary filings, ineligible submissions
ML pipeline in scikit-learn: isolation forest + logistic regression + clustering ensemble; PySpark + Databricks for distributed processing
NLP on 50K+ unstructured filings to surface fraud language patterns; Keras models for text classification
40% reduction in manual reporting overhead via SQL automation (CTEs, window functions) and dbt lineage pipelines
Power BI + Tableau investigator dashboards; reduced pipeline debugging by 30% through dbt documentation and governance
Python PySpark Databricks scikit-learn Keras dbt SQL Hadoop Power BI Tableau
College Board
Junior Data Analyst
Jun 2019 – Sep 2019

Forecasting regional revenue per test administration to support capacity planning, and building ML models to classify unstructured text in datasets used for fraud tagging — an early application of the techniques I would later apply at scale at the IRS.

Impact & Approach
Built regional revenue forecasting models used for test administration capacity planning
TensorFlow text classification models for unstructured fraud-tagging datasets
Containerized models with Docker for reproducible deployment in cloud pipelines
Python TensorFlow Docker Forecasting

Technical Skills
The Stack
Languages
  • Python
  • SQL
  • JavaScript / TypeScript
  • R · Scala · SAS
ML & Data Science
  • scikit-learn
  • Keras · TensorFlow
  • NLP · Anomaly Detection
  • Causal Inference · Forecasting
Big Data & Cloud
  • PySpark · Databricks
  • Hadoop · Spark
  • AWS (S3, Lambda, EC2)
  • Delta Lake
Pipelines & Storage
  • dbt · Airflow
  • Snowflake · Redshift
  • PostgreSQL · Redis
  • ETL Design · Docker
Visualization
  • Tableau · Power BI
  • Matplotlib · Seaborn
  • React (dashboards)
  • Ad Hoc Reporting
Domains
  • Fraud Detection
  • Financial Analytics
  • Quantitative Methods
  • Data Governance

Let's work together.

I'm open to contract engagements and full-time roles in data science and analytics — particularly fraud detection, financial analytics, and applied ML. Based in Alexandria, VA. Remote-first.