
I’m a biostatistician who builds AI-powered tools for clinical research. Most of my work lives at the intersection of rigorous statistical methodology and modern software: the kind of problems where the math matters and so does the engineering.
My PhD research at Boston University analyzes over a billion observations of smartwatch and wearable sensor data from the Electronic Framingham Heart Study, studying how digital biomarkers connect to cognitive aging in older adults. At Vertex Pharmaceuticals, I’m building a multi-agent system that automates the Tables, Figures, and Listings pipeline for clinical trial submissions, with statistical programmers driving LLM agents through a Shiny interface. Before the PhD, I spent three years at Boston Children’s Hospital doing applied biostatistics across pediatric cardiology, hematology, and critical care, where I built R packages, deployed survival models on registry data, and co-authored peer-reviewed publications.
Outside of work, I ski, write, and am always looking for dogs to pet.
PhD Research, Boston University Biostatistics Extending super learner methods for survival prediction in complex sampling designs, with applications to longitudinal clinical data. Advised by Dr. Haolin (Leo) Li.
Research Extern at Vertex Pharmaceuticals (Aug 2025 to Present) Building a multi-agent system that automates the Tables, Figures, and Listings (TFL) pipeline for clinical trial submissions. Statistical programmers drive LLM agents through a Shiny interface to handle SDTM mapping, dataset construction, and output generation.
A multi-agent system for automating Tables, Figures, and Listings generation in clinical trials, with statistical programmers driving LLM agents through a Shiny interface.
Aug 2025
PhD research: statistical analysis of large-scale smartwatch and mobile health data (>1 billion observations) from older adults to examine associations between wearable-derived measures and cognitive function.
Sep 2024
Honors thesis: a weighted hypothesis testing framework for differential variability in scRNA-seq data, with R tools to address the mean-variance relationship in zero-inflated count data.
May 2019
Building a multi-agent automation system for Tables, Figures, and Listings (TFL) generation in clinical trials. Statistical programmers interact with LLM agents through a Shiny interface; the agents handle SDTM mapping, analysis dataset construction, and output generation against provided specifications.
Statistical collaborator in the Biostatistics and Research Department, supervised by Dr. Edie Weller. Contributed to published research across pediatric hematology, cardiology, critical care, and COVID-19 outcomes.
Coursework spanning statistical theory and applied methods:
Advanced · Survival analysis · Package development · Shiny
Advanced · Clinical data analysis · Regulatory reporting
Proficient · Data pipelines · ML workflows · LLM tooling
Proficient · Data querying and management
Proficient · Workflow automation · HPC environments
Familiar · Neural networks · Deep learning
KM · Cox · Competing risks · RSF · Mixed-effects models · GEE
Registry outcomes/ EHRs (ELSO) · Large cohort wearables (eFHS)
Outlier diagnostics · Subgroup analyses · Reproducible R scripts
Translating programming notes into automated outputs
Coursework in study design · Simulation-based model evaluation
Working knowledge of SDTM/ADaM structures · Exposure through TFL tooling