Resume
Jonathan Simkin, PhD
Data and analytics leader and PhD epidemiologist. I build and lead across applied AI and ML, population-scale data systems, and the research and reporting they make possible, mostly in healthcare and other high-stakes, regulated settings.
PhD-trained data and analytics leader with 10+ years in regulated, high-stakes healthcare environments. My recent work is applied AI: NLP and language-model systems running in production. It sits on a longer arc of transforming population-scale data systems and turning them into research and reporting people act on. A player-coach: I lead a 20+ person multidisciplinary pod of data scientists, data engineers, and subject matter experts while staying close to model design, evaluation, and deployment. I pair hands-on technical depth with Responsible AI governance, and I translate technical trade-offs into clear decisions for executives.
GenAI & LLMs
Applied ML & NLP
Production ML & MLOps
Statistics & inference
Responsible AI & governance
Cloud & tooling
Leadership & strategy
Director, BC & Yukon Cancer Registries · Provincial Health Services Authority
May 2023 – Present- Lead a 20+ person multidisciplinary pod (data scientists, ML engineers, analysts, domain experts) as a hands-on technical leader, owning roadmap, priorities, and quality standards while staying close to model design and production readiness. Manage a $2.5M operating budget.
- Own end-to-end delivery of classification and predictive ML, putting MLOps practices in place: model versioning, gated promotion from PoC to production, and drift and performance monitoring.
- Stood up Azure cloud environments and Copilot agents for secure AI workflows, partnering with platform teams on identity, security, and deployment for ML and generative AI workloads.
- Established Responsible AI and model risk governance covering bias auditing, fairness evaluation across sex and ethnicity subgroups, human-in-the-loop review, and interpretability. Secured $1.3M+ in competitive funding.
Scientific Director, BC & Yukon Cancer Registries · BC Cancer
Oct 2019 – May 2023- Designed and deployed a transformer-based NLP classification pipeline (PyTorch, HuggingFace) processing 5M+ records per year, replacing legacy rule-based workflows for a 100x throughput improvement in production.
- Architected an ensemble of fine-tuned language models with dual preprocessing pipelines, achieving 98 to 99% recall in production with documented fairness across sex and ethnicity subgroups.
- Built statistical and Bayesian modeling capability for population-level risk estimation and forecasting, including time-series, Poisson, joinpoint, and BYM2 spatial models, supporting resource allocation.
- Led structured model audits when production performance diverged from test metrics, redesigning training data and restoring performance; served as data steward in a privacy-regulated environment.
Cancer Epidemiologist · Government of Yukon
Dec 2015 – Oct 2019- Advised the Chief Medical Officer of Health, delivering predictive analytics and statistical models to inform public health policy and resource allocation.
- Authored the territory's recurring cancer statistics reports and dashboards, turning complex population data into clear recommendations senior decision-makers acted on.
Production NLP classification pipelines · 5M+ records/yr
Led problem framing, fine-tuning of domain-pretrained transformers, ensemble design, evaluation, and production deployment with drift monitoring and human-in-the-loop review. Embedded fairness evaluation across demographic subgroups to meet regulated-environment requirements. Operational since 2023.
Generative AI pilot with Responsible AI governance
Designed and governed a generative AI pilot for a high-priority workload manual processes couldn't keep up with, using retrieval-augmented generation grounded in vetted content to limit hallucination risk. Secured authorization to run generative AI with privileged data access in a privacy-regulated environment.
Real-time diagnostic data platform
Secured $418K and oversaw delivery of a streaming platform that captures every provincial diagnostic imaging report, cutting ingestion latency and enabling ML-based health-system analytics.
Adapting Natural Language Processing Models Across Jurisdictions: A Pilot Study in Canadian Cancer Registries. Simkin J, et al. arXiv. 2026.
Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare. Gondara L, Simkin J, et al. Machine Learning and Knowledge Extraction. 2025.
Classifying Tumor Reportability Status From Unstructured Electronic Pathology Reports Using Language Models. Gondara L, Simkin J, et al. JCO Clinical Cancer Informatics. 2024.
Addressing Inequity in Spatial Access to Lung Cancer Screening. Simkin J, et al. Current Oncology. 2023.
Small Area Disease Mapping of Cancer Incidence Using Bayesian Spatial Models. Simkin J, et al. Frontiers in Oncology. 2022.
Full list on Google Scholar.