Invention: ML Pipeline for Genome-Wide Association Studies (Patent)
Published:

This project represents a major technical milestone at Illumina, resulting in a published patent application for a novel machine learning pipeline.
The Problem
Genome-wide association studies (GWAS) are powerful for detecting variants associated with disease risk. However, transitioning from “associated variants” to identifying the actual causal genes is a notoriously difficult bottleneck in precision medicine.
The Solution
We developed a machine learning framework that predicts causal genes directly from GWAS summary statistics. By utilizing advanced ML techniques, this approach substantially improves gene identification performance in terms of both precision and recall compared to traditional methods.
Key Contributions:
- Pipeline Architecture: Designed and implemented the reproducible ML pipelines on HPC and AWS infrastructure.
- Scalability: Optimized the system to handle the massive datasets characteristic of modern genomics.
- Innovation: Co-authored the underlying methodology described in the patent application.
Patent Details:
- Title: Machine learning pipeline for genome-wide association studies
- Publication No: US20240120024A1
- Assignee: Illumina Inc.
