Clinical Trial Matching – Center for Applied AI Hub

on March 17, 2025

Summary

Clinical trial matching is crucial to advancing medical treatments and improving patient outcomes, yet it remains a very manual and time-consuming process. There is significant opportunity for AI intervention to streamline this process, improving patient access to relevant trials while reducing the burden on providers.

Matching patients to trials is something that LLMs can do. Preliminary use of AI in clinical trial matching was accomplished through Trial GPT, a proof-of-concept system developed by researchers at NIH. Trial GPT boasted an 87.3% accuracy in matching while reducing screening time by 42.6%. These results are promising, and CAAI aims to build on this innovative work by leveraging recent advancements in LLMs, particularly enhanced reasoning capabilities, to provide more nuanced eligibility explanations and improve matching accuracy.

Given the recent availability of more powerful models that we can run locally, CAAI is equipped with the required tools to make an impact. Our approach builds on TrialGPT’s framework by incorporating advanced reasoning models, namely Deep Seek R1. Initial development has focused on filtering trials by key criteria (location, recruiting status, age, sex), identifying relevant trials, and ranking those interventions based on relevance. The AI agent developed in this project ultimately improves upon the existing trial matching framework through the application of advanced reasoning models to explain eligibility decisions, allowing for better-informed trial recommendations and facilitating human review.

This effort is a collaboration with Markey’s Molecular Tumor Board, and the project is still in its early stages.

Datasets/Models

The project is currently in development, with initial testing conducted using Deep Seek R1, an open-source reasoning model available through LLM Factory.

This work is inspired by TrialGPT, a tool developed by NIH for clinical trial matching (read more about it here: https://www.ncbi.nlm.nih.gov/research/trialgpt/).

To test the automated screening pipeline, we explored synthetic patient personas generated through Synthea, a publicly available repository designed to output synthetic, realistic (but not real), patient data and associated health records in a variety of formats.

The core of the matching process relies on the use of the ClinicalTrials.gov API, providing access to an extensive database of clinical trial information that guides the matching process.

Access

This effort is still in early stages with updates to come. With further development, the goal is to establish and validate a user-friendly system that coordinators and clinicians can seamlessly integrate into their existing workflows to augment their screening processes.

Ownership

The exploratory effort is still being developed.

Resources

The project team includes software developers and our collaborators offering guidance and providing expert feedback. This experimental effort is built with CAAI’s LLM Factory. Costs include development time, compute resources, and GPU powered inference on our LLM Factory server.

Categories:

LLM Project

Tags:

automation large language models