Drafting Evidence-Based Clinical Practice Guidelines with Human-Gated AI

Clinical Evidence Synthesis AI

Developing a clinical practice guideline is a months-long manual effort, and it cannot keep pace with the evidence. This research asks whether AI can draft an evidence-based guideline far faster while staying trustworthy, by implementing the evidence methodology end to end, verifying the model's own output against source data, refusing any quantitative claim it cannot trace, and keeping a human panel in control of every recommendation.

Multi-AgentClinical Practice GuidelinesEvidence SynthesisSource-Grounded GenerationHuman-in-the-LoopCitation ProvenanceEvidence-Based MedicineEvaluation

The crisis

A single clinical practice guideline takes a panel 6–18 months to produce, so guidance lags the evidence it is meant to summarize — and many questions never get a guideline at all.
The methodology that makes a guideline trustworthy is expert-bound and does not scale to the volume of questions clinicians face.
General-purpose LLMs draft fluent guidelines that are confidently wrong: a single hallucinated or direction-flipped effect estimate in a clinical recommendation is a safety failure.
In a clinical setting a recommendation is only usable if every quantitative claim is traceable to its source and a human panel makes the final call — speed without auditability is worthless.

About this research

Developing a clinical practice guideline is a slow, expert-bound process that cannot keep pace with the evidence, and many questions never get a guideline at all. This thread investigates whether the evidence-synthesis workflow can be drafted by an agentic pipeline in a fraction of the time while remaining trustworthy enough for clinical use. The organizing idea is distrust: the model's output is treated as untrusted input, every quantitative claim must trace back to source data, unsupported or direction-flipped estimates are rejected before they reach a clinician, and a human panel gates every recommendation. It is framed throughout as decision support that drafts while the panel decides, never an autonomous guideline author. The work draws on agentic LLM architectures, evidence-synthesis methodology, source-grounded generation, and rigorous evaluation against human-curated ground truth. Faculty-advised.

References

← Previous: Multi-Specialist Evidence Retrieval for Clinical Decision Support Next: Multi-Agent Specialist Debate for Abstract Visual Concept Learning →