IPP Symposium

Evidence-Based Medicine, the Clinical Data Deluge, and Machine Learning

Thomas Trikalinos, Brown University Professor, Health Services, Policy & Practice

An unprecedented volume of biomedical evidence is being published today. PubMed, the U.S. National Center's for Biotechnology Information portal to MEDLINE, already indexes more than 600,000 publications on clinical trials in humans (and upwards of 22 million articles in total); in 2010, an average of 75 new clinical trial reports were published every day. This volume of literature imposes a substantial burden on practitioners of Evidence-Based Medicine (EBM), which looks to address well-formulated clinical questions by synthesizing the entirety of published relevant evidence (this is called a systematic review). To realize this aim, researchers must be careful to identify all of the (usually only tens) of relevant articles amongst the hundreds of thousands of published clinical trials. Exacerbating this task, the cost of overlooking relevant abstracts is high: in EBM it is imperative that all relevant evidence is included in a given synthesis, else the validity of the review is compromised. As reviews have become more complex and the indexed literature has exploded in volume, the literature identification step has consumed an increasingly expensive amount of time. It is not uncommon for clinical researchers to screen (read through) tens of thousands of biomedical abstracts for a single review.

Using machine learning (i.e., inducing a model to automatically discriminate relevant from irrelevant articles) looks like an attractive option to mitigate this workload. But 'off-the-shelf' technologies are not sufficient to address the unique challenges imposed by the EBM domain. Specifically, for this task new methods are needed to: mitigate the effects of class imbalance during model induction; exploit the wealth of domain knowledge highly skilled domain experts bring to the task; and to induce better models with less effort (fewer labels). In this talk we present novel machine learning methods that address these issues. In particular, we develop new perspectives on class imbalance, novel methods for exploiting dual supervision (i.e., labels on both instances and features), and new active learning techniques that address issues inherent to real-world applications (e.g., exploiting multiple experts in tandem). Each of these contributions aims to squeeze better classification performance out of fewer labels, thereby making better use of domain experts' time and expertise. We demonstrate that the developed methods can reduce reviewer workload by more than half, without sacrificing the comprehensiveness of reviews (i.e., without missing any relevant published evidence).

Tom Trikalinos studied medicine in Greece. In 2006 he joined Joseph Lau's team at Tufts Medical Center in Boston, to conduct research in evidence-based medicine. Currently he directs the Center for Evidence-based Medicine at Brown University -- CEBM for short. CEBM faculty and staff work on novel methodologies for comparative effectiveness research, with emphasis on the steps of evidence synthesis (by means of systematic review and meta-analysis), and evidence contextualization (by means of decision and economic analysis). Tom and his colleagues strive to modernize and optimize the processes of evidence-based medicine by porting methodologies from computer science and applied mathematics.