The way we do medicine is undergoing a revolution driven by technology. As the modern drive to record, share, and analyse data sweeps across society, healthcare lies squarely in its path. Data generated by every-day clinical practice presents an invaluable view of health and disease at a scale previously unimaginable. However, to benefit it, we need computational tools to extract meaning, clinical insight, and actionable predictions. This new digital era of medicine is an opportunity not only for healthcare providers, but also for machine learning researchers to develop new methods tailored to the unique demands of this complex domain. The work described here sits in this sphere.
Firstly, we explore representation learning for medical language. With its long-tailed distribution of technical terms, medical language necessitates development of methods to augment data-scarcity by exploiting prior information encoded in knowledge graphs. Obtaining semantically meaningful representations of medical concepts and their relationships is vital, and we describe a probabilistic model to learn such representations.
Secondly, we address learning from and implicitly representing long time series using recurrent neural networks. These long sequences are commonplace in medicine, where one's health history is necessarily lengthy, but early events nonetheless provide crucial context. To address vanishing and exploding gradients in the training these networks, we propose a novel parametrisation exploiting the correspondence between the Lie group of unitary matrices and its Lie algebra.
Next, a method for generating synthetic ICU time series data is described in the framework of adversarial networks. A core challenge for researchers in healthcare is the scarcity of shareable datasets on which to benchmark. Realistic synthetic data is therefore key. Novel methods for evaluating the quality of this synthetic data are proposed, and the model's privacy and memorisation properties are analysed, both heuristically and in terms of differential privacy.
Finally, an ensemble of gradient-boosted decision trees are employed to identify circulatory system deterioration in Swiss ICU patients. As this system has been developed for deployment, we carefully detail the data processing steps, task specification, and evaluation considerations necessary for a real-world, real-time early warning system driven by machine learning.
|Commitee:||Elemento, Olivier, Fuchs, Thomas, Siepel, Adam|
|School:||Weill Medical College of Cornell University|
|Department:||Computational Biology and Medicine|
|School Location:||United States -- New York|
|Source:||DAI-B 80/07(E), Dissertation Abstracts International|
|Subjects:||Medicine, Artificial intelligence|
|Keywords:||Generative models, Intensive care, Language embedding, Machine learning, Recurrent neural networks|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be