Publications

Research work from our associates

Explore our latest, cutting-edge research fueling our AI-powered apps and partner projects.

Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli

A significant obstacle in training predictive cell models is the lack of integrated data sources. We develop semi-supervised normalization pipelines and perform experimental characterization (growth, transcriptional, proteome) to create Ecomics, a consistent, quality-controlled multi-omics compendium for Escherichia coli with cohesive meta-data information. We then use this resource to train a multi-scale model that integrates four omics layers to predict genome-wide concentrations and growth dynamics. The genetic and environmental ontology reconstructed from the omics data is substantially different and complementary to the genetic and chemical ontologies. The integration of different layers confers an incremental increase in the prediction performance, as does the information about the known gene regulatory and protein-protein interactions. The predictive performance of the model ranges from 0.54 to 0.87 for the various omics layers, which far exceeds various baselines. This work provides an integrative framework of omics-driven predictive modelling that is broadly applicable to guide biological discovery.

Compendium of synovial signatures identifies pathologic characteristics for predicting treatment response in rheumatoid arthritis patients

Rheumatoid arthritis (RA) is therapeutically challenging due to patient heterogeneity and variability. Herein we describe a novel integration of RA synovial genome-scale transcriptomic profiling of different patient cohorts that can be used to provide predictive insights on drug responses.
A normalized compendium consisting of 256 RA synovial samples that cover an intersection of 11,769 genes from 11 datasets was build and compared with similar datasets derived from OA patients and healthy controls. Differentially expression genes (DEGs) that were identified in three independent methods were fed into functional network analysis, with subsequent grouping of the samples based on a non-negative matrix factorization method. RA-relevant pathway activation scores and four machine learning classification techniques supported the generation of a predictive model of patient treatment response. We identified 876 up-regulated DEGs including 24 known genetic risk factors and 8 drug targets.

Predicting early risk of chronic kidney disease in cats using routine clinical laboratory tests and machine learning

Background: Advanced machine learning methods combined with large sets of health screening data provide opportunities for diagnostic value in human and veterinary medicine.
Hypothesis/Objectives: To derive a model to predict the risk of cats developing chronic kidney disease (CKD) using data from electronic health records (EHRs) collected during routine veterinary practice.
Animals: A total of 106 251 cats that attended Banfield Pet Hospitals between January 1, 1995, and December 31, 2017.
Methods: Longitudinal EHRs from Banfield Pet Hospitals were extracted and randomly split into 2 parts. The first 67% of the data were used to build a prediction model, which included feature selection and identification of the optimal neural network type and architecture. The remaining unseen EHRs were used to evaluate the model performance.

The computational diet: A review of computational methods across diet, microbiome, and health

Food and human health are inextricably linked. As such, revolutionary impacts on health have been derived from advances in the production and distribution of food relating to food safety and fortification with micronutrients. During the past two decades, it has become apparent that the human microbiome has the potential to modulate health, including in ways that may be related to diet and the composition of specific foods. Despite the excitement and potential surrounding this area, the complexity of the gut microbiome, the chemical composition of food, and their interplay in situ remains a daunting task to fully understand. However, recent advances in high-throughput sequencing, metabolomics profiling, compositional analysis of food, and the emergence of electronic health records provide new sources of data that can contribute to addressing this challenge. Computational science will play an essential role in this effort as it will provide the foundation to integrate these data layers and derive insights capable of revealing and understanding the complex interactions between diet, gut microbiome, and health.

Accelerated knowledge discovery from omics data by optimal experimental design

How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OPEX-guided exploration of Escherichia coli’s populations exposed to biocide and antibiotic combinations lead to more accurate predictive models of gene expression with 44% less data. Analysis of the proposed experiments shows that broad exploration of the experimental space followed by fine-tuning emerges as the optimal strategy. Additionally, analysis of the experimental data reveals 29 cases of cross-stress protection and 4 cases of cross-stress vulnerability. Further validation reveals the central role of chaperones, stress response proteins and transport pumps in cross-stress exposure. This work demonstrates how active learning can be used to guide omics..

Stay in the loop

Subscribe to our newsletter and be the first to learn the latest developments in predictive AI.

Subscribe to our newsletter