Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size

New Survey Paper: Machine Learning for Integrating Data in Biology and Medicine

My new survey paper on machine learning for integrating data in biology and medicine is now online.

In this review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. We also discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.

 

Nature Communications: Prioritizing Network Communities

Community detection allows one to decompose a network into its building blocks. While communities can be identified with a variety of methods, their relative importance cannot be easily derived.

In this Nature Communications paper, we introduce an algorithm to identify modules which are most promising for further analysis. Our method allows for more efficient evaluation of hypotheses brought forward by the analysis of complex networks and thus speeding-up scientific discovery process in experimental network sciences.

 

Bioinformatics: What side effects to expect if taking multiple drugs?

Many patients take multiple drugs at the same time to treat complex diseases, such as heart failure, or co-occurring diseases, such as diabetes and epilepsy. The use of combinations of drugs is a common practice. In fact, 25 percent of people ages 65 to 69 take at least five prescription drugs to treat chronic conditions, a figure that jumps to nearly 46 percent for those between 70 and 79.

However, a major consequence of drug combinations for a patient is a much higher risk of side effects. These side effects emerge because of drug-drug interactions, in which activity of one drug may change, favorably or unfavorably, if taken with another drug. These side effects are extremely difficult to identify manually because there are combinatorically many ways in which a given combination of drugs clinically manifests and each combination is valid in only a certain subset of patients. It is also practically impossible to test all possible pairs of drugs and observe side effects in relatively small clinical testing.

In our latest research published in Bioinformatics, we develop an approach for computational screening of drug combinations. The approach predicts what side effects a patient might experience when taking multiple drugs simultaneously.

Technically, this work defines a novel approach that blends deep learning for graphs with network science to achieve benefits from each. See the paper and project website for details!

 

Submit to Frontiers in Genetics: Single-Cell Data Analytics

I am thrilled about an opportunity to co-edit a research topic on single-cell data analytics, resources, challenges and perspectives for Frontiers in Genetics!

With this research topic, we aim to provide a broad coverage of single-cell data analytic studies.

We encourage contributions in the form of original research articles, short communications, reviews, and perspectives, addressing the major needs and challenges in the single-cell data analytics including (but not limited to): statistical models, algorithms, and software packages to analyze single-cell data; visualization tools for interpreting single-cell data; methods to relate single-cell data with disease classification and prognosis; methods and tools to discover spatial/temporal organization of tissues at a single-cell level; models of cell-cell communication; scalable mathematical and computer-science approaches for analysis of mega-scale single-cell data; methods for combining mixed platform data, noise filtering, and robust normalization.

You are cordially invited to submit your research to the Frontiers in Genetics' single-cell data analytics research topic.

 

Tutorial on Representation Learning for Network Biology

I am excited to announce that our tutorial on Representation learning for network biology is accepted at ISMB 2018. I will present the tutorial at ISMB 2018 conference in Chicago, IL. Stay tuned for more information and tutorial materials.

Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. This tutorial investigates key advancements in representation learning for networks over the last few years, with an emphasis on fundamentally new opportunities in network biology enabled by these advancements.

Tutorial website: http://snap.stanford.edu/deepnetbio-ismb.

 

Graph Convolutional Networks for Computational Pharmacology

Our paper on graph convolutional networks for modeling polypharmacy side effects has been accepted to ISMB conference. Stay tuned for the final version published in Bioinformatics journal.

We describe a general graph convolutional neural network approach for multirelational link prediction in heterogeneous graphs. In computational pharmacology, this approach creates, for the first time, an opportunity to use large molecular, pharmacological, and patient population data to flag and prioritize polypharmacy side effects for follow-up analysis via formal studies.

Project website: http://snap.stanford.edu/decagon.

 

JMM 2018: Invited Talk on Prioritization of Network Communities

I am giving a talk on prioritization of network communities, a framework that enables speeding-up scientific discovery process in experimental network sciences.

It is very exciting to be able to present this challenging and important problem at the Joint Mathematics Meetings conference, in the session on Theory, Practice, and Applications of Graph Clustering.

 

PSB 2018: Disease Pathways in the Human Interactome

I am giving a talk on large-scale analysis of disease pathways in the human interactome at PSB.

Check out my slides, poster and the paper if interested or want to learn more about disease pathway prediction, learning using biological data, and network biology.

 

Scalable Matrix Tri-Factorization

In our new paper on accelerating matrix tri-factorization we show how to learn factorized representations that scale well on multi-processor and multi-GPU architectures.

The new approach speeds up computations by more than two orders of magnitude without any loss in accuracy and is especially suitable for large-scale biomedical data analytics.

 

ECML PKDD Proceedings Online

The third volume of ECML PKDD 2017 proceedings is online, describing state-of-the-art machine learning and data mining systems presented at European conference on machine learning.

I had a great experience co-chairing the demo track.

 

Guest Lecture on Biological Network Analysis

I am giving a guest lecture on biological network analysis in the CS224W Network Analysis course at Stanford.

The lecture introduces biological networks and their analysis to the CS and engineering students. It describes statistical enrichment tests and several important prediction problems in biology, such as disease pathway detection and gene function prediction. It also explains some of the most successful methods for solving these problems.

Slides and class notes.

 

Nature Communications: Mapping Biological Functions of NUDIX Enzymes

Our new study published in Nature Communications explores the NUDIX hydrolases in human cells and provides attractive opportunities for expanding the use of this enzyme family as biomarkers and potential novel drug targets. The NUDIX enzymes are involved in several cellular processes, yet their biological role has remained largely unclear.

In a collaborative study with Karolinska Institutet, Helleday Laboratory, Science for Life Laboratory (SciLifeLab)Uppsala University, Stockholm University, and the Human Protein Atlas we have generated comprehensive data on the individual structural, biochemical and biological properties of 18 human NUDIX proteins, as well as how they relate to and interact with each other.

I am especially happy to see how my machine learning and computational biology methods can help discover new biology! We used my recent methods for data fusion and gene network inference to generate predictions, which we then validated in the wet laboratory. Using these novel algorithms, we integrated all data and created a comprehensive NUDIX enzyme profile map. This map reveals novel insights into substrate selectivity and biological functions of NUDIX hydrolases and poses a platform for expanding the use of NUDIX as biomarkers and potential novel cancer drug targets.

Karolinska Institutet NewsScience for Life Laboratory (SciLifeLab) News, and by Phys.org News wrote about this project.

 

PSB 2018: Large-Scale Analysis of Disease Pathways in the Human Interactome

Our paper on large-scale analysis of disease pathways in the human interactome will appear at Pacific Symposium on Biocomputing.

Discovering disease pathways, which can be defined as sets of proteins associated with a given disease, is an important problem that has the potential to provide clinically actionable insights for disease diagnosis, prognosis, and treatment. Computational methods aid the discovery by relying on protein-protein interaction (PPI) networks. They start with a few known disease-associated proteins and aim to find the rest of the pathway by exploring the PPI network around the known disease proteins.

However, the success of such methods has been limited, and failure cases have not been well understood. In the paper we study the PPI network structure of disease pathways. We find that pathways do not correspond to single well-connected components in the PPI network. These results counter one of the most frequently used assumptions in network medicine, which posits that disease pathways are likely to correspond to highly interconnected groups of proteins. Instead, we show that proteins associated with a single disease tend to form many separate connected components/regions in the network.

Furthermore, we show that state-of-the-art disease pathway discovery methods perform especially poorly on diseases with disconnected pathways. These results suggest that integration of disconnected regions of disease proteins into a broader disease pathway will be crucial for a holistic understanding of disease mechanisms.

In addition to new insights into the PPI network connectivity of disease proteins, our analysis leads to important implications for future disease protein discovery that can be summarized as:

  • We move away from modeling disease pathways as highly interlinked regions in the PPI network to modeling them as loosely interlinked and multi-regional objects with two or more regions distributed throughout the PPI network.
  • Higher-order connectivity structure provides a promising direction for disease pathway discovery.

Project website: http://snap.stanford.edu/pathways.

 

ISMB/ECCB 2017: Feature Learning in Multi-layer Tissue Networks

I am giving a talk on feature learning in multi-layer tissue networks and tissue-specific protein function prediction at ISMB/ECCB.

Check out the slides, the poster and the recorded talk.

 

Understanding Protein Functions in Different Biological Contexts

Our paper on predicting multicellular function through multi-layer tissue networks is published in Bioinformatics and is included in the proceedings of ISMB/ECCB 2017, a premier conference in bioinformatics and computational biology.

Understanding functions of proteins in specific human tissues is essential for insights into disease diagnostics and therapeutics, yet surprisingly little is known about protein functions in different biological contexts, and prediction of tissue-specific function remains a critical challenge in biomedicine.

Our approach OhmNet represents a network-based platform that shifts protein function prediction from flat networks to multiscale models able to predict a range of phenotypes spanning cellular systems.

OhmNet predicts tissue-specific protein functions by representing tissue organization with a rich multiscale tissue hierarchy and by modeling proteins through neural embedding-based representation of a multi-layer network. For the first time, we can systematically pinpoint tissue-specific functions of proteins across more than 100 human tissues. OhmNet accurately predicts protein functions, and also generates actionable hypotheses about protein actions specific to a given biological context.

Project website: http://snap.stanford.edu/ohmnet.

   

Invited Talk on Boosting Biomedical Discovery Through Network Data Analytics

I'm giving an invited talk on speeding-up scientific discovery in biomedicine through computational network analytics at the International Conference for Big Data and AI in Medicine.

 

Jozef Stefan Golden Emblem Prize

I am honored to receive Jozef Stefan Golden Emblem for winning PhD dissertation in the fields of natural sciences, medicine and biotechnology. The prize is awarded by Jozef Stefan Institute.

I look forward to making further progress on machine learning, data mining, and statistical methods research to better understand complex biomedical data systems!

For my Slovenian friends, I wrote a short non-technical column for Jozef Stefan Institute News (in Slovene) on the topic of this work.

 

Submit to AIME 2017 Workshop on Advanced Healthcare Analytics

You are cordially invited to submit a paper to the Workshop on Advanced Predictive Models in Healthcare that will take place during the AIME 2017 conference. This workshop will focus on topics related to advanced predictive models, capable of providing actionable and timely insights about health outcomes.

 

Submit to ECML PKDD 2017

You are cordially invited to submit a paper to the upcoming 2017 ECML PKDD conference.

ECML PKDD is the European Conference on Machine Learning and Knowledge Discovery. It is the largest European conference in these areas that has developed from the European Conference on Machine Learning (ECML) and the European Symposium on Principles of Knowledge Discovery and Data Mining (PKDD).

You are especially invited to consider submitting a paper to the ECML PKDD Demo Track which I am co-chairing this year.

 

ACM XRDS: The Infinite Mixtures of Food Products

The Fall issue of ACM XRDS is here! In this issue of XRDS, we take a closer look at the marriage of physics and computer science through quantum computing. Quantum computing is a model of computation that breaks with the tradition of digital computers surround us. The issue covers recent advances in the field of quantum computing, such as computer simulation, complexity theory, simulated annealing and machine learning, as well as an in-depth profile of David Deutsch who pioneered the field of quantum computation.

My department contributed a column on the infinite mixture models applied to the problem of clustering food products. Infinite mixture models are useful because they do not impose any a priori bound on the number of clusters in the data. This is in contrast with finite mixture models, which assume a finite and fixed number of clusters that have to be specified before the analysis is started. The column describes infinite mixture models through a generative story and then uses Gibbs sampling to cluster the food facts. It can be seen that the number of clusters detected by the model varies as we feed in more food products. As expected, the model discovers more clusters as more food products arrive. Additionally, results show that detected food clusters have distinct nutritional profiles revealing interesting nutrition patterns.

 

ISMB 2016: Connecting Gene-Disease Contexts

We presented our recent approach for disease module detection at the ISMB 2016Slides are available. The method is capable of making inference over heterogeneous data collections in new interesting ways! One of them, an approach we call jumping across data contexts, connects entities, such as genes and diseases, through semantically distinct chains, which are estimated by a collective latent variable model.

 

Bioinformatics: Jumping Across Contexts Using Compressive Fusion

Our paper on Jumping across biomedical contexts using compressive data fusion has just appeared in Bioinformatics. We will present the paper at ISMB 2016 in July 2016.

The rapid growth of diverse biological data allows us to consider interactions between a variety of objects, such as genes, chemicals, molecular signatures, diseases, pathways and environmental exposures. Often, any pair of objects—such as a gene and a disease—can be related in different ways, for example, directly via gene–disease associations or indirectly via functional annotations, chemicals and pathways. In this paper, we show that different ways of relating these objects carry different semantic meanings that are largely ignored by established computational methods.

We present an approach that operates on large-scale heterogeneous data collections and explicitly distinguishes between diverse data semantics. The approach detects size-k modules of objects that, taken together, appear most significant to another set of objects. The method builds on collective matrix factorization to derive different semantics, and it formulates the growing of the modules as a submodular optimization program.

In a systematic study on more than three hundred complex diseases, we show the effectiveness of the approach in associating genes with diseases and detecting disease modules.

 

ACM XRDS: Cultures of Computing

The Summer issue of ACM XRDS is here! The issue is centered around computing, culture, postcoloniality and questions of power. In it, many fascinating authors ask whether an Anglo-European culture of computing could be made more aware of its politics and what alternative cultures of computing could be realized. Our amazing issue editors, Ahmed Ansari (CMU) and Raghavendra Kandala (CMU), have tried to give the readers a slice of the incredible heterogeneity and plurality of critical scholarship and practice around the world.

The issue provides a brief introduction to decolonial computing and raises various issues around design and innovation in China, participation of Africans in the global HCI community, the life at the forefront of Indonesia's tech emancipation, and plans to develop hundreds of smart cities in India, revealing the complex politics of technological development and class.

Jennifer Jacobs (MIT) and I served as co-editors for the issue.

 

ACM XRDS: The Brownian Wanderlust of Things

The Spring issue of ACM XRDS is here! This issue is centered around digital fabrication, which in many ways highlights the expanded role of computer in today's society. Digital fabrication is not merely about 3D printing knickknacks, rather it enables individuals to create their own systems and devices using new technologies.

My department contributed a column on the Brownian wanderlust of things. Consider a gambler who starts with an initial fortune and plays the following simple coin tossing game. At each turn, the dealer throws an unbiased coin. If the outcome is head, the gambler wins a unit; if the coin comes up tails, the gambler loses a unit. The gambler continues to play until he is either bankrupted or his current holdings reach some fixed desired amount.

Stochastic models of this kind can have much wider implications than just estimating the fortune of a gambler flipping a coin. For example, the way in which information flows within social media outlets can affect mobilization and strategic interactions between participants of mass social movements, such as protests. While traditionally social movements have spread through on-the-ground unions, the use of communication platforms—such as Twitter and Facebook—has offered alternative ways for organizing such events. As we see in the column, to truly capture propagation in such environments, we need to take into consideration the stochastic nature of information propagation.

 

Bioinformatics: Orthogonal Factorization of RNA-Binding Proteins

Our paper on integrative analysis of multiple RNA-binding proteins has just appeared in Bioinformatics. RNA binding proteins (RBPs) are important for many cellular processes, including post-transcriptional control of gene expression, splicing, transport, polyadenylation and RNA stability. To better understand the RBP mechanisms we aimed to integrate the rapidly growing RBP experimental data with the latest genome annotation, gene function, RNA sequence and structure.

We have developed an integrative orthogonality-regularized nonnegative matrix factorization that can integrate multiple data sets and discover non-overlapping and class-specific RNA binding patterns of varying strengths. The orthogonality constraint is important here because it enables us to substantially reduce the effective size of inferred factor models.

The new models have proved powerful in predicting RBP interaction sites on RNA. We also showed that joint analysis of multiple data sets can boost retrieval accuracy of RNA binding sites, which we studied using the largest RBP data compendium to date.

 

ACM XRDS: Activities of Daily Living in the Era of Internet of Things

The Winter issue of ACM XRDS is here! This issue discusses Internet of Things (IoT), a collection of emerging technologies that promise to seamlessly expand our sensing capabilities across the globe with imagination as our only limit. In the issue you can read about the prospect for the IoT as seen by leaders in the field, the challenges of building network awareness, the trends of IoT platforms. You will also find columns about ontology-supported stream reasoning for querying flying robots, the importance of encrypted data in IoT systems, and ways of managing droughts using tech.

My department contributed a column on predicting activities of daily living (ADL) from sensor activation profiles. Together with Lara Zupan we used an open-source data mining tool for visual programming called Orange to analyze patterns of how humans interact with household devices. These interaction patterns provided powerful clues that helped us recognize various activities that take place in home environments.

 

PSB 2016: Collective Pairwise Classification for Multi-Way Analysis

Our paper on Collective Pairwise Classification for Multi-Way Analysis has been published in the Proceedings of the 21st Pacific Symposium on Biocomputing. We will present the work at the PSB conference in January 2016.

In the paper, we develop a collective pairwise classification approach for multi-way data analysis. The approach leverages the superiority of latent factor models for analyzing large heterogeneous relational data sets and provides probabilistic estimates of relationships by optimizing a pairwise ranking loss. Although the method bears correspondence with the maximization of a non-differentiable area under the receiver operating characteristic curve, we were able to design a learning algorithm that scales well on large multi-relational data.

We used the method to infer relationships from multiplex drug data and to predict connections between clinical manifestations of diseases and their underlying molecular signatures. An appealing property of the method is its ability to make category-jumping inferences, such as predictions about diseases based solely on genomic and clinical data generated far outside the molecular context.

 

ACM XRDS: The Marvel Comic Book Universe

The Fall issue of ACM XRDS is here! In this issue we write about virtual reality. Among others, you can read about the virtual reality revolution and ways to bring virtual reality home. The issue also discusses how to use your own muscles to achieve realistic physical experience, how to manage cybersickness in virtual reality, and how to avoid danger with mine disaster simulations.

My department contributed a column on mining the Marvel comic book universe. Together with Lara Zupan we scraped Wikipedia to obtain information on the Marvel comics characters and then analyzed the structure of the Marvel multiverse network, where two characters were considered linked if they shared a skill set. Here, the analysis of complex networks allowed us to better understand how properties of fictitious networks emerge from non-trivial interactions between characters.

 

BMC Bioinformatics: Extracting Gene Regulatory Networks from Text

Our paper on Sieve-based relation extraction of gene regulatory networks from biological literature has been published in BMC Bioinformatics.

In the paper, we describe a network extraction algorithm, which is an improvement on our winning submission to BioNLP 2013. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. To enable extraction of distant relations we transform the data into skip-mention sequences. We then infer multiple models, each of which is able to extract a particular relationship type (e.g., inhibition, activation, binding). Further analysis following the challenge showed that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. The analysis also showed that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions.

 

PLoS CompBio: Gene Prioritization by Compressive Data Fusion

Our paper on Gene prioritization by compressive data fusion and chaining has been published in PLoS Computational Biology.

In the paper, we present Collage, a new data fusion approach to gene prioritization. Together with collaborators from Baylor College of Medicine, we tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions.

We started from four bacterial response genes and 14 different data sets ranging from gene expression to pathway and literature information. Collage proposed eight candidate genes that were tested in the wet laboratory. Mutations in all eight candidates reduced the ability of the amoebae to grow on Gram-negative bacteria. Furthermore, five out of the eight candidate genes were required for growth on Gram-negative bacteria but had no discernible effect on growth on Gram-positive bacteria. This is a remarkably accurate result since only about a hundred of the 12,000 Dictyostelium genes are estimated to be responsible for bacterial response.

 
  • «
  •  Start 
  •  Prev 
  •  1 
  •  2 
  •  3 
  •  4 
  •  Next 
  •  End 
  • »


Page 1 of 4