Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size

Recent Invited Talks


Assistant Professor at Harvard University

Starting in December 2019, I will be a tenure-track Assistant Professor at Harvard University, and my laboratory at Harvard Medical School will focus on Machine Learning for Science and Medicine.

I am looking for outstanding students and postdoctoral fellows who would like to join me in transforming science and medicine to data-driven and computationally enabled disciplines. If you are excited about problems in machine learning and/or applications in genomics, medicine, and health and would like to work with me, feel free to contact me directly with a brief description of your research interests and your CV.


Named a Rising Star in Biomedicine

I am honored to be named a Rising Star in Biomedicine by The Broad Institute of Harvard and MIT! I am thrilled to present my research at the Next Generation in Biomedicine Symposium at the Broad.


Named a Rising Star in Electrical Engineering and Computer Science (EECS)

I am both honored and excited to be named a Rising Star in Electrical Engineering and Computer Science by MIT!


To Embed or Not: Network Embedding as a Paradigm in Computational Biology

Current technology is producing high throughput biomedical data at an ever-growing rate. A common approach to interpreting such data is through network-based analyses. Since biological networks are notoriously complex and hard to decipher, a growing body of work applies graph embedding techniques to simplify, visualize, and facilitate the analysis of the resulting networks.

In this review, we survey traditional and new approaches for graph embedding and compare their application to fundamental problems in network biology with using the networks directly. We consider a broad variety of applications including protein network alignment, community detection, and protein function prediction. We find that in all of these domains both types of approaches are of value and their performance depends on the evaluation measures being used and the goal of the project. In particular, network embedding methods outshine direct methods according to some of those measures and are, thus, an essential tool in bioinformatics research.

This is joint work with colleagues from Stanford University, University of Toronto, Vector Institute, and Tel Aviv University.


Submit to ICLR 2019 Workshop on Representation Learning on Graphs and Manifolds

Many scientific fields study data with an underlying graph or manifold structure such as protein networks, sensor networks, and biomedical knowledge graphs. The need for new optimization methods and neural network architectures that can accommodate these relational and non-Euclidean structures is becoming increasingly important.

We are organizing a workshop on Representation Learning on Graphs and Manifolds at the ICLR 2019. We encourage submissions to the workshop on topics related to graph and manifold representation learning.


Stanford News Story on How Species Evolve Ways to Backup Life's Machinery

Spotlight on our study of versatile and robust molecular machinery and evolution of protein interactomes in Stanford Engineering News.


Proceedings of the National Academy of Sciences: Evolution of Molecular Networks

Our paper on evolution of resilience in protein interactomes is published in Proceedings of the National Academy of Sciences (PNAS).

Using protein-protein interaction data that have only recently become available, we composed and analyzed interactome networks from 1,840 species across the tree of life, expanding the number of species from about 5 in previous studies to 1,840. This unique dataset allowed us to conduct the largest ever study of protein interactomes and quantify the resilience of interactomes--a critical property as the breakdown of proteins may lead to cell death or disease.

Our study reveals that evolution leads to more resilient interactomes, providing evidence for a longstanding hypothesis that interactomes evolve favoring robustness against protein failures. We show that a highly resilient interactome has an astonishingly beneficial impact on the organism to survive in complex, variable, and competitive habitats, a finding that draws attention to a previously unknown critical role of evolution in mediating the effects of the interactome on the ability of a species to thrive in specific habitats.


ICLR 2019 Workshop: Representation Learning on Graphs and Manifolds

I am very excited to be co-organizing an ICLR 2019 workshop on Representation Learning on Graphs and Manifolds. We will be having an amazing lineup of invited speakers on a variety of methods and problems in this area! Also, stay tuned for the upcoming call for papers!


Guest Lecture on Graph Convolutional Networks

I have had the opportunity to give a lecture on Graph Convolutional Networks in the CS224W class (Analysis of Networks: Mining and Learning with Graphs) at Stanford.

Here are slides and video of the lecture.


Evolution of Protein Interactomes across the Tree of Life

The interactome network of protein-protein interactions captures the structure of molecular machinery and gives rise to a bewildering degree of life complexity. We composed and analyzed interactome networks from 1,840 species across the tree of life, expanding the number of species from about 5 in previous studies to 1,840. This unique dataset allowed us to conduct the largest ever study of protein interactomes and quantify the resilience of interactomes a critical property as the breakdown of proteins may lead to cell death or disease.

By studying interactomes from 1,840 species across the tree of life, we find that evolution leads to more resilient interactomes, providing evidence for a longstanding hypothesis that interactomes evolve favoring robustness against protein failures. We find that a highly resilient interactome has an astonishingly beneficial impact on the organism to survive in complex, variable, and competitive habitats. Our findings reveal how interactomes change through evolution and how these changes impact their response to environmental unpredictability.


Biomedical Entity Recognition with Deep Multi-Task Learning

We propose a deep multi-task learning approach for biomedical named entity recognition, which is a fundamental task in the mining of biomedical text data. The new approach saves human efforts and frees biomedical experts from the need to painstakingly generate entity features by hand. Furthermore, it achieves excellent performance using only a limited amount of training data. The approach can help scientists to better exploit knowledge buried in vast biomedical literature.

This is joint work with colleagues from Stanford University, University of Southern California, and University of Illinois Urbana-Champaign.


Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities

My review of machine learning for biomedical data integration is now available online in Information Fusion.

This paper is intended for computer scientists and biomedical researchers who are curious about recent developments and applications of machine learning to biology and medicine and its potential for advancing biomedicine given the vast amounts of heterogeneous data being generated today.


BioSNAP Datasets: Stanford Biomedical Network Dataset Collection

We are announcing a repository of biomedical network datasets, BioSNAP Datasets: Stanford Biomedical Network Dataset Collection!

BioSNAP aims to bring biological and medical datasets closer to computer scientists who develop new exciting algorithms. It is often very difficult for computer scientists who typically do not have any background in bioinformatics or biostatistics to obtain and construct high-quality biomedical datasets. Because of that, biomedical datasets are rarely used in ML algorithm development and benchmarking, even though biomedicine is one of the most exciting domains for ML with a unique set of challenges, hard important problems, and huge potential impact. BioSNAP aims to close this gap by providing a number of ready-to-use network datasets.

BioSNAP contains many large biomedical networks that are ready-to-use for method development, algorithm evaluation, benchmarking, and network science analyses. In this first release, BioSNAP has a few tens of network datasets that describe a dozen different entity types (e.g., genes, proteins, cells, drugs, diseases, side-effects, tissues). These datasets can be used for standard prediction tasks (node clustering, link prediction, node classification) as well as relatively new tasks (graph-level classification, multi-relational link prediction, higher-order association prediction). Many datasets contain weighted networks and can be used to define multi-layer/heterogeneous graphs with attributes.

I look forward to seeing more biomedical network data considered in machine learning and data science research.


Nature Communications: General Method to Denoise Biological Networks

Technical noise in experiments is unavoidable, but it introduces inaccuracies into the biological networks we infer from the data.

In this Nature Communications paper, we introduce a diffusion-based method for denoising undirected, weighted networks, and show that it improves the performances of downstream analyses, including prediction of gene functions, interpretation of noisy Hi-C contact maps, and fine-grained identification of species.


Tutorial on Deep Learning for Network Biology at ISMB

We just presented a tutorial on Deep Learning for Network Biology at ISMB 2018 in Chicago, IL, USA. If you are interested in these topics and would like to learn more about graph neural networks and/or their biomedical applications but could not attend the tutorial because it was sold out, check out our tutorial website. All materials, including slides, network tools, examples, and code bases are available for download from the tutorial website.

In this tutorial, we cover the key conceptual foundations of representation learning, from approaches relying on network propagation to very recent advancements in deep representation learning for networks. In addition to a broad high-level overview, we spend a considerable amount of time describing the algorithmic and implementation aspects of recent advancements in deep representation learning and discussing many biomedical applications.


New Survey Paper: Machine Learning for Integrating Data in Biology and Medicine

My new survey paper on machine learning for integrating data in biology and medicine is now online.

In this review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. We also discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.


Nature Communications: Prioritizing Network Communities

Community detection allows one to decompose a network into its building blocks. While communities can be identified with a variety of methods, their relative importance cannot be easily derived.

In this Nature Communications paper, we introduce an algorithm to identify modules which are most promising for further analysis. Our method allows for more efficient evaluation of hypotheses brought forward by the analysis of complex networks and thus speeding-up scientific discovery process in experimental network sciences.


Bioinformatics: What side effects to expect if taking multiple drugs?

Many patients take multiple drugs at the same time to treat complex diseases, such as heart failure, or co-occurring diseases, such as diabetes and epilepsy. The use of combinations of drugs is a common practice. In fact, 25 percent of people ages 65 to 69 take at least five prescription drugs to treat chronic conditions, a figure that jumps to nearly 46 percent for those between 70 and 79.

However, a major consequence of drug combinations for a patient is a much higher risk of side effects. These side effects emerge because of drug-drug interactions, in which activity of one drug may change, favorably or unfavorably, if taken with another drug. These side effects are extremely difficult to identify manually because there are combinatorically many ways in which a given combination of drugs clinically manifests and each combination is valid in only a certain subset of patients. It is also practically impossible to test all possible pairs of drugs and observe side effects in relatively small clinical testing.

In our latest research published in Bioinformatics, we develop an approach for computational screening of drug combinations. The approach predicts what side effects a patient might experience when taking multiple drugs simultaneously.

Technically, this work defines a novel approach that blends deep learning for graphs with network science to achieve benefits from each. See the paperand project website for details!


Submit to Frontiers in Genetics: Single-Cell Data Analytics

I am thrilled about an opportunity to co-edit a research topic on single-cell data analytics, resources, challenges and perspectives for Frontiers in Genetics!

With this research topic, we aim to provide a broad coverage of single-cell data analytic studies.

We encourage contributions in the form of original research articles, short communications, reviews, and perspectives, addressing the major needs and challenges in the single-cell data analytics including (but not limited to): statistical models, algorithms, and software packages to analyze single-cell data; visualization tools for interpreting single-cell data; methods to relate single-cell data with disease classification and prognosis; methods and tools to discover spatial/temporal organization of tissues at a single-cell level; models of cell-cell communication; scalable mathematical and computer-science approaches for analysis of mega-scale single-cell data; methods for combining mixed platform data, noise filtering, and robust normalization.

You are cordially invited to submit your research to the Frontiers in Genetics' single-cell data analytics research topic.


Tutorial on Representation Learning for Network Biology

I am excited to announce that our tutorial on Representation learning for network biology is accepted at ISMB 2018. I will present the tutorial at ISMB 2018 conference in Chicago, IL. Stay tuned for more information and tutorial materials.

Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. This tutorial investigates key advancements in representation learning for networks over the last few years, with an emphasis on fundamentally new opportunities in network biology enabled by these advancements.

Tutorial website:


Graph Convolutional Networks for Computational Pharmacology

Our paper on graph convolutional networks for modeling polypharmacy side effects has been accepted to ISMB conference. Stay tuned for the final version published in Bioinformatics journal.

We describe a general graph convolutional neural network approach for multirelational link prediction in heterogeneous graphs. In computational pharmacology, this approach creates, for the first time, an opportunity to use large molecular, pharmacological, and patient population data to flag and prioritize polypharmacy side effects for follow-up analysis via formal studies.

Project website:


JMM 2018: Invited Talk on Prioritization of Network Communities

I am giving a talk on prioritization of network communities, a framework that enables speeding-up scientific discovery process in experimental network sciences.

It is very exciting to be able to present this challenging and important problem at the Joint Mathematics Meetings conference, in the session on Theory, Practice, and Applications of Graph Clustering.


PSB 2018: Disease Pathways in the Human Interactome

I am giving a talk on large-scale analysis of disease pathways in the human interactome at PSB.

Check out my slidesposter and the paper if interested or want to learn more about disease pathway prediction, learning using biological data, and network biology.


Scalable Matrix Tri-Factorization

In our new paper on accelerating matrix tri-factorization we show how to learn factorized representations that scale well on multi-processor and multi-GPU architectures.

The new approach speeds up computations by more than two orders of magnitude without any loss in accuracy and is especially suitable for large-scale biomedical data analytics.


ECML PKDD Proceedings Online

The third volume of ECML PKDD 2017 proceedings is online, describing state-of-the-art machine learning and data mining systems presented at European conference on machine learning.

I had a great experience co-chairing the demo track.


Guest Lecture on Biological Network Analysis

I am giving a guest lecture on biological network analysis in the CS224W Network Analysis course at Stanford.

The lecture introduces biological networks and their analysis to the CS and engineering students. It describes statistical enrichment tests and several important prediction problems in biology, such as disease pathway detection and gene function prediction. It also explains some of the most successful methods for solving these problems.

Slides and class notes.


Nature Communications: Mapping Biological Functions of NUDIX Enzymes

Our new study published in Nature Communications explores the NUDIX hydrolases in human cells and provides attractive opportunities for expanding the use of this enzyme family as biomarkers and potential novel drug targets. The NUDIX enzymes are involved in several cellular processes, yet their biological role has remained largely unclear.

In a collaborative study with Karolinska InstitutetHelleday Laboratory, Science for Life Laboratory (SciLifeLab)Uppsala UniversityStockholm University, and the Human Protein Atlas we have generated comprehensive data on the individual structural, biochemical and biological properties of 18 human NUDIX proteins, as well as how they relate to and interact with each other.

I am especially happy to see how my machine learning and computational biology methods can help discover new biology! We used my recent methods for data fusion and gene network inference to generate predictions, which we then validated in the wet laboratory. Using these novel algorithms, we integrated all data and created a comprehensive NUDIX enzyme profile map. This map reveals novel insights into substrate selectivity and biological functions of NUDIX hydrolases and poses a platform for expanding the use of NUDIX as biomarkers and potential novel cancer drug targets.

Karolinska Institutet NewsScience for Life Laboratory (SciLifeLab) News, and by News wrote about this project.


PSB 2018: Large-Scale Analysis of Disease Pathways in the Human Interactome

Our paper on large-scale analysis of disease pathways in the human interactome will appear at Pacific Symposium on Biocomputing.

Discovering disease pathways, which can be defined as sets of proteins associated with a given disease, is an important problem that has the potential to provide clinically actionable insights for disease diagnosis, prognosis, and treatment. Computational methods aid the discovery by relying on protein-protein interaction (PPI) networks. They start with a few known disease-associated proteins and aim to find the rest of the pathway by exploring the PPI network around the known disease proteins.

However, the success of such methods has been limited, and failure cases have not been well understood. In the paper we study the PPI network structure of disease pathways. We find that pathways do not correspond to single well-connected components in the PPI network. These results counter one of the most frequently used assumptions in network medicine, which posits that disease pathways are likely to correspond to highly interconnected groups of proteins. Instead, we show that proteins associated with a single disease tend to form many separate connected components/regions in the network.

Furthermore, we show that state-of-the-art disease pathway discovery methods perform especially poorly on diseases with disconnected pathways. These results suggest that integration of disconnected regions of disease proteins into a broader disease pathway will be crucial for a holistic understanding of disease mechanisms.

In addition to new insights into the PPI network connectivity of disease proteins, our analysis leads to important implications for future disease protein discovery that can be summarized as:

  • We move away from modeling disease pathways as highly interlinked regions in the PPI network to modeling them as loosely interlinked and multi-regional objects with two or more regions distributed throughout the PPI network.
  • Higher-order connectivity structure provides a promising direction for disease pathway discovery.

Project website:


ISMB/ECCB 2017: Feature Learning in Multi-layer Tissue Networks

I am giving a talk on feature learning in multi-layer tissue networks and tissue-specific protein function prediction at ISMB/ECCB.

Check out the slidesthe poster and the recorded talk.

  • «
  •  Start 
  •  Prev 
  •  1 
  •  2 
  •  3 
  •  4 
  •  Next 
  •  End 
  • »

Page 1 of 4