Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size

Recent Talks

- 08/04/2018: Tech Summit SYNC 2018 (Computer History Museum, Mountain View, CA)

- 08/22/2018: AI in Medicine: Inclusion & Equity (AiMIE) Symposium 2018 (Stanford University, CA)

- 09/04/2018: National Guest Scholar at Stanford CERC (Stanford University, CA)

- 09/19/2018: EMBL-EBI Workshop on Machine Learning in Drug Discovery and Precision Medicine (EMBL-EBI, Hinxton, UK)

- 10/04/2018: The Forum on Drug Discovery, Development, and TranslationThe National Academies of Sciences (Washington, DC)

- 10/28-30/2018: Rising Stars in EECSMIT Department of Electrical Engineering and Computer Science (Cambridge, MA)

- 11/05-06/2018: Next Generation Symposium, Rising Stars in Biomedicine, Broad Institute of MIT and Harvard (Cambridge, MA)


Biomedical Entity Recognition with Deep Multi-Task Learning

We propose a deep multi-task learning approach for biomedical named entity recognition, which is a fundamental task in the mining of biomedical text data. The new approach saves human efforts and frees biomedical experts from the need to painstakingly generate entity features by hand. Furthermore, it achieves excellent performance using only a limited amount of training data.

The approach can help scientists to better exploit knowledge buried in vast biomedical literature. I have enjoyed working on this project with researchers from Stanford, USC, and UIUC.


Named a Rising Star in Biomedicine

I am honored to be named a Rising Star in Biomedicine by The Broad Institute of Harvard and MIT, September 2018!


Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities

My review of machine learning for biomedical data integration is now available online in Information Fusion.

This paper is intended for computer scientists and biomedical researchers who are curious about recent developments and applications of machine learning to biology and medicine and its potential for advancing biomedicine given the vast amounts of heterogeneous data being generated today.


BioSNAP Datasets: Stanford Biomedical Network Dataset Collection

We are announcing a repository of biomedical network datasets, BioSNAP Datasets: Stanford Biomedical Network Dataset Collection!

BioSNAP aims to bring biological and medical datasets closer to computer scientists who develop new exciting algorithms. It is often very difficult for computer scientists who typically do not have any background in bioinformatics or biostatistics to obtain and construct high-quality biomedical datasets. Because of that, biomedical datasets are rarely used in ML algorithm development and benchmarking, even though biomedicine is one of the most exciting domains for ML with a unique set of challenges, hard important problems, and huge potential impact. BioSNAP aims to close this gap by providing a number of ready-to-use network datasets.

BioSNAP contains many large biomedical networks that are ready-to-use for method development, algorithm evaluation, benchmarking, and network science analyses. In this first release, BioSNAP has a few tens of network datasets that describe a dozen different entity types (e.g., genes, proteins, cells, drugs, diseases, side-effects, tissues). These datasets can be used for standard prediction tasks (node clustering, link prediction, node classification) as well as relatively new tasks (graph-level classification, multi-relational link prediction, higher-order association prediction). Many datasets contain weighted networks and can be used to define multi-layer/heterogeneous graphs with attributes.

I look forward to seeing more biomedical network data considered in machine learning and data science research.


Nature Communications: General Method to Denoise Biological Networks

Technical noise in experiments is unavoidable, but it introduces inaccuracies into the biological networks we infer from the data.

In this Nature Communications paper, we introduce a diffusion-based method for denoising undirected, weighted networks, and show that it improves the performances of downstream analyses, including prediction of gene functions, interpretation of noisy Hi-C contact maps, and fine-grained identification of species.


Tutorial on Deep Learning for Network Biology at ISMB

We just presented a tutorial on Deep Learning for Network Biology at ISMB 2018 in Chicago, IL, USA. If you are interested in these topics and would like to learn more about graph neural networks and/or their biomedical applications but could not attend the tutorial because it was sold out, check out our tutorial website. All materials, including slides, network tools, examples, and code bases are available for download from the tutorial website.

In this tutorial, we cover the key conceptual foundations of representation learning, from approaches relying on network propagation to very recent advancements in deep representation learning for networks. In addition to a broad high-level overview, we spend a considerable amount of time describing the algorithmic and implementation aspects of recent advancements in deep representation learning and discussing many biomedical applications.


New Survey Paper: Machine Learning for Integrating Data in Biology and Medicine

My new survey paper on machine learning for integrating data in biology and medicine is now online.

In this review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. We also discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.


Nature Communications: Prioritizing Network Communities

Community detection allows one to decompose a network into its building blocks. While communities can be identified with a variety of methods, their relative importance cannot be easily derived.

In this Nature Communications paper, we introduce an algorithm to identify modules which are most promising for further analysis. Our method allows for more efficient evaluation of hypotheses brought forward by the analysis of complex networks and thus speeding-up scientific discovery process in experimental network sciences.


Bioinformatics: What side effects to expect if taking multiple drugs?

Many patients take multiple drugs at the same time to treat complex diseases, such as heart failure, or co-occurring diseases, such as diabetes and epilepsy. The use of combinations of drugs is a common practice. In fact, 25 percent of people ages 65 to 69 take at least five prescription drugs to treat chronic conditions, a figure that jumps to nearly 46 percent for those between 70 and 79.

However, a major consequence of drug combinations for a patient is a much higher risk of side effects. These side effects emerge because of drug-drug interactions, in which activity of one drug may change, favorably or unfavorably, if taken with another drug. These side effects are extremely difficult to identify manually because there are combinatorically many ways in which a given combination of drugs clinically manifests and each combination is valid in only a certain subset of patients. It is also practically impossible to test all possible pairs of drugs and observe side effects in relatively small clinical testing.

In our latest research published in Bioinformatics, we develop an approach for computational screening of drug combinations. The approach predicts what side effects a patient might experience when taking multiple drugs simultaneously.

Technically, this work defines a novel approach that blends deep learning for graphs with network science to achieve benefits from each. See the paper and project website for details!


Named a Rising Star in EECS

I am both honored and excited to be named a Rising Star in Electrical Engineering and Computer Science by MIT, June 2018!


Submit to Frontiers in Genetics: Single-Cell Data Analytics

I am thrilled about an opportunity to co-edit a research topic on single-cell data analytics, resources, challenges and perspectives for Frontiers in Genetics!

With this research topic, we aim to provide a broad coverage of single-cell data analytic studies.

We encourage contributions in the form of original research articles, short communications, reviews, and perspectives, addressing the major needs and challenges in the single-cell data analytics including (but not limited to): statistical models, algorithms, and software packages to analyze single-cell data; visualization tools for interpreting single-cell data; methods to relate single-cell data with disease classification and prognosis; methods and tools to discover spatial/temporal organization of tissues at a single-cell level; models of cell-cell communication; scalable mathematical and computer-science approaches for analysis of mega-scale single-cell data; methods for combining mixed platform data, noise filtering, and robust normalization.

You are cordially invited to submit your research to the Frontiers in Genetics' single-cell data analytics research topic.


Tutorial on Representation Learning for Network Biology

I am excited to announce that our tutorial on Representation learning for network biology is accepted at ISMB 2018. I will present the tutorial at ISMB 2018 conference in Chicago, IL. Stay tuned for more information and tutorial materials.

Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. This tutorial investigates key advancements in representation learning for networks over the last few years, with an emphasis on fundamentally new opportunities in network biology enabled by these advancements.

Tutorial website:


Graph Convolutional Networks for Computational Pharmacology

Our paper on graph convolutional networks for modeling polypharmacy side effects has been accepted to ISMB conference. Stay tuned for the final version published in Bioinformatics journal.

We describe a general graph convolutional neural network approach for multirelational link prediction in heterogeneous graphs. In computational pharmacology, this approach creates, for the first time, an opportunity to use large molecular, pharmacological, and patient population data to flag and prioritize polypharmacy side effects for follow-up analysis via formal studies.

Project website:


JMM 2018: Invited Talk on Prioritization of Network Communities

I am giving a talk on prioritization of network communities, a framework that enables speeding-up scientific discovery process in experimental network sciences.

It is very exciting to be able to present this challenging and important problem at the Joint Mathematics Meetings conference, in the session on Theory, Practice, and Applications of Graph Clustering.


PSB 2018: Disease Pathways in the Human Interactome

I am giving a talk on large-scale analysis of disease pathways in the human interactome at PSB.

Check out my slides, poster and the paper if interested or want to learn more about disease pathway prediction, learning using biological data, and network biology.


Scalable Matrix Tri-Factorization

In our new paper on accelerating matrix tri-factorization we show how to learn factorized representations that scale well on multi-processor and multi-GPU architectures.

The new approach speeds up computations by more than two orders of magnitude without any loss in accuracy and is especially suitable for large-scale biomedical data analytics.


ECML PKDD Proceedings Online

The third volume of ECML PKDD 2017 proceedings is online, describing state-of-the-art machine learning and data mining systems presented at European conference on machine learning.

I had a great experience co-chairing the demo track.


Guest Lecture on Biological Network Analysis

I am giving a guest lecture on biological network analysis in the CS224W Network Analysis course at Stanford.

The lecture introduces biological networks and their analysis to the CS and engineering students. It describes statistical enrichment tests and several important prediction problems in biology, such as disease pathway detection and gene function prediction. It also explains some of the most successful methods for solving these problems.

Slides and class notes.


Nature Communications: Mapping Biological Functions of NUDIX Enzymes

Our new study published in Nature Communications explores the NUDIX hydrolases in human cells and provides attractive opportunities for expanding the use of this enzyme family as biomarkers and potential novel drug targets. The NUDIX enzymes are involved in several cellular processes, yet their biological role has remained largely unclear.

In a collaborative study with Karolinska Institutet, Helleday Laboratory, Science for Life Laboratory (SciLifeLab)Uppsala University, Stockholm University, and the Human Protein Atlas we have generated comprehensive data on the individual structural, biochemical and biological properties of 18 human NUDIX proteins, as well as how they relate to and interact with each other.

I am especially happy to see how my machine learning and computational biology methods can help discover new biology! We used my recent methods for data fusion and gene network inference to generate predictions, which we then validated in the wet laboratory. Using these novel algorithms, we integrated all data and created a comprehensive NUDIX enzyme profile map. This map reveals novel insights into substrate selectivity and biological functions of NUDIX hydrolases and poses a platform for expanding the use of NUDIX as biomarkers and potential novel cancer drug targets.

Karolinska Institutet NewsScience for Life Laboratory (SciLifeLab) News, and by News wrote about this project.


PSB 2018: Large-Scale Analysis of Disease Pathways in the Human Interactome

Our paper on large-scale analysis of disease pathways in the human interactome will appear at Pacific Symposium on Biocomputing.

Discovering disease pathways, which can be defined as sets of proteins associated with a given disease, is an important problem that has the potential to provide clinically actionable insights for disease diagnosis, prognosis, and treatment. Computational methods aid the discovery by relying on protein-protein interaction (PPI) networks. They start with a few known disease-associated proteins and aim to find the rest of the pathway by exploring the PPI network around the known disease proteins.

However, the success of such methods has been limited, and failure cases have not been well understood. In the paper we study the PPI network structure of disease pathways. We find that pathways do not correspond to single well-connected components in the PPI network. These results counter one of the most frequently used assumptions in network medicine, which posits that disease pathways are likely to correspond to highly interconnected groups of proteins. Instead, we show that proteins associated with a single disease tend to form many separate connected components/regions in the network.

Furthermore, we show that state-of-the-art disease pathway discovery methods perform especially poorly on diseases with disconnected pathways. These results suggest that integration of disconnected regions of disease proteins into a broader disease pathway will be crucial for a holistic understanding of disease mechanisms.

In addition to new insights into the PPI network connectivity of disease proteins, our analysis leads to important implications for future disease protein discovery that can be summarized as:

  • We move away from modeling disease pathways as highly interlinked regions in the PPI network to modeling them as loosely interlinked and multi-regional objects with two or more regions distributed throughout the PPI network.
  • Higher-order connectivity structure provides a promising direction for disease pathway discovery.

Project website:


ISMB/ECCB 2017: Feature Learning in Multi-layer Tissue Networks

I am giving a talk on feature learning in multi-layer tissue networks and tissue-specific protein function prediction at ISMB/ECCB.

Check out the slides, the poster and the recorded talk.


Understanding Protein Functions in Different Biological Contexts

Our paper on predicting multicellular function through multi-layer tissue networks is published in Bioinformatics and is included in the proceedings of ISMB/ECCB 2017, a premier conference in bioinformatics and computational biology.

Understanding functions of proteins in specific human tissues is essential for insights into disease diagnostics and therapeutics, yet surprisingly little is known about protein functions in different biological contexts, and prediction of tissue-specific function remains a critical challenge in biomedicine.

Our approach OhmNet represents a network-based platform that shifts protein function prediction from flat networks to multiscale models able to predict a range of phenotypes spanning cellular systems.

OhmNet predicts tissue-specific protein functions by representing tissue organization with a rich multiscale tissue hierarchy and by modeling proteins through neural embedding-based representation of a multi-layer network. For the first time, we can systematically pinpoint tissue-specific functions of proteins across more than 100 human tissues. OhmNet accurately predicts protein functions, and also generates actionable hypotheses about protein actions specific to a given biological context.

Project website:


Invited Talk on Boosting Biomedical Discovery Through Network Data Analytics

I'm giving an invited talk on speeding-up scientific discovery in biomedicine through computational network analytics at the International Conference for Big Data and AI in Medicine.


Jozef Stefan Golden Emblem Prize

I am honored to receive Jozef Stefan Golden Emblem for winning PhD dissertation in the fields of natural sciences, medicine and biotechnology. The prize is awarded by Jozef Stefan Institute.

I look forward to making further progress on machine learning, data mining, and statistical methods research to better understand complex biomedical data systems!

For my Slovenian friends, I wrote a short non-technical column for Jozef Stefan Institute News (in Slovene) on the topic of this work.


Submit to AIME 2017 Workshop on Advanced Healthcare Analytics

You are cordially invited to submit a paper to the Workshop on Advanced Predictive Models in Healthcare that will take place during the AIME 2017 conference. This workshop will focus on topics related to advanced predictive models, capable of providing actionable and timely insights about health outcomes.


Submit to ECML PKDD 2017

You are cordially invited to submit a paper to the upcoming 2017 ECML PKDD conference.

ECML PKDD is the European Conference on Machine Learning and Knowledge Discovery. It is the largest European conference in these areas that has developed from the European Conference on Machine Learning (ECML) and the European Symposium on Principles of Knowledge Discovery and Data Mining (PKDD).

You are especially invited to consider submitting a paper to the ECML PKDD Demo Track which I am co-chairing this year.


ACM XRDS: The Infinite Mixtures of Food Products

The Fall issue of ACM XRDS is here! In this issue of XRDS, we take a closer look at the marriage of physics and computer science through quantum computing. Quantum computing is a model of computation that breaks with the tradition of digital computers surround us. The issue covers recent advances in the field of quantum computing, such as computer simulation, complexity theory, simulated annealing and machine learning, as well as an in-depth profile of David Deutsch who pioneered the field of quantum computation.

My department contributed a column on the infinite mixture models applied to the problem of clustering food products. Infinite mixture models are useful because they do not impose any a priori bound on the number of clusters in the data. This is in contrast with finite mixture models, which assume a finite and fixed number of clusters that have to be specified before the analysis is started. The column describes infinite mixture models through a generative story and then uses Gibbs sampling to cluster the food facts. It can be seen that the number of clusters detected by the model varies as we feed in more food products. As expected, the model discovers more clusters as more food products arrive. Additionally, results show that detected food clusters have distinct nutritional profiles revealing interesting nutrition patterns.


ISMB 2016: Connecting Gene-Disease Contexts

We presented our recent approach for disease module detection at the ISMB 2016Slides are available. The method is capable of making inference over heterogeneous data collections in new interesting ways! One of them, an approach we call jumping across data contexts, connects entities, such as genes and diseases, through semantically distinct chains, which are estimated by a collective latent variable model.

  • «
  •  Start 
  •  Prev 
  •  1 
  •  2 
  •  3 
  •  4 
  •  Next 
  •  End 
  • »

Page 1 of 4