Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size

Jozef Stefan Golden Emblem Prize

Last month, I was honored to receive Jozef Stefan Golden Emblem for winning PhD dissertation in the fields of natural sciences, medicine and biotechnology. The prize is awarded by Jozef Stefan Institute.

I look forward to making further progress on machine learning, data mining, and statistical methods research to better understand complex biomedical data systems!


Submit to AIME 2017 Workshop on Advanced Healthcare Analytics

You are cordially invited to submit a paper to the Workshop on Advanced Predictive Models in Healthcare that will take place during the AIME 2017 conference. This workshop will focus on topics related to advanced predictive models, capable of providing actionable and timely insights about health outcomes.


Submit to ECML PKDD 2017

You are cordially invited to submit a paper to the upcoming 2017 ECML PKDD conference.

ECML PKDD is the European Conference on Machine Learning and Knowledge Discovery. It is the largest European conference in these areas that has developed from the European Conference on Machine Learning (ECML) and the European Symposium on Principles of Knowledge Discovery and Data Mining (PKDD).

You are especially invited to consider submitting a paper to the ECML PKDD Demo Track which I am co-chairing this year.


ACM XRDS: The Infinite Mixtures of Food Products

The Fall issue of ACM XRDS is here! In this issue of XRDS, we take a closer look at the marriage of physics and computer science through quantum computing. Quantum computing is a model of computation that breaks with the tradition of digital computers surround us. The issue covers recent advances in the field of quantum computing, such as computer simulation, complexity theory, simulated annealing and machine learning, as well as an in-depth profile of David Deutsch who pioneered the field of quantum computation.

My department contributed a column on the infinite mixture models applied to the problem of clustering food products. Infinite mixture models are useful because they do not impose any a priori bound on the number of clusters in the data. This is in contrast with finite mixture models, which assume a finite and fixed number of clusters that have to be specified before the analysis is started. The column describes infinite mixture models through a generative story and then uses Gibbs sampling to cluster the food facts. It can be seen that the number of clusters detected by the model varies as we feed in more food products. As expected, the model discovers more clusters as more food products arrive. Additionally, results show that detected food clusters have distinct nutritional profiles revealing interesting nutrition patterns.


ISMB 2016: Connecting Gene-Disease Contexts

We presented our recent approach for disease module detection at the ISMB 2016Slides are available. The method is capable of making inference over heterogeneous data collections in new interesting ways! One of them, an approach we call jumping across data contexts, connects entities, such as genes and diseases, through semantically distinct chains, which are estimated by a collective latent variable model.


Bioinformatics: Jumping Across Contexts Using Compressive Fusion

Our paper on Jumping across biomedical contexts using compressive data fusion has just appeared in Bioinformatics. We will present the paper at ISMB 2016 in July 2016.

The rapid growth of diverse biological data allows us to consider interactions between a variety of objects, such as genes, chemicals, molecular signatures, diseases, pathways and environmental exposures. Often, any pair of objects—such as a gene and a disease—can be related in different ways, for example, directly via gene–disease associations or indirectly via functional annotations, chemicals and pathways. In this paper, we show that different ways of relating these objects carry different semantic meanings that are largely ignored by established computational methods.

We present an approach that operates on large-scale heterogeneous data collections and explicitly distinguishes between diverse data semantics. The approach detects size-k modules of objects that, taken together, appear most significant to another set of objects. The method builds on collective matrix factorization to derive different semantics, and it formulates the growing of the modules as a submodular optimization program.

In a systematic study on more than three hundred complex diseases, we show the effectiveness of the approach in associating genes with diseases and detecting disease modules.


ACM XRDS: Cultures of Computing

The Summer issue of ACM XRDS is here! The issue is centered around computing, culture, postcoloniality and questions of power. In it, many fascinating authors ask whether an Anglo-European culture of computing could be made more aware of its politics and what alternative cultures of computing could be realized. Our amazing issue editors, Ahmed Ansari (CMU) and Raghavendra Kandala (CMU), have tried to give the readers a slice of the incredible heterogeneity and plurality of critical scholarship and practice around the world.

The issue provides a brief introduction to decolonial computing and raises various issues around design and innovation in China, participation of Africans in the global HCI community, the life at the forefront of Indonesia's tech emancipation, and plans to develop hundreds of smart cities in India, revealing the complex politics of technological development and class.

Jennifer Jacobs (MIT) and I served as co-editors for the issue.


ACM XRDS: The Brownian Wanderlust of Things

The Spring issue of ACM XRDS is here! This issue is centered around digital fabrication, which in many ways highlights the expanded role of computer in today's society. Digital fabrication is not merely about 3D printing knickknacks, rather it enables individuals to create their own systems and devices using new technologies.

My department contributed a column on the Brownian wanderlust of things. Consider a gambler who starts with an initial fortune and plays the following simple coin tossing game. At each turn, the dealer throws an unbiased coin. If the outcome is head, the gambler wins a unit; if the coin comes up tails, the gambler loses a unit. The gambler continues to play until he is either bankrupted or his current holdings reach some fixed desired amount.

Stochastic models of this kind can have much wider implications than just estimating the fortune of a gambler flipping a coin. For example, the way in which information flows within social media outlets can affect mobilization and strategic interactions between participants of mass social movements, such as protests. While traditionally social movements have spread through on-the-ground unions, the use of communication platforms—such as Twitter and Facebook—has offered alternative ways for organizing such events. As we see in the column, to truly capture propagation in such environments, we need to take into consideration the stochastic nature of information propagation.


Bioinformatics: Orthogonal Factorization of RNA-Binding Proteins

Our paper on integrative analysis of multiple RNA-binding proteins has just appeared in Bioinformatics. RNA binding proteins (RBPs) are important for many cellular processes, including post-transcriptional control of gene expression, splicing, transport, polyadenylation and RNA stability. To better understand the RBP mechanisms we aimed to integrate the rapidly growing RBP experimental data with the latest genome annotation, gene function, RNA sequence and structure.

We have developed an integrative orthogonality-regularized nonnegative matrix factorization that can integrate multiple data sets and discover non-overlapping and class-specific RNA binding patterns of varying strengths. The orthogonality constraint is important here because it enables us to substantially reduce the effective size of inferred factor models.

The new models have proved powerful in predicting RBP interaction sites on RNA. We also showed that joint analysis of multiple data sets can boost retrieval accuracy of RNA binding sites, which we studied using the largest RBP data compendium to date.


ACM XRDS: Activities of Daily Living in the Era of Internet of Things

The Winter issue of ACM XRDS is here! This issue discusses Internet of Things (IoT), a collection of emerging technologies that promise to seamlessly expand our sensing capabilities across the globe with imagination as our only limit. In the issue you can read about the prospect for the IoT as seen by leaders in the field, the challenges of building network awareness, the trends of IoT platforms. You will also find columns about ontology-supported stream reasoning for querying flying robots, the importance of encrypted data in IoT systems, and ways of managing droughts using tech.

My department contributed a column on predicting activities of daily living (ADL) from sensor activation profiles. Together with Lara Zupan we used an open-source data mining tool for visual programming called Orange to analyze patterns of how humans interact with household devices. These interaction patterns provided powerful clues that helped us recognize various activities that take place in home environments.


PSB 2016: Collective Pairwise Classification for Multi-Way Analysis

Our paper on Collective Pairwise Classification for Multi-Way Analysis has been published in the Proceedings of the 21st Pacific Symposium on Biocomputing. We will present the work at the PSB conference in January 2016.

In the paper, we develop a collective pairwise classification approach for multi-way data analysis. The approach leverages the superiority of latent factor models for analyzing large heterogeneous relational data sets and provides probabilistic estimates of relationships by optimizing a pairwise ranking loss. Although the method bears correspondence with the maximization of a non-differentiable area under the receiver operating characteristic curve, we were able to design a learning algorithm that scales well on large multi-relational data.

We used the method to infer relationships from multiplex drug data and to predict connections between clinical manifestations of diseases and their underlying molecular signatures. An appealing property of the method is its ability to make category-jumping inferences, such as predictions about diseases based solely on genomic and clinical data generated far outside the molecular context.

  • «
  •  Start 
  •  Prev 
  •  1 
  •  2 
  •  3 
  •  4 
  •  5 
  •  6 
  •  7 
  •  8 
  •  Next 
  •  End 
  • »

Page 1 of 8