Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size

ACM XRDS: Activities of Daily Living in the Era of Internet of Things

The Winter issue of ACM XRDS is here! This issue discusses Internet of Things (IoT), a collection of emerging technologies that promise to seamlessly expand our sensing capabilities across the globe with imagination as our only limit. In the issue you can read about the prospect for the IoT as seen by leaders in the field, the challenges of building network awareness, the trends of IoT platforms. You will also find columns about ontology-supported stream reasoning for querying flying robots, the importance of encrypted data in IoT systems, and ways of managing droughts using tech.

My department contributed a column on predicting activities of daily living (ADL) from sensor activation profiles. Together with Lara Zupan we used an open-source data mining tool for visual programming called Orange to analyze patterns of how humans interact with household devices. These interaction patterns provided powerful clues that helped us recognize various activities that take place in home environments.

 

PSB 2016: Collective Pairwise Classification for Multi-Way Analysis

Our paper on Collective Pairwise Classification for Multi-Way Analysis has been published in the Proceedings of the 21st Pacific Symposium on Biocomputing. We will present the work at the PSB conference in January 2016.

In the paper, we develop a collective pairwise classification approach for multi-way data analysis. The approach leverages the superiority of latent factor models for analyzing large heterogeneous relational data sets and provides probabilistic estimates of relationships by optimizing a pairwise ranking loss. Although the method bears correspondence with the maximization of a non-differentiable area under the receiver operating characteristic curve, we were able to design a learning algorithm that scales well on large multi-relational data.

We used the method to infer relationships from multiplex drug data and to predict connections between clinical manifestations of diseases and their underlying molecular signatures. An appealing property of the method is its ability to make category-jumping inferences, such as predictions about diseases based solely on genomic and clinical data generated far outside the molecular context.

 

ACM XRDS: The Marvel Comic Book Universe

The Fall issue of ACM XRDS is here! In this issue we write about virtual reality. Among others, you can read about the virtual reality revolution and ways to bring virtual reality home. The issue also discusses how to use your own muscles to achieve realistic physical experience, how to manage cybersickness in virtual reality, and how to avoid danger with mine disaster simulations.

My department contributed a column on mining the Marvel comic book universe. Together with Lara Zupan we scraped Wikipedia to obtain information on the Marvel comics characters and then analyzed the structure of the Marvel multiverse network, where two characters were considered linked if they shared a skill set. Here, the analysis of complex networks allowed us to better understand how properties of fictitious networks emerge from non-trivial interactions between characters.

 

BMC Bioinformatics: Extracting Gene Regulatory Networks from Text

Our paper on Sieve-based relation extraction of gene regulatory networks from biological literature has been published in BMC Bioinformatics.

In the paper, we describe a network extraction algorithm, which is an improvement on our winning submission to BioNLP 2013. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. To enable extraction of distant relations we transform the data into skip-mention sequences. We then infer multiple models, each of which is able to extract a particular relationship type (e.g., inhibition, activation, binding). Further analysis following the challenge showed that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. The analysis also showed that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions.

 

PLoS CompBio: Gene Prioritization by Compressive Data Fusion

Our paper on Gene prioritization by compressive data fusion and chaining has been published in PLoS Computational Biology.

In the paper, we present Collage, a new data fusion approach to gene prioritization. Together with collaborators from Baylor College of Medicine, we tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions.

We started from four bacterial response genes and 14 different data sets ranging from gene expression to pathway and literature information. Collage proposed eight candidate genes that were tested in the wet laboratory. Mutations in all eight candidates reduced the ability of the amoebae to grow on Gram-negative bacteria. Furthermore, five out of the eight candidate genes were required for growth on Gram-negative bacteria but had no discernible effect on growth on Gram-positive bacteria. This is a remarkably accurate result since only about a hundred of the 12,000 Dictyostelium genes are estimated to be responsible for bacterial response.

 

Data Fusion Tutorial at the IEEE Engineering in Medicine and Biology

Together with Blaz Zupan we organize a tutorial on data fusion at the International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

In the tutorial, we will explore latent factor models, a popular class of approaches that have in recent years seen many successful applications in integrative data analysis. We will describe the intuition behind matrix factorization and explain why latent factor models are suitable when collectively analyzing many heterogeneous data sets. To practice data fusion, we will construct visual data fusion workflows using Orange and its Data Fusion add-on.

This tutorial would not be possible without the great support by the Bioinformatics Laboratory at University of Ljubljana.

 

ACM XRDS: Understanding Cancer with Matrix Factorization

The Summer issue of ACM XRDS is here! In this issue we write about computational biology. Our features and interviews present different perspectives about some of the most recent advances of computational biology. You can read about personalized medicine and the use of genetic data to improve drug treatment, pharmacogenetics, machine learning techniques for mapping genetic differences to phenotypes in large-scale genome-wide association studies, computational approaches towards prediction of patient outcomes based on electronic health records, statistical techniques for drug discovery, etc. This issue also includes discussions on cutting-edge techniques, such as the analysis of single cell measurement data.

My department contributed a column on mining cancer data with matrix factorization, an established class of algorithms that proved useful in many bioinformatic studies. Diversity and abundance of data provided by the cancer projects like The International Cancer Genome Consortium challenge computer scientists of all kinds to develop innovative software, hardware, and analytic solutions for data analysis. We expect that with computationally and statistically stronger approaches, such as factorization models, we will be once able to reveal biological features that drive cancer development, define cancer types relevant for prognosis, and, ultimately, enable the development of new cancer therapies.

 

ISMB 2015: Gene Network Inference via Data Fusion

Our paper at ISMB 2015 addresses a challenging task of inferring gene networks by taking into consideration potentially many data sets. Importantly, these data sets might be nonidentically distributed and can follow any combination of exponential family distributions. To tackle this challenge we develop an efficient Markov network model that achieves fusion by reusing latent model parameters.

Empirical studies on cancer genome data sets show an advantage of joint inference over separate network inference and the merits of incorporating information about the underlying data distribution into inference.

The slides of the talk are available.

 

ISMB 2015: Integrate Everything but the Kitchen Sink

Our poster at ISMB 2015 is concerned with data set selection and sensitivity estimation in collective factor models.

Molecular biology data is rich in volume as well as heterogeneity. We can view individual data sets as relations between objects of different types, for example, function annotations describe relationships between genes and functions. We represent a large data compendium with a multiscale and multiplex relation graph. Recently, latent factor models were developed to fuse such representations and collectively infer accurate prediction models (Zitnik & Zupan, IEEE TPAMI 2015). Here, we are interested in how changes in one relation (data set) affect the latent model of another relation in the context of a given collective latent factor model. For example, in a user-movie recommendation system, how would a change of casting affect user's movie preferences? In bioinformatics, how would a change in gene expression data influence prediction of gene-disease associations?

We address this challenge by developing an approach to estimate dependence between any two relations within a single run of inference algorithm. Forensic derives from the theory of Frechet derivation and matrix conditioning and can be used with any collective matrix factorization.

See our poster for more details.

 

Compressive Data Fusion and Persistent Homology

E-mail Print PDF

My talk at the Summer School on Computational Topology in Ljubljana, Slovenia was about coupling compressive data fusion methods with algebraic topology, in particular persistent homology. There, I discussed how the latent data space obtained by fusion of heterogeneous biological data sets can be explored with topological methods.

In a case study from molecular biology, which included nearly two dozen data sets, we studied persistence (lifetime) of various topological features, e.g. connected components, loops, voids, tunnels, etc. We showed that significant topological features, i.e. features with long lifetime, also carry biologically relevant information. For example, gene modules with significant topology were enriched for cellular functions and biological processes, and, similarly, persistent drug modules captured the structural similarity between drugs.

The slides of the talk are available.

Last Updated on Thursday, 18 February 2016 07:47
 

Invited Talk on Learning Latent Factor Models by Data Fusion

E-mail Print PDF

Our invited talk at the Workshop on Matrix Computations for Biomedical Informatics at the 15th Conference on Artificial Intelligence in Medicine, AIME in Pavia, Italy, discussed the use of collective latent factor models for various predictive modeling tasks in biomedicine, such as gene prioritization, gene function prediction, network inference and discovery of disease-disease associations.

In the talk given together with Blaz Zupan, we highlighted our recent developments of data fusion approaches via latent factor models.

The slides of the talk are available at Prezi.

Last Updated on Friday, 21 August 2015 16:05
 


Page 3 of 9