Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size

CAMDA 2014: Survival Regression by Data Fusion

E-mail Print PDF

I have presented at CAMDA 2014 an extension of our recent matrix factorization-based data fusion approach that couples data fusion with survival regression. CAMDA 2014 runs as a satellite meeting at ISMB 2014, Boston, MA, USA. Our presentation got CAMDA best presentation award.

Any knowledge discovery could in principal benefit from the fusion of directly or even indirectly related data sources. In this work, we explore if a recently proposed simultaneous matrix factorization data fusion approach could be adapted for survival regression. We propose a new method that jointly infers latent factors by data fusion and estimates regression coefficients of survival model. We have applied the method to CAMDA 2014 large-scale Cancer Genomes Challenge and modeled survival time as a function of gene, protein and miRNA expression data, and data on methylated and mutated regions. We find that both joint inference of factors and regression coefficients on one side and data fusion procedure on the other are crucial for performance. Our approach is substantially more accurate than baseline Aalen's additive model. Latent factors inferred by our approach could be mined further; we found that the most informative factors are related to known cancer processes.

Last Updated on Thursday, 09 July 2015 15:08

Gene network inference by probabilistic scoring of relationships from a factorized model of interactions

E-mail Print PDF

Bioinformatics just published a special issue devoted to ISMB 2014 proceedings papers that will be presented next month at the world's premier conference on computational biology -- ISMB 2014 in Boston, MA, USA.

Our paper, Gene network inference by probabilistic scoring of relationships from a factorized model of interactions, which you will find in this issue of Bioinformatics, describes a conceptually new probabilistic approach to gene network inference from quantitative interaction data called Red. Red is founded on epistasis analysis. Epistasis analysis is an essential tool of classical genetics for inferring the order of function of genes in a common pathway. Typically, it considers single and double mutant phenotypes and for a pair of genes observes if a change in the first gene masks the effects of the mutation in the second gene. Despite the recent emergence of biotechnology techniques that can provide gene interaction data on a large, possibly genomic scale, very few methods are available for quantitative epistasis analysis and epistasis-based network reconstruction.

The features of Red are joint treatment of the mutant phenotype data with a factorized model and probabilistic scoring of pairwise gene relationships that are inferred from the latent gene representation. The resulting gene network is assembled from scored pairwise relationships. In an experimental study, we show that the proposed approach can accurately reconstruct several known pathways and that it surpasses the accuracy of current approaches.

Last Updated on Wednesday, 13 August 2014 05:21

ACM XRDS: Exploring Data with Topological Tools

E-mail Print PDF

The Summer issue of ACM XRDS is here! This issue focuses on diversity in computer science. You will find columns about how to make the tech more inclusive, women in computing, self-teaching and how hip-hop lyrics can be used in combination with artificial intelligence to engage more students in computer science. Also, you should not miss the Features section! There, you will learn, among others, about a research project in Germany that integrates gender and diversity in STEM fields and read about how neuroscience has revealed that we sometimes judge others by their gender or ethnicity without even realizing it. What can be done to address these issues? Check out the ACM XRDS's advice.

For the computationally inspired among you I have contributed a column that describes one of many possible usages of computational topology for exploratory data analysis. Tools from topology increasingly serve to inspire the development of novel computational methods for data analysis. With these methods we can study qualitative geometric information of the data to understand how they are organized on a large scale and focus on intrinsic shape properties rather than on characteristics that depend on a particular choice of a coordinate system. The column applies a topological tool called Mapper to extract and visualize simple descriptions of data sets.

Last Updated on Friday, 21 August 2015 15:01

Young Researcher in the Heidelberg Laureate Forum 2014

E-mail Print PDF

I have been selected to participate as young researcher in the Heidelberg Laureate Forum 2014 (HLF). The Forum will take place in September and will bring together winners of the Abel Prize and Fields Medal (mathematics) as well as the Turing Award and Nevanlinna Prize (computer science) with young researchers from around the world selected by an international committee of experts primarily from the award granting organizations. I was fortunate and was given an opportunity to be one of 200 young researchers (there are 100 spaces for each discipline of mathematics and computer science) that will be part of this Forum.

The HLF is an event inspired by Lindau Nobel Laureates Meetings, which provide a forum where people dedicated to science, both role models and young researchers in physics, chemistry and life sciences, can interact. This event spawned an idea to create something similar for scientific disciplines of mathematics and computer science. The list of participating Laureates is impressive and includes, among others, Manuel Blum, Stephen Cook, Antony Hoare, John Hopcroft, Leslie Lamport, John Torrence Tate and Wendelin Werner. I am looking forward to meet these distinguished experts from both disciplines and learn many new things.

Last Updated on Friday, 21 August 2015 16:06

ACM XRDS: Efficient Sensor Placement for Environmental Monitoring

E-mail Print PDF

The Spring 2014 issue of XRDS: Crossroads, the ACM magazine for students is about cyber-physical systems.

My XRDS department contributed a column on efficient sensor placement for environmental monitoring. The column is about an important problem of observation selection that received considerable research attention in recent years. Consider, for example, the air quality monitoring in a large research lab, the monitoring of algae biomass in a lake or the placement of a network of sensors in a water distribution system for early detection of contaminants. In all these settings we have to decide where to place the sensors in order to effectively collect information about the environment. Since acquiring observations is typically expensive and we have a limited budget, we want to select a small number of most informative locations for monitoring. Thus, we usually trade off the informativeness of sensor measurements for the cost of data acquisition. The column gives an example of large sensor deployment in a research lab and applies tools of submodular optimization to tackle the task effectively with some theoretical performance guarantees of near optimal observation selection.

Last Updated on Friday, 21 August 2015 15:01

@RECOMB 2014, Pittsburgh, PA (Part II)

E-mail Print PDF

We are presenting a poster about our recent data fusion methodology (ArXiv preprint) at RECOMB Conference. Thanks to Prof. Blaz Zupan for the storyline and Prof. Richard H. Kessin for valuable comments. served as an inspiration of poster design (HiRes). See also other post (part I) about our RECOMB paper.

Best Poster Award at RECOMB 2014!




















Last Updated on Sunday, 14 June 2015 10:52

@RECOMB 2014, Pittsburgh, PA (Part I)

E-mail Print PDF

We got accepted a paper on Imputation of Quantitative Genetic Interactions in Epistatic MAPs by Interaction Propagation Matrix Completion to RECOMB 2014.

Epistatic Miniarray Profile (E-MAP) is a popular large-scale gene interaction discovery platform. E-MAPs benefit from quantitative output, which makes it possible to detect subtle interactions. However, due to the limits of biotechnology, E-MAP studies fail to measure genetic interactions for up to 40% of gene pairs in an assay. Missing measurements can be recovered by computational techniques for data imputation, thus completing the interaction profiles and enabling downstream analysis algorithms that could otherwise be sensitive to largely incomplete data sets. In the paper, we introduce a new interaction data imputation method called interaction propagation matrix completion (IP-MC). The core part of IP-MC is a low-rank (latent) probabilistic matrix completion approach that considers additional knowledge presented through a gene network. IP-MC assumes that interactions are transitive, such that latent gene interaction profiles depend on the profiles of their direct neighbors in a given gene network. As the IP-MC inference algorithm progresses, the latent interaction profiles propagate through the branches of the network. In a study with three different E-MAP data assays and the considered protein-protein interaction and Gene Ontology similarity networks, IP-MC significantly surpassed existing alternative techniques. Inclusion of information from gene networks also allows IP-MC to predict interactions for genes that were not included in original E-MAP assays, a task that could not be considered by current imputation approaches.

Presentation is available at Prezi.

Last Updated on Wednesday, 02 April 2014 21:48

@Pacific Symposium on Biocomputing 2014, Hawaii

E-mail Print PDF

I am participating at PSB 2014, Pacific Symposium on Biocomputing, an international conference of current research in the theory and application of computational methods in problems of biological significance, which is held on the Big Island of Hawaii.

We got accepted a paper on Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold to PSB. In the paper, we have examined the applicability of our recently proposed matrix factorization-based data fusion approach on the problem of gene function prediction. We studied three fusion scenarios to demonstrate high accuracy of our approach when learning from disparate, incomplete and noisy data. The studies were successfully carried out for two different organisms, where, for example, the protein-protein interaction network for yeast is nearly complete but it is noisy, whereas the sets of available interactions for slime mold are rather sparse and only about one-tenth of its genes have experimentally derived annotations.

Last Updated on Monday, 07 December 2015 21:17

@Baylor College of Medicine, Department of Molecular and Human Genetics

E-mail Print PDF

Between December 2013 and August 2014 I am visiting the Department of Molecular and Human Genetics at Baylor College of Medicine, Houston, TX, USA. During my stay we will do research on computational methods for data fusion and their applications in systems biology. We will investigate our recently developed data fusion algorithms and applied them to tasks such as gene function prediction, gene ranking (prioritization), missing value imputation, association mining and inference of gene networks from mutant data. I anticipate that large-scale applications of our methods may provide valuable feedback on whether such functionality is useful for biological community and provide new insights into the correspondence between biological and algorithmic concepts.

Last Updated on Sunday, 14 June 2015 10:52

ACM XRDS: On Constructing the Tree of Life

E-mail Print PDF

The Winter 2013 issue of XRDS: Crossroads, the ACM magazine for students features the latest in wearable computing, such as wearable brain computer interface, human motion capturing and tracking how we read, the augmented reality and airwriting. In this issue there is a fascinating insider's look at what a Google technical interview is all about. Check it out!

I contributed a column on constructing, interpreting and visualizing phylogenetic trees, diagrams of relatedness between organisms, species, or genes that show a history of descent from common ancestry. As more and more life sciences data are freely available in public databases, some of the analyses that would have been performed in well-equipped research laboratories just few years ago are nowadays accessible to any interested individual with a commodity computer. Such a shift was only possible due to unprecedented technological and theoretical advancements across a broad spectrum of science and technology. Check it out!

Last Updated on Friday, 21 August 2015 15:00

Press Coverage of Our Recent Study About Connections Between Human Diseases

E-mail Print PDF

BioTechniques, The International Journal of Life Science Methods highlighted our recent paper on Discovering disease-disease associations by fusing systems-level molecular data, which was published by Nature's Scientific Reports. In the paper we applied our novel computational approach for data fusion to a plethora of molecular data in order to discover disease-disease associations.

Complete article featuring our study and a commmentary by paper's senior author prof. Blaz Zupan, PhD are available at BioTechniques site.

Last Updated on Sunday, 30 March 2014 16:37

Page 6 of 10