Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size

@Baylor College of Medicine, Department of Molecular and Human Genetics

E-mail Print PDF

Between December 2013 and August 2014 I am visiting the Department of Molecular and Human Genetics at Baylor College of Medicine, Houston, TX, USA. During my stay we will do research on computational methods for data fusion and their applications in systems biology. We will investigate our recently developed data fusion algorithms and applied them to tasks such as gene function prediction, gene ranking (prioritization), missing value imputation, association mining and inference of gene networks from mutant data. I anticipate that large-scale applications of our methods may provide valuable feedback on whether such functionality is useful for biological community and provide new insights into the correspondence between biological and algorithmic concepts.

Last Updated on Sunday, 14 June 2015 10:52

ACM XRDS: On Constructing the Tree of Life

E-mail Print PDF

The Winter 2013 issue of XRDS: Crossroads, the ACM magazine for students features the latest in wearable computing, such as wearable brain computer interface, human motion capturing and tracking how we read, the augmented reality and airwriting. In this issue there is a fascinating insider's look at what a Google technical interview is all about. Check it out!

I contributed a column on constructing, interpreting and visualizing phylogenetic trees, diagrams of relatedness between organisms, species, or genes that show a history of descent from common ancestry. As more and more life sciences data are freely available in public databases, some of the analyses that would have been performed in well-equipped research laboratories just few years ago are nowadays accessible to any interested individual with a commodity computer. Such a shift was only possible due to unprecedented technological and theoretical advancements across a broad spectrum of science and technology. Check it out!

Last Updated on Friday, 21 August 2015 15:00

Press Coverage of Our Recent Study About Connections Between Human Diseases

E-mail Print PDF

BioTechniques, The International Journal of Life Science Methods highlighted our recent paper on Discovering disease-disease associations by fusing systems-level molecular data, which was published by Nature's Scientific Reports. In the paper we applied our novel computational approach for data fusion to a plethora of molecular data in order to discover disease-disease associations.

Complete article featuring our study and a commmentary by paper's senior author prof. Blaz Zupan, PhD are available at BioTechniques site.

Last Updated on Sunday, 30 March 2014 16:37

Discovering Disease-Disease Associations by Fusing Molecular Data

E-mail Print PDF

Nature's Scientific Reports has published our latest paper on data fusion, Discovering disease-disease associations by fusing systems-level molecular data, in which we combine various sources of biological information to discover human disease-disease associations.

The advent of genome-scale genetic and genomic studies allows new insight into disease classification. Recently, a shift was made from linking diseases simply based on their shared genes towards systems-level integration of molecular data. We aim to find relationships between diseases based on evidence from fusing all available molecular interaction and ontology data. We propose a multi-level hierarchy of disease classes that significantly overlaps with existing disease classification. In it, we find 14 disease-disease associations currently not present in Disease Ontology and provide evidence for their relationships through comorbidity data and literature curation. Interestingly, even though the number of known human genetic interactions is currently very small, we find they are the most important predictor of a link between diseases. Finally, we show that omission of any one of the included data sources reduces prediction quality, further highlighting the importance in the paradigm shift towards systems-level data fusion. Check it out!

Last Updated on Wednesday, 15 June 2016 22:07

ACM XRDS: Zero-Knowledge Proofs

E-mail Print PDF

The Fall 2013 issue of XRDS: Crossroads, the ACM magazine for students is about the complexities of privacy and anonymity.

The issue is motivated by the current research problems and recent societal concerns about digital privacy. When real and digital worlds collide things can get messy. Complicated problems surrounding privacy and anonymity arise as our interconnected world evolves technically, culturally, and politically. But what do we mean by privacy? By anonymity? Inside this issue there are contributions from lawyers, researchers, computer scientists, policy makers, and industry heavyweights all of whom try to answer the tough questions surrounding privacy, anonymity, and security. From cryptocurrencies to differential privacy, the issue looks at how technology is used to protect our digital selves, and how that same technology can expose our vulnerabilities causing lasting, real-world effects. Check it out!

Department that I'm responsible for contributed a column on zero-knowledge proofs. A zero-knowledge proof allows one person to convince another person of some statement without revealing any information about the proof other than the fact that the statement is indeed true. Zero-knowledge proofs are of practical and theoretical interests in cryptography and mathematics. They achieve a seemingly contradictory goal of proving a statement without revealing it. In the column we describe the interactive proof systems and some implications that zero-knowledge proofs have on the complexity theory. We conclude with an application of zero-knowledge proofs in cryptography, the Fiat-Shamir identification protocol, which is the basis of current zero-knowledge entity authentication schemes. Check it out!

Last Updated on Friday, 21 August 2015 15:00

MLSS 2013, Max Planck Institute for Intelligent Systems, Tübingen

E-mail Print PDF

This year I am participating at Machine Learning Summer School (MLSS) that is held in Tübingen, Germany. The Summer School offers an opportunity to learn about fundamental and advanced aspects of machine learning, data analysis and inference, from leaders of the field. Topics are diverse and include graphical models, multilayer networks, cognitive and kernel learning, network modeling and information propagation, distributed M, structured-output prediction, reinforcement learning, sparse models, learning theory, causality and much more. I am looking forward to it. Also, posters are a long-standing tradition at the MLSS. Below is an image of a poster presentation that covers some of my recent work.


Last Updated on Thursday, 09 July 2015 15:09

Extracting Gene Regulation Networks Using Linear-Chain Conditional Random Fields and Rules @ACL 2013, BioNLP Workshop

E-mail Print PDF

This week Slavko Zitnik will present our paper (he is the first author) at ACLACL BioNLP Workshop on extending linear-chain conditional random fields (CRF) with skip-mentions to extract gene regulatory networks from biomedical literature and a sieve-based system architecture, which is the complete pipeline of data processing that includes data preparation, linear-chain CRF and rule based relation detection and data cleaning.

Published literature in molecular genetics may collectively provide much information on gene regulation networks. Dedicated computational approaches are required to sip through large volumes of text and infer gene interactions. We propose a novel sieve-based relation extraction system that uses linear-chain conditional random fields and rules. Also, we introduce a new skip-mention data representation to enable distant relation extraction using first-order models. To account for a variety of relation types, multiple models are inferred. The system was applied to the BioNLP 2013 Gene Regulation Network Shared Task. Our approach was ranked first of five, with a slot error rate of 0.73.

Presentation slides.

Last Updated on Sunday, 25 August 2013 21:40

ISMB/ECCB 2013 - 21st International Conference on Intelligent Systems in Molecular Biology & 12th European Conference on Computational Biology

E-mail Print PDF

I participated in CAMDA Satellite Meeting on critical assessment of massive data analysis during 29th and 20th July at ISMB in Berlin, where I presented our matrix factorization-based data fusion approach to predicting drug-induced liver injury from toxicogenomics data sets and circumstantial evidence from related data sources. The outcome was positive and our work has been recognized as an excellent research.

The main conference days of 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 12th European Conference on Computational Biology (ECCB) were in Berlin, 21st to 23rd July. Overall, the meeting was enjoyable and the talks there offered novel insights from both computational and biological perspectives. As a side note, in 2014 ISMB and ECCB will be organized separately, the ISMB conference will be in July in Boston and the ECCB meeting will be in September in Strasbourg.

Here, I list some of the talks I attended at ISMB/ECCB. At some point it was difficult to pick the most interesting talk due to nine parallel sessions. Note that only the presenting authors are provided here.

First day:

  • Simple topological properties predict functional misannotations in a metabolic network (J. Pinney).
  • Of men ad not mice. Comparative genome analysis of human diseases and mouse models (W. Xiao).
  • Integration of heterogeneous -seq and -omics data sets: ongoing research and development projects at CLC bio (M. Lappe). Technology track.
  • System based metatranscriptomic analysis (X. Xiong).
  • Integrative analysis of large scale data (M. Spivakov, S. Menon). Workshop track.
  • Multi-task learning for host-pathogen interactions (M. Kshirsagar).
  • Integrative modelling coupled with mass spectrometry-based approaches reveals the structure and dynamics of protein assemblies (A. Politis).
  • Synthetic lethality between gene defects affecting a single non-essential molecular pathway with reversible steps (I. Kupperstein).
Second day:
  • KeyPathwayMiner - extracting disease specific pathways by combining omics data and biological networks (J. Baumbach). Technology track.
  • Compressive genomics (M. Baym).
  • Predicting drug-target interactions using restricted Boltzmann machines (J. Zeng).
  • Efficient network-guided multi locus associationmapping with graph cuts (C. Azencott).
  • Differential genetic interactions of S. cerevisiae stress response pathways (P. Beltrao). Special session on dynamic interaction networks.
  • Coordination of post-translational  modifications in human protein interaction networks (J. Woodsmith). Special session on dynamic interaction networks.
  • Prediction and analysis of protein interaction networks (A. Valencia). Special session on dynamic interaction networks.
  • Characterizing the context of human protein-protein interactions for an improved understanding of drug mechanism of action (M. Kotlyar). Special session on dynamic interaction networks.
  • GPU acceleration of bioinformatics pipeline (M. Berger and a team from NVIDIA).
Third day:
  • Using the world's public big data to find novel uses for drugs (P. Bourne).
  • A top-down systems biology approach to novel therapeutic strategies (P. Aloy).
  • A large-scale evaluation of computational protein function prediction (P. Radivojac).
  • Deciphering the gene expression code via a combined synthetic computational biology approach (T. Tuller).
  • Interplay of microRNAs, transcription factors and genes: linking dynamic expression changes to function (P. Nazarov).
  • Visual analytics, the human back in the loop (J. Aerts).
  • Turning networks into ontologies of gene function (J. Dutkowski).
  • A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text (S. Ananiadou).
I enjoyed the keynote talks:
  • How chromatin organization and epigenetics talk with alternative splicing (G. Ast).
  • Insights from sequencing thousands of human genomes (G. Abecasis).
  • Sequencing based functional genomics (analysis) (L. Pachter).
  • Searching for signals in sequences (G. Stormo).
  • Results may vary. What is reproducible? Why do open science and who gets the credit? (C. A. Goble).
  • Protein interactions in health and disease (D. Eisenberg).
It has been quite lively on Twitter as well. The official hashtag was #ISMBECCB, at some point it was even a trending hashtag on Twitter. Check the archive, tweets captured important insights from the talks and take-away messages as well as some entertaining ideas such as the unofficial ISMB Bingo card by @jonathancairns.
Last Updated on Thursday, 25 July 2013 19:50

CAMDA 2013: Matrix Factorization-Based Data Fusion for Drug-Induced Liver Injury Prediction

E-mail Print PDF

This work was recognized as first prize winner for excellent research at ISMB/ECCB CAMDA 2013 Conference.

I am giving a talk at CAMDA 2013 Conference, which runs as a satellite meeting of ISMB/ECCB 2013 Conference. CAMDA focuses on challenges in the analysis of the massive data sets that are increasingly produced in several fields of the life sciences. The conference offers researchers from the computer sciences, statistics, molecular biology, and other fields a unique opportunity to benefit from a critical comparative evaluation of the latest approaches in the analysis of life sciences' “Big Data”.

Currently, the Big Data explosion is the grand challenge in life sciences. Analysing large data sets is emerging to one of the scientific key techniques in the post genomic era. Still the data analysis bottleneck prevents new biotechnologies from providing new medical and biological insights in a larger scale. This trend towards the need for analysing massive data sets is further accelerated by novel high throughput sequencing technologies and the increasing size of biomedical studies. CAMDA provides new approaches and solutions to the big data problem, presents new techniques in the field of bioinformatics, data analysis, and statistics for handling and processing large data sets. This year, CAMDA's scientific committee set up two challenges; the prediction of drug compatibility from an extremely large toxicogenomic data set, and the decoding of genomes from the Korean Personal Genome Project.

The keynote talks were given by Atul Butte from Stanford University School of Medicine and Nikolaus Rajewsky from Max-Delbrück-Center for Molecular Medicine in Berlin. Atul Butte talked about translational bioinformatics and emphasized the importance of converting molecular, clinical and epidemiological data into diagnostics and therapeutics to ease the bench-to-bedsize translation. Nikolaus Rajewsky presented his group work on circular RNAs and findings on RNA-protein interactions.

I was involved in the prediction of drug compatibility from an extremely large toxicogenomic data set to answer two most important questions in toxicology. We investigated whether animal studies can be replaced with in vitro assays and if liver injuries in humans can be predicted using toxicogenomics data from animals.

In this work, we demonstrate that data fusion allows us to simultaneously consider the available data for outcome prediction of drug-induced liver injury. Its models can surpass accuracy of standard machine learning approaches. Our results also indicate that future prediction models should exploit circumstantial evidence from related data sources in addition to standard toxicogenomics data sets. We anticipate that efforts in data analysis have the promise to replace animal studies with in vitro assays and predict the outcome of liver injuries in humans using toxicogenomics data from animals.


Last Updated on Thursday, 09 July 2015 15:08

Numerical Analysis of Matrix Functions

E-mail Print PDF

I have spent some time recently studying matrix functions, both from theoretical and computational perspective. There is a nice book by Nick J. Higham on functions of matrices, which I highly recommend to interested reader and which provides a thorough overview of current theoretical results on matrix functions and several efficient numerical methods for computing them. Another well written text is by Rajendra Bhatia on matrix analysis (graduate texts in mathematics), which includes topics such as the theory of majorization, variational principles for eigenvalues, operator monotone and convex functions, matrix inequalities and perturbation of matrix functions. Bhatia's book is more functional analytic in spirit, whereas Higham's book focuses more on numerical linear algebra.

Below you will find a report that I produced and which contains a few interesting (some are elementary) proofs and implementations of algorithms. Interested reader should check the literature above to be able to follow the text.

Last Updated on Sunday, 25 August 2013 21:36

Topological Concepts in Machine Learning @ACAT Summer School

E-mail Print PDF

I had a talk at ACAT Summer school on computational topology and topological data analysis held at University of Ljubljana.

Abstract: Fast growth in the amount of data emerging from studies across various scientific disciplines and engineering requires alternative approaches to understand large and complex data sets in order to turn data into useful knowledge. Topological methods are making an increasing contribution in revealing patterns and shapes of high-dimensional data sets. Ideas, such as studying the shapes in a coordinate free ways, compressed representations and invariance to data deformations are important when one is dealing with large data sets. In this talk we consider which key concepts make topological methods appropriate for data analysis and survey some machine learning techniques proposed in the literature, which exploit them. We illustrate their utility with examples from computational biology, text classification and data visualization.

Slides (in English).

Last Updated on Thursday, 25 June 2015 14:41

Page 5 of 8