Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size
Marinka Zitnik

ISMB/ECCB 2013 - 21st International Conference on Intelligent Systems in Molecular Biology & 12th European Conference on Computational Biology

I participated in CAMDA Satellite Meeting on critical assessment of massive data analysis during 29th and 20th July at ISMB in Berlin, where I presented our matrix factorization-based data fusion approach to predicting drug-induced liver injury from toxicogenomics data sets and circumstantial evidence from related data sources. The outcome was positive and our work has been recognized as an excellent research.

The main conference days of 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 12th European Conference on Computational Biology (ECCB) were in Berlin, 21st to 23rd July. Overall, the meeting was enjoyable and the talks there offered novel insights from both computational and biological perspectives. As a side note, in 2014 ISMB and ECCB will be organized separately, the ISMB conference will be in July in Boston and the ECCB meeting will be in September in Strasbourg.

Here, I list some of the talks I attended at ISMB/ECCB. At some point it was difficult to pick the most interesting talk due to nine parallel sessions. Note that only the presenting authors are provided here.

First day:

  • Simple topological properties predict functional misannotations in a metabolic network (J. Pinney).
  • Of men ad not mice. Comparative genome analysis of human diseases and mouse models (W. Xiao).
  • Integration of heterogeneous -seq and -omics data sets: ongoing research and development projects at CLC bio (M. Lappe). Technology track.
  • System based metatranscriptomic analysis (X. Xiong).
  • Integrative analysis of large scale data (M. Spivakov, S. Menon). Workshop track.
  • Multi-task learning for host-pathogen interactions (M. Kshirsagar).
  • Integrative modelling coupled with mass spectrometry-based approaches reveals the structure and dynamics of protein assemblies (A. Politis).
  • Synthetic lethality between gene defects affecting a single non-essential molecular pathway with reversible steps (I. Kupperstein).
Second day:
  • KeyPathwayMiner - extracting disease specific pathways by combining omics data and biological networks (J. Baumbach). Technology track.
  • Compressive genomics (M. Baym).
  • Predicting drug-target interactions using restricted Boltzmann machines (J. Zeng).
  • Efficient network-guided multi locus associationmapping with graph cuts (C. Azencott).
  • Differential genetic interactions of S. cerevisiae stress response pathways (P. Beltrao). Special session on dynamic interaction networks.
  • Coordination of post-translational  modifications in human protein interaction networks (J. Woodsmith). Special session on dynamic interaction networks.
  • Prediction and analysis of protein interaction networks (A. Valencia). Special session on dynamic interaction networks.
  • Characterizing the context of human protein-protein interactions for an improved understanding of drug mechanism of action (M. Kotlyar). Special session on dynamic interaction networks.
  • GPU acceleration of bioinformatics pipeline (M. Berger and a team from NVIDIA).
Third day:
  • Using the world's public big data to find novel uses for drugs (P. Bourne).
  • A top-down systems biology approach to novel therapeutic strategies (P. Aloy).
  • A large-scale evaluation of computational protein function prediction (P. Radivojac).
  • Deciphering the gene expression code via a combined synthetic computational biology approach (T. Tuller).
  • Interplay of microRNAs, transcription factors and genes: linking dynamic expression changes to function (P. Nazarov).
  • Visual analytics, the human back in the loop (J. Aerts).
  • Turning networks into ontologies of gene function (J. Dutkowski).
  • A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text (S. Ananiadou).
I enjoyed the keynote talks:
  • How chromatin organization and epigenetics talk with alternative splicing (G. Ast).
  • Insights from sequencing thousands of human genomes (G. Abecasis).
  • Sequencing based functional genomics (analysis) (L. Pachter).
  • Searching for signals in sequences (G. Stormo).
  • Results may vary. What is reproducible? Why do open science and who gets the credit? (C. A. Goble).
  • Protein interactions in health and disease (D. Eisenberg).
It has been quite lively on Twitter as well. The official hashtag was #ISMBECCB, at some point it was even a trending hashtag on Twitter. Check the archive, tweets captured important insights from the talks and take-away messages as well as some entertaining ideas such as the unofficial ISMB Bingo card by @jonathancairns.

Call for Papers: Special Issue on Deep Learning and Graph Embeddings for Network Biology

I am co-editing a special issue on Deep Learning and Graph Embeddings for Network Biology for the journal IEEE/ACM Transactions on Computational Biology and Bioinformatics.

Spread the word and submit your finest work on new embedding techniques and/or their applications to challenging problems in biology, medicine, and health.


CAMDA 2013: Matrix Factorization-Based Data Fusion for Drug-Induced Liver Injury Prediction

This work was recognized as first prize winner for excellent research at ISMB/ECCB CAMDA 2013 Conference.

I am giving a talk at CAMDA 2013 Conference, which runs as a satellite meeting of ISMB/ECCB 2013 Conference. CAMDA focuses on challenges in the analysis of the massive data sets that are increasingly produced in several fields of the life sciences. The conference offers researchers from the computer sciences, statistics, molecular biology, and other fields a unique opportunity to benefit from a critical comparative evaluation of the latest approaches in the analysis of life sciences' “Big Data”.

Currently, the Big Data explosion is the grand challenge in life sciences. Analysing large data sets is emerging to one of the scientific key techniques in the post genomic era. Still the data analysis bottleneck prevents new biotechnologies from providing new medical and biological insights in a larger scale. This trend towards the need for analysing massive data sets is further accelerated by novel high throughput sequencing technologies and the increasing size of biomedical studies. CAMDA provides new approaches and solutions to the big data problem, presents new techniques in the field of bioinformatics, data analysis, and statistics for handling and processing large data sets. This year, CAMDA's scientific committee set up two challenges; the prediction of drug compatibility from an extremely large toxicogenomic data set, and the decoding of genomes from the Korean Personal Genome Project.

The keynote talks were given by Atul Butte from Stanford University School of Medicine and Nikolaus Rajewsky from Max-Delbrück-Center for Molecular Medicine in Berlin. Atul Butte talked about translational bioinformatics and emphasized the importance of converting molecular, clinical and epidemiological data into diagnostics and therapeutics to ease the bench-to-bedsize translation. Nikolaus Rajewsky presented his group work on circular RNAs and findings on RNA-protein interactions.

I was involved in the prediction of drug compatibility from an extremely large toxicogenomic data set to answer two most important questions in toxicology. We investigated whether animal studies can be replaced with in vitro assays and if liver injuries in humans can be predicted using toxicogenomics data from animals.

In this work, we demonstrate that data fusion allows us to simultaneously consider the available data for outcome prediction of drug-induced liver injury. Its models can surpass accuracy of standard machine learning approaches. Our results also indicate that future prediction models should exploit circumstantial evidence from related data sources in addition to standard toxicogenomics data sets. We anticipate that efforts in data analysis have the promise to replace animal studies with in vitro assays and predict the outcome of liver injuries in humans using toxicogenomics data from animals.



Part1: Matrix Computations Notes

Labels: FactorizationMaths

Constrained LS Problems

Subset Selection Using SVD

Total LS

Comparing Subspaces Using SVD

Some Modified Eigenvalue Problems

Updating the QR Factorization



Assistant Professor at Harvard University

Starting in December 2019, I will be a tenure-track Assistant Professor at Harvard University, and my laboratory at Harvard and the Blavatnik Institute will focus on Machine Learning for Science and Medicine.

I am looking for outstanding students and postdoctoral fellows who would like to join me in transforming science and medicine to data-driven and computationally enabled disciplines. The research focus is on new data science and machine learning methods for learning and reasoning over rich interaction data and on translation of these methods into solutions for biomedical problems. This scientific approach not only opens up new avenues for understanding nature, analyzing health, and developing new medicines to help people but can impact on the way predictive modeling is performed today at the fundamental level.

If you are excited about problems in machine learning and/or applications in genomics, medicine, and health and would like to work with me at Harvard, please contact me with a brief description of your research interests and your CV.


Page 4 of 25