Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size

ACM XRDS: Zero-Knowledge Proofs

E-mail Print PDF

The Fall 2013 issue of XRDS: Crossroads, the ACM magazine for students is about the complexities of privacy and anonymity.

The issue is motivated by the current research problems and recent societal concerns about digital privacy. When real and digital worlds collide things can get messy. Complicated problems surrounding privacy and anonymity arise as our interconnected world evolves technically, culturally, and politically. But what do we mean by privacy? By anonymity? Inside this issue there are contributions from lawyers, researchers, computer scientists, policy makers, and industry heavyweights all of whom try to answer the tough questions surrounding privacy, anonymity, and security. From cryptocurrencies to differential privacy, the issue looks at how technology is used to protect our digital selves, and how that same technology can expose our vulnerabilities causing lasting, real-world effects. Check it out!

Department that I'm responsible for contributed a column on zero-knowledge proofs. A zero-knowledge proof allows one person to convince another person of some statement without revealing any information about the proof other than the fact that the statement is indeed true. Zero-knowledge proofs are of practical and theoretical interests in cryptography and mathematics. They achieve a seemingly contradictory goal of proving a statement without revealing it. In the column we describe the interactive proof systems and some implications that zero-knowledge proofs have on the complexity theory. We conclude with an application of zero-knowledge proofs in cryptography, the Fiat-Shamir identification protocol, which is the basis of current zero-knowledge entity authentication schemes. Check it out!

Last Updated on Friday, 21 August 2015 15:00
 

MLSS 2013, Max Planck Institute for Intelligent Systems, Tübingen

E-mail Print PDF

This year I am participating at Machine Learning Summer School (MLSS) that is held in Tübingen, Germany. The Summer School offers an opportunity to learn about fundamental and advanced aspects of machine learning, data analysis and inference, from leaders of the field. Topics are diverse and include graphical models, multilayer networks, cognitive and kernel learning, network modeling and information propagation, distributed M, structured-output prediction, reinforcement learning, sparse models, learning theory, causality and much more. I am looking forward to it. Also, posters are a long-standing tradition at the MLSS. Below is an image of a poster presentation that covers some of my recent work.

 

Last Updated on Thursday, 09 July 2015 15:09
 

Extracting Gene Regulation Networks Using Linear-Chain Conditional Random Fields and Rules @ACL 2013, BioNLP Workshop

E-mail Print PDF

This week Slavko Zitnik will present our paper (he is the first author) at ACLACL BioNLP Workshop on extending linear-chain conditional random fields (CRF) with skip-mentions to extract gene regulatory networks from biomedical literature and a sieve-based system architecture, which is the complete pipeline of data processing that includes data preparation, linear-chain CRF and rule based relation detection and data cleaning.

Published literature in molecular genetics may collectively provide much information on gene regulation networks. Dedicated computational approaches are required to sip through large volumes of text and infer gene interactions. We propose a novel sieve-based relation extraction system that uses linear-chain conditional random fields and rules. Also, we introduce a new skip-mention data representation to enable distant relation extraction using first-order models. To account for a variety of relation types, multiple models are inferred. The system was applied to the BioNLP 2013 Gene Regulation Network Shared Task. Our approach was ranked first of five, with a slot error rate of 0.73.

Presentation slides.

Last Updated on Sunday, 25 August 2013 21:40
 

ISMB/ECCB 2013 - 21st International Conference on Intelligent Systems in Molecular Biology & 12th European Conference on Computational Biology

E-mail Print PDF

I participated in CAMDA Satellite Meeting on critical assessment of massive data analysis during 29th and 20th July at ISMB in Berlin, where I presented our matrix factorization-based data fusion approach to predicting drug-induced liver injury from toxicogenomics data sets and circumstantial evidence from related data sources. The outcome was positive and our work has been recognized as an excellent research.

The main conference days of 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 12th European Conference on Computational Biology (ECCB) were in Berlin, 21st to 23rd July. Overall, the meeting was enjoyable and the talks there offered novel insights from both computational and biological perspectives. As a side note, in 2014 ISMB and ECCB will be organized separately, the ISMB conference will be in July in Boston and the ECCB meeting will be in September in Strasbourg.

Here, I list some of the talks I attended at ISMB/ECCB. At some point it was difficult to pick the most interesting talk due to nine parallel sessions. Note that only the presenting authors are provided here.

First day:

  • Simple topological properties predict functional misannotations in a metabolic network (J. Pinney).
  • Of men ad not mice. Comparative genome analysis of human diseases and mouse models (W. Xiao).
  • Integration of heterogeneous -seq and -omics data sets: ongoing research and development projects at CLC bio (M. Lappe). Technology track.
  • System based metatranscriptomic analysis (X. Xiong).
  • Integrative analysis of large scale data (M. Spivakov, S. Menon). Workshop track.
  • Multi-task learning for host-pathogen interactions (M. Kshirsagar).
  • Integrative modelling coupled with mass spectrometry-based approaches reveals the structure and dynamics of protein assemblies (A. Politis).
  • Synthetic lethality between gene defects affecting a single non-essential molecular pathway with reversible steps (I. Kupperstein).
Second day:
  • KeyPathwayMiner - extracting disease specific pathways by combining omics data and biological networks (J. Baumbach). Technology track.
  • Compressive genomics (M. Baym).
  • Predicting drug-target interactions using restricted Boltzmann machines (J. Zeng).
  • Efficient network-guided multi locus associationmapping with graph cuts (C. Azencott).
  • Differential genetic interactions of S. cerevisiae stress response pathways (P. Beltrao). Special session on dynamic interaction networks.
  • Coordination of post-translational  modifications in human protein interaction networks (J. Woodsmith). Special session on dynamic interaction networks.
  • Prediction and analysis of protein interaction networks (A. Valencia). Special session on dynamic interaction networks.
  • Characterizing the context of human protein-protein interactions for an improved understanding of drug mechanism of action (M. Kotlyar). Special session on dynamic interaction networks.
  • GPU acceleration of bioinformatics pipeline (M. Berger and a team from NVIDIA).
Third day:
  • Using the world's public big data to find novel uses for drugs (P. Bourne).
  • A top-down systems biology approach to novel therapeutic strategies (P. Aloy).
  • A large-scale evaluation of computational protein function prediction (P. Radivojac).
  • Deciphering the gene expression code via a combined synthetic computational biology approach (T. Tuller).
  • Interplay of microRNAs, transcription factors and genes: linking dynamic expression changes to function (P. Nazarov).
  • Visual analytics, the human back in the loop (J. Aerts).
  • Turning networks into ontologies of gene function (J. Dutkowski).
  • A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text (S. Ananiadou).
I enjoyed the keynote talks:
  • How chromatin organization and epigenetics talk with alternative splicing (G. Ast).
  • Insights from sequencing thousands of human genomes (G. Abecasis).
  • Sequencing based functional genomics (analysis) (L. Pachter).
  • Searching for signals in sequences (G. Stormo).
  • Results may vary. What is reproducible? Why do open science and who gets the credit? (C. A. Goble).
  • Protein interactions in health and disease (D. Eisenberg).
It has been quite lively on Twitter as well. The official hashtag was #ISMBECCB, at some point it was even a trending hashtag on Twitter. Check the archive, tweets captured important insights from the talks and take-away messages as well as some entertaining ideas such as the unofficial ISMB Bingo card by @jonathancairns.
Last Updated on Thursday, 25 July 2013 19:50
 

CAMDA 2013: Matrix Factorization-Based Data Fusion for Drug-Induced Liver Injury Prediction

E-mail Print PDF

This work was recognized as first prize winner for excellent research at ISMB/ECCB CAMDA 2013 Conference.

I am giving a talk at CAMDA 2013 Conference, which runs as a satellite meeting of ISMB/ECCB 2013 Conference. CAMDA focuses on challenges in the analysis of the massive data sets that are increasingly produced in several fields of the life sciences. The conference offers researchers from the computer sciences, statistics, molecular biology, and other fields a unique opportunity to benefit from a critical comparative evaluation of the latest approaches in the analysis of life sciences' “Big Data”.

Currently, the Big Data explosion is the grand challenge in life sciences. Analysing large data sets is emerging to one of the scientific key techniques in the post genomic era. Still the data analysis bottleneck prevents new biotechnologies from providing new medical and biological insights in a larger scale. This trend towards the need for analysing massive data sets is further accelerated by novel high throughput sequencing technologies and the increasing size of biomedical studies. CAMDA provides new approaches and solutions to the big data problem, presents new techniques in the field of bioinformatics, data analysis, and statistics for handling and processing large data sets. This year, CAMDA's scientific committee set up two challenges; the prediction of drug compatibility from an extremely large toxicogenomic data set, and the decoding of genomes from the Korean Personal Genome Project.

The keynote talks were given by Atul Butte from Stanford University School of Medicine and Nikolaus Rajewsky from Max-Delbrück-Center for Molecular Medicine in Berlin. Atul Butte talked about translational bioinformatics and emphasized the importance of converting molecular, clinical and epidemiological data into diagnostics and therapeutics to ease the bench-to-bedsize translation. Nikolaus Rajewsky presented his group work on circular RNAs and findings on RNA-protein interactions.

I was involved in the prediction of drug compatibility from an extremely large toxicogenomic data set to answer two most important questions in toxicology. We investigated whether animal studies can be replaced with in vitro assays and if liver injuries in humans can be predicted using toxicogenomics data from animals.

In this work, we demonstrate that data fusion allows us to simultaneously consider the available data for outcome prediction of drug-induced liver injury. Its models can surpass accuracy of standard machine learning approaches. Our results also indicate that future prediction models should exploit circumstantial evidence from related data sources in addition to standard toxicogenomics data sets. We anticipate that efforts in data analysis have the promise to replace animal studies with in vitro assays and predict the outcome of liver injuries in humans using toxicogenomics data from animals.

 

Last Updated on Thursday, 09 July 2015 15:08
 

Numerical Analysis of Matrix Functions

E-mail Print PDF

I have spent some time recently studying matrix functions, both from theoretical and computational perspective. There is a nice book by Nick J. Higham on functions of matrices, which I highly recommend to interested reader and which provides a thorough overview of current theoretical results on matrix functions and several efficient numerical methods for computing them. Another well written text is by Rajendra Bhatia on matrix analysis (graduate texts in mathematics), which includes topics such as the theory of majorization, variational principles for eigenvalues, operator monotone and convex functions, matrix inequalities and perturbation of matrix functions. Bhatia's book is more functional analytic in spirit, whereas Higham's book focuses more on numerical linear algebra.

Below you will find a report that I produced and which contains a few interesting (some are elementary) proofs and implementations of algorithms. Interested reader should check the literature above to be able to follow the text.

Last Updated on Sunday, 25 August 2013 21:36
 

Topological Concepts in Machine Learning @ACAT Summer School

E-mail Print PDF

I had a talk at ACAT Summer school on computational topology and topological data analysis held at University of Ljubljana.

Abstract: Fast growth in the amount of data emerging from studies across various scientific disciplines and engineering requires alternative approaches to understand large and complex data sets in order to turn data into useful knowledge. Topological methods are making an increasing contribution in revealing patterns and shapes of high-dimensional data sets. Ideas, such as studying the shapes in a coordinate free ways, compressed representations and invariance to data deformations are important when one is dealing with large data sets. In this talk we consider which key concepts make topological methods appropriate for data analysis and survey some machine learning techniques proposed in the literature, which exploit them. We illustrate their utility with examples from computational biology, text classification and data visualization.

Slides (in English).

Last Updated on Thursday, 25 June 2015 14:41
 

BioDay: Trends in Bioinformatics @Hekovnik

E-mail Print PDF

In May I participated at the first BioDay meeting organized by Hekovnik in Ljubljana, Slovenia. The aim of the BioDay events is the exchange of ideas, knowledge and fostering collaboration and networking between life scientists, computer scientists, bioinformaticians, mathematicians and physicists.

The first event focused on recent trends in bioinformatics, specifically on experimental methods in systems biology (by Spela Baebler, PhD) and biomedical data fusion. I presented the latter topic and discussed how heterogeneous data sources in biology can be collectively mined by data fusion. The video of the event is available at video.hekovnik.com/bioday_trendi_v_bioinformatiki (in Slovene). Enjoy!

Last Updated on Sunday, 25 August 2013 21:33
 

Winning BioNLP Challenge 2013: Extracting Gene Regulation Network

E-mail Print PDF

I have recently participated in BioNLP Shared Task 2013 Challenge together with Slavko Zitnik and won the first place in the task extracting gene regulation networks.

The goal of the challenge was to assess the performance of information extraction systems to extract a gene regulation network of a specific cellular function in Bacillus Subutilis. This function was sporulation and is related to the adaptation of bacteria to scarce resource conditions. The automatic reconstruction of gene regulation networks is of great importance in biology, because it furthers the understanding of cellular regulation systems.

We were provided a manually curated annotation of the training corpus including entities, events and relations with gene interactions. Also, the regulation network that can be reconstructed with interactions mentioned in sentences of training data was provided (picture on the right). The task required to estimate gene regulation network from test data by specifying a directed graph where vertices represent genes, and arcs represent interactions between genes extracted from the text. The arcs were labeled with an interaction type (e.g., inhibition, activation, binding, transcription).

We hope to describe our approach using conditional random fields and rules in a paper but the details are not public yet (stay tuned).

 

P.S. I have been accepted to Machine Learning Summer School (MLSS) 2013 (acceptance rate 26%) that will take place at Max Planck Institute for Intelligent Systems, Tubingen, Germany late in August this year. There is a list of highly acclaimed speakers and I am looking forward to it!

Last Updated on Sunday, 25 August 2013 21:40
 

ACM HQ, NY, XRDS Editorial Meeting

E-mail Print PDF

Some of the ACM XRDS Editors are participating these days in a meeting to discuss the magazine's future direction in print and online. We will do our best to further promote the XRDS, enhance its departments, improve web presence, build a community of readers, and provide high quality content from various CS disciplines.

See the current ACM XRDS issue on Information and Communication Technologies and Development (ICTD). The Spring issue is coming very soon! It will be on Scientific Computing. And, the Summer issue will be on Computer Science and Creativity.

If you are interested in submitting an article or have some crazy good ideas, please share them with us (contact the editorial staff). Or, share with your colleagues any interesting columns/featured articles you read in ACM XRDS.

XRDS is the ACM's flagship magazine for students, established in 1994. It is published quarterly and invites submissions of high quality articles of interest to computer science students (from Editorial Calendar).

See http://xrds.acm.org.

Last Updated on Thursday, 09 July 2015 15:09
 

Preseren Award of U of Ljubljana, 2012

E-mail Print PDF

In first week of December U of Ljubljana celebrates traditional "Week of University" (Why?) during which numerous invited lectures, presentations and award ceremonies are organized.

This year, I was awarded the faculty award for best students and Preseren Award of U of Ljubljana for thesis "A Matrix Factorization Approach for Inference of Prediction Models from Heterogeneous Data Sources" (slo: univerzitetna Prešernova nagrada za delo "Pristop matrične faktorizacije za gradnjo napovednih modelov iz heterogenih podatkovnih virov"). I would like to thank my supervisor and mentor Prof. dr. Blaž Zupan for encouragement and advice he provides throughout my time as his student. I am lucky to have a supervisor who cares so much about my work and responds to my questions promptly. I could not have won the award without his support and mentoring.

Last Updated on Sunday, 30 March 2014 16:37
 


Page 5 of 8