Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size

Google Global Planning Committee for Women in Computer Science

E-mail Print PDF

I have been given an opportunity to join Google Global Planning Committee for Women in Computer Science in an effort to identify ways we can have the greatest impact and reach more women in tech. As member of this committee I will partner with Google to build the community and direct outreach activities for women in computer science. To kick things off, we will have our global meeting at the Grace Hopper Conference in Phoenix, AZ, USA. I am excited to be part of this great program to promote women to excel in computer science and information technology.

Stay tuned, there will be many possibilities to engage with fellow technologists!

Last Updated on Tuesday, 16 September 2014 16:25
 

@Stanford University, Department of Computer Science

E-mail Print PDF

I am visiting the Department of Computer Science at Stanford University, CA, USA in Summer and Fall 2014. During my stay we will study the interplay between network analysis, data integration and biology. There are many exciting challenges one can explore in these areas and I am very enthusiastic about the work.

Last Updated on Thursday, 21 August 2014 05:53
 

ISMB 2014: Epistasis-Based Gene Network Inference

E-mail Print PDF

I have presented our recent approach for epistasis-based gene network inference at ISMB 2014. We propose a factorized model of interactions that is used for scoring of different types of gene-gene relationships, such as epistasis, parallelism and partial interdependence, and assembly of gene networks that are consistent with estimated pairwise relationships. Detailed derivation of the method and its empirical comparisons with existing approaches are described in our paper published by Bioinformatics.

Last Updated on Thursday, 09 July 2015 15:08
 

CAMDA 2014: Survival Regression by Data Fusion

E-mail Print PDF

I have presented at CAMDA 2014 an extension of our recent matrix factorization-based data fusion approach that couples data fusion with survival regression. CAMDA 2014 runs as a satellite meeting at ISMB 2014, Boston, MA, USA. Our presentation got CAMDA best presentation award.

Any knowledge discovery could in principal benefit from the fusion of directly or even indirectly related data sources. In this work, we explore if a recently proposed simultaneous matrix factorization data fusion approach could be adapted for survival regression. We propose a new method that jointly infers latent factors by data fusion and estimates regression coefficients of survival model. We have applied the method to CAMDA 2014 large-scale Cancer Genomes Challenge and modeled survival time as a function of gene, protein and miRNA expression data, and data on methylated and mutated regions. We find that both joint inference of factors and regression coefficients on one side and data fusion procedure on the other are crucial for performance. Our approach is substantially more accurate than baseline Aalen's additive model. Latent factors inferred by our approach could be mined further; we found that the most informative factors are related to known cancer processes.

Last Updated on Thursday, 09 July 2015 15:08
 

Gene network inference by probabilistic scoring of relationships from a factorized model of interactions

E-mail Print PDF

Bioinformatics just published a special issue devoted to ISMB 2014 proceedings papers that will be presented next month at the world's premier conference on computational biology -- ISMB 2014 in Boston, MA, USA.

Our paper, Gene network inference by probabilistic scoring of relationships from a factorized model of interactions, which you will find in this issue of Bioinformatics, describes a conceptually new probabilistic approach to gene network inference from quantitative interaction data called Red. Red is founded on epistasis analysis. Epistasis analysis is an essential tool of classical genetics for inferring the order of function of genes in a common pathway. Typically, it considers single and double mutant phenotypes and for a pair of genes observes if a change in the first gene masks the effects of the mutation in the second gene. Despite the recent emergence of biotechnology techniques that can provide gene interaction data on a large, possibly genomic scale, very few methods are available for quantitative epistasis analysis and epistasis-based network reconstruction.

The features of Red are joint treatment of the mutant phenotype data with a factorized model and probabilistic scoring of pairwise gene relationships that are inferred from the latent gene representation. The resulting gene network is assembled from scored pairwise relationships. In an experimental study, we show that the proposed approach can accurately reconstruct several known pathways and that it surpasses the accuracy of current approaches.

Last Updated on Wednesday, 13 August 2014 05:21
 

ACM XRDS: Exploring Data with Topological Tools

E-mail Print PDF

The Summer issue of ACM XRDS is here! This issue focuses on diversity in computer science. You will find columns about how to make the tech more inclusive, women in computing, self-teaching and how hip-hop lyrics can be used in combination with artificial intelligence to engage more students in computer science. Also, you should not miss the Features section! There, you will learn, among others, about a research project in Germany that integrates gender and diversity in STEM fields and read about how neuroscience has revealed that we sometimes judge others by their gender or ethnicity without even realizing it. What can be done to address these issues? Check out the ACM XRDS's advice.

For the computationally inspired among you I have contributed a column that describes one of many possible usages of computational topology for exploratory data analysis. Tools from topology increasingly serve to inspire the development of novel computational methods for data analysis. With these methods we can study qualitative geometric information of the data to understand how they are organized on a large scale and focus on intrinsic shape properties rather than on characteristics that depend on a particular choice of a coordinate system. The column applies a topological tool called Mapper to extract and visualize simple descriptions of data sets.

Last Updated on Friday, 21 August 2015 15:01
 

Young Researcher in the Heidelberg Laureate Forum 2014

E-mail Print PDF

I have been selected to participate as young researcher in the Heidelberg Laureate Forum 2014 (HLF). The Forum will take place in September and will bring together winners of the Abel Prize and Fields Medal (mathematics) as well as the Turing Award and Nevanlinna Prize (computer science) with young researchers from around the world selected by an international committee of experts primarily from the award granting organizations. I was fortunate and was given an opportunity to be one of 200 young researchers (there are 100 spaces for each discipline of mathematics and computer science) that will be part of this Forum.

The HLF is an event inspired by Lindau Nobel Laureates Meetings, which provide a forum where people dedicated to science, both role models and young researchers in physics, chemistry and life sciences, can interact. This event spawned an idea to create something similar for scientific disciplines of mathematics and computer science. The list of participating Laureates is impressive and includes, among others, Manuel Blum, Stephen Cook, Antony Hoare, John Hopcroft, Leslie Lamport, John Torrence Tate and Wendelin Werner. I am looking forward to meet these distinguished experts from both disciplines and learn many new things.

Last Updated on Friday, 21 August 2015 16:06
 

ACM XRDS: Efficient Sensor Placement for Environmental Monitoring

E-mail Print PDF

The Spring 2014 issue of XRDS: Crossroads, the ACM magazine for students is about cyber-physical systems.

My XRDS department contributed a column on efficient sensor placement for environmental monitoring. The column is about an important problem of observation selection that received considerable research attention in recent years. Consider, for example, the air quality monitoring in a large research lab, the monitoring of algae biomass in a lake or the placement of a network of sensors in a water distribution system for early detection of contaminants. In all these settings we have to decide where to place the sensors in order to effectively collect information about the environment. Since acquiring observations is typically expensive and we have a limited budget, we want to select a small number of most informative locations for monitoring. Thus, we usually trade off the informativeness of sensor measurements for the cost of data acquisition. The column gives an example of large sensor deployment in a research lab and applies tools of submodular optimization to tackle the task effectively with some theoretical performance guarantees of near optimal observation selection.

Last Updated on Friday, 21 August 2015 15:01
 

@RECOMB 2014, Pittsburgh, PA (Part II)

E-mail Print PDF

We are presenting a poster about our recent data fusion methodology (ArXiv preprint) at RECOMB Conference. Thanks to Prof. Blaz Zupan for the storyline and Prof. Richard H. Kessin for valuable comments. xkcd.com served as an inspiration of poster design (HiRes). See also other post (part I) about our RECOMB paper.

Best Poster Award at RECOMB 2014!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Last Updated on Sunday, 14 June 2015 10:52
 

@RECOMB 2014, Pittsburgh, PA (Part I)

E-mail Print PDF

We got accepted a paper on Imputation of Quantitative Genetic Interactions in Epistatic MAPs by Interaction Propagation Matrix Completion to RECOMB 2014.

Epistatic Miniarray Profile (E-MAP) is a popular large-scale gene interaction discovery platform. E-MAPs benefit from quantitative output, which makes it possible to detect subtle interactions. However, due to the limits of biotechnology, E-MAP studies fail to measure genetic interactions for up to 40% of gene pairs in an assay. Missing measurements can be recovered by computational techniques for data imputation, thus completing the interaction profiles and enabling downstream analysis algorithms that could otherwise be sensitive to largely incomplete data sets. In the paper, we introduce a new interaction data imputation method called interaction propagation matrix completion (IP-MC). The core part of IP-MC is a low-rank (latent) probabilistic matrix completion approach that considers additional knowledge presented through a gene network. IP-MC assumes that interactions are transitive, such that latent gene interaction profiles depend on the profiles of their direct neighbors in a given gene network. As the IP-MC inference algorithm progresses, the latent interaction profiles propagate through the branches of the network. In a study with three different E-MAP data assays and the considered protein-protein interaction and Gene Ontology similarity networks, IP-MC significantly surpassed existing alternative techniques. Inclusion of information from gene networks also allows IP-MC to predict interactions for genes that were not included in original E-MAP assays, a task that could not be considered by current imputation approaches.

Presentation is available at Prezi.

Last Updated on Wednesday, 02 April 2014 21:48
 

@Pacific Symposium on Biocomputing 2014, Hawaii

E-mail Print PDF

I am participating at PSB 2014, Pacific Symposium on Biocomputing, an international conference of current research in the theory and application of computational methods in problems of biological significance, which is held on the Big Island of Hawaii.

We got accepted a paper on Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold to PSB. In the paper, we have examined the applicability of our recently proposed matrix factorization-based data fusion approach on the problem of gene function prediction. We studied three fusion scenarios to demonstrate high accuracy of our approach when learning from disparate, incomplete and noisy data. The studies were successfully carried out for two different organisms, where, for example, the protein-protein interaction network for yeast is nearly complete but it is noisy, whereas the sets of available interactions for slime mold are rather sparse and only about one-tenth of its genes have experimentally derived annotations.

Last Updated on Monday, 07 December 2015 21:17
 

@Baylor College of Medicine, Department of Molecular and Human Genetics

E-mail Print PDF

Between December 2013 and August 2014 I am visiting the Department of Molecular and Human Genetics at Baylor College of Medicine, Houston, TX, USA. During my stay we will do research on computational methods for data fusion and their applications in systems biology. We will investigate our recently developed data fusion algorithms and applied them to tasks such as gene function prediction, gene ranking (prioritization), missing value imputation, association mining and inference of gene networks from mutant data. I anticipate that large-scale applications of our methods may provide valuable feedback on whether such functionality is useful for biological community and provide new insights into the correspondence between biological and algorithmic concepts.

Last Updated on Sunday, 14 June 2015 10:52
 

ACM XRDS: On Constructing the Tree of Life

E-mail Print PDF

The Winter 2013 issue of XRDS: Crossroads, the ACM magazine for students features the latest in wearable computing, such as wearable brain computer interface, human motion capturing and tracking how we read, the augmented reality and airwriting. In this issue there is a fascinating insider's look at what a Google technical interview is all about. Check it out!

I contributed a column on constructing, interpreting and visualizing phylogenetic trees, diagrams of relatedness between organisms, species, or genes that show a history of descent from common ancestry. As more and more life sciences data are freely available in public databases, some of the analyses that would have been performed in well-equipped research laboratories just few years ago are nowadays accessible to any interested individual with a commodity computer. Such a shift was only possible due to unprecedented technological and theoretical advancements across a broad spectrum of science and technology. Check it out!

Last Updated on Friday, 21 August 2015 15:00
 

Press Coverage of Our Recent Study About Connections Between Human Diseases

E-mail Print PDF

BioTechniques, The International Journal of Life Science Methods highlighted our recent paper on Discovering disease-disease associations by fusing systems-level molecular data, which was published by Nature's Scientific Reports. In the paper we applied our novel computational approach for data fusion to a plethora of molecular data in order to discover disease-disease associations.

Complete article featuring our study and a commmentary by paper's senior author prof. Blaz Zupan, PhD are available at BioTechniques site.

Last Updated on Sunday, 30 March 2014 16:37
 

Discovering Disease-Disease Associations by Fusing Molecular Data

E-mail Print PDF

Nature's Scientific Reports has published our latest paper on data fusion, Discovering disease-disease associations by fusing systems-level molecular data, in which we combine various sources of biological information to discover human disease-disease associations.

The advent of genome-scale genetic and genomic studies allows new insight into disease classification. Recently, a shift was made from linking diseases simply based on their shared genes towards systems-level integration of molecular data. We aim to find relationships between diseases based on evidence from fusing all available molecular interaction and ontology data. We propose a multi-level hierarchy of disease classes that significantly overlaps with existing disease classification. In it, we find 14 disease-disease associations currently not present in Disease Ontology and provide evidence for their relationships through comorbidity data and literature curation. Interestingly, even though the number of known human genetic interactions is currently very small, we find they are the most important predictor of a link between diseases. Finally, we show that omission of any one of the included data sources reduces prediction quality, further highlighting the importance in the paradigm shift towards systems-level data fusion. Check it out!

Last Updated on Wednesday, 15 June 2016 22:07
 

ACM XRDS: Zero-Knowledge Proofs

E-mail Print PDF

The Fall 2013 issue of XRDS: Crossroads, the ACM magazine for students is about the complexities of privacy and anonymity.

The issue is motivated by the current research problems and recent societal concerns about digital privacy. When real and digital worlds collide things can get messy. Complicated problems surrounding privacy and anonymity arise as our interconnected world evolves technically, culturally, and politically. But what do we mean by privacy? By anonymity? Inside this issue there are contributions from lawyers, researchers, computer scientists, policy makers, and industry heavyweights all of whom try to answer the tough questions surrounding privacy, anonymity, and security. From cryptocurrencies to differential privacy, the issue looks at how technology is used to protect our digital selves, and how that same technology can expose our vulnerabilities causing lasting, real-world effects. Check it out!

Department that I'm responsible for contributed a column on zero-knowledge proofs. A zero-knowledge proof allows one person to convince another person of some statement without revealing any information about the proof other than the fact that the statement is indeed true. Zero-knowledge proofs are of practical and theoretical interests in cryptography and mathematics. They achieve a seemingly contradictory goal of proving a statement without revealing it. In the column we describe the interactive proof systems and some implications that zero-knowledge proofs have on the complexity theory. We conclude with an application of zero-knowledge proofs in cryptography, the Fiat-Shamir identification protocol, which is the basis of current zero-knowledge entity authentication schemes. Check it out!

Last Updated on Friday, 21 August 2015 15:00
 

MLSS 2013, Max Planck Institute for Intelligent Systems, Tübingen

E-mail Print PDF

This year I am participating at Machine Learning Summer School (MLSS) that is held in Tübingen, Germany. The Summer School offers an opportunity to learn about fundamental and advanced aspects of machine learning, data analysis and inference, from leaders of the field. Topics are diverse and include graphical models, multilayer networks, cognitive and kernel learning, network modeling and information propagation, distributed M, structured-output prediction, reinforcement learning, sparse models, learning theory, causality and much more. I am looking forward to it. Also, posters are a long-standing tradition at the MLSS. Below is an image of a poster presentation that covers some of my recent work.

 

Last Updated on Thursday, 09 July 2015 15:09
 

Extracting Gene Regulation Networks Using Linear-Chain Conditional Random Fields and Rules @ACL 2013, BioNLP Workshop

E-mail Print PDF

This week Slavko Zitnik will present our paper (he is the first author) at ACLACL BioNLP Workshop on extending linear-chain conditional random fields (CRF) with skip-mentions to extract gene regulatory networks from biomedical literature and a sieve-based system architecture, which is the complete pipeline of data processing that includes data preparation, linear-chain CRF and rule based relation detection and data cleaning.

Published literature in molecular genetics may collectively provide much information on gene regulation networks. Dedicated computational approaches are required to sip through large volumes of text and infer gene interactions. We propose a novel sieve-based relation extraction system that uses linear-chain conditional random fields and rules. Also, we introduce a new skip-mention data representation to enable distant relation extraction using first-order models. To account for a variety of relation types, multiple models are inferred. The system was applied to the BioNLP 2013 Gene Regulation Network Shared Task. Our approach was ranked first of five, with a slot error rate of 0.73.

Presentation slides.

Last Updated on Sunday, 25 August 2013 21:40
 

ISMB/ECCB 2013 - 21st International Conference on Intelligent Systems in Molecular Biology & 12th European Conference on Computational Biology

E-mail Print PDF

I participated in CAMDA Satellite Meeting on critical assessment of massive data analysis during 29th and 20th July at ISMB in Berlin, where I presented our matrix factorization-based data fusion approach to predicting drug-induced liver injury from toxicogenomics data sets and circumstantial evidence from related data sources. The outcome was positive and our work has been recognized as an excellent research.

The main conference days of 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 12th European Conference on Computational Biology (ECCB) were in Berlin, 21st to 23rd July. Overall, the meeting was enjoyable and the talks there offered novel insights from both computational and biological perspectives. As a side note, in 2014 ISMB and ECCB will be organized separately, the ISMB conference will be in July in Boston and the ECCB meeting will be in September in Strasbourg.

Here, I list some of the talks I attended at ISMB/ECCB. At some point it was difficult to pick the most interesting talk due to nine parallel sessions. Note that only the presenting authors are provided here.

First day:

  • Simple topological properties predict functional misannotations in a metabolic network (J. Pinney).
  • Of men ad not mice. Comparative genome analysis of human diseases and mouse models (W. Xiao).
  • Integration of heterogeneous -seq and -omics data sets: ongoing research and development projects at CLC bio (M. Lappe). Technology track.
  • System based metatranscriptomic analysis (X. Xiong).
  • Integrative analysis of large scale data (M. Spivakov, S. Menon). Workshop track.
  • Multi-task learning for host-pathogen interactions (M. Kshirsagar).
  • Integrative modelling coupled with mass spectrometry-based approaches reveals the structure and dynamics of protein assemblies (A. Politis).
  • Synthetic lethality between gene defects affecting a single non-essential molecular pathway with reversible steps (I. Kupperstein).
Second day:
  • KeyPathwayMiner - extracting disease specific pathways by combining omics data and biological networks (J. Baumbach). Technology track.
  • Compressive genomics (M. Baym).
  • Predicting drug-target interactions using restricted Boltzmann machines (J. Zeng).
  • Efficient network-guided multi locus associationmapping with graph cuts (C. Azencott).
  • Differential genetic interactions of S. cerevisiae stress response pathways (P. Beltrao). Special session on dynamic interaction networks.
  • Coordination of post-translational  modifications in human protein interaction networks (J. Woodsmith). Special session on dynamic interaction networks.
  • Prediction and analysis of protein interaction networks (A. Valencia). Special session on dynamic interaction networks.
  • Characterizing the context of human protein-protein interactions for an improved understanding of drug mechanism of action (M. Kotlyar). Special session on dynamic interaction networks.
  • GPU acceleration of bioinformatics pipeline (M. Berger and a team from NVIDIA).
Third day:
  • Using the world's public big data to find novel uses for drugs (P. Bourne).
  • A top-down systems biology approach to novel therapeutic strategies (P. Aloy).
  • A large-scale evaluation of computational protein function prediction (P. Radivojac).
  • Deciphering the gene expression code via a combined synthetic computational biology approach (T. Tuller).
  • Interplay of microRNAs, transcription factors and genes: linking dynamic expression changes to function (P. Nazarov).
  • Visual analytics, the human back in the loop (J. Aerts).
  • Turning networks into ontologies of gene function (J. Dutkowski).
  • A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text (S. Ananiadou).
I enjoyed the keynote talks:
  • How chromatin organization and epigenetics talk with alternative splicing (G. Ast).
  • Insights from sequencing thousands of human genomes (G. Abecasis).
  • Sequencing based functional genomics (analysis) (L. Pachter).
  • Searching for signals in sequences (G. Stormo).
  • Results may vary. What is reproducible? Why do open science and who gets the credit? (C. A. Goble).
  • Protein interactions in health and disease (D. Eisenberg).
It has been quite lively on Twitter as well. The official hashtag was #ISMBECCB, at some point it was even a trending hashtag on Twitter. Check the archive, tweets captured important insights from the talks and take-away messages as well as some entertaining ideas such as the unofficial ISMB Bingo card by @jonathancairns.
Last Updated on Thursday, 25 July 2013 19:50
 

CAMDA 2013: Matrix Factorization-Based Data Fusion for Drug-Induced Liver Injury Prediction

E-mail Print PDF

This work was recognized as first prize winner for excellent research at ISMB/ECCB CAMDA 2013 Conference.

I am giving a talk at CAMDA 2013 Conference, which runs as a satellite meeting of ISMB/ECCB 2013 Conference. CAMDA focuses on challenges in the analysis of the massive data sets that are increasingly produced in several fields of the life sciences. The conference offers researchers from the computer sciences, statistics, molecular biology, and other fields a unique opportunity to benefit from a critical comparative evaluation of the latest approaches in the analysis of life sciences' “Big Data”.

Currently, the Big Data explosion is the grand challenge in life sciences. Analysing large data sets is emerging to one of the scientific key techniques in the post genomic era. Still the data analysis bottleneck prevents new biotechnologies from providing new medical and biological insights in a larger scale. This trend towards the need for analysing massive data sets is further accelerated by novel high throughput sequencing technologies and the increasing size of biomedical studies. CAMDA provides new approaches and solutions to the big data problem, presents new techniques in the field of bioinformatics, data analysis, and statistics for handling and processing large data sets. This year, CAMDA's scientific committee set up two challenges; the prediction of drug compatibility from an extremely large toxicogenomic data set, and the decoding of genomes from the Korean Personal Genome Project.

The keynote talks were given by Atul Butte from Stanford University School of Medicine and Nikolaus Rajewsky from Max-Delbrück-Center for Molecular Medicine in Berlin. Atul Butte talked about translational bioinformatics and emphasized the importance of converting molecular, clinical and epidemiological data into diagnostics and therapeutics to ease the bench-to-bedsize translation. Nikolaus Rajewsky presented his group work on circular RNAs and findings on RNA-protein interactions.

I was involved in the prediction of drug compatibility from an extremely large toxicogenomic data set to answer two most important questions in toxicology. We investigated whether animal studies can be replaced with in vitro assays and if liver injuries in humans can be predicted using toxicogenomics data from animals.

In this work, we demonstrate that data fusion allows us to simultaneously consider the available data for outcome prediction of drug-induced liver injury. Its models can surpass accuracy of standard machine learning approaches. Our results also indicate that future prediction models should exploit circumstantial evidence from related data sources in addition to standard toxicogenomics data sets. We anticipate that efforts in data analysis have the promise to replace animal studies with in vitro assays and predict the outcome of liver injuries in humans using toxicogenomics data from animals.

 

Last Updated on Thursday, 09 July 2015 15:08
 

Numerical Analysis of Matrix Functions

E-mail Print PDF

I have spent some time recently studying matrix functions, both from theoretical and computational perspective. There is a nice book by Nick J. Higham on functions of matrices, which I highly recommend to interested reader and which provides a thorough overview of current theoretical results on matrix functions and several efficient numerical methods for computing them. Another well written text is by Rajendra Bhatia on matrix analysis (graduate texts in mathematics), which includes topics such as the theory of majorization, variational principles for eigenvalues, operator monotone and convex functions, matrix inequalities and perturbation of matrix functions. Bhatia's book is more functional analytic in spirit, whereas Higham's book focuses more on numerical linear algebra.

Below you will find a report that I produced and which contains a few interesting (some are elementary) proofs and implementations of algorithms. Interested reader should check the literature above to be able to follow the text.

Last Updated on Sunday, 25 August 2013 21:36
 

Topological Concepts in Machine Learning @ACAT Summer School

E-mail Print PDF

I had a talk at ACAT Summer school on computational topology and topological data analysis held at University of Ljubljana.

Abstract: Fast growth in the amount of data emerging from studies across various scientific disciplines and engineering requires alternative approaches to understand large and complex data sets in order to turn data into useful knowledge. Topological methods are making an increasing contribution in revealing patterns and shapes of high-dimensional data sets. Ideas, such as studying the shapes in a coordinate free ways, compressed representations and invariance to data deformations are important when one is dealing with large data sets. In this talk we consider which key concepts make topological methods appropriate for data analysis and survey some machine learning techniques proposed in the literature, which exploit them. We illustrate their utility with examples from computational biology, text classification and data visualization.

Slides (in English).

Last Updated on Thursday, 25 June 2015 14:41
 

BioDay: Trends in Bioinformatics @Hekovnik

E-mail Print PDF

In May I participated at the first BioDay meeting organized by Hekovnik in Ljubljana, Slovenia. The aim of the BioDay events is the exchange of ideas, knowledge and fostering collaboration and networking between life scientists, computer scientists, bioinformaticians, mathematicians and physicists.

The first event focused on recent trends in bioinformatics, specifically on experimental methods in systems biology (by Spela Baebler, PhD) and biomedical data fusion. I presented the latter topic and discussed how heterogeneous data sources in biology can be collectively mined by data fusion. The video of the event is available at video.hekovnik.com/bioday_trendi_v_bioinformatiki (in Slovene). Enjoy!

Last Updated on Sunday, 25 August 2013 21:33
 

Winning BioNLP Challenge 2013: Extracting Gene Regulation Network

E-mail Print PDF

I have recently participated in BioNLP Shared Task 2013 Challenge together with Slavko Zitnik and won the first place in the task extracting gene regulation networks.

The goal of the challenge was to assess the performance of information extraction systems to extract a gene regulation network of a specific cellular function in Bacillus Subutilis. This function was sporulation and is related to the adaptation of bacteria to scarce resource conditions. The automatic reconstruction of gene regulation networks is of great importance in biology, because it furthers the understanding of cellular regulation systems.

We were provided a manually curated annotation of the training corpus including entities, events and relations with gene interactions. Also, the regulation network that can be reconstructed with interactions mentioned in sentences of training data was provided (picture on the right). The task required to estimate gene regulation network from test data by specifying a directed graph where vertices represent genes, and arcs represent interactions between genes extracted from the text. The arcs were labeled with an interaction type (e.g., inhibition, activation, binding, transcription).

We hope to describe our approach using conditional random fields and rules in a paper but the details are not public yet (stay tuned).

 

P.S. I have been accepted to Machine Learning Summer School (MLSS) 2013 (acceptance rate 26%) that will take place at Max Planck Institute for Intelligent Systems, Tubingen, Germany late in August this year. There is a list of highly acclaimed speakers and I am looking forward to it!

Last Updated on Sunday, 25 August 2013 21:40
 

ACM HQ, NY, XRDS Editorial Meeting

E-mail Print PDF

Some of the ACM XRDS Editors are participating these days in a meeting to discuss the magazine's future direction in print and online. We will do our best to further promote the XRDS, enhance its departments, improve web presence, build a community of readers, and provide high quality content from various CS disciplines.

See the current ACM XRDS issue on Information and Communication Technologies and Development (ICTD). The Spring issue is coming very soon! It will be on Scientific Computing. And, the Summer issue will be on Computer Science and Creativity.

If you are interested in submitting an article or have some crazy good ideas, please share them with us (contact the editorial staff). Or, share with your colleagues any interesting columns/featured articles you read in ACM XRDS.

XRDS is the ACM's flagship magazine for students, established in 1994. It is published quarterly and invites submissions of high quality articles of interest to computer science students (from Editorial Calendar).

See http://xrds.acm.org.

Last Updated on Thursday, 09 July 2015 15:09
 

Preseren Award of U of Ljubljana, 2012

E-mail Print PDF

In first week of December U of Ljubljana celebrates traditional "Week of University" (Why?) during which numerous invited lectures, presentations and award ceremonies are organized.

This year, I was awarded the faculty award for best students and Preseren Award of U of Ljubljana for thesis "A Matrix Factorization Approach for Inference of Prediction Models from Heterogeneous Data Sources" (slo: univerzitetna Prešernova nagrada za delo "Pristop matrične faktorizacije za gradnjo napovednih modelov iz heterogenih podatkovnih virov"). I would like to thank my supervisor and mentor Prof. dr. Blaž Zupan for encouragement and advice he provides throughout my time as his student. I am lucky to have a supervisor who cares so much about my work and responds to my questions promptly. I could not have won the award without his support and mentoring.

Last Updated on Sunday, 30 March 2014 16:37
 

@Imperial College London, Department of Computing (Part II)

E-mail Print PDF

Recent days at Department of Computing, Imperial College London, were pleasant (though intense) and our efforts in data fusion produced some very good results.

Below are images taken at Imperial and nearby Chrome Web Lab, located in Science Museum. More about Google Chrome Web Lab experiment.

Last Updated on Sunday, 25 August 2013 21:31
 

@Imperial College London, Department of Computing (Part I)

E-mail Print PDF

I have just arrived to London, United Kingdom, where I will stay until the end of November this year. I will be working at Imperial College London, Department of Computing, Computational Network Biology Research Group led by Prof. Dr. Natasa Przulj. For this great opportunity I need to thank to my supervisor Prof. Dr. Blaz Zupan, Head of Bioinformatics Laboratory at UofLj.

My work here will be about network integration for disease classification, specifically inferring prediction models from heterogenous data sources through matrix factorization. More about it in the next days. For now you can check the Interactive map of the Diseasome (Below is an image showing a part of diseasome. Interested reader is referred to Barabasi's paper The Human Disease Network.) linked from the NYTimes article Redefining Disease, Genes and All.

I am very much looking forward to it :)

Last Updated on Sunday, 25 August 2013 21:40
 

@University of Toronto, The 13th International Conference on Systems Biology (Part II)

E-mail Print PDF

The 13th international conference on systems biology was held in Toronto, 19th--23rd August 2012. Here is a list of talks from platform sessions which I found especially interesting:

  • Modeling the regulatory diversity of human cancers (S. Nelander)
  • Tissue specific modeling of functional genomics data: from networks to understanding human disease (O. Troyanskaya)
  • An evaluation of methods for the modeling of transcription factor sequence specificity (M. T. Weirauch)
  • SEEK and find: data management for systems biology projects (O. Krebs)
  • Excerbt: next-generation knowledge extraction and hypothesis generation from massive amounts of biomedical literature (B. Wachinger)
  • Combining multiple biological domains using patient network fusion (B. Wang)
  • Combining many interaction networks to predict gene function and analyze gene lists (Q. Morris)
  • Assembling global maps of cellular function through integrative analysis of physical and genetic networks (R. K. Srivas)
  • iCAVE: immersive 3d visualization of biomolecular interaction network (Z. Gumus)
  • Systems-level insights from the global yeast genetic interaction network (C. Myers)
  • Monopoly systems edition: advance to GO collect $200 (T. Idekar) (*actually about NeXO, a network extracted ontology and functional enrichment)
  • Genome-scale metabolic models: a bridge between bioinformatics and systems biology (J. Nielsen)

The organizers came up with a nice social program, parts of it is depicted on images below. At opening ceremony Tanja Tagaq, an Inuit woman, performed a unique style of traditional throat singing, Amanda and Rasmus from Sweden made performance at first poster session, Serena Ryder entertained us at conference reception dinner. Shonen Knife, a Japanese punk band that opened Nirvana, played at second poster session at Hart House.

I attended workshops on Designing experiments using state of the art Bayesian global parameter search methodology (M. Goldstein), Introduction to the statistical inference or regulatory networks (F. Emmert-Streib), Imaging flow cytometry: a new view on systems biology (R. DeMarco). In addition to parallel sessions I also enjoyed special lectures and plenary sessions. A few of them are: Reading and writing omes (G. Church), Towards unification of genetic and hierarchy models of tumor heterogeneity (J. Dick), Interactome networks and human disease (M. Vidal), The genetics of individuals (B. Lehner), Synthetic genetic interaction analysis by high-throughput imaging to map cellular networks, Unraveling principles of gene regulation using thousands of designed promotor sequences (E. Segal), Systems biology applications of imaging flow cytometry (T. Galitski).

Last Updated on Sunday, 25 August 2013 21:41
 

@University of Toronto, Terrence Donnelly Centre for Cellular and Biomolecular Research (Part III)

E-mail Print PDF

So what have I been up to in recent weeks here at Toronto? Highlights include my first ride with famous American yellow school bus to a reception at ICSB12 conference, some sightseeing in Toronto city and a trip to Niagara Falls.

Besides, I have finished with data analysis of real-time yeast S. cerevisiae microscopy screens, an idea about it can be captured here. I am now starting with time series analysis and will probably have time to work on integration of phenomics data with genetic interaction and protein interaction data.

Recently a quantum optics research group here at UofT demonstrated a violation of Heisenberg's uncertainty principle and I was really excited about their work. "The quantum world is still full of uncertainty, but at least our attempts to look at it don't have to add as much uncertainty as we used to think!" ... and an easy reading to motivate you to learn more.

I have also come upon a nice real-world (I do not like this term) implementation of an argument based machine learning offered through classification module in CellProfiler Analyst package, participated in a discussion about Gaussian processes (intro, notes) at ccbr-stats meeting and much more. The Lab organized a farewell lunch for summer students only two weeks after my arrival to Toronto, as here and in US classes have already begun (after the Labour Day), I considered it as a welcome event :)

Below are images of Toronto CN Tower, Niagara Falls as seen from Skylon Tower and squirrels at UofT campus (Yes, one cannot miss numerous squirrels playing in parks at campus. A careful look should reveal four of them.), respectively.

Last Updated on Sunday, 25 August 2013 21:42
 

@University of Toronto, Terrence Donnelly Centre for Cellular and Biomolecular Research (Part I)

E-mail Print PDF

In the past few days I have settled in Toronto, Canada, where I will stay until October this year. As a graduate student I will be working at the University of Toronto, Terrence Donnelly Centre for Cellular and Biomolecular Research in the Charlie Boone's Lab.

My work will be mostly data analysis of S. cerevisiae screens by employing various statistical and machine learning methods to gain new knowledge about identification of yeast mutant strains with non-WT phenotype. Possibly I will also work on time-series analysis of actin patches in yeast cells to differentiate them. First impressions are great, I have already met some great people and am looking forward to meet some at the International Conference on Systems Biology (ICSB12), which is held in Toronto in the next week and have a fortunate opportunity to attend.

Last Updated on Sunday, 30 March 2014 17:21
 


Page 2 of 4