Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size

@Imperial College London, Department of Computing (Part II)

E-mail Print PDF

Recent days at Department of Computing, Imperial College London, were pleasant (though intense) and our efforts in data fusion produced some very good results.

Below are images taken at Imperial and nearby Chrome Web Lab, located in Science Museum. More about Google Chrome Web Lab experiment.

Last Updated on Sunday, 25 August 2013 21:31
 

@Imperial College London, Department of Computing (Part I)

E-mail Print PDF

I have just arrived to London, United Kingdom, where I will stay until the end of November this year. I will be working at Imperial College London, Department of Computing, Computational Network Biology Research Group led by Prof. Dr. Natasa Przulj. For this great opportunity I need to thank to my supervisor Prof. Dr. Blaz Zupan, Head of Bioinformatics Laboratory at UofLj.

My work here will be about network integration for disease classification, specifically inferring prediction models from heterogenous data sources through matrix factorization. More about it in the next days. For now you can check the Interactive map of the Diseasome (Below is an image showing a part of diseasome. Interested reader is referred to Barabasi's paper The Human Disease Network.) linked from the NYTimes article Redefining Disease, Genes and All.

I am very much looking forward to it :)

Last Updated on Sunday, 25 August 2013 21:40
 

@University of Toronto, The 13th International Conference on Systems Biology (Part II)

E-mail Print PDF

The 13th international conference on systems biology was held in Toronto, 19th--23rd August 2012. Here is a list of talks from platform sessions which I found especially interesting:

  • Modeling the regulatory diversity of human cancers (S. Nelander)
  • Tissue specific modeling of functional genomics data: from networks to understanding human disease (O. Troyanskaya)
  • An evaluation of methods for the modeling of transcription factor sequence specificity (M. T. Weirauch)
  • SEEK and find: data management for systems biology projects (O. Krebs)
  • Excerbt: next-generation knowledge extraction and hypothesis generation from massive amounts of biomedical literature (B. Wachinger)
  • Combining multiple biological domains using patient network fusion (B. Wang)
  • Combining many interaction networks to predict gene function and analyze gene lists (Q. Morris)
  • Assembling global maps of cellular function through integrative analysis of physical and genetic networks (R. K. Srivas)
  • iCAVE: immersive 3d visualization of biomolecular interaction network (Z. Gumus)
  • Systems-level insights from the global yeast genetic interaction network (C. Myers)
  • Monopoly systems edition: advance to GO collect $200 (T. Idekar) (*actually about NeXO, a network extracted ontology and functional enrichment)
  • Genome-scale metabolic models: a bridge between bioinformatics and systems biology (J. Nielsen)

The organizers came up with a nice social program, parts of it is depicted on images below. At opening ceremony Tanja Tagaq, an Inuit woman, performed a unique style of traditional throat singing, Amanda and Rasmus from Sweden made performance at first poster session, Serena Ryder entertained us at conference reception dinner. Shonen Knife, a Japanese punk band that opened Nirvana, played at second poster session at Hart House.

I attended workshops on Designing experiments using state of the art Bayesian global parameter search methodology (M. Goldstein), Introduction to the statistical inference or regulatory networks (F. Emmert-Streib), Imaging flow cytometry: a new view on systems biology (R. DeMarco). In addition to parallel sessions I also enjoyed special lectures and plenary sessions. A few of them are: Reading and writing omes (G. Church), Towards unification of genetic and hierarchy models of tumor heterogeneity (J. Dick), Interactome networks and human disease (M. Vidal), The genetics of individuals (B. Lehner), Synthetic genetic interaction analysis by high-throughput imaging to map cellular networks, Unraveling principles of gene regulation using thousands of designed promotor sequences (E. Segal), Systems biology applications of imaging flow cytometry (T. Galitski).

Last Updated on Sunday, 25 August 2013 21:41
 

@University of Toronto, Terrence Donnelly Centre for Cellular and Biomolecular Research (Part III)

E-mail Print PDF

So what have I been up to in recent weeks here at Toronto? Highlights include my first ride with famous American yellow school bus to a reception at ICSB12 conference, some sightseeing in Toronto city and a trip to Niagara Falls.

Besides, I have finished with data analysis of real-time yeast S. cerevisiae microscopy screens, an idea about it can be captured here. I am now starting with time series analysis and will probably have time to work on integration of phenomics data with genetic interaction and protein interaction data.

Recently a quantum optics research group here at UofT demonstrated a violation of Heisenberg's uncertainty principle and I was really excited about their work. "The quantum world is still full of uncertainty, but at least our attempts to look at it don't have to add as much uncertainty as we used to think!" ... and an easy reading to motivate you to learn more.

I have also come upon a nice real-world (I do not like this term) implementation of an argument based machine learning offered through classification module in CellProfiler Analyst package, participated in a discussion about Gaussian processes (intro, notes) at ccbr-stats meeting and much more. The Lab organized a farewell lunch for summer students only two weeks after my arrival to Toronto, as here and in US classes have already begun (after the Labour Day), I considered it as a welcome event :)

Below are images of Toronto CN Tower, Niagara Falls as seen from Skylon Tower and squirrels at UofT campus (Yes, one cannot miss numerous squirrels playing in parks at campus. A careful look should reveal four of them.), respectively.

Last Updated on Sunday, 25 August 2013 21:42
 

@University of Toronto, Terrence Donnelly Centre for Cellular and Biomolecular Research (Part I)

E-mail Print PDF

In the past few days I have settled in Toronto, Canada, where I will stay until October this year. As a graduate student I will be working at the University of Toronto, Terrence Donnelly Centre for Cellular and Biomolecular Research in the Charlie Boone's Lab.

My work will be mostly data analysis of S. cerevisiae screens by employing various statistical and machine learning methods to gain new knowledge about identification of yeast mutant strains with non-WT phenotype. Possibly I will also work on time-series analysis of actin patches in yeast cells to differentiate them. First impressions are great, I have already met some great people and am looking forward to meet some at the International Conference on Systems Biology (ICSB12), which is held in Toronto in the next week and have a fortunate opportunity to attend.

Last Updated on Sunday, 30 March 2014 17:21
 

A Randomized Algorithm for Linear Equations over Prime Fields in Clojure

E-mail Print PDF

Professor of Computer Science Dick Lipton from Georgia Tech has a few days ago posted an interesting post on his blog about a new way of solving systems of linear equations over prime fields. Solving linear systems of equations is a very well explored area and a new approach came as a pleasant surprise to me. The author of the proposed method is Prasad Raghavendra from Berekeley. He wrote a draft paper with the algorithm and the analysis in which he proves the soundness and completeness of the randomized algorithm he had proposed. The proof is not very complicated, you need to be familiar with some probabilistic bounds, specifically a generalization of the Bernstein inequality, the Hoeffding inequality, and foundations of finite fields.

Solving linear systems of equations is indeed an ancient problem and a new approach is therefore warmly welcomed. As early as 200 BC the Chinese has devised a clever method for solving 2-by-2 systems and there is evidence that they had developed a method essentially equivalent to Gaussian elimination (2000 years before Gauss came up with it!). Then Gabriel Cramer has in 1750 published a famous and simple O(n^3) rule for solving system of linear equations by computing matrix determinants and cofactors. Later in 1810, Carl Friedrich Gauss became interested in discovering the orbit of Pallas, the second-largest asteroid of the solar system. His work led him to a system of six linear equations in six unknowns. Gauss invented the method of Gaussian elimination which is still used today. The method consists of performing row-operations on the coefficient matrix to obtain an equivalent system of equations whose coefficient matrix is upper-triangular. This means that the last equation will involve only one unknown and can be easily solved. Substituting that solution into the second-to-last equation, one can then solve for another unknown. Gauss' work was continued by Wilhelm Jordan who convert a given system of linear equations to a triangular system which can be reduced to a diagonal one and then solved directly (Gauss-Jordan method) -- LU decomposition. We now have faster methods for general systems based on discoveries of Volker Strassen and almost linear time algorithms which require systems to have special structure.

Most readers probably know methods mentioned above very well, so let us do not spend any time discussing them. So, what is the trick in the new method? Prasad starts with the set of random vectors, V0. We expect that about half of them will satisfy the first linear equation. He then throws away all vectors that do not satisfy the first equation and call the remaining set S1. The idea is successively applied to further equations until we reach the set Sm. Vectors in Sm will then satisfy all linear equations and represent the solutions to the system. The problem with this winnowing-down process is that with high probability the set Si winnows down to zero before all equations in the system are satisfied. An exception is if we start with a set V0 of exponential size, but this is obviously too much.

Prasad came up with a brilliant answer that indeed we do not need a large initial V0 (he proved it can be of linear order in the number of variables) and still have a high probability of getting a solution to the complete system of linear equations if the system is feasible. The trick is to use an amplifier and a property of finite fields. Remember, the method is for systems of equations over finite field and the exploited property limits its usage. Over finite field with characteristics p, the trick is to form Vi (being of the same size as V0) as sums of random p+1 vectors from Si. Namely, summing p+1 solutions to the first p equations will give in finite field a (new) solution to the first p equations and possibly solve also (p+1)-th solution. See the draft paper for the details.

The new algorithm is easy to implement. Recently I am becoming quite a fan of Clojure, so I decided to implement the algorithm in this popular functional programming language that runs on JVM. I have always been very fond of functional programming languages (examples include Haskell, OCaml, F#, Lisp, ML, Erlang, Scheme etc.) and I am glad functional paradigm is gaining popularity also in "real-world" problems (I do not like this term, but ...). Functional programming is not more difficult than object oriented programming, it is just different. It is a style of programming that emphasizes first-class functions that are pure and is inspired by ideas from lambda calculus. I acknowledge that Matlab would be the most appropriate for this problem and imperative language implementation more understandable but it is more fun, isn't it :) ?

(ns clj_concurrency.ls-solver)
 
(declare n m q N)
 
(defn random-ints [X] (conj X (repeatedly n #(rand-int q))))
 
(defn mod-reduce [X] (map #(map (fn [x] (mod x q)) %1) X))
 
(defn satisfy? [Ax Sx bx]
(= (mod (apply + (map * Ax Sx)) q) bx))
 
(defn select [S Ai bi]
(def T #{})
(doseq [Sx S :when (satisfy? Ai Sx bi)]
(def T (conj T Sx)))
(into #{} T))
 
(defn recombine [T]
(def R #{})
(dotimes [_ N]
 (let [Q (take (+ q 1) (shuffle T))]
(def R (conj R (reduce #(map + %1 %2) Q)))))
(into #{} R))
 
(defn solve [A b]
(def S (apply concat (map random-ints (repeat N []))))
(dotimes [i m]
(let [T (select S (nth A i) (nth b i))]
 (when (empty? T)
 (throw (IllegalArgumentException. "System infeasible, T.")))
 (def S (recombine T))))
 (if (empty? S)
 (throw (IllegalArgumentException. "System infeasible, S.")))
(do
 (println "--------")
 (println "Solution" (distinct (mod-reduce S)))))
 
; toy example 1
(def n 3) ;number of variables
(def m 3) ;number of equations
(def q 3) ;finite field characteristics
(def N (* 30 n)) ;sampling size
(solve [[2 1 1] [1 1 1] [1 2 1]] [1 0 0]) ;--> (1 0 2)
 
; toy example 2
(def q 5)
(def m 3)
(def n 3)
(def N (* 30 n))
(solve [[1 1 1] [2 3 2] [1 3 4]] [1 4 4]) ;--> (1 2 3)

Last Updated on Wednesday, 30 January 2013 23:37
 

Undergraduate Thesis Defense

E-mail Print PDF

Yesterday I defended my undergraduate Thesis entitled A Matrix Factorization Approach for Inference of Prediction Models from Heterogeneous Data Sources (slo: Pristop matrične faktorizacije za gradnjo napovednih modelov iz heterogenih podatkovnih virov) at University of Ljubljana, Faculty of Computer and Information Science and Faculty of Mathematics and Physics.

With that Thesis I completed a four year interdisciplinary university program consisting of eight semesters of lectures and one year of Diploma thesis work in less than four years. Diploma leads to the degree of University dipl.ing. of Computer Science and Mathematics. I graduated summa cum laude with the average grade being 10.0 out of 10.0.

My brother Slavko took some pictures and shot the defense. The movie and pictures are available at zitnik.si site.

 

Abstract

Today we are witnessing rapid growth of data both in quantity and variety in all areas of human endeavour. Integrative treatment of these sources of information is a major challenge. We propose a new computation framework for inference of prediction models based on symmetric penalized matrix tri-factorization and intermediate strategy for data integration. Major advantages of the approach are an elegant mathematical formulation of the problem, an integration of any kind of data that can be expressed in matrix form, and high predictive accuracy.

 

We tested the effectiveness of the proposed framework on predicting gene annotations of social amoebae D. dictyostelium. The developed model integrates gene expressions, protein-protein interactions and known gene annotations. The model achieves higher accuracy than standard techniques of early and late integration, which combine inputs and predictions, respectively, and have in the past been favourably reported for their accuracy.

 

With the proposed approach we have also predicted that there is a set of genes of D. dictyostelium that may have a role in bacterial resistance and which were previously not associated with this function. Until now, only a handful of genes were known to participate in related bacterial recognition pathways. Expanding the list of such genes is crucial in the studies of mechanisms for bacterial resistance and can contribute to the research in development of alternative antibacterial therapy. Our predictions were experimentally confirmed in wet-lab experiments at the collaborating institution (Baylor College of Medicine, Houston, USA).

 

Note: Complete Thesis will be available for download after some aspects of the proposed approach will be presented in the scientific journal.

I am looking forward to starting my PhD studies in the autumn. Until then ... many new interesting projects ...

Last Updated on Sunday, 30 March 2014 16:36
 

CRY: Commitment Schemes

E-mail Print PDF

I have been following the Cryptography and Theory of Coding 2 course this semester (summer 12).

During the course I have been working on the project about Commitment schemes. Please find below attached the final report produced and a short presentation. Additionally, some homework solutions are attached as well. The language is Slovene, but it is mostly pure math with proofs, so it should not be too difficult to capture the main idea.

Please note these are reports and have not been subject to the usual scrutiny of published papers.

Last Updated on Wednesday, 30 January 2013 23:42
 

XRDS: Crossroads, The ACM Magazine for Students

E-mail Print PDF

XRDS, Crossroads is the flagship academic magazine for student members of the ACM. It is published quarterly, both in print and online for ACM student members. The publication was previously named Crossroads and has been running since 1994, edited and maintained voluntarily by students. The magazine is distributed to tens of thousands of students worldwide.

Issues focus on Computer Science and include various articles from highlights in new and important research to roundtable discussions, interviews and introductory overviews. In the magazine are published exciting pieces written by top leaders in the field as well as students who submit unsolicited work.

I highly recommend reading the issues! Here is the current one.

 

I am writing this since I have been selected as the Department Editor for this magazine as January 2012. I am looking forward to contribute to this great platform.

Last Updated on Wednesday, 30 January 2013 23:37
 

ML: Reliable Calibrated Probability Estimation in Classification

E-mail Print PDF

I followed a Machine Learning course this semester at the university (along with the one, offered by Stanford Uni). I have been working on a seminar studying Reliable probability estimation in classification using calibration. The final report is in the form of the scientific article, which is attached below.

Here is the abstract:

 

Estimating reliable class membership probabilities is of vital importance for many applications in data mining in which classification results are combined with other sources of information to produce decisions. Other sources include domain knowledge, outputs of other classifiers or example-dependent misclassification costs. We revisit the problem of classification calibration motivated by the issues of the isotonic regression and binning calibration. These methods can behave badly on small or noisy calibration sets, producing inappropriate intervals or boundary generalization. We propose an improvement of the calibration with isotonic regression and binning method by using bootstrapping technique, named boot-isotonic regression and boot-binning, respectively. Confidence intervals obtained by repeatedly calibrating the set sampled with replacement from the original training set are used for merging unreliable or too narrow calibration intervals. This method has been experimentally evaluated with respect to two calibration measures, several classification methods and several problem domains. The results show that the new method outperforms the basic isotonic regression and binning methods in most configurations.

 

Short presentation about work and final report:

 

Last Updated on Sunday, 25 August 2013 21:31
 

RTVSLO: Ugriznimo znanost (Bite the Science)

E-mail Print PDF

In November 2011, a show called Ugriznimo znanost (Bite the Science) has been shot at the Faculty of Computer and Information Science by the national public broadcasting organization (RTVSLO).

Bite the Science is a weekly series on the educational channel TV Slovenia. It is the series which aims to explain the science in a relaxed and humorous manner. The topics range from current events in science to accomplishments of Slovene scientists and scientific research in other parts of the world.

The episode scheduled for the last week of December has discussed programming. In it I am presenting my achievements at the Google Summer of Code program (Nimfa library) and discuss Orange, open-source data mining and visualization tool developed by Biolab, Bioinfomatics Laboratory. Further, researchers talk in the episode about smart house, Faculty Robot League, summer school of programming and mathematical data analysis used for predicting the results of the Eurosong competition.

Take a look (link to the episode is below)!

Shooting the Bite the Science

Relevant links:

Last Updated on Wednesday, 30 January 2013 23:38
 


Page 6 of 8