Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size
Marinka Zitnik

ML: Reliable Calibrated Probability Estimation in Classification

I followed a Machine Learning course this semester at the university (along with the one, offered by Stanford Uni). I have been working on a seminar studying Reliable probability estimation in classification using calibration. The final report is in the form of the scientific article, which is attached below.

Here is the abstract:

 

Estimating reliable class membership probabilities is of vital importance for many applications in data mining in which classification results are combined with other sources of information to produce decisions. Other sources include domain knowledge, outputs of other classifiers or example-dependent misclassification costs. We revisit the problem of classification calibration motivated by the issues of the isotonic regression and binning calibration. These methods can behave badly on small or noisy calibration sets, producing inappropriate intervals or boundary generalization. We propose an improvement of the calibration with isotonic regression and binning method by using bootstrapping technique, named boot-isotonic regression and boot-binning, respectively. Confidence intervals obtained by repeatedly calibrating the set sampled with replacement from the original training set are used for merging unreliable or too narrow calibration intervals. This method has been experimentally evaluated with respect to two calibration measures, several classification methods and several problem domains. The results show that the new method outperforms the basic isotonic regression and binning methods in most configurations.

 

Short presentation about work and final report:

 

 

Biomedical Entity Recognition with Deep Multi-Task Learning

We proposeĀ a deep multi-task learning approach for biomedical named entity recognition, which is a fundamental task in the mining of biomedical text data. The new approach saves human efforts and frees biomedical experts from the need to painstakingly generate entity features by hand. Furthermore, it achieves excellent performance using only a limited amount of training data. The approach can help scientists to better exploit knowledge buried in vast biomedical literature.

This is joint work with colleagues from Stanford University, University of Southern California, and University of Illinois Urbana-Champaign.

 

Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities

My review of machine learning for biomedical data integration is now available online in Information Fusion.

This paper is intended for computer scientists and biomedical researchers who are curious about recent developments and applications of machine learning to biology and medicine and its potential for advancing biomedicine given the vast amounts of heterogeneous data being generated today.

 

RTVSLO: Ugriznimo znanost (Bite the Science)

In November 2011, a show called Ugriznimo znanost (Bite the Science) has been shot at the Faculty of Computer and Information Science by the national public broadcasting organization (RTVSLO).

Bite the Science is a weekly series on the educational channel TV Slovenia. It is the series which aims to explain the science in a relaxed and humorous manner. The topics range from current events in science to accomplishments of Slovene scientists and scientific research in other parts of the world.

The episode scheduled for the last week of December has discussed programming. In it I am presenting my achievements at the Google Summer of Code program (Nimfa library) and discuss Orange, open-source data mining and visualization tool developed by Biolab, Bioinfomatics Laboratory. Further, researchers talk in the episode about smart house, Faculty Robot League, summer school of programming and mathematical data analysis used for predicting the results of the Eurosong competition.

Take a look (link to the episode is below)!

Shooting the Bite the Science

Relevant links:

 

Renaming: MF - Matrix Factorization Techniques for Data Mining

In this short post I would like to update you on some recent changes in the MF - Matrix Factorization Techniques for Data Mining Library.

Recently, the library MF - Matrix Factorization Techniques for Data Mining has been moved and renamed. It is now called Nimfa - A Python Library for Nonnegative Matrix Factorization or short, nimfa. The latest version with documentation, link to source code and working examples can be found at nimfa.biolab.si. Those of you still having old links, are kindly requested to update the new information - you will be automatically redirected to the new site if you visit old page.

Hope you like the new name.

Enjoy.

 


Page 9 of 25