Marinka Zitnik

Fusing bits and DNA

  • Increase font size
  • Default font size
  • Decrease font size
Home

GSoC & Orange: Matrix Factorization Techniques for Data Mining

This year I have applied for the Google Summer of Code, namely the Orange project.

Will see if I will be accepted. :)

Update 25.04.2011: Google has announced the results. My proposal has been accepted and am looking forward to start working. :)

Some links to articles in Slovenian news:

Project title: Matrix Factorization Techniques for Data Mining

Description: Matrix factorization is a fundamental building block for many of current data mining approaches and factorization techniques are widely used in applications of data mining. Our objective is to provide the Orange community with a uni fed and efficient interface to matrix factorization algorithms and methods. For that purpose we will develop a scripting library which will include a number of published factorization algorithms and initialization methods and will facilitate the combination of these to produce new strategies. Extensive documentation with working examples that will demonstrate real applications, commonly used benchmark data and visualization methods will be provided to help with the interpretation and comprehension of the results.

Main factorization techniques and their variations planned to be included in the library are: Bayesian decomposition (BD) together with linearly constrained and variational BD using Gibbs sampling, probabilistic matrix factorization (PMF), Bayesian factor regression modeling (BFRM), family of nonnegative matrix factorizations (NMF) including sparse NMF, non-smooth NMF, local factorization with Fisher NMF, least-squares NMF. Di fferent multiplicative and update algorithms for NMF will be analyzed which minimize LS error or generalized KL divergence. Further nonnegative matrix approximations (NNMA) with extensions will be implemented. For completeness algorithms such as NCA, ICA and PCA could be added to the library.

Here is proposal document.