Yet another great news concering my (little) involvement with Google. I have written few weeks ago about being accepted to the Google Summer of Code 2011 with the project on matrix factorizations techniques in data mining for the Orange platform.

Nevertheless, Google has announcedGoogle Anita Borg Scholarship Recipients and Finalists, a Scholarship for which I have applied this year and I am among 147 undergraduate and graduate students worldwide being chosen. Just for clarification - this is completely unrelated to the GSoC (the only common denominator being the Google itself), the scholarship however being awarded based on the strength of candidates’ academic performance, leadership experience and demonstrated passion for computer science.

Scholars from Europe have Scholars' Retreat at European Google centre at Zurich in June and I am very much looking forward to this event to meet some fascinating people. The retreat will include workshops, speakers, panelists, breakout sessions and social activities scheduled over a couple of days.

(Official Google Blog with published results of the Scholars's selection process) link

(Official Google Students Blog with published announcement of the Scholars's) link

Google Team has shared a new Chrome experiment, called the WebGL Globe, namely the visualization platform for geographic data that runs in WebGL enabled browsers - Chrome, Firefox. (Check http://www.doesmybrowsersupportwebgl.com/ if your browser supports the WebGL standard).

To speed up the visualization of 3D geometry, they have used vertex shader and took advantage of GLSL with two fragment shaders. 3D data spikes are drawn with Three.js, JS library for building lightweight 3D graphics.

I have embedded simple globe showing Google search traffic. Try it or try more examples that shipped with this cool open source project. Or create your own globe using the JSON data format.

Project title: Matrix Factorization Techniques for Data Mining

Description: Matrix factorization is a fundamental building block for many of current data mining approaches and factorization techniques are widely used in applications of data mining. Our objective is to provide the Orange community with a unifed and efficient interface to matrix factorization algorithms and methods. For that purpose we will develop a scripting library which will include a number of published factorization algorithms and initialization methods and will facilitate the combination of these to produce new strategies. Extensive documentation with working examples that will demonstrate real applications, commonly used benchmark data and visualization methods will be provided to help with the interpretation and comprehension of the results.

Main factorization techniques and their variations planned to be included in the library are: Bayesian decomposition (BD) together with linearly constrained and variational BD using Gibbs sampling, probabilistic matrix factorization (PMF), Bayesian factor regression modeling (BFRM), family of nonnegative matrix factorizations (NMF) including sparse NMF, non-smooth NMF, local factorization with Fisher NMF, least-squares NMF. Different multiplicative and update algorithms for NMF will be analyzed which minimize LS error or generalized KL divergence. Further nonnegative matrix approximations (NNMA) with extensions will be implemented. For completeness algorithms such as NCA, ICA and PCA could be added to the library.

Just today I returned from the Portugal, Lisbon, where Euroskills 2010 has been held. In the Office ICT Category (Informatics) I won two silver medals, one as a Project Manager of the team and second together with Slovenian team (members: me, Slavko Zitnik, Miha Longino, Peter Virant).

The contest was organized very well, we were accomodated in the part of the Lisbon, where Expo was held, and there was also the competition. In fifty trades there were 500 competitors and a few hundred of experts, observers and guests.

When I started writing this article, the countdown said: ES 2010 in 17 days 0 hours 9 minutes, and 4 seconds.

Since September we (the team of four) have actively devoted ourselves to the Office ICT Test Project, draft of the project which we will have to implement in Lisbona. The project has progressed well, we gain some expertise in vast majority (if not all, we will see, depends on the final version of TP, which will be known not before as at the beginning of the compettion) of the features. Basically the task is to set up the entire ICT system of fictive international corporation from functional and network design (IPv4, IPv6, NATv6, DHCPv6, security measures, OSPF & EIGRP routing, routing destribution, dynamic VLANs, network authentication with RADIUS, wifi etc.) to common bussines services (VPN, mail, DA, AD, DNS, DHCP, VoIP Asterisk, AAA, MDT, WSUS, SNMP Nagios, virtualization etc.) and company's portfolio, documentation maintenance, cost and time management etc.

It has been a while since SP 2010 has been released and since I have been developing quite a lot on SP 2007 and am now exploring 2010 version, I feel moral duty :) to write something about this latest version as well. So I decided to write something about embedded business intelligence, which is not mentioned very often but I think it will become one of the integral part of SP.

First of all, embedded BI is a result of incorporation of the Performance Point Server as Performance Point Services. Before SP 2010, Performance Point Server was independant and separate product but now it isn't anymore. New Services enrich SP with KPI indices, scorecarding, matrices and much more that can easily be rendered as dashboards, charting webparts or consumed through Visio Services or used by numerous improvements to Excel Services.

Together with Performance Point Services, you get a Dashboard Designer, with which is possible to gui interface or hook up the data you you want to drive your scorecard or KPI off (e.g. conect it to SQL Service Anaylsis, easily create KPIs in designer and render them as webparts in sharepoint). As data mining and OLAP are getting more and more important BI technologies it is necessary to stress their benefits. First you are in control of what is happening with the data you have: data sources can be configured by admins nd dashboards by department business units. Furthermore it is very easy to slice and dice the data to get the answers you are looking for. One among numerous new features is the decomposition tree - we can drill into key notes and get more details in a very visual graphical way, which enriches the models from which we pull the data, so users can get quality answers quickly.

Few improvements are included in the Excel Services allowing users to publish and share bits of or whole workbooks, but still the owner has the total control of the services users consume.

There is another novelty worth mentioning, namely the Visio Services. It is simply a matter of creating a graph (e.g. network diagram or graph of used resources on the project) that is data bound and which then checkis real present data (e.g. changing pictures or states in accordance with the progress of the project). It is more of creating a simple user-friendly workflow that updates itself than a diagram. Do not confuse this with another powerful tool WWF - Windows Workflow Foundation to create complex workflows in .NET and VS.

I've been following a course on Statistical Aspects of Data Mining lately, which is not what I will write about, but this article got inspiration from it. The software environment being used in this course is the R programming language, which is used for statistical computing and graphics (it is available for Windows, Linux and Mac as part of the GNU project). If you download it from R's website, you get it with the command line interpreter, of course there are some IDEs as well, such as Rcmdr or Tinn-R. The capabilities of R are extended with numerous user-submitted packages - for the animation of the Mandelbrot Set at least the following libraries are needed: spam, fields, bitops, caTools - all are freely available at R's website. The R is influenced by S and Scheme, but I'wont go into details, as there is plenty information about it on the web.

I tried to draw the classic Mandelbrot Set (the basic code for it is available here), which is just iterating through the formula, , where is a complex parameter, starting at . The Mandelbrot Set is defined as set of all points, such that the sequence, got by iteration, does not escape to infinity. Some of the set's properties are: local connectivity, self-similarity, correspondence with the structure of Julia Set etc. Very simple formula, which gives fascinating results. In the R language animation you can observe the main cardioid, period bulbs, hyperbolic components.