SIDE ― A Subjective and Interactive Visual Data Exploration Tool

This JavaScript based tool is developed along with paper "A Tool for Subjective and Interactive Visual Data Exploration"

We are now actively developing new features, you might find the actual user interface is slightly different from this intro.

In this paper, we introduce a novel generic method for interactive visual exploration of high-dimensional data. In contrast to most visualization tools, it is not based on the traditional dogma of manually zooming and rotating data. Instead, the tool initially presents the user with an 'interesting' projection of the data and then employs data randomization with constraints to allow users to flexibly and intuitively express their interests or beliefs using visual interactions that correspond to exactly defined constraints. These constraints expressed by the user are then taken into account by a projection-finding algorithm to compute a new 'interesting' projection, a process that can be iterated until the user runs out of time or finds that constraints explain everything she needs to find from the data.

The tool operates in the following way:

We present the tool by means of two case studies, one controlled study on synthetic data and another on real census data.

Synthetic Dataset Case Study

In this case study, user will operate on a sub-sample (250 data points) of a synthetic dataset. The Synthetic Dataset consisting of 1000 10-dimensional data vectors of which dimensions 1-4 can be clustered into five clusters, dimensions 5-6 into four clusters involving different subsets of data points, and of which dimensions 7-10 are Gaussian noise. All dimensions have equal variance. The sub-sampled dataset is zscored. It takes about 20 seconds to load the case study.

UCI Adult Dataset Case Study

In this case study, user will use the tool to explore a real census dataset. The dataset is compiled from UCI Adult Dataset . It consists of 218 sub-sampled data points and nine attributes: "Age" (integer, 17-19), "Education" (integer, 1-16), "HoursPerWeek" (integer, 1-99), "EG_White" (binary, {"No" = 0, "Yes" = 1}), "EG_AsianPacIlander" (binary), EG_Black" (binary), "EG_Other" (binary), "Gender" (binary, {"Female" = 0, "Male" = 1}), and "Income" (binary, {"<50k" = 0, ">50k" = 1}), where "EG_" stands for "Ethnic Group". The sub-sampled dataset is zscored. It takes about 15 seconds to load the case study.