Radviz projection simplifies the representation of multidimensional data onto a 2D plane. In this note, we delve into the computation of Radviz projections and demonstrate their application in uncovering important features in multivariate regression analysis.

Radviz Projection

Consider an $M$-dimensional data point represented as $\bf{x} = [x_1, x_2, \ldots, x_M]$, projected onto a 2D plane as a vector $\mathbf{v}$. The Radviz projection uses a mass-spring model, akin to a mass connected to $M$ anchor points on a circle. These anchor points, $A_1$, $A_2$, $\ldots$, $A_M$, evenly distributed on a circle, correspond to the dimensions of the data. The strength $x_i$ of the $i$-th string connecting the mass to the anchor point $A_i$ determines the position of the mass, settling at the location $\mathbf{v}$. The calculation involves balancing forces across all springs, resulting in the 2D projection (see an illustration in Figure 1 for a four-dimensional data point).

Figure 1. An illustration of the Radviz project for a 4-dimensional data point x=[x1,x2, x3, x4]. Ai denote the anchor points evenly placed on the circle. Vector v denotes the position of the projected data. The value xi (i=1,2,3,4) signifies the influence strength of anchor point Ai.

Mathematical Formulation

Let $\mathbf{A}_i$ denote the location of the anchor point $A_i$ on the circle. At equilibrium, the total forces on the mass at the location $\mathbf{v}$​ is zero:

\[\sum_{i=1}^{M} (\mathbf{A}_i -\mathbf{v}) x_i = 0. \notag\]

This leads to the projection’s location formula:

\[\mathbf{v} = \frac{\sum_{i=1}^M \mathbf{A}_i x_i}{\sum_{i=1}^M x_i}.\label{eqn:radviz}\]

The resulting 2D projection location is a weighted average of the anchor points’ positions. Before calculation, variables $x_i$ are scaled to the range $[0, 1]$ to keep the mapped points in a confined region.

An Example

Data Overview

To demonstrate the application of Radviz, we consider a dataset with six columns $x_1$, $x_2$, $x_3$, $x_4$, $x_5$, and $y$. Figure 2 displays the scatter matrix plot of these variables.

Figure 2. Scatter matrix plots of variables.

Radviz and Parallel Coordinates Plots

Before constructing a regression model of $y$ on $x_i$, we apply the Radviz projection to $[ x_1, x_2, x_3, x_4, x_5]$ using Equation ($\ref{eqn:radviz}$). The resulting Radviz plot in Figure 3 reveals that the gradient of the $y$ value aligns with the line passing through anchor point $x_3$ and the circle’s center, indicating $x_3$ as a strong influencer. Conversely, anchor points $x_1$ and $x_5$ on the opposite end also show significance. Anchor points $x_2$ and $x_4$ appear less important because they are positioned orthogonal to the $y$ gradient of $y$.

Figure 3. Visualization of the data using Radviz plot, where color signifies the y values. The red dashed line passes through the anchor point associated with x3 and the center of the circle.

As a reference, the data is plotted with the parallel coordinates plot in Figure 4, revealing trends but lacking information on the relative importance of variables $x_i$.

Figure 4. Parallel coordinates plot of the data.

Feature Importance in Regression Model

To assess feature importance, linear regression models of $y$ on $x_i$ are constructed for different predictor combinations. The results align with the Radviz findings.

With one predictor, the table below shows that $x_3$ has the best performance, i.e., the lowest mse (mean squared error), AIC, and BIC, and the highest $R^2$. $x_1$ is a close second.

One Predictor

Predictor mse AIC BIC $R^2$
$x_3$ 0.69 3607 3617 0.31
$x_1$ 0.73 3689 3700 0.27
$x_2$ 0.91 4001 4011 0.095
$x_4$ 0.91 4002 4012 0.094
$x_5$ 0.99 4140 4151 0.0039

With two predictors, $(x_1, x_5)$ and $(x_1, x_3)$​ are the top two combinations.

Two Predictors

Predictors mse AIC BIC $R^2$
$x_1$, $x_5$ 0.63 3462 3478 0.37
$x_1$, $x_3$ 0.65 3509 3525 0.35
$x_3$, $x_4$ 0.65 3523 3539 0.35
$x_3$, $x_5$ 0.68 3573 3589 0.33
$x_2$, $x_3$ 0.68 3583 3599 0.32
$x_1$, $x_2$ 0.69 3599 3615 0.31
$x_1$, $x_4$ 0.70 3634 3650 0.30
$x_2$, $x_4$ 0.72 3670 3686 0.28
$x_4$, $x_5$ 0.90 3998 4014 0.097
$x_2$, $x_5$ 0.91 4003 4019 0.094

Three Predictors

With three predictors, $x_1$, $x_3$, and $x_5$ is the best combination.

The regression model results affirm the insights gained from the Radviz projection.

Conclusion

Radviz projection, combined with regression, gives us a clear way to make sense of complex data. It helps us see important features and validates findings with regression models, making data interpretation more straightforward.

References

Hoffman, P., Grinstein, G., and Pinkney, D. Dimensional anchors: a graphic primitive for multidimensional multivariate information visualizations. Proceedings of the NPIVM 99, 1999

Brunsdon, C., Fotheringham, A., and Charlton, M, The RADVIZ Approach to Visualisation in An Investigation of Methods for Visualising Highly Multivariate Datasets