Unveiling Multidimensional Insights: Radviz Projection and Feature Importance in Regression

Radviz projection simplifies the representation of multidimensional data onto a 2D plane. In this note, we delve into the computation of Radviz projections and demonstrate their application in uncovering important features in multivariate regression analysis.

Radviz Projection

Consider an $M$-dimensional data point represented as $\bf{x} = [x_1, x_2, \ldots, x_M]$, projected onto a 2D plane as a vector $\mathbf{v}$. The Radviz projection uses a mass-spring model, akin to a mass connected to $M$ anchor points on a circle. These anchor points, $A_1$, $A_2$, $\ldots$, $A_M$, evenly distributed on a circle, correspond to the dimensions of the data. The strength $x_i$ of the $i$-th string connecting the mass to the anchor point $A_i$ determines the position of the mass, settling at the location $\mathbf{v}$. The calculation involves balancing forces across all springs, resulting in the 2D projection (see an illustration in Figure 1 for a four-dimensional data point).

Figure 1. An illustration of the Radviz project for a 4-dimensional data point x=[x₁,x₂, x₃, x₄]. A_i denote the anchor points evenly placed on the circle. Vector v denotes the position of the projected data. The value x_i (i=1,2,3,4) signifies the influence strength of anchor point A_i.

Mathematical Formulation

Let $\mathbf{A}_i$ denote the location of the anchor point $A_i$ on the circle. At equilibrium, the total forces on the mass at the location $\mathbf{v}$ is zero:

\[\sum_{i=1}^{M} (\mathbf{A}_i -\mathbf{v}) x_i = 0. \notag\]

This leads to the projection’s location formula:

\[\mathbf{v} = \frac{\sum_{i=1}^M \mathbf{A}_i x_i}{\sum_{i=1}^M x_i}.\label{eqn:radviz}\]

The resulting 2D projection location is a weighted average of the anchor points’ positions. Before calculation, variables $x_i$ are scaled to the range $[0, 1]$ to keep the mapped points in a confined region.

An Example

Data Overview

To demonstrate the application of Radviz, we consider a dataset with six columns $x_1$, $x_2$, $x_3$, $x_4$, $x_5$, and $y$. Figure 2 displays the scatter matrix plot of these variables.

Figure 2. Scatter matrix plots of variables.

Radviz and Parallel Coordinates Plots

Before constructing a regression model of $y$ on $x_i$, we apply the Radviz projection to $[ x_1, x_2, x_3, x_4, x_5]$ using Equation ($\ref{eqn:radviz}$). The resulting Radviz plot in Figure 3 reveals that the gradient of the $y$ value aligns with the line passing through anchor point $x_3$ and the circle’s center, indicating $x_3$ as a strong influencer. Conversely, anchor points $x_1$ and $x_5$ on the opposite end also show significance. Anchor points $x_2$ and $x_4$ appear less important because they are positioned orthogonal to the $y$ gradient of $y$.

Figure 3. Visualization of the data using Radviz plot, where color signifies the y values. The red dashed line passes through the anchor point associated with x₃ and the center of the circle.

As a reference, the data is plotted with the parallel coordinates plot in Figure 4, revealing trends but lacking information on the relative importance of variables $x_i$.

Figure 4. Parallel coordinates plot of the data.

Feature Importance in Regression Model

To assess feature importance, linear regression models of $y$ on $x_i$ are constructed for different predictor combinations. The results align with the Radviz findings.

With one predictor, the table below shows that $x_3$ has the best performance, i.e., the lowest mse (mean squared error), AIC, and BIC, and the highest $R^2$. $x_1$ is a close second.

One Predictor

Predictor	mse	AIC	BIC	$R^2$
$x_3$	0.69	3607	3617	0.31
$x_1$	0.73	3689	3700	0.27
$x_2$	0.91	4001	4011	0.095
$x_4$	0.91	4002	4012	0.094
$x_5$	0.99	4140	4151	0.0039

With two predictors, $(x_1, x_5)$ and $(x_1, x_3)$ are the top two combinations.

Two Predictors

Predictors	mse	AIC	BIC	$R^2$
$x_1$, $x_5$	0.63	3462	3478	0.37
$x_1$, $x_3$	0.65	3509	3525	0.35
$x_3$, $x_4$	0.65	3523	3539	0.35
$x_3$, $x_5$	0.68	3573	3589	0.33
$x_2$, $x_3$	0.68	3583	3599	0.32
$x_1$, $x_2$	0.69	3599	3615	0.31
$x_1$, $x_4$	0.70	3634	3650	0.30
$x_2$, $x_4$	0.72	3670	3686	0.28
$x_4$, $x_5$	0.90	3998	4014	0.097
$x_2$, $x_5$	0.91	4003	4019	0.094

Three Predictors

With three predictors, $x_1$, $x_3$, and $x_5$ is the best combination.

The regression model results affirm the insights gained from the Radviz projection.

Conclusion

Radviz projection, combined with regression, gives us a clear way to make sense of complex data. It helps us see important features and validates findings with regression models, making data interpretation more straightforward.

References

Hoffman, P., Grinstein, G., and Pinkney, D. Dimensional anchors: a graphic primitive for multidimensional multivariate information visualizations. Proceedings of the NPIVM 99, 1999

Brunsdon, C., Fotheringham, A., and Charlton, M, The RADVIZ Approach to Visualisation in An Investigation of Methods for Visualising Highly Multivariate Datasets