Unveiling Multidimensional Insights: Radviz Projection and Feature Importance in Regression
Radviz projection simplifies the representation of multidimensional data onto a 2D plane. In this note, we delve into the computation of Radviz projections and demonstrate their application in uncovering important features in multivariate regression analysis.
Radviz Projection
Consider an M-dimensional data point represented as x=[x1,x2,…,xM], projected onto a 2D plane as a vector v. The Radviz projection uses a mass-spring model, akin to a mass connected to M anchor points on a circle. These anchor points, A1, A2, …, AM, evenly distributed on a circle, correspond to the dimensions of the data. The strength xi of the i-th string connecting the mass to the anchor point Ai determines the position of the mass, settling at the location v. The calculation involves balancing forces across all springs, resulting in the 2D projection (see an illustration in Figure 1 for a four-dimensional data point).
Mathematical Formulation
Let Ai denote the location of the anchor point Ai on the circle. At equilibrium, the total forces on the mass at the location v is zero:
M∑i=1(Ai−v)xi=0.This leads to the projection’s location formula:
v=∑Mi=1Aixi∑Mi=1xi.The resulting 2D projection location is a weighted average of the anchor points’ positions. Before calculation, variables xi are scaled to the range [0,1] to keep the mapped points in a confined region.
An Example
Data Overview
To demonstrate the application of Radviz, we consider a dataset with six columns x1, x2, x3, x4, x5, and y. Figure 2 displays the scatter matrix plot of these variables.
Radviz and Parallel Coordinates Plots
Before constructing a regression model of y on xi, we apply the Radviz projection to [x1,x2,x3,x4,x5] using Equation (1). The resulting Radviz plot in Figure 3 reveals that the gradient of the y value aligns with the line passing through anchor point x3 and the circle’s center, indicating x3 as a strong influencer. Conversely, anchor points x1 and x5 on the opposite end also show significance. Anchor points x2 and x4 appear less important because they are positioned orthogonal to the y gradient of y.
As a reference, the data is plotted with the parallel coordinates plot in Figure 4, revealing trends but lacking information on the relative importance of variables xi.
Feature Importance in Regression Model
To assess feature importance, linear regression models of y on xi are constructed for different predictor combinations. The results align with the Radviz findings.
With one predictor, the table below shows that x3 has the best performance, i.e., the lowest mse (mean squared error), AIC, and BIC, and the highest R2. x1 is a close second.
One Predictor
Predictor | mse | AIC | BIC | R2 |
---|---|---|---|---|
x3 | 0.69 | 3607 | 3617 | 0.31 |
x1 | 0.73 | 3689 | 3700 | 0.27 |
x2 | 0.91 | 4001 | 4011 | 0.095 |
x4 | 0.91 | 4002 | 4012 | 0.094 |
x5 | 0.99 | 4140 | 4151 | 0.0039 |
With two predictors, (x1,x5) and (x1,x3) are the top two combinations.
Two Predictors
Predictors | mse | AIC | BIC | R2 |
---|---|---|---|---|
x1, x5 | 0.63 | 3462 | 3478 | 0.37 |
x1, x3 | 0.65 | 3509 | 3525 | 0.35 |
x3, x4 | 0.65 | 3523 | 3539 | 0.35 |
x3, x5 | 0.68 | 3573 | 3589 | 0.33 |
x2, x3 | 0.68 | 3583 | 3599 | 0.32 |
x1, x2 | 0.69 | 3599 | 3615 | 0.31 |
x1, x4 | 0.70 | 3634 | 3650 | 0.30 |
x2, x4 | 0.72 | 3670 | 3686 | 0.28 |
x4, x5 | 0.90 | 3998 | 4014 | 0.097 |
x2, x5 | 0.91 | 4003 | 4019 | 0.094 |
Three Predictors
With three predictors, x1, x3, and x5 is the best combination.
The regression model results affirm the insights gained from the Radviz projection.
Conclusion
Radviz projection, combined with regression, gives us a clear way to make sense of complex data. It helps us see important features and validates findings with regression models, making data interpretation more straightforward.
References
Hoffman, P., Grinstein, G., and Pinkney, D. Dimensional anchors: a graphic primitive for multidimensional multivariate information visualizations. Proceedings of the NPIVM 99, 1999
Brunsdon, C., Fotheringham, A., and Charlton, M, The RADVIZ Approach to Visualisation in An Investigation of Methods for Visualising Highly Multivariate Datasets