Causal Inference By Regression
We use a simple example to illustrate that causal inference by regression is unreliable in realistic cases where measurement noises are present.
Introduction
Causal inference from observed data offers a cost-effective alternative to randomized control trials. One method, described by Peters et al. (2017) and Molak (2023), involves assessing the independence between regression residuals and variables.
Example Illustration
Consider the data generated by Equation ($\ref{eqn:example}$):
\[\begin{array}{cl} x & \sim & N(0,1) \\ y & = & x^3 + N(0,\sigma_y^2) \end{array} \label{eqn:example}\]where $N$ denotes the Gaussian random variable.
To infer causality, a Generalized Additive Model (GAM) regression of $y$ on $x$ is conducted, and the independence between the regression residual and $x$ is tested. The analysis is repeated in the opposite direction: regression of $x$ on $y$. The Hibert-Schmidt Independence Criterion (HSIC) assesses this independence (see this blog post), with a small p-value indicating dependence.
Figure 1 displays the result for $100$ data samples, generated by Equation ($\ref{eqn:example}$) with $\sigma_y=1$, confirming that $x$ causes $y$.
Effect of Noises
Real-world scenarios involve measurement noises in both $x$ and $y$. The impact of noise on causal inference is evaluated using Equation ($\ref{eqn:noise}$), varying $\sigma_x$ while keeping $\sigma_y=1$.
\[\begin{array}{cl} x_0 & \sim & N(0,1) \\ y_0 & = & f(x_0) \\ x & = & x_0 + N(0,\sigma_x^2) \\ y & = & y_0 + N(0, \sigma_y^2) \end{array} \label{eqn:noise}\]The following $R$ code performs the calculations. For each specified value of $\sigma_x$, it generates $100$ data points for both $x$ and $y$. Subsequently, it computes the p-values of the HSIC tests, assessing independence between variables and regression residuals. This entire analysis is repeated $100$ times.
regress <- function(func, n, sigma_x, sigma_y){
# generate data y=func(x) with Gaussian noises added to
# both x and y,
# then regress y on x and x on y and compute the correlation
# between the predictor and residuals
#
# Inputs:
# func: a univariate function
# n: number of data samples to be generated
# sigma_x: standard deviation of Gaussian noise added to x
# sigma_y: standard deviation of Gaussian noise added to y
x0 <- rnorm(n)
y0 <- func(x0)
y <- y0 + rnorm(n)*sigma_y
x <- x0 + rnorm(n)*sigma_x
# regression y ~ x
y_on_x <- mgcv::gam(y~s(x))
pv.y_on_x <- dHSIC::dhsic.test(x, y_on_x$residuals)
# regression x ~ y
x_on_y <- mgcv::gam(x~s(y))
pv.x_on_y <- dHSIC::dhsic.test(y, x_on_y$residuals)
return(list(data=tibble::tibble(x=x, y=y),
pv_yx = pv.y_on_x,
pv_xy = pv.x_on_y
)
)
}
n_boots <- 100
n_samples <- 100
sigma_y <- 1
p_y_x <- c()
p_x_y <- c()
index <- c()
sigma_x <- c()
for(s in c(0, 0.1, 0.25, 0.5, 0.75, 1)){
for(i in 1:n_boots){
r <- regress(function(.x) .x^3, n_samples, s, 1)
index <- append(index, i)
p_y_x <- append(p_y_x, r$pv_yx$p.value)
p_x_y <- append(p_x_y, r$pv_xy$p.value)
sigma_x <- append(sigma_x, s)
}
result <- tibble::tibble(
sigma_x = sigma_x,
index = index,
p_yx = p_y_x, #p-value for y~x
p_xy = p_x_y) #p-value for x~y
}
Figure 2 illustrates the significant influence of noise on causal inference. The distribution of $\Delta = pv_{y\sim x} - pv_{x\sim y}$ is depicted. Here, $pv_{y\sim x}$ represents the p-value from the Hibert-Schmidt Independence Criterion (HSIC) test between x and the residual of the regression of $y$ on $x$, while $pv_{x\sim y}$ is the p-value from the HSIC test between $y$ and the residual of the regression of $x$ on $y$.
For small $\sigma_x$, $\Delta>0$, suggesting a causal direction of $x \rightarrow y$. Conversely, when $\sigma_x$ is large, $\Delta < 0$, indicating a causal direction of $y\rightarrow x$. Notably, at $\sigma_x = 0.25$, the distribution of $\Delta$ centers around 0, signifying an inability to infer a causal direction clearly.
Figures 3 and 4 show the examples of $\sigma_x=0.25$ and $\sigma_x=1$, respectively.
Discussion and Summary
When both $x$ and $y$ contain noise, their relationship resembles a Directed Acyclic Graph (DAG) with a cofounder, as depicted in Figure 5. Both $x$ and $y$ are influenced by the confounder $x_0$, introducing noise components $N_x$ and $N_y$ to each, respectively.
Intuitively, if the inverse function $f^{-1}$ exists, there is no distinguishable causal direction between $x$ and $y$. However, when $f^{-1}$ is non-existent, a causal direction from $x$ to $y$ emerges. For example:
\[\begin{array}{cl} x_0 & \sim & N(0,1) \\ y_0 & = & x_0^2 \\ x & = & x_0 + N(0,\sigma_x^2) \\ y & = & y_0 + N(0, \sigma_y^2) \end{array}\]In Figure 6, the function that maps $x$ to $y$ is not invertible: $y$ as a function $x$ yields a single value for each $x$, while $x$ as a function of $y$ has two values for the same $y$. This suggests a causal direction of $x\rightarrow y$ based on function invertibility.
Despite the theoretical interest in causal inference by regression, its robustness is compromised when measurement noises are present, as demonstrated in this study. The causal inference results are significantly influenced by noise, underscoring the need to incorporate the data-generating mechanism for more robust causal inference.
References
Peters, J., Jazzing, D., Schölkopf, B. (2017). Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press
Molak, A. (2023). Causal Inference and Discovery in Python. Packt Publishing