Derivation and Comparison of Log-rank, Score, and Likelihood Ratio Tests in Survival Analysis
When ranking the reliability of two groups based on survival data, the statistical significance of any observed difference can be assessed using several methods:
- The Log-rank test: A non-parametric test based on the difference between observed and expected failures.
- The Cox Proportional Hazards Model: Methods utilizing the raw likelihood ratio test or the variation of the score function (the first derivative of the partial log-likelihood).
This note derives the test statistics for these three methods and compares theoretical results with statistical software output using numerical data.
Theoretical Derivations
Survival Data and Contigency Table
We compare Group 0 and Group 1 at various times $t_j$ using the following contingency table:
| Failures | Survivors | Number at Risk | |
|---|---|---|---|
| Group 0 | $d_{0j}$ | $n_{0j} - d_{0j}$ | $n_{0j}$ |
| Group 1 | $d_{1j}$ | $n_{1j}-d_{1j}$ | $n_{1j}$ |
| Total | $d_j = d_{01} + d_{1j}$ | $n_j - d_j$ | $n_j = n_{0j} + n_{1j}$ |
Null Hypothesis
The null hypothesis ($H_0$) is that the two groups are statistically identical.
Deriving the Log-rank Test
Under the null hypothesis ($H_0$), the observed failures $O_{01}=d_{0j}$ and $O_{1j}=d_{1j}$ follow a hypergeometric distribution.
Expectation and Variance
The expected number of failures, $d_{0j}$ and $d_{1j}$ are:
\[E_{0j} = n_{0j}\frac{d_j}{n_j}, \quad E_{1j} = n_{1j}\frac{d_j}{n_j}. \notag\]The variance of $d_{0j}$ and $d_{1j}$ is:
\[V_{j} = n_{1j}\frac{d_j}{n_j}\frac{n_j - d_j}{n_j}\frac{n_j-n_{1j}}{n_j -1} = \frac{d_j}{n_j}\frac{n_j - d_j}{n_j}\frac{n_{0j}n_{1j}}{n_j -1} \notag\]Test Statistic
Let $X_{ij} = O_{ij} - E_{ij}$ (where $O_{ij} = d_{ij}$) represent the difference between observed and expected failures. The sum over all times is $X_i = \sum_j X_{ij}$. Because failures at different time steps are independent, the variance of the total sum is $\text{Var}(X_i) = \sum_j V_j$.
The resulting test statistic $\lambda_{\text{Log-rank}}$ follows a chi-square distribution with 1 degree of freedom ($\chi_1^2$):
\[\lambda_\text{Log-rank} = \frac{X_i^2}{\text{Var}(X_i)}= \frac{\left(\sum_j (O_{ij} - E_{ij})\right)^2}{\sum_j V_j} \sim \chi_1^2. \notag\]Connection to Cox Proportional Hazards Model
In the Cox the null hypothesis is equivalent to $\beta = 0$, where $\beta$ is the parameter of the model.
Partial Likelihood
Partial likelihood is defined as
\[L(\beta) = \prod_{j=1}^{K}\frac{\exp\left(\beta \sum\limits_{i \in D_j}^{} x_i\right)}{\left[ \sum\limits_{l \in R(t_j)}\exp(\beta x_l)\right]^{d_j}}. \notag\]where
- $x_i$: the group the failure belongs to, 0 or 1.
- $d_j$: total number of failures at time $t_j$.
- $D_j$: the set of individuals who fail at time $t_j$.
- $R(t_j)$: risk set at time $t_j$ (all individuals alive and not censored just before $t_j$).
The partial log-likelihood is
\[\ell(\beta) = \sum_{j=1}^K\left[ \beta\sum\limits_{i\in D_j}x_i -d_j \log\left(\sum\limits_{l \in R(t_j)}\exp(\beta x_l)\right)\right]. \notag\]Score and Information Functions
The score function is defined as the first derivative of $\ell(\beta)$:
\[U(\beta) =\frac{\partial \ell(\beta)}{\partial \beta}.\notag\]Information $I(\beta)$ is the negative second derivative of $\ell(\beta)$:
\[I(\beta) = -\frac{\partial^2 \ell(\beta)}{\partial \beta^2} = -\frac{\partial U(\beta)}{\partial \beta}. \notag\]Deriving the Likelihood Ratio Test
The Likelihood Ratio Test compares the goodness-of-fit between two models: the null model where $\beta=0$ and the alternative model where $\beta \neq 0$.
The log likelihood ratio is defined as
\[\lambda_{\text{Likelihood-ratio}} = -2 \log\left[\frac{\sup_{\beta=0} L(\beta)}{\sup_{\beta\neq 0} L(\beta)} \right] \notag\]By Wilk’s Theorem, $\lambda_\text{Likelihood-ratio}$ follows a chi-square distribution with one degree of freedom ($\chi^2_1$).
And by definition,
\[\log \left[\sup_{\beta=0} L(\beta)\right] = \log L(0)= \ell(0) \notag\]and
\[\log \left[\sup_{\beta\neq 0} L(\beta)\right] = \ell(\hat{\beta}), \notag\]where $\hat{\beta}$ is the maximum likelihood estimate of $\beta$.
Using the properties of logarithms and the second-order Taylor expansion to approximate $\ell(\beta)$ around the null hypothesis value ($\beta=0$), the statistic becomes
\[\begin{align} \notag \lambda_\text{Likelihood-ratio} & = 2\left[\ell(\hat{\beta})-\ell(0)\right] \\ \notag & \approx 2 \left[ \ell(0) + \frac{\partial \ell(\beta)}{\partial \beta}\bigg\rvert_{\beta=0} \hat{\beta} +\frac{1}{2}\frac{\partial^2 \ell(\beta)}{\partial \beta^2}\bigg\rvert_{\beta=0} -\ell(0)\right] \\ \label{llr_expand} & = 2\left[U(0) \hat{\beta} -\frac{1}{2} I(0) \hat{\beta}^2\right] \end{align}\]At $\hat{\beta}$, the first derivative of the log likelihood is zeor: $U(\hat{\beta})=\frac{\partial \ell(\beta)}{\partial \beta}=0$. Expand $U(\hat{\beta})$ around $\beta=0$:
\[U(\hat{\beta}) \approx U(0) + \frac{\partial U(\beta)}{\partial \beta}\bigg\rvert_{\beta=0}\hat{\beta} = U(0) - I(0) \hat{\beta} = 0 \notag\]We have
\[\hat{\beta} = \frac{U(0)}{I(0)}.\label{beta_hat}\]Plug ($\ref{beta_hat}$) into ($\ref{llr_expand}$), we have the statistic $\lambda_\text{Likelihood-ratio}$:
\[\lambda_\text{Likelihood-ratio} = \frac{U^2(0)}{I(0)} \sim \chi_1^2.\notag\]The Score Function and Its Variance
Another way to test the null hypothesis that $\beta = 0$ is to evaluate the score function and its variance.
This method relies on two properties of the score function, $U(\beta)= \frac{\partial \ell(\beta)}{\partial \beta}$:
- The expected value of the score function is zero at the true parameter value.
- The variance of the score function is equal to the Expected Fisher Information.
We will show in the following that if the true value of $\beta = \beta_0 = 0$, mean of the score function at $\beta = \beta_0$ is zero; the observed $U(\beta)$ approximately follows a normal distribution; the variance of $U(0)$ and the p-value can be calculated.
To show that the mean of the score function is zero when $\beta$ is at its true value ($\beta_0$), we calculate the expectation over the data distribution:
\[\begin{align} \notag E_{\beta_0}(U(\beta_0)) &= \int U(\beta_0) L(x\mid \beta_0)dx \\ \notag & = \int \frac{\partial \ell }{\partial \beta} L(x\mid \beta_0) dx \\ \notag & = \int \frac{\partial L}{\partial \beta}\frac{L}{L} d x \\ \notag & = \int \frac{\partial L}{\partial \beta} dx \\ \notag & = \frac{\partial}{\partial \beta}\int L dx \\ \notag & = \frac{\partial 1}{\partial \beta} \\\notag & = 0, \notag \end{align} \notag\]where $E_{\beta_0}$ is the average over the data given the parameter $\beta=\beta_0$, and $x$ represents the data.
The variance of $U(\beta)$ is defined as
\[\text{Var}(U(\beta_0)) = E_{\beta_0}\left[\left(U(\beta_0)-E_{\beta_0}(U(\beta_0)\right)^2)\right] = E_{\beta_0}\left[U(\beta_0)^2\right] \notag\]Since the mean is zero, this simplifies to:
\[\text{Var}(U(\beta_0)) =E_{\beta_0}\left[U(\beta_0)^2\right]. \notag\]To relate this to the second derivative of the log-likelihood, we differentiate the identity $E_\beta[U(\theta)] = 0 $ with respect to $\beta$:
\[\frac{\partial }{\partial \beta} \int \frac{\partial \ln L}{\partial \beta} L dx = 0. \notag\]Applying the product rule to the terms inside the integral:
\[\int \frac{\partial^2 \ln L}{\partial \beta^2} L dx +\int\frac{\partial \ln L}{\partial \beta}\frac{\partial L}{\partial \beta} dx = 0 \notag\]We can rewrite the second term using the identity $\frac{\partial L}{\partial \beta} =L \frac{\partial \ln L}{\partial \beta}$:
\[E_\beta\left[\frac{\partial U(\beta)}{\partial \beta}\right] + E_\beta\left[U(\beta)^2\right] = 0. \notag\]Therefore, the variance of the score is:
\[\text{Var}(U(\beta_0)) = E_{\beta_0}\left[U(\beta_0)^2\right] = -E_{\beta_0}\left[\frac{\partial U(\beta)}{\partial \beta}\right]\bigg\rvert_{\beta_0}. \notag\]The observed score $U(0)$ is from an approximately normal distribution
\[U(0) \sim \mathcal{N}(0, \text{Var}(U(0))). \notag\]To test the null hypothesis, we construct a chi-square statistic with one degree of freedom by squaring the standardized score:
\[\lambda_\text{Score} = \frac{U^2(0)}{\text{Var}(U(0))} \sim \chi_1^2. \notag\]Calculations of $U(0)$, Variance of $U(0)$, and $I(0)$ From Survival Data
For the Cox proportional hazards model, the score is
\[U(\beta) =\frac{\partial \ell}{\partial \beta} = \sum_{j=1}^K\left[\sum\limits_{i\in D_j}x_i - d_j\frac{\sum\limits_{l\in R(t_j)}x_l \exp(\beta x_l)}{\sum\limits_{l \in R(t_j)}\exp(\beta x_l)}\right] \notag\]To evaluate the score under the null hypothesis ($H_0: \beta=0$), we substitute $\beta=0$ into the equation. For a binary covariate where $x\in {0,1}$:
- $\sum_{l\in R(t_j)}\exp(0)= n_j$ (the total number at risk)
- $\sum_{l\in R(t_j)}x_l \exp(0) = h_{1j}$ ( the number at risk in Group 1)
- $\sum_{i\in D_j}x_i = d_{1j}$ (the observed failures in Group 1)
The score at $\beta=0$ simplifies to:
\[U(0) =\sum_{j=1}^K \left[d_{1j}- d_j\frac{n_{1j}}{n_j} \right] = \sum_{j=1}^K (O_j - E_j). \notag\]The Fisher Information is the negative second derivative of the log-likelihood. To find $I(0)$, we calculate:
\[\begin{align} \notag \frac{\partial U(\beta)}{\partial \beta}\bigg\rvert_{\beta=0} & = - \sum_{j=1}^K d_j \frac{\left(\sum\limits_{l\in R(t_j)}x_l^2\exp(\beta x_l)\right)\left(\sum\limits_{l \in R(t_j)}\exp(\beta x_l)\right) - \left(\sum\limits_{l\in R(t_j)}x_l \exp(\beta x_l)\right)^2}{\left(\sum\limits_{l \in R(t_j)}\exp(\beta x_l)\right)^2}\Bigg\rvert_{\beta=0} \\ \notag & = -\sum_{j = 1}^K d_j\left[\frac{n_{1j}}{n_j} -\left(\frac{n_{1j}}{n_j}\right)^2\right] \\ \notag & = -\sum_{j=1}^K d_j \left(\frac{n_{1j}}{n_j}\right)\left(\frac{n_j - n_{1j}}{n_j}\right) \\ \notag & = - \sum_{j=1}^K d_j \left(\frac{n_{0j}}{n_j}\right)\left(\frac{n_{1j}}{n_j}\right) \notag \end{align}.\notag\]At $\beta=0$
- $\sum\limits_{l\in R(t_j)}x_l\exp(\beta x_l)=n_{1j}$
- $\sum\limits_{l\in R(t_j)}x_l^2\exp(\beta x_l)=n_{1j}$
- $\sum\limits_{l \in R(t_j)}\exp(\beta x_l)=n_j$
- When multiple failures occur at time $t_j$, all the ties have the same risk set. This is called the Berslow method.
$E_\beta\left[\beta\frac{\partial U(\beta)}{\partial \beta}\bigg\rvert_{\beta=0}\right] \approx \frac{\partial U(\beta)}{\partial \beta}\bigg\rvert_{\beta=0}$ because $\frac{\partial U(\beta)}{\partial \beta}\bigg\rvert_{\beta=0}$ is the sample estimate that is used to approximate the mean over the data distribution.
We use the observed information $I(0)$ as a sample esitmate of the variance:
\[I(0) = - \frac{\partial U(\beta)}{\partial \beta}\bigg\rvert_{\beta=0} \approx \text{Var}(U(0)) = \sum_j W_j,\notag\]where \(W_j = d_j \left(\frac{n_{0j}}{n_j}\right)\left(\frac{n_{1j}}{n_j}\right). \notag\)
The table below summarizes the statistics of the three methods to test for the difference between the two groups. Note that while the numerator $U^2(0)$ is identical for all three, the denominators differ slightly between the log-rank test and the Cox-based models.
| Test Method | Statistic ($\chi_1^2$) | Denominator Logic |
|---|---|---|
| Log-rank Test | $\lambda_{\text{Log-rank}} = \frac{U^2(0)}{\sum_j V_j}$ | Hypergeometric Variance: $V_j = \frac{d_j(n_j-d_j)n_{0j}n_{1j}}{n_j^2(n_j-1)}$ |
| Likelihood Ratio Test | $\lambda_{\text{Likelihood-ratio}} = \frac{U^2(0)}{\sum_j W_j}$ | Observed Information: $W_j = \frac{d_j n_{0j} n_{1j}}{n_j^2}$ |
| Score Test | $\lambda_{\text{Score}} = \frac{U^2(0)}{\sum_j W_j}$ | Equivalent to the Likelihood Ratio test at $\beta=0$ |
The Cox model-based tests yield identical results when using the Breslow approximation for ties. The log-rank test uses a slightly more conservative variance ($V_j$) derived from the hypergeometric distribution.
By calculating these statistics and comparing them against the chi-square distribution, we can determine the p-value and decide whether to reject the null hypothesis of no difference between the groups.
Results
This section summarizes the results of the survival analysis comparing two groups. We provide the theoretical formulas, a step-by-step numerical calculation using a sample dataset, and verification using statistical software.
Both the Log-rank and Likelihood Ratio tests follow a chi-square distribution with one degree of freedom ($\chi_1^2$) under the null hypothesis.
| Test | Statistic | P-value |
|---|---|---|
| Log-rank test | \(\lambda_\text{Log-rank} = \frac{U^2(0)}{\sum_j V_j}\sim \chi_1^2\) | $ P(x \geq\lambda_\text{Log-rank})$ |
| Likelihood ratio test | \(\lambda_\text{Likelihood-ratio} = \frac{U^2(0)}{I(0)}= \frac{U^2(0)}{\sum_j W_j} \sim \chi_1^2\) | $ P(x \geq\lambda_\text{Likelihood-ratio})$ |
The components of the statistics are calculated based on the failures and risk sets at each distinct event time $j$.
| Parameter | Formula |
|---|---|
| $U(0)$ | \(\sum_{j=1} \left[d_{ij}- d_j\frac{n_{ij}}{n_j} \right]\), $i=0, 1$ |
| $\sum_j V_j$ | \(\sum_j \frac{d_j}{n_j}\frac{n_j - d_j}{n_j}\frac{n_{0j}n_{1j}}{n_j -1}\) |
| $I(0)=\sum_j W_j$ | \(\sum_{j=1} d_j \left(\frac{n_{0j}}{n_j}\right)\left(\frac{n_{1j}}{n_j}\right)\) |
Numerical Calculation
To compare the methods, we generated survival data for two groups ($x=0$ and $x=1$) with four subjects each.
| Subject | Group ($x$) | Time ($t$) | Status |
|---|---|---|---|
| 1 | 0 | 2 | 1 (Event) |
| 2 | 0 | 4 | 1 (Event) |
| 3 | 0 | 4 | 1 (Event) |
| 4 | 0 | 7 | 0 (Censored) |
| 5 | 1 | 3 | 1 (Event) |
| 6 | 1 | 4 | 1 (Event) |
| 7 | 1 | 6 | 1 (Event) |
| 8 | 1 | 8 | 0 (Censored) |
There are four distinct event times in the dataset: $t={2, 3, 4, 6}$. The counts at these event times are as follows.
| Time ($t_j$) | Group 0 Risk ($n_{0j}$) | Group 1 Risk ($n_{1j}$) | Total Risk ($n_j$) | Group 0 Events ($O_{0j}$) | Group 1 Events ($O_{1j}$) | Total Events ($d_j$) |
|---|---|---|---|---|---|---|
| 2 | 4 | 4 | 8 | 1 | 0 | 1 |
| 3 | 3 | 4 | 7 | 0 | 1 | 1 |
| 4 | 3 | 3 | 6 | 2 | 1 | 3 |
| 6 | 1 | 2 | 3 | 0 | 1 | 1 |
Theoretical Calculations
Using the counts from the event-time breakdown, we calculate the observed and expected failures along with the variance components.
| Time ($t_j$) | Observed ($O_{0j}$) | Observed ($O_{1j}$) | Expectation ($E_{0j}$) | Expectation ($E_{1j}$) | $V_j$ Log-rank | $W_j$ Likelihood Ratio |
|---|---|---|---|---|---|---|
| 2 | 1 | 0 | 1/2 | 1/2 | 1/4 | 1/4 |
| 3 | 0 | 1 | 3/7 | 4/7 | 12/49 | 12/49 |
| 4 | 2 | 1 | 3/2 | 3/2 | 9/20 | 3/4 |
| 6 | 0 | 1 | 1/3 | 2/3 | 2/9 | 2/9 |
| Total | 3 | 3 | $\frac{58}{21}=2.7619$ | $\frac{68}{21}=3.2381$ | $\frac{5147}{4410}=1.167$ | $\frac{647}{441}=1.467$ |
Log-rank Test Result
\[\lambda_{\text{Log-rank}} = \frac{\left(\sum_j \left(O_{1j}-E_{1j}\right)\right)^2}{\sum_j V_j} = \frac{\left(\sum_j \left(O_{0j}-E_{0j}\right)\right)^2}{\sum_j V_j} = \frac{0.2381^2}{1.167} = 0.0486. \notag\]The p-value = 0.8255.
Likelihood Ratio Test Result
\[\lambda_{\text{Likelihood-ratio}} = \frac{\left(\sum_j \left(O_{1j}-E_{1j}\right)\right)^2}{\sum_j W_j} = \frac{\left(\sum_j \left(O_{0j}-E_{0j}\right)\right)^2}{\sum_j W_j} = \frac{0.2381^2}{1.467} = 0.0386 \notag\]The p-value = 0.8442.
Statistical Software Verification
We verified these calculations using the survival package in R.
library(survival)
data <- read_csv("data.csv", show_col_types = FALSE)
data
Log-rank Test with survdiff Function
with(data, survdiff(Surv(Time, Status)~ Group))
| N | Observed | Expected | (O-E)^2/E | (O-E)^2/V | |
|---|---|---|---|---|---|
| Group=0 | 4 | 3 | 2.76 | 0.0205 | 0.0486 |
| Group=1 | 4 | 3 | 3.24 | 0.0175 | 0.0486 |
Chisq= 0.04857 on 1 degrees of freedom, p= 0.8256.
Lieklihood Ratio Test with coxph Function
cox.model <- with(data, coxph(Surv(Time, Status) ~ Group,ties = 'breslow'))
summary(cox.model)$logtest
| coef | exp(coef) | se(coef) | z | p | |
|---|---|---|---|---|---|
| Group | -0.162 | 0.851 | 0.823 | -0.196366 | 0.844324 |
Likelihood ratio test=0.0385 on 1 df, p=0.844, n= 8, number of events= 6.
The theoretical derivations and software calculations align almost perfectly.
- The Log-rank test produced a $\chi^2$ of 0.0486 ($p=0.8256$).
- The Likelihood Ratio test produced a $\chi^2$ of 0.0385 ($p=0.844$).
Conclusion
This analysis demonstrates the mathematical equivalence between the non-parametric log-rank test and the likelihood test of the Cox Proportional Hazards model.
The difference between the two tests arises solely from the denominator. The Log-rank test uses a variance estimate derived from the hypergeometric distribution, while the likelihood ratio test uses the Fisher Information estimated via the Breslow method.
While both tests yield nearly identical results in practice, they differ significantly in conceptual straightforwardness.
The likelihood ratio and score tests are conceptually easier to derive. They rely on standard calculus. In contrast, the log-rank test relies on counting progress. Proving its asymptotic properties requires the Martingale Central Limit Theorem. This framework is significantly more complex. Finally, the likelihood framework is more flexible, allowing for easy expansion into multivariate models.
References
Likelihood ratio test. In Wikipedia. https://en.wikipedia.org/wiki/Likelihood-ratio_test
Fisher Information. In Wikipedia. https://en.wikipedia.org/wiki/Fisher_information
Wilk’s Theorem. In Wikipedia. https://en.wikipedia.org/wiki/Wilks%27_theorem