17 生存模型

Survival Models

本节译者：于娇

初次校审：李君竹

二次校审：李君竹（Claude 辅助）

Survival models apply to animals and plants as well as inanimate objects such as machine parts or electrical components. Survival models arise when there is an event of interest for a group of subjects, machine component, or other item that is

certain to occur after some amount of time,
but only measured for a fixed period of time, during which the event may not have occurred for all subjects.

生存模型适用于动物和植物，也适用于机器零件或电子元件等无生命物体。生存模型出现的情况是当一组研究对象、机器组件或其他物品存在感兴趣的事件，该事件：

确定会在一定时间后发生，
但只在固定时间段内进行测量，在此期间并非所有对象都可能发生该事件。

For example, one might wish to estimate the the distribution of time to failure for solid state drives in a data center, but only measure drives for a two year period, after which some number will have failed and some will still be in service.

例如，我们可能希望估计数据中心中固态硬盘的故障时间分布，但只测量两年时间内的硬盘，在此之后，一些硬盘会发生故障，一些仍在运行。

Survival models are often used comparatively, such as comparing time to death of patients diagnosed with stage one liver cancer under a new treatment and a standard treatment (pure controls are not allowed when there is an effective existing treatment for a serious condition). During a two year trial, some patients will die and others will survive.

生存模型通常用于比较分析，例如比较诊断为一期肝癌患者在新疗法和标准疗法下的死亡时间（当存在有效的现有治疗方法时，不允许使用纯对照组）。在两年的试验期间，一些患者会死亡，另一些会存活。

Survival models may involve covariates, such as the factory at which a component is manufactured, the day on which it is manufactured, and the amount of usage it gets. A clinical trial might be adjusted for the sex and age of a cancer patient or the hospital at which treatment is received.

生存模型可能涉及协变量，如零件制造的工厂、制造日期以及使用量等。临床试验可能需要根据癌症患者的性别和年龄或接受治疗的医院进行调整。

Survival models come in two main flavors, parametric and semi-parametric. In a parametric model, the survival time of a subject is modeled explicitly using a parametric probability distribution. There is a great deal of flexibility in how the parametric probability distribution is constructed. The sections below consider exponential and Weibull distributed survival times.

生存模型主要有两种类型：参数模型和半参数模型。在参数模型中，对象的生存时间使用参数概率分布进行显式建模。参数概率分布的构建方式具有很大的灵活性。下面的章节将考虑指数分布和威布尔分布的生存时间。

Rather than explicitly modeling a parametric survival probability, semi-parametric survival models instead model the relative effect on survival of covariates. The final sections of this chapter consider the proportional hazards survival model.

半参数生存模型不是显式建模参数生存概率，而是对协变量对生存的相对影响进行建模。本章的最后几节将讨论比例风险生存模型。

17.1 Exponential survival model

指数生存模型

The exponential distribution is commonly used in survival models where there is a constant risk of failure that does not go up the longer a subject survives. This is because the exponential distribution is memoryless in sense that if \(T \sim \textrm{exponential}(\lambda)\) for some rate \(\lambda > 0,\) then

指数分布常用于具有恒定失效风险的生存模型中，失效风险不会随着对象生存时间的延长而增加。这是因为指数分布具有无记忆性，即如果对于某个速率参数 \(\lambda > 0\)，有 \(T \sim \textrm{exponential}(\lambda)\)，那么

\[\begin{equation*} \Pr[T > t] = \Pr[T > t + t' \mid T > t']. \end{equation*}\] If component survival times are distributed exponentially, it means the distribution of time to failure is the same no matter how long the item has already survived. This can be a reasonable assumption for electronic components, but is not a reasonable model for animal survival.

如果组件的生存时间呈指数分布，这意味着无论物品已经存活了多长时间，失效时间的分布都是相同的。这对于电子元件来说可能是一个合理的假设，但对于动物生存来说并不是一个合理的模型。

The exponential survival model has a single parameter for the rate, which assumes all subjects have the same distribution of failure time (this assumption is relaxed in the next section by introducing per-subject covariates). With the rate parameterization, the expected survival time for a component with survival time represented as the random variable \(T\) is

指数生存模型具有单个速率参数，假设所有对象具有相同的失效时间分布（这个假设在下一节通过引入每个对象的协变量得以放宽）。在速率参数化下，以随机变量 \(T\) 表示生存时间的组件的期望生存时间为

\[\begin{equation*} \mathbb{E}[T \mid \lambda] = \frac{1}{\lambda}. \end{equation*}\] The exponential distribution is sometimes parameterized in terms of a scale (i.e., inverse rate) \(\beta = 1 / \lambda\).

指数分布有时用尺度参数（即速率的倒数）\(\beta = 1 / \lambda\) 来参数化。

The data for a survival model consists of two components. First, there is a vector \(t \in (0, \infty)^N\) of \(N\) observed failure times. Second, there is a censoring time \(t^{\textrm{cens}}\) such that failure times greater than \(t^{\textrm{cens}}\) are not observed. The censoring time assumption imposes a constraint which requires \(t_n < t^{\textrm{cens}}\) for all \(n \in 1{:}N.\) For the censored subjects, the only thing required in the model is their total count, \(N^\textrm{cens}\) (their covariates are also required for models with covariates).

生存模型的数据包含两个组成部分。首先，有一个包含 \(N\) 个观测失效时间的向量 \(t \in (0, \infty)^N\)。其次，有一个删失时间 \(t^{\textrm{cens}}\)，使得大于 \(t^{\textrm{cens}}\) 的失效时间无法观测到。删失时间假设施加了一个约束条件，要求对所有 \(n \in 1{:}N\) 都有 \(t_n < t^{\textrm{cens}}\)。对于删失的对象，模型中唯一需要的是它们的总数 \(N^\textrm{cens}\)（对于有协变量的模型，还需要它们的协变量）。

The model for the observed failure times is exponential, so that for \(n \in 1{:}N,\)

观测失效时间的模型是指数的，因此对于 \(n \in 1{:}N\)，

\[\begin{equation*} t_n \sim \textrm{exponential}(\lambda). \end{equation*}\]

The model for the censored failure times is also exponential. All that is known of a censored item is that its failure time is greater than the censoring time, so each censored item contributes a factor to the likelihood of

删失失效时间的模型也是指数的。对于删失项目，我们只知道其失效时间大于删失时间，因此每个删失项目对似然函数贡献一个因子：

\[\begin{equation*} \Pr[T > t^{\textrm{cens}}] = 1 - F_T(t^{\textrm{cens}}), \end{equation*}\] where \(F_T\) is the cumulative distribution function (cdf) of survival time \(T\) (\(F_X(x) = \Pr[X \leq x]\) is standard notation for the cdf of a random variable \(X\)). The function \(1 - F_T(t)\) is the complementary cumulative distribution function (ccdf), and it is used directly to define the likelihood

其中 \(F_T\) 是生存时间 \(T\) 的累积分布函数（cdf）（\(F_X(x) = \Pr[X \leq x]\) 是随机变量 \(X\) 的 cdf 的标准记法）。函数 \(1 - F_T(t)\) 是补累积分布函数（ccdf），直接用于定义似然函数

\[\begin{eqnarray*} p(t, t^{\textrm{cens}}, N^{\textrm{cens}} \mid \lambda) & = & \prod_{n=1}^N \textrm{exponential}(t_n \mid \lambda) \cdot \prod_{n=1}^{N^{\textrm{cens}}} \textrm{exponentialCCDF}(t^{\textrm{cens}} \mid \lambda) \\ & = & \prod_{n=1}^N \textrm{exponential}(t_n \mid \lambda) \cdot \textrm{exponentialCCDF}(t^{\textrm{cens}} \mid \lambda)^{N^{\textrm{cens}}}. \end{eqnarray*}\]

On the log scale, that’s

在对数尺度上，这是

\[\begin{eqnarray*} \log p(t, t^{\textrm{cens}}, N^{\textrm{cens}} \mid \lambda) & = & \sum_{n=1}^N \log \textrm{exponential}(t_n \mid \lambda) \\ & & { } + N^{\textrm{cens}} \cdot \log \textrm{exponentialCCDF}(t^{\textrm{cens}} \mid \lambda). \end{eqnarray*}\]

The model can be completed with a standard lognormal prior on \(\lambda,\)

该模型可以用 \(\lambda\) 的标准对数正态先验来完成，

\[\begin{equation*} \lambda \sim \textrm{lognormal}(0, 1), \end{equation*}\] which is reasonable if failure times are in the range of 0.1 to 10 time units, because that’s roughly the 95% central interval for a variable distributed \(\textrm{lognormal}(0, 1)\). In general, the range of the prior (and likelihood!) should be adjusted with prior knowledge of expected failure times.

如果失效时间在 0.1 到 10 个时间单位的范围内，这是合理的，因为这大约是分布为 \(\textrm{lognormal}(0, 1)\) 的变量的 95% 中心区间。一般来说，先验（和似然！）的范围应该根据对期望失效时间的先验知识进行调整。

Stan program

Stan 程序

The data for a simple survival analysis without covariates can be coded as follows.

不包含协变量的简单生存分析数据可以如下编码。

data {
  int<lower=0> N;
  vector[N] t;
  int<lower=0> N_cens;
  real<lower=0> t_cens;
}

In this program, N is the number of uncensored observations and t contains the times of the uncensored observations. There are a further N_cens items that are right censored at time t_cens. Right censoring means that if the time to failure is greater than t_cens, it is only observed that the part survived until time t_cens. In the case where there are no covariates, the model only needs the number of censored items because they all share the same censoring time.

在这个程序中，N 是未删失观测的数量，t 包含未删失观测的时间。还有 N_cens 个项目在时间 t_cens 处被右删失。右删失意味着如果失效时间大于 t_cens，只能观测到零件存活到时间 t_cens。在没有协变量的情况下，模型只需要删失项目的数量，因为它们都共享相同的删失时间。

There is a single rate parameter, the inverse of which is the expected time to failure.

有一个单一的速率参数，其倒数是期望失效时间。

parameters {
  real<lower=0> lambda;
}

The exponential survival model and the prior are coded directly using vectorized distribution and ccdf statements. This both simplifies the code and makes it more computationally efficient by sharing computation across instances.

指数生存模型和先验使用向量化分布和 ccdf 语句直接编码。这既简化了代码，又通过在实例间共享计算使其更具计算效率。

model {
  t ~ exponential(lambda);
  target += N_cens * exponential_lccdf(t_cens | lambda);

  lambda ~ lognormal(0, 1);
}

The likelihood for rate lambda is just the density of exponential distribution for observed failure time. The Stan code is vectorized, modeling each entry of the vector t as a having an exponential distribution with rate lambda. This data model could have been written as

观测失效时间的速率 lambda 的似然函数就是指数分布的密度。Stan 代码是向量化的，将向量 t 的每个条目建模为具有速率 lambda 的指数分布。这个数据模型也可以写成

for (n in 1:N) {
  t[n] ~ exponential(lambda);
}

The log likelihood contribution given censored items is the number of censored items times the log complementary cumulative distribution function (lccdf) at the censoring time of the exponential distribution with rate lambda. The log likelihood terms arising from the censored events could have been added to the target log density one at a time,

给定删失项目的对数似然贡献是删失项目数量乘以速率为 lambda 的指数分布在删失时间处的对数补累积分布函数（lccdf）。由删失事件产生的对数似然项也可以逐个添加到目标对数密度中，

for (n in 1:N)
  target += exponential_lccdf(t_cens | lambda);

to define the same log density, but it is much more efficient computationally to multiply by a constant than do a handful of sequential additions.

来定义相同的对数密度，但在计算上乘以一个常数比进行一系列连续加法要高效得多。

17.2 Weibull survival model

威布尔生存模型

The Weibull distribution is a popular alternative to the exponential distribution in cases where there is a decreasing probability of survival as a subject gets older. The Weibull distribution models this by generalizing the exponential distribution to include a power-law trend.

威布尔分布是指数分布的一个流行替代方案，适用于随着对象年龄增长生存概率下降的情况。威布尔分布通过将指数分布推广到包含幂律趋势来建模这种情况。

The Weibull distribution is parameterized by a shape \(\alpha > 0\) and scale \(\sigma > 0.\) For an outcome \(t \geq 0\), the Weibull distribution’s probability density function is

威布尔分布由形状参数 \(\alpha > 0\) 和尺度参数 \(\sigma > 0\) 进行参数化。对于结果 \(t \geq 0\)，威布尔分布的概率密度函数是

\[\begin{equation*} \textrm{Weibull}(t \mid \alpha, \sigma) = \frac{\alpha}{\sigma} \cdot \left( \frac{t}{\sigma} \right)^{\alpha - 1} \cdot \exp\left(-\left(\frac{t}{\sigma}\right)^{\alpha}\right). \end{equation*}\]

In contrast, recall that the exponential distribution can be expressed using a rate (inverse scale) parameter \(\beta > 0\) with probability density function

相比之下，回顾指数分布可以使用速率（尺度的倒数）参数 \(\beta > 0\) 表示，其概率密度函数为

\[\begin{equation*} \textrm{exponential}(t \mid \beta) = \beta \cdot \exp(-\beta \cdot t). \end{equation*}\] When \(\alpha = 1,\) the Weibull distribution reduces to an exponential distribution, \[\begin{equation*} \textrm{Weibull}(t \mid 1, \sigma) = \textrm{exponential}\!\left(t \,\bigg|\, \frac{1}{\sigma}\right). \end{equation*}\] In other words, the Weibull is a continuous expansion of the exponential distribution.

当 \(\alpha = 1\) 时，威布尔分布简化为指数分布，

换句话说，威布尔分布是指数分布的连续扩展。

If \(T \sim \textrm{Weibull}(\alpha, \sigma),\) then the expected survival time is

如果 \(T \sim \textrm{Weibull}(\alpha, \sigma)\)，那么期望生存时间是

\[\begin{equation*} \mathbb{E}[T] = \sigma \cdot \Gamma\!\left(1 + \frac{1}{\alpha}\right), \end{equation*}\]

where the \(\Gamma\) function is the continuous completion of the factorial function (i.e., \(\Gamma(1 + n) = n!\ \) for \(n \in \mathbb{N}\)). As \(\alpha \rightarrow 0\) for a fixed \(\sigma\) or as \(\sigma \rightarrow \infty\) for a fixed \(\alpha\), the expected survival time goes to infinity.

其中 \(\Gamma\) 函数是阶乘函数的连续扩展（即，对于 \(n \in \mathbb{N}\)，有 \(\Gamma(1 + n) = n!\)）。当固定 \(\sigma\) 时 \(\alpha \rightarrow 0\)，或当固定 \(\alpha\) 时 \(\sigma \rightarrow \infty\)，期望生存时间趋于无穷大。

There are three regimes of the Weibull distribution.

威布尔分布有三种情况。

\(\alpha < 1.\) A subject is more likely to fail early. When \(\alpha < 1,\) the Weibull density approaches infinity as \(t \rightarrow 0.\)
\(\alpha < 1.\) 对象更可能早期失效。当 \(\alpha < 1\) 时，威布尔密度在 \(t \rightarrow 0\) 时趋于无穷大。
\(\alpha = 1.\) The Weibull distribution reduces to the exponential distribution, with a constant rate of failure over time. When \(\alpha = 1,\) the Weibull distribution approaches \(\sigma\) as \(t \rightarrow 0.\)
\(\alpha = 1.\) 威布尔分布简化为指数分布，失效率随时间保持恒定。当 \(\alpha = 1\) 时，威布尔分布在 \(t \rightarrow 0\) 时趋于 \(\sigma\)。
\(\alpha > 1.\) Subjects are less likely to fail early. When \(\alpha > 1,\) the Weibull density approaches zero as \(t \rightarrow 0.\)
\(\alpha > 1.\) 对象不太可能早期失效。当 \(\alpha > 1\) 时，威布尔密度在 \(t \rightarrow 0\) 时趋于零。

With \(\alpha \leq 1,\) the mode is zero (\(t = 0\)), whereas with \(\alpha > 1,\) the mode is nonzero (\(t > 0\)).

当 \(\alpha \leq 1\) 时，众数为零（\(t = 0\)），而当 \(\alpha > 1\) 时，众数非零（\(t > 0\)）。

Stan program

Stan 程序

With Stan, one can just swap the exponential distribution for the Weibull distribution with the appropriate parameters and the model remains essentially the same. Recall the exponential model’s parameters and model block.

使用 Stan，可以用适当的参数将指数分布替换为威布尔分布，模型本质上保持不变。回顾指数模型的参数和模型块。

parameters {
  real<lower=0> beta;
}
model {
  t ~ exponential(beta);
  target += N_cens * exponential_lccdf(t_cens | beta);

  beta ~ lognormal(0, 1);
}

The Stan program for the Weibull model just swaps in the Weibull distribution and complementary cumulative distribution function with shape (alpha) and scale (sigma) parameters.

威布尔模型的 Stan 程序只是用具有形状（alpha）和尺度（sigma）参数的威布尔分布和补累积分布函数进行替换。

parameters {
  real<lower=0> alpha;
  real<lower=0> sigma;
}
model {
  t ~ weibull(alpha, sigma);
  target += N_cens * weibull_lccdf(t_cens | alpha, sigma);

  alpha ~ lognormal(0, 1);
  sigma ~ lognormal(0, 1);
}

As usual, if more is known about expected survival times, alpha and sigma should be given more informative priors.

通常，如果对期望生存时间有更多了解，应该给 alpha 和 sigma 更具信息性的先验。

17.3 Survival with covariates

带协变量的生存模型

Suppose that for each of \(n \in 1{:}N\) items observed, both censored and uncensored, there is a covariate (row) vector \(x_n \in \mathbb{R}^K.\) For example, a clinical trial may include the age (or a one-hot encoding of an age group) and the sex of a participant; an electronic component might include a one-hot encoding of the factory at which it was manufactured and a covariate for the load under which it has been run.

假设对于观测的 \(n \in 1{:}N\) 个项目中的每一个（包括删失和未删失的），都有一个协变量（行）向量 \(x_n \in \mathbb{R}^K\)。例如，临床试验可能包括参与者的年龄（或年龄组的独热编码）和性别；电子元件可能包括制造工厂的独热编码和运行负载的协变量。

Survival with covariates replaces what is essentially a simple regression with only an intercept \(\lambda\) with a generalized linear model with a log link, where the rate for item \(n\) is

带协变量的生存模型将本质上只有截距 \(\lambda\) 的简单回归替换为具有对数链接的广义线性模型，其中项目 \(n\) 的速率为

\[\begin{equation*} \lambda_n = \exp(x_n \cdot \beta), \end{equation*}\]

where \(\beta \in \mathbb{R}^K\) is a \(K\)-vector of regression coefficients. Thus

其中 \(\beta \in \mathbb{R}^K\) 是回归系数的 \(K\) 维向量。因此

\[\begin{equation*} t_n \sim \textrm{exponential}(\lambda_n). \end{equation*}\]

The censored items have probability

删失项目的概率为

\[\begin{equation*} \Pr[n\textrm{-th censored}] = \textrm{exponentialCCDF}(t^{\textrm{cens}} \mid x^{\textrm{cens}}_n \cdot \beta). \end{equation*}\]

The covariates form an \(N \times K\) data matrix, \(x \in \mathbb{R}^{N \times K}\). An intercept can be introduced by adding a column of 1 values to \(x\).

协变量形成一个 \(N \times K\) 的数据矩阵 \(x \in \mathbb{R}^{N \times K}\)。可以通过向 \(x\) 添加一列值为 1 的列来引入截距。

A Stan program for the exponential survival model with covariates is as follows. It relies on the fact that the order of failure times (t and t_cens) corresponds to the ordering of items in the covariate matrices (x and x_cens).

带协变量的指数生存模型的 Stan 程序如下。它依赖于失效时间的顺序（t 和 t_cens）与协变量矩阵（x 和 x_cens）中项目的顺序相对应这一事实。

data {
  int<lower=0> N;
  vector[N] t;
  int<lower=0> N_cens;
  real<lower=0> t_cens;
  int<lower=0> K;
  matrix[N, K] x;
  matrix[N_cens, K] x_cens;
}
parameters {
  vector[K] gamma;
}
model {
  gamma ~ normal(0, 2);

  t ~ exponential(exp(x * gamma));
  target += exponential_lccdf(t_cens | exp(x_cens * gamma));
}

Both the distribution statement for uncensored times and the log density increment statement for censored times are vectorized, one in terms of the exponential distribution and one in terms of the log complementary cumulative distribution function.

未删失时间的分布语句和删失时间的对数密度增量语句都是向量化的，一个用指数分布表示，一个用对数补累积分布函数表示。

17.4 Hazard and survival functions

风险和生存函数

Suppose \(T\) is a random variable representing a survival time, with a smooth cumulative distribution function

假设 \(T\) 是代表生存时间的随机变量，具有光滑的累积分布函数

\[\begin{equation*} F_T(t) = \Pr[T \leq t], \end{equation*}\]

so that its probability density function is

因此其概率密度函数为

\[\begin{equation*} p_T(t) = \frac{\textrm{d}}{\textrm{d}t} F_T(t). \end{equation*}\]

The survival function \(S(t)\) is the probability of surviving until at least time \(t\), which is just the complementary cumulative distribution function (ccdf) of the survival random variable \(T\),

生存函数 \(S(t)\) 是至少存活到时间 \(t\) 的概率，它就是生存随机变量 \(T\) 的补累积分布函数（ccdf），

\[\begin{equation*} S(t) = 1 - F_T(t). \end{equation*}\] The survival function appeared in the Stan model in the previous section as the likelihood for items that did not fail during the period of the experiment (i.e., the censored failure times for the items that survived through the trial period).

生存函数在前一节的 Stan 模型中作为在实验期间未失效项目的似然出现（即在试验期间存活的项目的删失失效时间）。

The hazard function \(h(t)\) is the instantaneous risk of not surviving past time \(t\) assuming survival until time \(t\), which is given by

风险函数 \(h(t)\) 是假设存活到时间 \(t\) 后在时间 \(t\) 之后不存活的瞬时风险，定义为

\[\begin{equation*} h(t) = \frac{p_T(t)}{S(t)} = \frac{p_T(t)}{1 - F_T(t)}. \end{equation*}\] The cumulative hazard function \(H(t)\) is defined to be the accumulated hazard over time,

累积风险函数 \(H(t)\) 定义为随时间累积的风险，

\[\begin{equation*} H(t) = \int_0^t h(u) \, \textrm{d}u. \end{equation*}\]

The hazard function and survival function are related through the differential equation

风险函数和生存函数通过微分方程相关联

\[\begin{eqnarray*} h(t) & = & -\frac{\textrm{d}}{\textrm{d}t} \log S(t). \\[4pt] & = & -\frac{1}{S(t)} \frac{\textrm{d}}{\textrm{d}t} S(t) \\[4pt] & = & \frac{1}{S(t)} \frac{\textrm{d}}{\textrm{d}t} -(1 - F_Y(t)) \\[4pt] & = & \frac{1}{S(t)} \frac{\textrm{d}}{\textrm{d}t} (F_Y(t) - 1) \\[4pt] & = & \frac{1}{S(t)} \frac{\textrm{d}}{\textrm{d}t} F_Y(t) \\[4pt] & = & \frac{p_T(t)}{S(t)}. \end{eqnarray*}\]

If \(T \sim \textrm{exponential}(\beta)\) has an exponential distribution, then its hazard function is constant,

如果 \(T \sim \textrm{exponential}(\beta)\) 具有指数分布，那么其风险函数是常数，

\[\begin{eqnarray*} h(t \mid \beta) & = & \frac{p_T(t \mid \beta)}{S(t \mid \beta)} \\[4pt] & = & \frac{\textrm{exponential}(t \mid \beta)}{1 - \textrm{exponentialCCDF}(t \mid \beta)} \\[4pt] & = & \frac{\beta \cdot \exp(-\beta \cdot t)} {1 - (1 - \exp(-\beta \cdot t))} \\[4pt] & = & \frac{\beta \cdot \exp(-\beta \cdot t)} {\exp(-\beta \cdot t)} \\[4pt] & = & \beta. \end{eqnarray*}\] The exponential distribution is the only distribution of survival times with a constant hazard function.

指数分布是唯一具有恒定风险函数的生存时间分布。

If \(T \sim \textrm{Weibull}(\alpha, \sigma),\) then its hazard function is

如果 \(T \sim \textrm{Weibull}(\alpha, \sigma)\)，那么其风险函数是

\[\begin{eqnarray*} h(t \mid \alpha, \sigma) & = & \frac{p_T(t \mid \alpha, \sigma)}{S(t \mid \alpha, \sigma)} \\[4pt] & = & \frac{\textrm{Weibull}(t \mid \alpha, \sigma}{1 - \textrm{WeibullCCDF}(t \mid \alpha, \sigma)} \\[4pt] & = & \frac{\frac{\alpha}{\sigma} \cdot \left( \frac{t}{\sigma} \right)^{\alpha - 1} \cdot \exp\left(-\left(\frac{t}{\sigma} \right)^\alpha\right)} {1 - \left(1 - \exp\left(-\left(\frac{t}{\sigma}\right)^\alpha \right)\right)} \\[4pt] & = & \frac{\alpha}{\sigma} \cdot \left( \frac{t}{\sigma} \right)^{\alpha - 1}. \end{eqnarray*}\]

If \(\alpha = 1\) the hazard is constant over time (which also follows from the fact that the Weibull distribution reduces to the exponential distribution when \(\alpha = 1\)). When \(\alpha > 1,\) the hazard grows as time passes, whereas when \(\alpha < 1,\) it decreases as time passes.

如果 \(\alpha = 1\)，风险随时间保持恒定（这也源于当 \(\alpha = 1\) 时威布尔分布简化为指数分布的事实）。当 \(\alpha > 1\) 时，风险随时间增长，而当 \(\alpha < 1\) 时，风险随时间减少。

17.5 Proportional hazards model

比例风险模型

The exponential model is parametric in that is specifies an explicit parametric form for the distribution of survival times. Cox (1972) introduced a semi-parametric survival model specified directly in terms of a hazard function \(h(t)\) rather than in terms of a distribution over survival times. Cox’s model is semi-parametric in that it does not model the full hazard function, instead modeling only the proportional differences in hazards among subjects.

指数模型是参数模型，因为它为生存时间的分布指定了显式的参数形式。Cox (1972) 引入了一个半参数生存模型，直接用风险函数 \(h(t)\) 而不是生存时间分布来指定。Cox 模型是半参数的，因为它不对完整的风险函数建模，而只对受试者之间风险的比例差异建模。

Let \(x_n \in \mathbb{R}^K\) be a (row) vector of covariates for subject \(n\) so that the full covariate data matrix is \(x \in \mathbb{R}^{N \times K}\). In Cox’s model, the hazard function for subject \(n\) is defined conditionally in terms of their covariates \(x_n\) and the parameter vector \(\gamma \in \mathbb{R}^K\) as

设 \(x_n \in \mathbb{R}^K\) 为受试者 \(n\) 的协变量（行）向量，使得完整的协变量数据矩阵为 \(x \in \mathbb{R}^{N \times K}\)。在 Cox 模型中，受试者 \(n\) 的风险函数根据其协变量 \(x_n\) 和参数向量 \(\gamma \in \mathbb{R}^K\) 条件性地定义为

\[\begin{equation*} h(t \mid x_n, \beta) = h_0(t) \cdot \exp(x_n \cdot \gamma), \end{equation*}\]

where \(h_0(t)\) is a shared baseline hazard function and \(x_n \cdot \gamma = \sum_{k=1}^K x_{n, k} \cdot \beta_k\) is a row vector-vector product.

其中 \(h_0(t)\) 是共享的基线风险函数，\(x_n \cdot \gamma = \sum_{k=1}^K x_{n, k} \cdot \beta_k\) 是行向量-向量乘积。

In the semi-parametric, proportional hazards model, the baseline hazard function \(h_0(t)\) is not modeled. This is why it is called “semi-parametric.” Only the factor \(\exp(x_n \cdot \gamma),\) which determines how individual \(n\) varies by a proportion from the baseline hazard, is modeled. This is why it’s called “proportional hazards.”

在半参数比例风险模型中，基线风险函数 \(h_0(t)\) 不被建模。这就是它被称为”半参数”的原因。只有因子 \(\exp(x_n \cdot \gamma)\) 被建模，它决定了个体 \(n\) 与基线风险的比例变化。这就是它被称为”比例风险”的原因。

Cox’s proportional hazards model is not fully generative. There is no way to generate the times of failure because the baseline hazard function \(h_0(t)\) is unmodeled; if the baseline hazard were known, failure times could be generated. Cox’s proportional hazards model is generative for the ordering of failures conditional on a number of censored items. Proportional hazard models may also include parametric or non-parametric model for the baseline hazard function¹.

Cox 比例风险模型不是完全生成性的。由于基线风险函数 \(h_0(t)\) 未被建模，无法生成失效时间；如果基线风险已知，可以生成失效时间。Cox 比例风险模型对于给定若干删失项目条件下的失效顺序是生成性的。比例风险模型也可能包括基线风险函数的参数或非参数模型²。

Partial likelihood function

部分似然函数

Cox’s proportional specification of the hazard function is insufficient to generate random variates because the baseline hazard function \(h_0(t)\) is unknown. On the other hand, the proportional specification is sufficient to generate a partial likelihood that accounts for the order of the survival times.

Cox 风险函数的比例规范不足以生成随机变量，因为基线风险函数 \(h_0(t)\) 未知。另一方面，比例规范足以生成考虑生存时间顺序的部分似然。

The hazard function \(h(t \mid x_n, \beta) = h_0(t) \cdot \exp(x_n \cdot \beta)\) for subject \(n\) represents the instantaneous probability that subject \(n\) fails at time \(t\) given that it has survived until time \(t.\) The probability that subject \(n\) is the first to fail among \(N\) subjects is thus proportional to subject \(n\)’s hazard function,

受试者 \(n\) 的风险函数 \(h(t \mid x_n, \beta) = h_0(t) \cdot \exp(x_n \cdot \beta)\) 表示给定受试者 \(n\) 存活到时间 \(t\) 的条件下在时间 \(t\) 失效的瞬时概率。因此，受试者 \(n\) 在 \(N\) 个受试者中第一个失效的概率与受试者 \(n\) 的风险函数成比例，

\[\begin{equation*} \Pr[n \textrm{ first to fail at time } t] \propto h(t \mid x_n, \beta). \end{equation*}\] Normalizing yields

标准化得到

\[\begin{eqnarray*} \Pr[n \textrm{ first to fail at time } t] & = & \frac{h(t \mid x_n, \beta)} {\sum_{n' = 1}^N h(t \mid x_{n'}, \beta)} \\[4pt] & = & \frac{h_0(t) \cdot \exp(x_n \cdot \beta)} {\sum_{n' = 1}^N h_0(t) \cdot \exp(x_{n'} \cdot \beta)} \\[4pt] & = & \frac{\exp(x_n \cdot \beta)} {\sum_{n' = 1}^N \exp(x_{n'} \cdot \beta)}. \end{eqnarray*}\]

Suppose there are \(N\) subjects with strictly ordered survival times \(t_1 < t_2 < \cdots < t_N\) and covariate (row) vectors \(x_1, \ldots, x_N\). Let \(t^{\textrm{cens}}\) be the (right) censoring time and let \(N^{\textrm{obs}}\) be the largest value of \(n\) such that \(t_n \leq t^{\textrm{cens}}\). This means \(N^{\textrm{obs}}\) is the number of subjects whose failure time was observed. The ordering is for convenient indexing and does not cause any loss of generality—survival times can simply be sorted into the necessary order.

假设有 \(N\) 个受试者，其严格有序的生存时间为 \(t_1 < t_2 < \cdots < t_N\)，协变量（行）向量为 \(x_1, \ldots, x_N\)。设 \(t^{\textrm{cens}}\) 为（右）删失时间，设 \(N^{\textrm{obs}}\) 为使得 \(t_n \leq t^{\textrm{cens}}\) 的 \(n\) 的最大值。这意味着 \(N^{\textrm{obs}}\) 是观测到失效时间的受试者数量。排序是为了方便索引，不会导致任何一般性损失——生存时间可以简单地按所需顺序排序。

With failure times sorted in decreasing order, the partial likelihood for each observed subject \(n \in 1{:}N^{\textrm{obs}}\) can be expressed as

失效时间按递减顺序排序后，每个观测受试者 \(n \in 1{:}N^{\textrm{obs}}\) 的部分似然可以表示为

\[\begin{equation*} \Pr[n \textrm{ first to fail among } n, n + 1, \ldots N] = \frac{\exp(x_n \cdot \beta)} {\sum_{n' = n}^N \exp(x_{n'} \cdot \beta)}. \end{equation*}\] The group of items for comparison and hence the summation is over all items, including those with observed and censored failure times.

用于比较的项目组以及求和包括所有项目，包括观测和删失失效时间的项目。

The partial likelihood, defined in this form by Breslow (1975), is just the product of the partial likelihoods for the observed subjects (i.e., excluding subjects whose failure time is censored).

由 Breslow (1975) 以这种形式定义的部分似然就是观测受试者的部分似然的乘积（即排除失效时间被删失的受试者）。

\[\begin{equation*} \Pr[\textrm{observed failures ordered } 1, \ldots, N^{\textrm{obs}} | x, \beta] = \prod_{n = 1}^{N^{\textrm{obs}}} \frac{\exp(x_n \cdot \beta)} {\sum_{n' = n}^N \exp(x_{n'} \cdot \beta)}. \end{equation*}\] On the log scale,

在对数尺度上，

\[\begin{eqnarray*} \log \Pr[\textrm{obs.\ fail ordered } 1, \ldots, N^{\textrm{obs}} | x, \beta] & = & \sum_{n = 1}^{N^{\textrm{obs}}} \log \left( \frac{\exp(x_n \cdot \beta)} {\sum_{n' = n}^N \exp(x_{n'} \cdot \beta)} \right) \\[4pt] & = & x_n \cdot \beta - \log \sum_{n' = n}^N \exp(x_{n'} \cdot \beta) \\ & = & x_n \cdot \beta - \textrm{logSumExp}_{n' = n}^N \ x_{n'} \cdot \beta, \end{eqnarray*}\] where \[\begin{equation*} \textrm{logSumExp}_{n = a}^b \ x_n = \log \sum_{n = a}^b \exp(x_n) \end{equation*}\] is implemented so as to preserve numerical precision.

其中实现以保持数值精度。

This likelihood follows the same approach to ranking as that developed by Plackett (1975) for estimating the probability of the order of the first few finishers in a horse race.

这个似然遵循与 Plackett (1975) 为估计赛马中前几名完赛者顺序概率而开发的相同排序方法。

A simple normal prior on the components of \(\beta\) completes the model,

\(\beta\) 各分量的简单正态先验完成了模型，

\[\begin{equation*} \beta \sim \textrm{normal}(0, 2). \end{equation*}\] This should be scaled based on knowledge of the predictors.

这应该根据对预测变量的了解进行缩放。

Stan program

Stan 程序

To simplify the Stan program, the survival times for uncensored events are sorted into decreasing order (unlike in the mathematical presentation, where they were sorted into ascending order). The covariates for censored and uncensored observations are separated into two matrices.

为了简化 Stan 程序，未删失事件的生存时间按递减顺序排序（与数学表示中的升序排序不同）。删失和未删失观测的协变量分离到两个矩阵中。

data {
  int<lower=0> K;          // num covariates

  int<lower=0> N;          // num uncensored obs
  vector[N] t;             // event time (non-strict decreasing)
  matrix[N, K] x;          // covariates for uncensored obs

  int N_c;                 // num censored obs
  real<lower=t[N]> t_c;    // censoring time
  matrix[N_c, K] x_c;      // covariates for censored obs
}

The parameters are just the coefficients.

参数只是系数。

parameters {
  vector[K] beta;          // slopes (no intercept)
}

The prior is a simple independent centered normal distribution on each element of the parameter vector, which is vectorized in the Stan code.

先验是参数向量每个元素上的简单独立中心正态分布，在 Stan 代码中被向量化。

model {
  beta ~ normal(0, 2);
  ...

The log likelihood is implemented so as to minimize duplicated effort. The first order of business is to calculate the linear predictors, which is done separately for the subjects whose event time is observed and those for which the event time is censored.

对数似然的实现旨在最小化重复工作。首要任务是计算线性预测变量，对观测到事件时间的受试者和事件时间被删失的受试者分别进行计算。

  vector[N] log_theta = x * beta;
  vector[N_c] log_theta_c = x_c * beta;

These vectors are computed using efficient matrix-vector multiplies. The log of exponential values of the censored covariates times the coefficients is reused in the denominator of each factor, which on the log scale, starts with the log sum of exponentials of the censored items’ linear predictors.

这些向量使用高效的矩阵-向量乘法计算。删失协变量乘以系数的指数值的对数在每个因子的分母中重复使用，在对数尺度上，从删失项目线性预测变量指数的对数和开始。

  real log_denom = log_sum_exp(log_theta_c);

Then, for each observed survival time, going backwards from the latest to the earliest, the denominator can be incremented (which turns into a log sum of exponentials on the log scale), and then the target is updated with its likelihood contribution.

然后，对于每个观测的生存时间，从最近的到最早的反向进行，分母可以递增（在对数尺度上变成指数的对数和），然后用其似然贡献更新目标。

  for (n in 1:N) {
    log_denom = log_sum_exp(log_denom, log_theta[n]);
    target += log_theta[n] - log_denom;   // log likelihood
  }

The running log sum of exponentials is why the list is iterated in reverse order of survival times. It allows the log denominator to be accumulated one term at a time. The condition that the survival times are sorted into decreasing order is not checked. It could be checked very easily in the transformed data block by adding the following code.

指数的运行对数和是列表按生存时间反向顺序迭代的原因。它允许对数分母逐项累积。不检查生存时间按递减顺序排序的条件。可以通过在转换数据块中添加以下代码轻松检查。

transformed data {
  for (n in 2:N) {
    if (!(t[n] < t[n - 1])) {
      reject("times must be strictly decreasing, but found"
             "!(t[", n, "] < t[, ", (n - 1), "])");
    }   
  }
}

Stan model for tied survival times

并列生存时间的 Stan 模型

Technically, for continuous survival times, the probability of two survival times being identical will be zero. Nevertheless, real data sets often round survival times, for instance to the nearest day or week in a multi-year clinical trial. The technically “correct” thing to do in the face of unknown survival times in a range would be to treat their order as unknown and infer it. But considering all \(N!\) permutations for a set of \(N\) subjects with tied survival times is not tractable. As an alternative, Efron (1977) introduced an approximate partial likelihood with better properties than a random permutation while not being quite as good as considering all permutations. Efron’s model averages the contributions as if they truly did occur simultaneously.

从技术上讲，对于连续生存时间，两个生存时间相同的概率为零。然而，真实数据集经常将生存时间四舍五入，例如在多年临床试验中四舍五入到最近的天或周。面对范围内未知生存时间，技术上”正确”的做法是将其顺序视为未知并进行推断。但是对于一组具有并列生存时间的 \(N\) 个受试者考虑所有 \(N!\) 种排列是不可行的。作为替代方案，Efron (1977) 引入了一个近似部分似然，其性质比随机排列更好，同时不如考虑所有排列那么好。Efron 的模型将贡献平均化，就好像它们真的同时发生一样。

In the interest of completeness, here is the Stan code for an implementation of Efron’s estimator. It uses two user-defined functions. The first calculates how many different survival times occur in the data.

为了完整性，这里是 Efron 估计器实现的 Stan 代码。它使用两个用户定义的函数。第一个计算数据中出现多少不同的生存时间。

functions {
  int num_unique_starts(vector t) {
    if (size(t) == 0) return 0;
    int us = 1;
    for (n in 2:size(t)) {
      if (t[n] != t[n - 1]) us += 1;
    }
    return us;
  }

This is then used to compute the value J to send into the function that computes the position in the array of failure times where each new failure time starts, plus an end point that goes one past the target. This is a standard way in Stan to code ragged arrays.

然后使用它来计算值 J，发送到函数中，该函数计算失效时间数组中每个新失效时间开始的位置，加上超过目标一个位置的终点。这是 Stan 中编码不规则数组的标准方法。

  array[] int unique_starts(vector t, int J) {
    array[J + 1] int starts;
    if (J == 0) return starts;
    starts[1] = 1;
    int pos = 2;
    for (n in 2:size(t)) {
      if (t[n] != t[n - 1]) {
    starts[pos] = n;
    pos += 1;
      }
    }
    starts[J + 1] = size(t) + 1;
    return starts;
  }
}

The data format is exactly the same as for the model in the previous section, but in this case, the transformed data block is used to cache some precomputations required for the model, namely the ragged array grouping elements that share the same survival time.

数据格式与前一节中的模型完全相同，但在这种情况下，转换数据块用于缓存模型所需的一些预计算，即对共享相同生存时间的元素进行分组的不规则数组。

transformed data {
  int<lower=0> J = num_unique_starts(t);
  array[J + 1] int<lower=0> starts = unique_starts(t, J);
}

For each unique survival time j in 1:J, the subjects indexed from starts[j] to starts[j + 1] - 1 (inclusive) share the same survival time. The number of elements with survival time j is thus (starts[j + 1] - 1) - starts[j] + 1, or just starts[j + 1] - starts[j].

对于 1:J 中的每个唯一生存时间 j，从 starts[j] 到 starts[j + 1] - 1（包含）索引的受试者共享相同的生存时间。因此，具有生存时间 j 的元素数量为 (starts[j + 1] - 1) - starts[j] + 1，或简写为 starts[j + 1] - starts[j]。

The parameters and prior are also the same—just a vector beta of coefficients with a centered normal prior. Although it starts with the same caching of results for later, and uses the same accumulator for the denominator, the overall partial likelihood is much more involved, and depends on the user-defined functions defining the transformed data variables J and starts.

参数和先验也相同——只是一个带中心正态先验的系数向量 beta。虽然它开始时同样缓存结果供后用，并使用相同的分母累加器，但整体部分似然更加复杂，依赖于定义转换数据变量 J 和 starts 的用户定义函数。

  vector[N] log_theta = x * beta;
  vector[N_c] log_theta_c = x_c * beta;
  real log_denom_lhs = log_sum_exp(log_theta_c);
  for (j in 1:J) {
    int start = starts[j];
    int end = starts[j + 1] - 1;
    int len = end - start + 1;
    real log_len = log(len);
    real numerator = sum(log_theta[start:end]);
    log_denom_lhs = log_sum_exp(log_denom_lhs,
                                log_sum_exp(log_theta[start:end]));
    vector[len] diff;
    for (ell in 1:len) {
      diff[ell] = log_diff_exp(log_denom_lhs,
                               log(ell - 1) - log_len
                               + log_sum_exp(log_theta[start:end]));
    }
    target += numerator - sum(diff);
  }

The special function log_diff_exp is defined as

特殊函数 log_diff_exp 定义为

\[\begin{equation*} \textrm{logDiffExp}(u, v) = \log(\exp(u) - \exp(v)). \end{equation*}\]

Because of how J and starts are constructed, the length len will always be strictly positive so that the log is well defined.

由于 J 和 starts 的构造方式，长度 len 总是严格为正，因此对数是定义良好的。

Breslow, N. E. 1975. “Analysis of Survival Data Under the Proportional Hazards Model.” International Statisticas Review 43 (1): 45–58.

Cox, David R. 1972. “Regression Models and Life-Tables.” Journal of the Royal Statistical Society: Series B (Methodological) 34 (2): 187–202.

Efron, Bradley. 1977. “The Efficiency of Cox’s Likelihood Function for Censored Data.” Journal of the American Statistical Association 72 (359): 557–65.

Plackett, Robin L. 1975. “The Analysis of Permutations.” Journal of the Royal Statistical Society Series C: Applied Statistics 24 (2): 193–202.

Cox mentioned in his seminal paper that modeling the baseline hazard function would improve statistical efficiency, but he did not do it for computational reasons.↩︎
Cox 在其开创性论文中提到，对基线风险函数建模会提高统计效率，但由于计算原因他没有这样做。↩︎