22 自定义概率函数

Custom Probability Functions

本节译者：谭朕斯

本节校审：张梓源（ChatGPT辅助）

Custom distributions may also be implemented directly within Stan’s programming language. The only thing that is needed is to increment the total log probability. The rest of the chapter provides examples.

在 Stan 的编程语言中也可以直接实现自定义分布。实现时仅需增加总体的对数概率即可。本章的后续内容给出了一些示例。

22.1 Examples

示例

Triangle distribution

三角分布

A simple example is the triangle distribution, whose density is shaped like an isosceles triangle with corners at specified bounds and height determined by the constraint that a density integrate to 1. If \(\alpha \in \mathbb{R}\) and \(\beta \in \mathbb{R}\) are the bounds, with \(\alpha < \beta\), then \(y \in (\alpha,\beta)\) has a density defined as follows.

一个简单的例子是三角分布，其密度函数的形状像一个等腰三角形，三角形的底边由指定的边界确定，高度由密度函数积分为1的约束条件决定。\(\alpha \in \mathbb{R}\) 和 \(\beta \in \mathbb{R}\) 是边界，其中 \(\alpha < \beta\)，那么 \(y \in (\alpha,\beta)\) 的密度函数定义如下。

\[ \textsf{triangle}(y \mid \alpha,\beta) = \frac{2}{\beta - \alpha} \ \left( 1 - \left| y - \frac{\alpha + \beta}{\beta - \alpha} \right| \right) \]

If \(\alpha = -1\), \(\beta = 1\), and \(y \in (-1,1)\), this reduces to

如果 \(\alpha = -1\)，\(\beta = 1\)，且 \(y \in (-1,1)\)，那么这可以简化为以下形式：

\[ \textsf{triangle}(y \mid -1,1) = 1 - |y|. \] Consider the following Stan implementation of \(\textsf{triangle}(-1,1)\) for sampling.

考虑以下用于从 \(\textsf{triangle}(-1,1)\) 采样的 Stan 实现。

parameters {
  real<lower=-1, upper=1> y;
}
model {
  target += log1m(abs(y));
}

The single scalar parameter y is declared as lying in the interval (-1,1). The total log probability is incremented with the joint log probability of all parameters, i.e., \(\log \mathsf{Triangle}(y \mid -1,1)\). This value is coded in Stan as log1m(abs(y)). The function log1m is defined so that log1m(x) has the same value as \(\log(1-x)\), but the computation is faster, more accurate, and more stable.

单一的标量参数 y 被声明为位于区间 (-1,1) 内。总的对数概率会根据所有参数的联合对数概率进行累加，即 \(\log \mathsf{Triangle}(y \mid -1,1)\)。在 Stan 中，这个值被编码为 log1m(abs(y))。函数 log1m 被定义为使得 log1m(x) 具有与 \(\log(1-x)\) 相同值的函数，但计算速度更快、更准确、更稳定。

The constrained type real<lower=-1, upper=1> declared for y is critical for correct sampling behavior. If the constraint on y is removed from the program, say by declaring y as having the unconstrained scalar type real, the program would compile, but it would produce arithmetic exceptions at run time when the sampler explored values of y outside of \((-1,1)\).

为 y 声明的受限类型 real<lower=-1, upper=1> 对于正确的采样行为至关重要。如果从程序中移除 y 的约束条件，例如将 y 声明为未受约束的标量类型 real，程序将能够编译通过，但在运行时，当采样器探索超出 \((-1,1)\) 范围的 y 值时，会产生算术异常。

Now suppose the log probability function were extended to all of \(\mathbb{R}\) as follows by defining the probability to be log(0.0), i.e., \(-\infty\), for values outside of \((-1,1)\).

现在假设将对数概率函数扩展到整个实数集 \(\mathbb{R}\)，方法是对于 \((-1,1)\) 范围之外的值将概率定义为 log(0.0)，即 \(-\infty\)。

target += log(fmax(0.0,1 - abs(y)));

With the constraint on y in place, this is just a less efficient, slower, and less arithmetically stable version of the original program. But if the constraint on y is removed, the model will compile and run without arithmetic errors, but will not sample properly.¹

如果保留了 y 的约束条件，那么这只是原始程序的一个效率更低、速度更慢、数学计算稳定性较差的版本。但是如果移除了 y 的约束条件，模型将能够编译和运行，但是采样结果将不正确。²

Exponential distribution

指数分布

If Stan didn’t happen to include the exponential distribution, it could be coded directly using the following assignment statement, where lambda is the inverse scale and y the sampled variate.

如果 Stan 没有包含指数分布，可以使用以下赋值语句直接编写指数分布，其中 lambda 是逆比例尺度参数，y 是采样变量。

target += log(lambda) - y * lambda;

This encoding will work for any lambda and y; they can be parameters, data, or one of each, or even local variables.

这种编码对于任何 lambda 和 y 都适用；它们可以是参数、数据，也可以是二者的组合，甚至可以是局部变量。

The assignment statement in the previous paragraph generates C++ code that is similar to that generated by the following distribution statement.

在上一段中的赋值语句生成的 C++ 代码与以下采样语句生成的代码非常相似。

y ~ exponential(lambda);

There are two notable differences. First, the distribution statement will check the inputs to make sure both lambda is positive and y is non-negative (which includes checking that neither is the special not-a-number value).

这里有两个显著的区别。首先，采样语句会检查输入，确保 lambda 为正数且 y 为非负数（这包括检查两者是否为特殊的非数字值）。

The second difference is that if lambda is not a parameter, transformed parameter, or local model variable, the distribution statement is clever enough to drop the log(lambda) term. This results in the same posterior because Stan only needs the log probability up to an additive constant. If lambda and y are both constants, the distribution statement will drop both terms (but still check for out-of-domain errors on the inputs).

第二个区别是，如果 lambda 不是参数、转换参数或局部模型变量，采样语句会智能地省略 log(lambda) 项。这导致了相同的后验分布，因为 Stan 只需要考虑到加法常数的对数概率。如果 lambda 和 y 都是常数，采样语句将省略这两个项（但仍会检查输入是否超出定义域）。

Bivariate normal cumulative distribution function

双变量正态累积分布函数

For another example of user-defined functions, consider the following definition of the bivariate normal cumulative distribution function (CDF) with location zero, unit variance, and correlation rho. That is, it computes

作为用户定义函数的另一个例子，考虑以下定义的双变量正态累积分布函数（CDF），其中均值为零，方差为单位，并且具有相关系数 rho。也就是说，它计算的是以下概率：

\[ \texttt{binormal}\mathtt{\_}\texttt{cdf}(z_1, z_2, \rho) = \Pr[Z_1 \leq z_1 \text{ and } Z_2 \leq z_2] \] where the random 2-vector \(Z\) has the distribution

其中随机二维向量 \(Z\) 服从以下分布：

\[ Z \sim \textsf{multivariate normal}\left( \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \ \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \right). \]

The following Stan program implements this function,

以下是实现该函数的 Stan 程序：

real binormal_cdf(real z1, real z2, real rho) {
  if (z1 != 0 || z2 != 0) {
    real denom = abs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho))
                                 : not_a_number();
    real a1 = (z2 / z1 - rho) / denom;
    real a2 = (z1 / z2 - rho) / denom;
    real product = z1 * z2;
    real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
    return 0.5 * (Phi(z1) + Phi(z2) - delta)
                 - owens_t(z1, a1) - owens_t(z2, a2);
  }
  return 0.25 + asin(rho) / (2 * pi());
}

The problem is the (extremely!) light tails of the triangle distribution. The standard HMC and NUTS samplers can’t get into the corners of the triangle properly. Because the Stan code declares y to be of type real<lower=-1, upper=1>, the inverse logit transform is applied to the unconstrained variable and its log absolute derivative added to the log probability. The resulting distribution on the logit-transformed y is well behaved.↩︎
问题在于三角形分布的尾部极其轻。标准的 HMC 和 NUTS 采样器无法正确地进入三角形的角落。因为 Stan 代码声明了 y 的类型为 real<lower=-1, upper=1>，因此将逆 logit 变换应用于无约束变量，并将其对数绝对导数添加到对数概率中。logit 变换后的 y 得到的分布是良好的。↩︎