14  计算一维积分

Computing One Dimensional Integrals

本节译者:李君竹

初次校审:李君竹

二次校审:李君竹(Claude 辅助)

Definite and indefinite one dimensional integrals can be performed in Stan using the integrate_1d function

在 Stan 中可以使用 integrate_1d 函数 来计算一维定积分和不定积分。

As an example, the normalizing constant of a left-truncated normal distribution is

例如,左截断正态分布的标准化常量为

\[ \int_a^\infty \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{1}{2}\frac{(x - \mu)^2}{\sigma^2}} \]

To compute this integral in Stan, the integrand must first be defined as a Stan function (see the Stan Reference Manual chapter on User-Defined Functions for more information on coding user-defined functions).

要在 Stan 中计算此积分,首先必须将被积函数定义为 Stan 函数(有关编写用户定义函数的更多信息,请参阅《Stan 参考手册》中的用户定义函数一章)。

real normal_density(real x,             // Function argument
                    real xc,            // Complement of function argument
                                        //  on the domain (defined later)
                    array[] real theta, // parameters
                    array[] real x_r,   // data (real)
                    array[] int x_i) {  // data (integer)
  real mu = theta[1];
  real sigma = theta[2];

  return 1 / (sqrt(2 * pi()) * sigma) * exp(-0.5 * ((x - mu) / sigma)^2);
}

This function is expected to return the value of the integrand evaluated at point x. The argument xc is used in definite integrals to avoid loss of precision near the limits of integration and is set to NaN when either limit is infinite (see the section on precision/loss in the chapter on Higher-Order Functions of the Stan Functions Reference for details on how to use this). The argument theta is used to pass in arguments of the integral that are a function of the parameters in our model. The arguments x_r and x_i are used to pass in real and integer arguments of the integral that are not a function of our parameters.

此函数应返回在点 x 处计算得到的被积函数值。参数 xc 在定积分中用于避免积分限附近的精度损失,当任一积分限为无穷时设为 NaN(详情请参阅《Stan 函数参考》高阶函数章节中关于精度/损失的部分)。参数 theta 用于传递积分中依赖于模型参数的参数。参数 x_rx_i 用于传递积分中不依赖于模型参数的实数和整数参数。

The function defining the integrand must have exactly the argument types and return type of normal_density above, though argument naming is not important. Even if x_r and x_i are unused in the integrand, they must be included in the function signature. Even if the integral does not involve some of these, they must still be supplied some value. The most efficient will be a zero-length array or vector, which can be created with rep_array(0, 0) and rep_vector(0, 0), respectively. Other options include an uninitialized variable declared with size 0, which is equivalent to the above, or any easy value, such as size 1 array created with {0}.

定义被积函数的函数必须与上述 normal_density 具有完全相同的参数类型和返回类型,虽然参数名称并不重要。即使 x_rx_i 在被积函数中未使用,也必须包含在函数签名中。即使积分不涉及这些参数中的某些,仍需要为它们提供某个值。最高效的做法是使用零长度数组或向量,可分别用 rep_array(0, 0) 和 rep_vector(0, 0) 创建。其他选择包括声明大小为 0 的未初始化变量(等价于上述做法),或任何简单值,如用 {0} 创建的大小为 1 的数组。

14.1 Calling the integrator

调用积分器

Suppose that our model requires evaluating the lpdf of a left-truncated normal, but the truncation limit is to be estimated as a parameter. Because the truncation point is a parameter, we must include the normalization term of the truncated pdf when computing our model’s log density. Note this is just an example of how to use the 1D integrator. The more efficient way to perform the correct normalization in Stan is described in the chapter on Truncated or Censored Data of this guide.

假设我们的模型需要计算左截断正态分布的 lpdf,但截断限需要作为参数进行估计。由于截断点是参数,在计算模型对数密度时必须包含截断 pdf 的标准化项。请注意这只是使用一维积分器的示例。在 Stan 中执行正确标准化的更高效方法在本指南的”截断或删失数据”章节中描述。

Such a model might look like (include the function defined at the beginning of this chapter to make this code compile):

这样的模型可能如下所示(包含本章开头定义的函数以使代码可编译):

data {
  int N;
  array[N] real y;
}

transformed data {
  array[0] real x_r;
  array[0] int x_i;
}

parameters {
  real mu;
  real<lower=0.0> sigma;
  real left_limit;
}

model {
  mu ~ normal(0, 1);
  sigma ~ normal(0, 1);
  left_limit ~ normal(0, 1);
  target += normal_lpdf(y | mu, sigma);
  target += log(integrate_1d(normal_density,
                             left_limit,
                             positive_infinity(),
                             { mu, sigma }, x_r, x_i));
}

14.1.1 Limits of integration

积分的极限

The limits of integration can be finite or infinite. The infinite limits are made available via the Stan calls negative_infinity() and positive_infinity().

积分限可以是有限的或无限的。无限积分限可通过 Stan 函数 negative_infinity()positive_infinity() 来表示。

If both limits are either negative_infinity() or positive_infinity(), the integral and its gradients are set to zero.

如果两个积分限都是 negative_infinity()positive_infinity(),则积分及其梯度设为零。

14.1.2 Data vs. parameters

数据与参数

The arguments for the real data x_r and the integer data x_i must be expressions that only involve data or transformed data variables. theta, on the other hand, can be a function of data, transformed data, parameters, or transformed parameters.

实数数据 x_r 和整数数据 x_i 的参数必须是仅涉及数据或转换数据变量的表达式。而 theta 则可以是数据、转换数据、参数或转换参数的函数。

The endpoints of integration can be data or parameters (and internally the derivatives of the integral with respect to the endpoints are handled with the Leibniz integral rule).

积分的端点可以是数据或参数(内部使用莱布尼茨积分法则处理积分关于端点的导数)。

14.2 Integrator convergence

积分器的收敛

The integral is performed with the iterative 1D double exponential quadrature methods implemented in the Boost library (Agrawal et al. 2017). If the \(n\)th estimate of the integral is denoted \(I_n\) and the \(n\)th estimate of the norm of the integral is denoted \(|I|_n\), the iteration is terminated when

积分采用 Boost 库 (Agrawal et al. 2017) 中实现的迭代一维双指数求积方法。如果积分的第 \(n\) 次估计记为 \(I_n\),积分范数的第 \(n\) 次估计记为 \(|I|_n\),则当满足以下条件时迭代终止

\[ \frac{{|I_{n + 1} - I_n|}}{{|I|_{n + 1}}} < \text{relative tolerance}. \]

The relative_tolerance parameter can be optionally specified as the last argument to integrate_1d. By default, integrate_1d follows the Boost library recommendation of setting relative_tolerance to the square root of the machine epsilon of double precision floating point numbers (about 1e-8). If the Boost integrator is not able to reach the relative tolerance an exception is raised with a message somehing like “Exception: integrate: error estimate of integral 4.25366e-13 exceeds the given relative tolerance times norm of integral”. If integrate_1d causes an exception in transformed parameters block or model block, the result has the same effect as assigning a \(-\infty\) log probability, which causes rejection of the current proposal in MCMC samplers and adjustment of search parameters in optimization. If integrate_1d causes an exception in generated quantities block, the returned output from integrate_1d is NaN. In these cases, a bigger relative_tolerance value can be specified.

relative_tolerance 参数可作为 integrate_1d 的最后一个参数可选指定。默认情况下,integrate_1d 遵循 Boost 库的建议,将 relative_tolerance 设为双精度浮点数机器精度的平方根(约 1e-8)。如果 Boost 积分器无法达到相对容差,将抛出异常,消息类似”Exception: integrate: error estimate of integral 4.25366e-13 exceeds the given relative tolerance times norm of integral”。如果 integrate_1d 在转换参数块或模型块中引发异常,结果等同于指定 \(-\infty\) 对数概率,这会导致 MCMC 采样器拒绝当前提议并在优化中调整搜索参数。如果 integrate_1d 在生成量块中引发异常,integrate_1d 返回 NaN。在这些情况下,可以指定更大的 relative_tolerance 值。

14.2.1 Zero-crossing integrals

过零积分

Integrals on the (possibly infinite) interval \((a, b)\) that cross zero are split into two integrals, one from \((a, 0)\) and one from \((0, b)\). This is because the quadrature methods employed internally can have difficulty near zero.

跨越零点的(可能无限)区间 \((a, b)\) 上的积分被分为两个积分,一个从 \((a, 0)\),另一个从 \((0, b)\)。这是因为内部采用的求积方法在零点附近可能遇到困难。

In this case, each integral is separately integrated to the given relative_tolerance.

在这种情况下,每个积分都分别按给定的 relative_tolerance 进行积分。

14.2.2 Avoiding precision loss near limits of integration in definite integrals

避免定积分中积分极限附近的精度损失

If care is not taken, the quadrature can suffer from numerical loss of precision near the endpoints of definite integrals.

如果不加注意,求积法可能在定积分端点附近出现数值精度损失。

For instance, in integrating the pdf of a beta distribution when the values of \(\alpha\) and \(\beta\) are small, most of the probability mass is lumped near zero and one.

例如,当 \(\alpha\)\(\beta\) 值较小时,在对 beta 分布的 pdf 进行积分时,大部分概率质量集中在 0 和 1 附近。

The pdf of a beta distribution is proportional to

beta 分布的 pdf 正比于

\[ p(x) \propto x^{\alpha - 1}(1 - x)^{\beta - 1} \]

Normalizing this distribution requires computing the integral of \(p(x)\) from zero to one. In Stan code, the integrand might look like:

对此分布进行标准化需要计算 \(p(x)\) 从 0 到 1 的积分。在 Stan 代码中,被积函数可能如下:

real beta(real x, real xc, array[] real theta, array[] real x_r, array[] int x_i) {
  real alpha = theta[1];
  real beta = theta[2];

  return x^(alpha - 1.0) * (1.0 - x)^(beta - 1.0);
}

The issue is that there will be numerical breakdown in the precision of 1.0 - x as x gets close to one. This is because of the limited precision of double precision floating numbers. This integral will fail to converge for values of alpha and beta much less than one.

问题在于当 x 接近 1 时,1.0 - x 的精度会出现数值崩溃。这是由于双精度浮点数的精度有限。当 alphabeta 的值远小于 1 时,此积分将无法收敛。

This is where xc is useful. It is defined, for definite integrals, as a high precision version of the distance from x to the nearest endpoint — a - x or b - x for a lower endpoint a and an upper endpoint b. To make use of this for the beta integral, the integrand can be re-coded:

这里 xc 就很有用。对于定积分,它被定义为从 x 到最近端点距离的高精度版本——对于下端点 a 和上端点 b,为 a - xb - x。要在 beta 积分中利用这一点,可以重新编写被积函数: endpoint — a - x or b - x for a lower endpoint a and an upper endpoint b. To make use of this for the beta integral, the integrand can be re-coded:

这就是 xc 的用处。对于定积分来说,它被定义为 x 到最近端点距离的高精度版本——对于下端点 a 和上端点 b,为 a - xb - x。为了将其用于贝塔积分,积分可以重写为:

real beta(real x, real xc, array[] real theta, array[] real x_r, array[] int x_i) {
  real alpha = theta[1];
  real beta = theta[2];
  real v;

  if(x > 0.5) {
    v = x^(alpha - 1.0) * xc^(beta - 1.0);
  } else {
    v = x^(alpha - 1.0) * (1.0 - x)^(beta - 1.0);
  }

  return v;
}

In this case, as we approach the upper limit of integration \(a = 1\), xc will take on the value of \(a - x = 1 - x\). This version of the integrand will converge for much smaller values of alpha and beta than otherwise possible.

在这种情况下,当我们接近积分上限 \(a = 1\) 时,xc 将取值 \(a - x = 1 - x\)。这个版本的被积函数能够在 alphabeta 取值远小于原先可能收敛值的情况下收敛。

Consider another example: let’s say we have a log-normal distribution that is both shifted away from zero by some amount \(\delta\), and truncated at some value \(b\). If we were interested in calculating the expectation of a variable \(X\) distributed in this way, we would need to calculate

考虑另一个例子:假设我们有一个对数正态分布,它既偏移零点某个量 \(\delta\),又在某个值 \(b\) 处截断。如果我们想要计算以这种方式分布的变量 \(X\) 的期望,需要计算

\[ \int_a^b xf(x)\,dx = \int_{\delta}^b xf(x)\,dx \]

in the numerator, where \(f(x)\) is the probability density function for the shifted log-normal distribution. This probability density function can be coded in Stan as:

分子中,其中 \(f(x)\) 是偏移对数正态分布的概率密度函数。此概率密度函数可以在 Stan 中编码为:

real shift_lognormal_pdf(real x,
                         real mu,
                         real sigma,
                         real delta) {
  real p;

  p = (1.0 / ((x - delta) * sigma * sqrt(2 * pi()))) *
    exp(-1 * (log(x - delta) - mu)^2 / (2 * sigma^2));

  return p;
}

Therefore, the function that we want to integrate is:

因此,我们想要积分的函数是:

real integrand(real x,
               real xc,
               array[] real theta,
               array[] real x_r,
               array[] int x_i) {
  real numerator;
  real p;

  real mu = theta[1];
  real sigma = theta[2];
  real delta = theta[3];
  real b = theta[4];

  p = shift_lognormal_pdf(x, mu, sigma, delta);

  numerator = x * p;

  return numerator;
}

What happens here is that, given that the log-normal distribution is shifted by \(\delta\), when we then try to integrate the numerator, our x starts at values just above delta. This, in turn, causes the x - delta term to be near zero, leading to a breakdown.

此情况为,对数正态分布移动了 \(\delta\),当我们试图对该式进行积分时,x 的起始值刚好高于 delta。这相应地导致 x - delta 项接近于零,从而导致崩溃。

We can use xc, and define the integrand as:

我们可以使用 xc,将 integrand 定义为:

real integrand(real x,
               real xc,
               array[] real theta,
               array[] real x_r,
               array[] int x_i) {
  real numerator;
  real p;

  real mu = theta[1];
  real sigma = theta[2];
  real delta = theta[3];
  real b = theta[4];

  if (x < delta + 1) {
    p = shift_lognormal_pdf(xc, mu, sigma, delta);
  } else {
    p = shift_lognormal_pdf(x, mu, sigma, delta);
  }

  numerator = x * p;

  return numerator;
}

Why does this work? When our values of x are less than delta + 1 (so, when they’re near delta, given that our lower bound of integration is equal to \(\delta\)), we pass xc as an argument to our shift_lognormal_pdf function. This way, instead of dealing with x - delta in shift_lognormal_pdf, we are working with xc - delta which is equal to delta - x - delta, as delta is the lower endpoint in that case. The delta terms cancel out, and we are left with a high-precision version of x. We don’t encounter the same problem at the upper limit \(b\) so we don’t adjust the code for that case.

为什么这样有效?当我们的 x 值小于 delta + 1 时(也就是说,当它们接近 delta 时,考虑到我们的积分下限等于 \(\delta\)),我们将 xc 作为参数传递给 shift_lognormal_pdf 函数。这样,我们在 shift_lognormal_pdf 中处理的不是 x - delta,而是 xc - delta,在这种情况下等于 delta - x - delta,因为 delta 是下端点。delta 项相消,我们得到了 x 的高精度版本。在上限 \(b\) 处我们不会遇到同样的问题,所以不需要为那种情况调整代码。

Note, xc is only used for definite integrals. If either the left endpoint is at negative infinity or the right endpoint is at positive infinity, xc will be NaN.

注意,xc 仅用于定积分。如果左端点为负无穷或右端点为正无穷,xc 将为 NaN。

For zero-crossing definite integrals (see section Zero Crossing) the integrals are broken into two pieces (\((a, 0)\) and \((0, b)\) for endpoints \(a < 0\) and \(b > 0\)) and xc is a high precision version of the distance to the limits of each of the two integrals separately. This means xc will be a high precision version of a - x, x, or b - x, depending on the value of x and the endpoints.

对于过零定积分(参见 Zero Crossing 一节),积分被分成两部分(\((a, 0)\)\((0, b)\),端点 \(a < 0\)\(b > 0\)),xc 是两个积分中每个积分极限的距离的高精度版本。这意味着 xc 将是 a - xxb - x 的高精度版本,具体取决于 x 的值和端点。

Agrawal, Nikhar, Anton Bikineev, Paul Bristow, Marco Guazzone, Christopher Kormanyos, Hubert Holin, Bruno Lalande, et al. 2017. “Double-Exponential Quadrature.” https://www.boost.org/doc/libs/1_66_0/libs/math/doc/html/math_toolkit/double_exponential.html.