12  求解代数方程

Solving Algebraic Equations

本节译者:段园家

初次校审:李君竹

二次校审:邱怡轩(DeepSeek 辅助)

Stan provides a built-in mechanism for specifying systems of algebraic equations. These systems can be solved either with the Newton method, as implemented in the Kinsol package (Hindmarsh et al. 2005), or with the Powell hybrid method (Powell 1970). The function signatures for Stan’s algebraic solvers are fully described in the algebraic solver section of the reference manual.

Stan 提供了一种内置机制来指定代数方程组。这些方程组系统可以使用牛顿方法求解,如 Kinsol 包 (Hindmarsh et al. 2005) 中实现的那样,或使用 Powell 混合方法 (Powell 1970) 求解。Stan 代数求解器的函数签名在参考手册的代数求解器部分有完整描述。

Solving any system of algebraic equations can be translated into a root-finding problem, that is, given a function \(f\), we wish to find \(y\) such that \(f(y) = 0\).

求解任何代数方程组都可以转化为寻根问题,即给定一个函数 \(f\),我们希望找到 \(y\),使得 \(f(y)=0\)

12.1 Example: system of nonlinear algebraic equations

示例:非线性代数方程组

For systems of linear algebraic equations, we recommend solving the system using matrix division. The algebraic solver becomes handy when we want to solve nonlinear equations.

对于线性代数方程组,我们建议使用矩阵除法来求解该系统。当我们想要求解非线性方程组时,代数求解器就变得非常有用。

As an illustrative example, we consider the following nonlinear system of two equations with two unknowns:

作为一个说明性示例,我们考虑以下具有两个方程、两个未知数的非线性系统:

\[\begin{align*} z_1 &= y_1 - \theta_1 \\ z_2 &= y_1 y_2 + \theta_2 \end{align*}\]

Our goal is to simultaneously solve all equations for \(y_1\) and \(y_2\), such that the vector \(z\) goes to 0.

我们的目标是同时解出 \(y_1\)\(y_2\),使得向量 \(z\) 趋于0。

12.2 Coding an algebraic system

编码代数系统

A system of algebraic equations is coded directly in Stan as a function with a strictly specified signature. For example, the nonlinear system given above can be coded using the following function in Stan (see the user-defined functions section for more information on coding user-defined functions).

代数方程组直接在 Stan 中编码为一个具有严格指定签名的函数。例如,上面给出的非线性系统可以使用下面的函数在 Stan 中进行编码(有关编码用户自定义函数的更多信息,请参阅用户自定义函数章节)。

vector system(vector y,              // unknowns
              vector theta,          // parameters
              data array[] real x_r, // data (real)
              array[] int x_i) {     // data (integer)
  vector[2] z;
  z[1] = y[1] - theta[1];
  z[2] = y[1] * y[2] - theta[2];
  return z;
}

The function takes the unknowns we wish to solve for in y (a vector), the system parameters in theta (a vector), the real data in x_r (a real array) and the integer data in x_i (an integer array). The system function returns the value of the function (a vector), for which we want to compute the roots. Our example does not use real or integer data. Nevertheless, these unused arguments must be included in the system function with exactly the signature above.

该函数包括我们希望求解的未知数 y(一个向量)、系统参数 theta(一个向量)、实数数据 x_r(一个实数数组)和整数数据 x_i(一个整数数组)。系统函数返回函数的值(一个向量),我们旨在计算其根。我们的示例没有使用实数或整数数据。尽管如此,这些未使用的参数必须包含在系统函数中,并完全符合上面的签名。

The body of the system function here could also be coded using a row vector constructor and transposition,

这里的系统函数的主体也可以使用行向量和转置进行编码,

return [ y[1] - theta[1],
         y[1] * y[2] - theta[2] ]';

As systems get more complicated, naming the intermediate expressions goes a long way toward readability.

随着方程组系统变得越来越复杂,为中间表达式命名将大大有助于提高可读性。

Strict signature

严格签名

The function defining the system must have exactly these argument types and return type. This may require passing in zero-length arrays for data or a zero-length vector for parameters if the system does not involve data or parameters.

定义方程组系统的函数必须具有完全相同的参数类型和返回类型。如果系统不涉及数据或参数,这可能需要传入零长度的数组作为数据或零长度的向量作为参数。

12.3 Calling the algebraic solver

调用代数求解器

Let’s suppose \(\theta = (3, 6)\). To call the algebraic solver, we need to provide an initial guess. This varies on a case-by-case basis, but in general a good guess will speed up the solver and, in pathological cases, even determine whether the solver converges or not. If the solver does not converge, the Metropolis proposal gets rejected and a warning message, stating no acceptable solution was found, is issued.

假设 \(\theta=(3,6)\)。要调用代数求解器,我们需要提供一个初始猜测值。这因具体情况而异,但一般来说,一个好的初值会加快求解器的速度,并且在病态情况下,甚至可以决定求解器是否收敛。如果求解器不收敛,则 Metropolis 提议会被拒绝,并显示警告信息,说明没有找到可接受的解。

The solver has three tuning parameters to determine convergence: the relative tolerance, the function tolerance, and the maximum number of steps. Their behavior is explained in the section about algebraic solvers with control parameters.

求解器有三个用来确定收敛准则的调节参数:相对容忍度、函数容忍度和最大步数。它们的行为在代数求解器的控制参数一节中进行了解释。

The following code returns the solution to our nonlinear algebraic system:

以下代码会返回我们的非线性代数系统的解:

transformed data {
  vector[2] y_guess = [1, 1]';
  array[0] real x_r;
  array[0] int x_i;
}

transformed parameters {
  vector[2] theta = [3, 6]';
  vector[2] y;

  y = solve_newton(system, y_guess, theta, x_r, x_i);
}

which returns \(y = (3, -2)\).

它将返回 \(y =(3,-2)\)

Data versus parameters

数据与参数

The arguments for the real data x_r and the integer data x_i must be expressions that only involve data or transformed data variables. theta, on the other hand, must only involve parameters. Note there are no restrictions on the initial guess, y_guess, which may be a data or a parameter vector.

实数数据 x_r 和整数数据 x_i 参数必须是仅涉及数据或转换后数据变量的表达式。另一方面,theta 必须只涉及参数。请注意,初始值 y_guess 并没有限制,它可以是数据向量或参数向量。

Length of the algebraic function and of the vector of unknowns

代数函数和未知向量的长度

The Jacobian of the solution with respect to the parameters is computed using the implicit function theorem, which imposes certain restrictions. In particular, the Jacobian of the algebraic function \(f\) with respect to the unknowns \(x\) must be invertible. This requires the Jacobian to be square, meaning \(f(y)\) and \(y\) have the same length or, in other words the number of equations in the system is the same as the number of unknowns.

关于参数的解的雅可比矩阵是使用隐函数定理计算的,该定理施加了某些限制。特别是,代数函数 \(f\) 关于未知数 \(x\) 的雅可比矩阵必须是可逆的。这要求雅可比矩阵是方阵,即 \(f(y)\)\(y\) 具有相同的长度,或者换句话说,系统中方程的数量与未知数的数量相同

Pathological solutions

病态解

Certain systems may be degenerate, meaning they have multiple solutions. The algebraic solver will not report these cases, as the algorithm stops once it has found an acceptable solution. The initial guess will often determine which solution gets found first. The degeneracy may be broken by putting additional constraints on the solution. For instance, it might make “physical sense” for a solution to be positive or negative.

某些系统可能是退化的,这意味着它们有多组解。代数求解器不会报告这些情况,因为算法一旦发现一个可接受的解就会停止。初值通常会决定首先找到哪组解。退化情况可以通过对解施加额外的约束来去除。例如,为了有“物理上的意义”,解可能需要限制为正数或者负数。

On the other hand, a system may not have a solution (for a given point in the parameter space). In that case, the solver will not converge to a solution. When the solver fails to do so, the current Metropolis proposal gets rejected.

另一方面,系统可能没有解(对于参数空间中的给定点)。在这种情况下,求解器将不会收敛到一个解。当求解器未能收敛时,当前的 Metropolis 提议将被拒绝。

12.4 Control parameters for the algebraic solver

代数求解器的控制参数

The call to the algebraic solver shown previously uses the default control settings. The _tol variant of the solver function allows three additional parameters, all of which must be supplied.

前文对代数求解器的调用使用的是默认的控制设置。求解器函数的 _tol 变体允许使用三个附加参数,且如果提供了其中任何一个参数,则必须提供所有这些参数。

y = solve_newton_tol(system, y_guess, theta, x_r, x_i,
                     scaling_step, f_tol, max_steps);

For the Newton solver the three control arguments are scaling step, function tolerance, and maximum number of steps. For the Powell’s hybrid method the three control arguments are relative tolerance, function tolerance, and maximum number of steps. If a Newton step is smaller than the scaling step tolerance, the code breaks, assuming the solver is no longer making significant progress. If set to 0, this constraint is ignored. For Powell’s hybrid method the relative tolerance is the estimated relative error of the solver and serves to test if a satisfactory solution has been found. After convergence of the either solver, the proposed solution is plugged into the algebraic system and its norm is compared to the function tolerance. If the norm is below the function tolerance, the solution is deemed acceptable. If the solver solver reaches the maximum number of steps, it stops and returns an error message. If one of the criteria is not met, the Metropolis proposal gets rejected with a warning message explaining which criterion was not satisfied.

对于牛顿求解器,这三个控制参数是缩放步长容忍度、函数容忍度和最大步数。对于 Powell 混合方法,这三个控制参数是相对容忍度、函数容忍度和最大步数。如果牛顿步长小于缩放步长容忍度,代码将中断,即假设求解器将不再取得显著进展。如果设置为0,则忽略此约束。对于 Powell 混合方法,相对容忍度是求解器估计的相对误差,用于测试是否找到了满意的解。在任一求解器收敛后,将提出的解代入代数方程组,并将其范数与函数容忍度进行比较。如果范数低于函数容忍度,则认为该解是可接受的。如果求解器达到最大步数,它将停止并返回错误消息。如果任一标准未满足,Metropolis 提议将被拒绝,并附带一条解释哪个标准未满足的警告消息。

The default values for the control arguments are respectively scaling_step = 1e-3 (\(10^{-3}\)), rel_tol = 1e-10 (\(10^{-10}\)), f_tol = 1e-6 (\(10^{-6}\)), and max_steps = 200 (\(200\)).

控制参数的默认值分别为 scaling_step = 1e-3\(10^{-3}\)),rel_tol = 1e-10\(10^{-10}\)),f_tol = 1e-6\(10^{-6}\))和 max_steps = 200\(200\))。

Tolerance

容忍度

The relative and function tolerances control the accuracy of the solution generated by the solver. Relative tolerances are relative to the solution value. The function tolerance is the norm of the algebraic function, once we plug in the proposed solution. This norm should go to 0 (equivalently, all elements of the vector function are 0). It helps to think about this geometrically. Ideally the output of the algebraic function is at the origin; the norm measures deviations from this ideal. As the length of the return vector increases, a certain function tolerance becomes an increasingly difficult criterion to meet, given each individual element of the vector contribute to the norm.

相对容忍度和函数容忍度控制求解器生成的解的精度。相对容忍度是相对于解的取值而言的。函数容忍度是代数函数在代入提出的解后的范数。这个范数应该趋近于 0(等价地,向量函数的所有元素都为0)。从几何角度思考这个问题会有所帮助。理想情况下,代数函数的输出在原点;范数衡量了偏离这个理想点的程度。随着返回向量长度的增加,达到某个函数容忍度会变得越来越困难,因为向量的每个元素都会对范数有贡献。

Smaller relative tolerances produce more accurate solutions but require more computational time.

较小的相对容忍度将产生更精确的解,但需要更多的计算时间。

Sensitivity analysis

敏感性分析

The tolerances should be set low enough that setting them lower does not change the statistical properties of posterior samples generated by the Stan program. The sensitivity can be analysed using importance sampling without need to re-run MCMC with different tolerances as shown by Timonen et al. (2023).

容忍度应设置得足够低,以至于进一步降低它们也不会改变 Stan 程序生成的后验样本的统计特性。可以使用重要性采样来分析敏感性,而无需使用不同的容忍度重新运行 MCMC,如 Timonen et al. (2023) 所示。

Maximum number of steps

最大步数

The maximum number of steps can be used to stop a runaway simulation. This can arise in MCMC when a bad jump is taken, particularly during warmup. If the limit is hit, the current Metropolis proposal gets rejected. Users will see a warning message stating the maximum number of steps has been exceeded.

最大步数可用于停止一个失控的模拟。这在 MCMC 中可能发生,特别是在预热阶段采取了不好的跳跃时。如果达到限制,当前的 Metropolis 提议将被拒绝。用户将看到一条警告消息,指出已超过最大步数。

Hindmarsh, Alan C, Peter N Brown, Keith E Grant, Steven L Lee, Radu Serban, Dan E Shumaker, and Carol S Woodward. 2005. SUNDIALS: Suite of Nonlinear and Differential/Algebraic Equation Solvers.” ACM Transactions on Mathematical Software (TOMS) 31 (3): 363–96.
Powell, Michael J. D. 1970. “A Hybrid Method for Nonlinear Equations.” In Numerical Methods for Nonlinear Algebraic Equations, edited by P. Rabinowitz. Gordon; Breach.
Timonen, Juho, Nikolas Siccha, Ben Bales, Harri Lähdesmäki, and Aki Vehtari. 2023. “An Importance Sampling Approach for Reliable and Efficient Inference in Bayesian Ordinary Differential Equation Models.” Stat 12 (1): e614.