21  用户自定义函数

User-Defined Functions

本节译者:杜丽英、李君竹

初次校审:李君竹

二次校审:李君竹(Claude 辅助)

This chapter explains functions from a user perspective with examples; see the language reference for a full specification. User-defined functions allow computations to be encapsulated into a single named unit and invoked elsewhere by name. Similarly, functions allow complex procedures to be broken down into more understandable components. Writing modular code using descriptively named functions is easier to understand than a monolithic program, even if the latter is heavily commented.1

本章从用户角度通过示例解释函数;完整规范请参阅语言参考手册。用户自定义函数允许将计算封装到单个命名单元中,并在其他地方通过名称调用。同样,函数允许将复杂过程分解为更易理解的组件。使用描述性命名函数编写的模块化代码比单体程序更易理解,即使后者有大量注释。2

21.1 Basic functions

基本函数

Here’s an example of a skeletal Stan program with a user-defined relative difference function employed in the generated quantities block to compute a relative differences between two parameters.

以下是一个 Stan 程序框架示例,其中包含一个用户自定义的相对差异函数,在生成量块中用于计算两个参数之间的相对差异。

functions {
  real relative_diff(real x, real y) {
    real abs_diff;
    real avg_scale;
    abs_diff = abs(x - y);
    avg_scale = (abs(x) + abs(y)) / 2;
    return abs_diff / avg_scale;
  }
}
// ...
generated quantities {
  real rdiff;
  rdiff = relative_diff(alpha, beta);
}

The function is named relative_diff, and is declared to have two real-valued arguments and return a real-valued result. It is used the same way a built-in function would be used in the generated quantities block.

该函数名为 relative_diff,声明为具有两个实值参数并返回一个实值结果。它的使用方式与生成量块中的内置函数相同。

User-defined functions block

用户自定义函数块

All functions are defined in their own block, which is labeled functions and must appear before all other program blocks. The user-defined functions block is optional.

所有函数都在自己的块中定义,该块标记为 functions,必须出现在所有其他程序块之前。用户自定义函数块是可选的。

Function bodies

函数体

The body (the part between the curly braces) contains ordinary Stan code, including local variables. The new function is used in the generated quantities block just as any of Stan’s built-in functions would be used.

函数体(花括号之间的部分)包含普通的 Stan 代码,包括局部变量。新函数在生成量块中的使用方式与 Stan 的任何内置函数相同。

Return statements

返回语句

Return statements, such as the one on the last line of the definition of relative_diff above, are only allowed in the bodies of function definitions. Return statements may appear anywhere in a function, but functions with non-void return types must end in a return statement.

返回语句,如上面 relative_diff 定义的最后一行,只允许在函数定义的函数体中使用。返回语句可以出现在函数的任何地方,但具有非 void 返回类型的函数必须以返回语句结尾。

Reject and error statements

拒绝和错误语句

The Stan reject statement provides a mechanism to report errors or problematic values encountered during program execution. It accepts any number of quoted string literals or Stan expressions as arguments. This statement is typically embedded in a conditional statement in order to detect bad or illegal outcomes of some processing step.

Stan 的 reject 语句提供了一种在程序执行期间报告错误或问题值的机制。它接受任意数量的带引号字符串字面量或 Stan 表达式作为参数。该语句通常嵌入在条件语句中,以检测某些处理步骤的错误或非法结果。

If an error is indicative of a problem from which it is not expected to be able to recover, Stan provides a fatal_error statement.

如果错误表明存在无法恢复的问题,Stan 提供了 fatal_error 语句。

Catching errors

捕获错误

Rejection is used to flag errors that arise in inputs or in program state. It is far better to fail early with a localized informative error message than to run into problems much further downstream (as in rejecting a state or failing to compute a derivative).

拒绝用于标记输入或程序状态中出现的错误。尽早失败并提供局部化的信息性错误消息,远比在下游遇到问题(如拒绝状态或无法计算导数)要好得多。

The most common errors that are coded is to test that all of the arguments to a function are legal. The following function takes a square root of its input, so requires non-negative inputs; it is coded to guard against illegal inputs.

最常见的编码错误是测试函数的所有参数是否合法。以下函数对其输入取平方根,因此需要非负输入;它被编码以防止非法输入。

real dbl_sqrt(real x) {
  if (!(x >= 0)) {
    reject("dblsqrt(x): x must be positive; found x = ", x);
  }
  return 2 * sqrt(x);
}

The negation of the positive test is important, because it also catches the case where x is a not-a-number value. If the condition had been coded as (x < 0) it would not catch the not-a-number case, though it could be written as (x < 0 || is_nan(x)). The positive infinite case is allowed through, but could also be checked with the is_inf(x) function. The square root function does not itself reject, but some downstream consumer of dbl_sqrt(-2) would be likely to raise an error, at which point the origin of the illegal input requires detective work. Or even worse, as Matt Simpson pointed out in the GitHub comments, the function could go into an infinite loop if it starts with an infinite value and tries to reduce it by arithmetic, likely consuming all available memory and crashing an interface. Much better to catch errors early and report on their origin.

正数测试的否定很重要,因为它还捕获了 x 是非数字值的情况。如果条件被编码为 (x < 0),它将不会捕获非数字情况,尽管可以写成 (x < 0 || is_nan(x))。正无穷大的情况被允许通过,但也可以使用 is_inf(x) 函数检查。平方根函数本身不拒绝,但 dbl_sqrt(-2) 的某些下游使用者可能会引发错误,此时需要追查非法输入的来源。或者更糟的是,正如 Matt Simpson 在 GitHub 评论中指出的,如果函数从无穷大值开始并尝试通过算术减少它,可能会进入无限循环,可能会消耗所有可用内存并使接口崩溃。尽早捕获错误并报告其来源要好得多。

The effect of rejection depends on the program block in which the rejection is executed. In transformed data, rejections cause the program to fail to load. In transformed parameters or in the model block, rejections cause the current state to be rejected in the Metropolis sense.3

拒绝的效果取决于执行拒绝的程序块。在转换数据中,拒绝会导致程序无法加载。在转换参数或模型块中,拒绝会导致当前状态在 Metropolis 意义上被拒绝。4

In generated quantities there is no way to recover and generate the remaining parameters, so rejections cause subsequent values to be reported as NaNs. Extra care should be taken in calling functions which may reject in the generated quantities block.

在生成量中没有办法恢复并生成剩余的参数,因此拒绝会导致后续值报告为 NaN。在生成量块中调用可能拒绝的函数时应格外小心。

Type declarations for functions

函数的类型声明

Function argument and return types for vector and matrix types are not declared with their sizes, unlike type declarations for variables. Function argument type declarations may not be declared with constraints, either lower or upper bounds or structured constraints like forming a simplex or correlation matrix, (as is also the case for local variables); see the table of types in the reference manual for full details.

向量和矩阵类型的函数参数和返回类型不声明其大小,这与变量的类型声明不同。函数参数类型声明不能带有约束,无论是下界或上界,还是形成单纯形或相关矩阵等结构化约束(局部变量也是如此);完整详情请参阅参考手册中的类型表。

For example, here’s a function to compute the entropy of a categorical distribution with simplex parameter theta.

例如,这是一个计算具有单纯形参数 theta 的分类分布熵的函数。

real entropy(vector theta) {
  return sum(theta .* log(theta));
}

Although theta must be a simplex, only the type vector is used.5

尽管 theta 必须是单纯形,但只使用 vector 类型。6

Upper or lower bounds on values or constrained types are not allowed as return types or argument types in function declarations.

值的上界或下界或约束类型不允许作为函数声明中的返回类型或参数类型。

Array types for function declarations

函数声明的数组类型

Array arguments have their own syntax, which follows that used in this manual for function signatures. For example, a function that operates on a two-dimensional array to produce a one-dimensional array might be declared as follows.

数组参数有自己的语法,遵循本手册中用于函数签名的语法。例如,对二维数组进行操作以产生一维数组的函数可能声明如下。

array[] real baz(array[,] real x);

The notation [ ] is used for one-dimensional arrays (as in the return above), [ , ] for two-dimensional arrays, [ , , ] for three-dimensional arrays, and so on.

符号 [ ] 用于一维数组(如上面的返回),[ , ] 用于二维数组,[ , , ] 用于三维数组,依此类推。

Functions support arrays of any type, including matrix and vector types. As with other types, no constraints are allowed.

函数支持任何类型的数组,包括矩阵和向量类型。与其他类型一样,不允许约束。

Data-only function arguments

纯数据函数参数

A function argument which is a real-valued type or a container of a real-valued type, i.e., not an integer type or integer array type, can be qualified using the prefix qualifier data. The following is an example of a data-only function argument.

作为实值类型或实值类型容器的函数参数(即不是整数类型或整数数组类型)可以使用前缀限定符 data 进行限定。以下是纯数据函数参数的示例。

real foo(real y, data real mu) {
  return -0.5 * (y - mu)^2;
}

This qualifier restricts this argument to being invoked with expressions which consist only of data variables, transformed data variables, literals, and function calls. A data-only function argument cannot involve real variables declared in the parameters, transformed parameters, or model block. Attempts to invoke a function using an expression which contains parameter, transformed parameters, or model block variables as a data-only argument will result in an error message from the parser.

此限定符将此参数的调用限制为仅包含数据变量、转换数据变量、字面量和函数调用的表达式。纯数据函数参数不能涉及在参数、转换参数或模型块中声明的实变量。尝试使用包含参数、转换参数或模型块变量的表达式作为纯数据参数调用函数将导致解析器产生错误消息。

Use of the data qualifier must be consistent between the forward declaration and the definition of a functions.

data 限定符的使用必须在函数的前向声明和定义之间保持一致。

This qualifier should be used when writing functions that call the built-in ordinary differential equation (ODE) solvers, algebraic solvers, or map functions. These higher-order functions have strictly specified signatures where some arguments of are data only expressions. (See the ODE solver chapter for more usage details and the functions reference manual for full definitions.) When writing a function which calls the ODE or algebraic solver, arguments to that function which are passed into the call to the solver, either directly or indirectly, should have the data prefix qualifier. This allows for compile-time type checking and increases overall program understandability.

在编写调用内置常微分方程(ODE)求解器、代数求解器或映射函数的函数时应使用此限定符。这些高阶函数具有严格指定的签名,其中某些参数仅为数据表达式。(有关更多使用详细信息,请参阅 ODE 求解器章节,完整定义请参阅函数参考手册。)在编写调用 ODE 或代数求解器的函数时,传递给求解器调用的该函数的参数(直接或间接)应具有 data 前缀限定符。这允许编译时类型检查并提高整体程序的可理解性。

21.2 Functions as statements

函数作为语句

In some cases, it makes sense to have functions that do not return a value. For example, a routine to print the lower-triangular portion of a matrix can be defined as follows.

在某些情况下,不返回值的函数是有意义的。例如,可以定义一个打印矩阵下三角部分的例程,如下所示。

functions {
  void pretty_print_tri_lower(matrix x) {
    if (rows(x) == 0) {
      print("empty matrix");
      return;
    }
    print("rows=", rows(x), " cols=", cols(x));
    for (m in 1:rows(x)) {
      for (n in 1:m) {
        print("[", m, ",", n, "]=", x[m, n]);
      }
    }
  }
}

The special symbol void is used as the return type. This is not a type itself in that there are no values of type void; it merely indicates the lack of a value. As such, return statements for void functions are not allowed to have arguments, as in the return statement in the body of the previous example.

特殊符号 void 用作返回类型。这本身不是一种类型,因为没有 void 类型的值;它仅表示缺少值。因此,void 函数的返回语句不允许有参数,如前面示例主体中的返回语句。

Void functions applied to appropriately typed arguments may be used on their own as statements. For example, the pretty-print function defined above may be applied to a covariance matrix being defined in the transformed parameters block.

应用于适当类型参数的 void 函数可以单独用作语句。例如,上面定义的美化打印函数可以应用于在转换参数块中定义的协方差矩阵。

transformed parameters {
  cov_matrix[K] Sigma;
  // ... code to set Sigma ...
  pretty_print_tri_lower(Sigma);
  // ...
}

21.3 Functions accessing the log probability accumulator

访问对数概率累加器的函数

Functions whose names end in _lp are allowed to use sampling statements and target += statements; other functions are not. Because of this access, their use is restricted to the transformed parameters and model blocks.

名称以 _lp 结尾的函数允许使用采样语句和 target += 语句;其他函数则不允许。由于这种访问权限,它们的使用仅限于转换参数和模型块。

Here is an example of a function to assign standard normal priors to a vector of coefficients, along with a center and scale, and return the translated and scaled coefficients; see the reparameterization section for more information on efficient non-centered parameterizations

以下是一个函数示例,为系数向量分配标准正态先验,以及中心和尺度,并返回平移和缩放后的系数;有关高效非中心参数化的更多信息,请参阅重参数化部分

functions {
  vector center_lp(vector beta_raw, real mu, real sigma) {
    beta_raw ~ std_normal();
    mu ~ cauchy(0, 5);
    sigma ~ cauchy(0, 5);
    return sigma * beta_raw + mu;
  }
}
model {
  beta ~ center_lp(beta_raw, mu, sigma);
}

The non-centered parameterization can also be coded directly in the model block.

非中心参数化也可以直接在模型块中编码。

model {
  beta_raw ~ std_normal();
  mu ~ cauchy(0, 5);
  sigma ~ cauchy(0, 5);
  beta = sigma * beta_raw + mu;
}

Context-sensitive log probability access

上下文敏感的对数概率访问

The parsing scheme used by the Stan compiler allows functions to be defined whose names end in _lp. These are allowed to behave differently depending on whether the log probability accumulator is accessible or not. The log probability accumulator is accessible whenever these functions are used in blocks where target += statements are allowed, which are the transformed parameters and model blocks. In other words, they behave like the C++ code

Stan 编译器使用的解析方案允许定义名称以 _lp 结尾的函数。这些函数允许根据对数概率累加器是否可访问而表现不同。当这些函数在允许 target += 语句的块中使用时,对数概率累加器是可访问的,这些块是转换参数和模型块。换句话说,它们的行为类似于 C++ 代码

if (log_density_accessible) {
  // code accessing target
} else {
  // code not accessing target
}

If the function is used in a block where target += statements are not allowed, then it is a compile-time error to try to access the value of target. If the code accessing target is executed, then behavior will be undefined.

如果函数在不允许 target += 语句的块中使用,那么尝试访问 target 的值将是编译时错误。如果执行访问 target 的代码,则行为将是未定义的。

Functions ending in _lp may only be used in the transformed parameters and model blocks.

_lp 结尾的函数只能在转换参数和模型块中使用。

Parameters are constant

参数是常量

One implication of the log probability function’s context-dependent scalability is that its parameters are a constant function of the parameters, even if they are not declared as data or transformed data. The parameters in the _lp function behave as if they were uniform between the lowest and highest possible values. To indicate that they do not play a role in the model, their log probabilities are not added to the log density and their changes do not produce non-zero gradients.

对数概率函数的上下文依赖可扩展性的一个含义是,其参数是参数的常量函数,即使它们未声明为数据或转换数据。_lp 函数中的参数表现得好像它们在最低和最高可能值之间均匀分布。为了表明它们在模型中不起作用,它们的对数概率不会添加到对数密度中,它们的变化不会产生非零梯度。

Covariance matrix example

协方差矩阵示例

The following covariance matrix parameterization provides an example of declaring a parameter in an _lp function. It uses a Cholesky factor parameterization of a covariance matrix to generate correlations (represented on the log scale as raw variables) uniformly at random in \((-1, 1)\) and standard deviations uniformly at random in \((0, \infty)\).

以下协方差矩阵参数化提供了在 _lp 函数中声明参数的示例。它使用协方差矩阵的 Cholesky 因子参数化来生成在 \((-1, 1)\) 中均匀随机的相关性(在对数尺度上表示为原始变量)和在 \((0, \infty)\) 中均匀随机的标准差。

functions {
  matrix wishart_cholesky_lp(matrix L_raw, real nu) {
    int K = rows(L_raw);
    vector[K] sigma_raw = diagonal(L_raw);
    matrix[K, K] L = L_raw - diag_matrix(sigma_raw);
    sigma_raw ~ std_normal();
    target += lkj_corr_cholesky_lpdf(L | 1);
    target += (K - 1) * log(nu / 2.0);
    for (k in 1:K) {
      target += gamma_lpdf(nu / 2.0 | nu / 2.0, 0.5 / square(sigma_raw[k]));
    }
    return diag_pre_multiply(sigma_raw .* sqrt(nu), L);
  }
}
model {
  L_Sigma ~ wishart_cholesky_lp(L_raw_Sigma, nu);
}
generated quantities {
  matrix[K, K] Sigma = L_Sigma * L_Sigma';
}

The function defines a Wishart Cholesky factor in terms of a scale nu and a lower-triangular matrix L_raw. The diagonal elements are pulled out as scales, and the strictly lower triangular elements are transformed through the LKJ correlation Cholesky factor distribution. The scales are given independent half-normal priors, but conditioned on nu through the Wishart. The function returns the Cholesky factor of a Wishart with scale nu and covariance matrix Sigma.

该函数根据尺度 nu 和下三角矩阵 L_raw 定义 Wishart Cholesky 因子。对角元素作为尺度提取出来,严格下三角元素通过 LKJ 相关 Cholesky 因子分布进行转换。尺度被赋予独立的半正态先验,但通过 Wishart 以 nu 为条件。该函数返回具有尺度 nu 和协方差矩阵 Sigma 的 Wishart 的 Cholesky 因子。

The parameter L_Sigma is declared as a covariance matrix and assigned in the model block. The raw variables on which it depends are declared locally in the user-defined function and not added to the log density.

参数 L_Sigma 声明为协方差矩阵并在模型块中赋值。它所依赖的原始变量在用户定义的函数中局部声明,不添加到对数密度中。

The generated quantities block is used to convert the Cholesky factor to a covariance matrix for output.

生成量块用于将 Cholesky 因子转换为协方差矩阵以供输出。

Context of use restrictions

使用上下文限制

Functions defined with the _lp suffix may only be used in the transformed parameters and model blocks.

_lp 后缀定义的函数只能在转换参数和模型块中使用。

Random number generators

随机数生成器

Functions defined with the suffix _rng may only be used in the transformed data and generated quantities blocks, and within the bodies of user-defined functions with names ending in _rng.

_rng 后缀定义的函数只能在转换数据和生成量块中使用,以及在名称以 _rng 结尾的用户定义函数的主体中使用。

User-defined probability functions

用户定义的概率函数

Functions defined with the suffix _lpdf or _lpmf may only be used in the probability statements within the model block, and within the bodies of other user-defined probability functions. The cumulative distribution functions (CDFs) defined with _lcdf and _lccdf may be used in all the same places as the user-defined probability functions.

_lpdf_lpmf 后缀定义的函数只能在模型块中的概率语句中使用,以及在其他用户定义的概率函数的主体中使用。以 _lcdf_lccdf 定义的累积分布函数(CDF)可以在与用户定义的概率函数相同的所有地方使用。

Parameterization of discrete distributions

离散分布的参数化

Stan’s built-in probability function for discrete outcomes such as the Bernoulli logit use a parameterization in terms of log odds; the categorical logit and ordered logit are similarly logit scaled, but with vector parameters. The user-defined probability mass function allows a direct parameterization in terms of a categorical distribution over outcomes.

Stan 的内置离散结果概率函数(如伯努利 logit)使用对数几率的参数化;分类 logit 和有序 logit 类似地进行 logit 缩放,但使用向量参数。用户定义的概率质量函数允许直接根据结果的分类分布进行参数化。

functions {
  real categorical_rng(vector beta) {
    int k = size(beta);
    vector[k] theta = softmax(beta);
    return categorical_rng(theta);
  }
}

The local variable theta ensures that the probabilities sum to 1. The softmax function converts the unconstrained vector beta to a simplex.

局部变量 theta 确保概率总和为 1。softmax 函数将无约束向量 beta 转换为单纯形。

Although the built-in categorical_logit could have been used here, this example shows how categorical is coded in terms of more basic functions.

尽管这里可以使用内置的 categorical_logit,但此示例展示了如何根据更基本的函数对分类进行编码。

Context restrictions

上下文限制

Functions defined with the suffix _lpdf and _lpmf may only be called in the model block and within definitions of other user-defined probability functions. As with built-in probability functions, the user-defined probability functions may only be used in the model block or in other user-defined probability function definitions.

_lpdf_lpmf 后缀定义的函数只能在模型块中调用,以及在其他用户定义的概率函数的定义中调用。与内置概率函数一样,用户定义的概率函数只能在模型块或其他用户定义的概率函数定义中使用。

The suffixes _lcdf and _lccdf indicate log cumulative distribution functions and log complementary cumulative distribution functions. Functions defined with these suffixes may only be used within the definition of user-defined probability functions.

后缀 _lcdf_lccdf 表示对数累积分布函数和对数补累积分布函数。以这些后缀定义的函数只能在用户定义的概率函数的定义中使用。

Overloading functions

函数重载

Stan allows user-defined functions to be overloaded. Multiple functions may share the same name as long as they have different sequences of argument types.

Stan 允许用户定义的函数被重载。多个函数可以共享相同的名称,只要它们具有不同的参数类型序列。

For example, a normal random number generator could be defined for real and integer arguments.

例如,可以为实数和整数参数定义正态随机数生成器。

real normal_rng(real mu, real sigma) {
  return normal_rng(mu, sigma);
}
int normal_rng(real mu, real sigma) {
  return floor(mu + sigma * normal_rng(0, 1));
}

The sequences of argument types is considered different if the types differ in dimensionality, such as real versus array[] real or vector versus row_vector. Sequences also differ if they have a different number of arguments.

如果类型在维度上不同,例如 realarray[] realvectorrow_vector,则参数类型序列被认为是不同的。如果参数数量不同,序列也不同。

Function signature declarations

函数签名声明

Stan also provides a means to declare a function without defining it. This can be used to create function signatures for functions defined in other files, or to provide a forward declaration for a function defined later in the same file.

Stan 还提供了一种声明函数而不定义它的方法。这可以用于为在其他文件中定义的函数创建函数签名,或为在同一文件中稍后定义的函数提供前向声明。

A function declaration consists of the return type, the function name, and the argument types in parentheses, followed by a semicolon. For example, the following declares a function named foo that takes two real arguments and returns a real value.

函数声明由返回类型、函数名称和括号中的参数类型组成,后跟分号。例如,以下声明了一个名为 foo 的函数,它接受两个实数参数并返回一个实数值。

real foo(real x, real y);

A function definition consists of the return type, the function name, the argument types in parentheses, and the body in curly braces. For example, the following defines a function named foo that takes two real arguments and returns their sum.

函数定义由返回类型、函数名称、括号中的参数类型和花括号中的主体组成。例如,以下定义了一个名为 foo 的函数,它接受两个实数参数并返回它们的和。

real foo(real x, real y) {
  return x + y;
}

If a function is used before it is defined, it must be declared before use. For example, the following program is legal.

如果函数在定义之前使用,必须在使用之前声明。例如,以下程序是合法的。

real foo(real x, real y);
real bar(real x, real y) {
  return foo(x, y);
}
real foo(real x, real y) {
  return x + y;
}

The following program is not legal, because foo is used before it is declared.

以下程序不合法,因为 foo 在声明之前被使用。

real bar(real x, real y) {
  return foo(x, y);
}
real foo(real x, real y) {
  return x + y;
}

The declaration of foo must appear before its use in bar.

foo 的声明必须出现在 bar 中使用它之前。

Context restrictions on overloading

重载的上下文限制

Overloaded functions are restricted to the following contexts:

重载函数限制在以下上下文中:

  • The transformed data block

  • The transformed parameters block

  • The model block

  • The generated quantities block

  • The bodies of other user-defined functions

  • 转换数据块

  • 转换参数块

  • 模型块

  • 生成量块

  • 其他用户定义函数的主体

Overloaded functions may not be used in the data block, the parameters block, or the body of a for loop in the transformed data block.

重载函数不能在数据块、参数块或转换数据块中 for 循环的主体中使用。

Covariance matrix example

协方差矩阵示例

Overloaded functions can be used to implement functions that behave differently depending on their arguments. For example, the following code defines a function that computes the covariance matrix of a vector, a row vector, or a matrix.

重载函数可用于实现根据参数不同而行为不同的函数。例如,以下代码定义了一个计算向量、行向量或矩阵的协方差矩阵的函数。

matrix cov_matrix(vector x) {
  return cov_matrix(to_matrix(x));
}
matrix cov_matrix(row_vector x) {
  return cov_matrix(to_matrix(x));
}
matrix cov_matrix(matrix x) {
  return (x' * x) / (rows(x) - 1);
}

The function cov_matrix is overloaded to accept a vector, a row vector, or a matrix. The vector and row vector versions convert their argument to a matrix and then call the matrix version.

函数 cov_matrix 被重载以接受向量、行向量或矩阵。向量和行向量版本将其参数转换为矩阵,然后调用矩阵版本。

Overloading with ordering

带顺序的重载

Overloaded functions can be used to implement functions that behave differently depending on the order of their arguments. For example, the following code defines a function that sorts two real numbers.

重载函数可用于实现根据参数顺序不同而行为不同的函数。例如,以下代码定义了一个对两个实数进行排序的函数。

array[2] real sort(real x, real y) {
  if (x < y) {
    return {x, y};
  } else {
    return {y, x};
  }
}
array[2] real sort(real x, int y) {
  return sort(x, to_real(y));
}
array[2] real sort(int x, real y) {
  return sort(to_real(x), y);
}
array[2] real sort(int x, int y) {
  return sort(to_real(x), to_real(y));
}

The function sort is overloaded to accept two real numbers, a real and an integer, an integer and a real, or two integers. The versions with mixed types convert the integer to a real and then call the version with two reals.

函数 sort 被重载以接受两个实数、一个实数和一个整数、一个整数和一个实数,或两个整数。混合类型的版本将整数转换为实数,然后调用两个实数的版本。

Overloading to improve performance

重载以提高性能

Overloaded functions can be used to improve performance by avoiding unnecessary computations. For example, the following code defines a function that computes the log of the sum of exponentials of two vectors.

重载函数可用于通过避免不必要的计算来提高性能。例如,以下代码定义了一个计算两个向量的指数和的对数的函数。

real log_sum_exp(vector x, vector y) {
  return log_sum_exp(append_row(x, y));
}
real log_sum_exp(vector x, real y) {
  return log_sum_exp(append_row(x, to_vector({y})));
}
real log_sum_exp(real x, vector y) {
  return log_sum_exp(append_row(to_vector({x}), y));
}
real log_sum_exp(real x, real y) {
  return log_sum_exp({x, y});
}

The function log_sum_exp is overloaded to accept two vectors, a vector and a real, a real and a vector, or two reals. The versions with mixed types avoid the overhead of creating a vector with a single element.

函数 log_sum_exp 被重载以接受两个向量、一个向量和一个实数、一个实数和一个向量,或两个实数。混合类型的版本避免了创建具有单个元素的向量的开销。

Overloading and generics

重载和泛型

Overloading should be used sparingly. It can make code harder to read and understand, and it can lead to unexpected behavior. For example, consider the following code.

应谨慎使用重载。它可能使代码更难阅读和理解,并可能导致意外行为。例如,考虑以下代码。

real triple(real x) {
  return 3 * x;
}
complex triple(complex x) {
  return 3 * x;
}
array[] real triple(array[] real x) {
  array[size(x)] real y;
  for (i in 1:size(x)) {
    y[i] = triple(x[i]);
  }
  return y;
}

This code defines a function named triple that triples its argument. The function is overloaded to accept a real, a complex, or an array of reals. The array version calls the real version recursively.

此代码定义了一个名为 triple 的函数,将其参数乘以三。该函数被重载以接受实数、复数或实数数组。数组版本递归调用实数版本。

Overloading like this can be used to implement generic functions that work with many different types. However, it can also lead to unexpected behavior. For example, the following code calls the real version of triple three times.

像这样的重载可用于实现适用于许多不同类型的泛型函数。然而,它也可能导致意外行为。例如,以下代码调用 triple 的实数版本三次。

triple({3.0, 4.0})[0] == triple({3.0, 4.0}[0])

This is probably not what the user intended. The user probably intended to call the array version of triple once.

这可能不是用户的意图。用户可能打算调用 triple 的数组版本一次。

Overloading and confusion

重载和困惑

Overloading can also lead to confusion. For example, consider the following code.

重载也可能导致困惑。例如,考虑以下代码。

real triple(real x) {
  return 3 * x;
}
complex triple(complex x) {
  return 3 * x;
}
array[] real triple(array[] real x) {
  array[size(x)] real y;
  for (i in 1:size(x)) {
    y[i] = triple(x[i]);
  }
  return y;
}

This code defines a function named triple that triples its argument. The function is overloaded to accept a real, a complex, or an array of reals. The array version calls the real version recursively.

此代码定义了一个名为 triple 的函数,将其参数乘以三。该函数被重载以接受实数、复数或实数数组。数组版本递归调用实数版本。

The following properties hold for this function:

此函数具有以下属性:

// The function does what it says
triple(3.0) == 9.0
// It is defined reasonably for different types
triple(to_complex(3.0)) == to_complex(triple(3.0))
// A container version of this function works by element
triple({3.0, 4.0})[0] == triple({3.0, 4.0}[0])

Note that none of these properties are enforced by Stan, they are mentioned merely to warn against uses of overloading which cause confusion.

请注意,Stan 不强制执行这些属性,提及它们仅是为了警告避免使用会引起困惑的重载。

Function resolution

函数解析

Stan resolves overloaded functions by the number and type of arguments passed to the function. This can be subtle when multiple signatures with the same number of arguments are present.

Stan 通过传递给函数的参数的数量和类型来解析重载函数。当存在具有相同参数数量的多个签名时,这可能很微妙。

Consider the following function signatures

考虑以下函数签名

real foo(int a, real b);
real foo(real a, real b);

Given these, the function call foo(1.5, 2.5) is unambiguous - it must resolve to the second signature. But, the function call foo(1, 1.5) could be valid for either under Stan’s promotion rules, which allow integers to be promoted to real numbers.

鉴于此,函数调用 foo(1.5, 2.5) 是明确的——它必须解析为第二个签名。但是,函数调用 foo(1, 1.5) 在 Stan 的提升规则下可能对_任一_都有效,该规则允许整数提升为实数。

To resolve this, Stan selects the signature which requires the fewest number of promotions for a given function call. In the above case, this means the call foo(1, 1.5) would select the first signature, because it requires 0 promotions (the second signature would require 1 promotion).

为了解决这个问题,Stan 选择给定函数调用所需提升次数最少的签名。在上述情况下,这意味着调用 foo(1, 1.5) 将选择第一个签名,因为它需要 0 次提升(第二个签名需要 1 次提升)。

Furthermore, there must be only one such signature, e.g., the minimum number of promotions for a given function call. In the above case, this means the call foo(1, 1.5) would select the first signature, because it requires 0 promotions (the second signature would require 1 promotion).

此外,必须只有一个这样的签名,即给定函数调用的最小提升次数必须是唯一的。这个要求禁止了某些类型的重载。例如,考虑函数签名

real bar(int x, real y);
real bar(real x, int y);

These signatures do not have a unique minimum number of promotions for the call bar(1, 2). Both signatures require one int to real promotion, and so it cannot be determined which is correct. Stan will produce a compilation error in this case.

这些签名对于调用 bar(1, 2) 没有唯一的最小提升次数。两个签名都需要一个 intreal 的提升,因此无法确定哪个是正确的。在这种情况下,Stan 将产生编译错误。

Promotion from integers to complex numbers is considered to be two separate promotions, first from int to real, then from real to complex. This means that integer arguments will “prefer” a signature with real types over complex types.

从整数到复数的提升被认为是两个独立的提升,首先从 intreal,然后从 realcomplex。这意味着整数参数将”偏好”具有实数类型而不是复数类型的签名。

For example, consider the function signatures

例如,考虑函数签名

real pop(real x);
real pop(complex x);

Stan will select the first signature when pop is called with an integer argument such as pop(0).

当使用整数参数(如 pop(0))调用 pop 时,Stan 将选择第一个签名。

21.4 Documenting functions

函数文档

Functions will ideally be documented at their interface level. The Stan style guide for function documentation follows the same format as used by the Doxygen (C++) and Javadoc (Java) automatic documentation systems. Such specifications indicate the variables and their types and the return value, prefaced with some descriptive text.

理想情况下,函数应在其接口级别进行文档化。Stan 函数文档的样式指南遵循 Doxygen(C++)和 Javadoc(Java)自动文档系统使用的相同格式。这些规范指示变量及其类型和返回值,并以一些描述性文本作为前言。

For example, here’s some documentation for the prediction matrix generator.

例如,这是预测矩阵生成器的一些文档。

/**
 * Return a data matrix of specified size with rows
 * corresponding to items and the first column filled
 * with the value 1 to represent the intercept and the
 * remaining columns randomly filled with unit-normal draws.
 *
 * @param N Number of rows corresponding to data items
 * @param K Number of predictors, counting the intercept, per
 *          item.
 * @return Simulated predictor matrix.
 */
matrix predictors_rng(int N, int K) {
  // ...

The comment begins with /**, ends with */, and has an asterisk (*) on each line. It uses @param followed by the argument’s identifier to document a function argument. The tag @return is used to indicate the return value. Stan does not (yet) have an automatic documentation generator like Javadoc or Doxygen, so this just looks like a big comment starting with /* and ending with */ to the Stan parser.

注释以 /** 开始,以 */ 结束,每行都有一个星号(*)。它使用 @param 后跟参数的标识符来记录函数参数。标签 @return 用于指示返回值。Stan(尚)没有像 Javadoc 或 Doxygen 那样的自动文档生成器,所以对于 Stan 解析器来说,这只是一个以 /* 开始并以 */ 结束的大注释。

For functions that raise exceptions, exceptions can be documented using @throws.7

对于引发异常的函数,可以使用 @throws 记录异常。8

For example,

例如,

 /** ...
 * @param theta
 * @throws If any of the entries of theta is negative.
 */
real entropy(vector theta) {
  // ...
}

Usually an exception type would be provided, but these are not exposed as part of the Stan language, so there is no need to document them.

通常会提供异常类型,但这些不作为 Stan 语言的一部分公开,因此无需记录它们。

21.5 Summary of function types

函数类型总结

Functions may have a void or non-void return type and they may or may not have one of the special suffixes, _lpdf, _lpmf, _lp, or _rng.

函数可能具有 void 或非 void 返回类型,它们可能有也可能没有特殊后缀之一,_lpdf_lpmf_lp_rng

Void vs. non-void return

Void 与非 void 返回

Only functions declared to return void may be used as statements. These are also the only functions that use return statements with no arguments.

只有声明返回 void 的函数才能用作语句。这些也是唯一使用没有参数的 return 语句的函数。

Only functions declared to return non-void values may be used as expressions. These functions require return statements with arguments of a type that matches the declared return type.

只有声明返回非 void 值的函数才能用作表达式。这些函数需要具有与声明的返回类型匹配的类型的参数的 return 语句。

Suffixed or non-suffixed

带后缀或不带后缀

Only functions ending in _lpmf or _lpdf and with return type real may be used as probability functions in distribution statements.

只有以 _lpmf_lpdf 结尾且返回类型为 real 的函数才能用作分布语句中的概率函数。

Only functions ending in _lp may access the log probability accumulator through distribution statements or target += statements. Such functions may only be used in the transformed parameters or model blocks.

只有以 _lp 结尾的函数才能通过分布语句或 target += 语句访问对数概率累加器。此类函数只能在转换参数或模型块中使用。

Only functions ending in _rng may access the built-in pseudo-random number generators. Such functions may only be used in the generated quantities block or transformed data block, or in the bodies of other user-defined functions ending in _rng.

只有以 _rng 结尾的函数才能访问内置的伪随机数生成器。此类函数只能在生成量块或转换数据块中使用,或在以 _rng 结尾的其他用户定义函数的主体中使用。

21.6 Recursive functions

递归函数

Stan supports recursive function definitions, which can be useful for some applications. For instance, consider the matrix power operation, \(A^n\), which is defined for a square matrix \(A\) and positive integer \(n\) by

Stan 支持递归函数定义,这对某些应用可能很有用。例如,考虑矩阵幂运算 \(A^n\),它对于方阵 \(A\) 和正整数 \(n\) 定义为

\[ A^n = \begin{cases} \textrm{I} & \quad\text{如果 } n = 0, \text{ 且} \\ A \, A^{n-1} & \quad\text{如果 } n > 0. \end{cases} \]

where \(\textrm{I}\) is the identity matrix. This definition can be directly translated to a recursive function definition.

其中 \(\textrm{I}\) 是单位矩阵。这个定义可以直接转换为递归函数定义。

matrix matrix_pow(matrix a, int n) {
  if (n == 0) {
    return diag_matrix(rep_vector(1, rows(a)));
  } else {
    return a *  matrix_pow(a, n - 1);
  }
}

It would be more efficient to not allow the recursion to go all the way to the base case, adding the following conditional clause.

不允许递归一直到基本情况会更有效,添加以下条件子句。

else if (n == 1) {
  return a;
}

21.7 Truncated random number generation

截断随机数生成

Generation with inverse CDFs

使用逆 CDF 生成

To generate random numbers, it is often sufficient to invert their cumulative distribution functions. This is built into many of the random number generators. For example, to generate a standard logistic variate, first generate a uniform variate \(u \sim \textsf{uniform}(0, 1)\), then run through the inverse cumulative distribution function, \(y = \textrm{logit}(u)\). If this were not already built in as logistic_rng(0, 1), it could be coded in Stan directly as

要生成随机数,通常只需反转其累积分布函数即可。这已内置在许多随机数生成器中。例如,要生成标准 logistic 变量,首先生成均匀变量 \(u \sim \textsf{uniform}(0, 1)\),然后通过逆累积分布函数运行,\(y = \textrm{logit}(u)\)。如果这还没有内置为 logistic_rng(0, 1),可以在 Stan 中直接编码为

real standard_logistic_rng() {
  real u = uniform_rng(0, 1);
  real y = logit(u);
  return y;
}

Following the same pattern, a standard normal RNG could be coded as

按照相同的模式,标准正态 RNG 可以编码为

real standard_normal_rng() {
  real u = uniform_rng(0, 1);
  real y = inv_Phi(u);
  return y;
}

that is, \(y = \Phi^{-1}(u)\), where \(\Phi^{-1}\) is the inverse cumulative distribution function for the standard normal distribution, implemented in the Stan function inv_Phi.

\(y = \Phi^{-1}(u)\),其中 \(\Phi^{-1}\) 是标准正态分布的逆累积分布函数,在 Stan 函数 inv_Phi 中实现。

In order to generate non-standard variates of the location-scale variety, the variate is scaled by the scale parameter and shifted by the location parameter. For example, to generate \(\textsf{normal}(\mu, \sigma)\) variates, it is enough to generate a uniform variate \(u \sim \textsf{uniform}(0, 1)\), then convert it to a standard normal variate, \(z = \Phi(u)\), where \(\Phi\) is the inverse cumulative distribution function for the standard normal, and then, finally, scale and translate it, \(y = \mu + \sigma \times z\). In code,

为了生成位置-尺度类型的非标准变量,变量通过尺度参数缩放并通过位置参数移动。例如,要生成 \(\textsf{normal}(\mu, \sigma)\) 变量,只需生成均匀变量 \(u \sim \textsf{uniform}(0, 1)\),然后将其转换为标准正态变量,\(z = \Phi(u)\),其中 \(\Phi\) 是标准正态的逆累积分布函数,然后最后缩放和平移它,\(y = \mu + \sigma \times z\)。在代码中,

real my_normal_rng(real mu, real sigma) {
  real u = uniform_rng(0, 1);
  real z = inv_Phi(u);
  real y = mu + sigma * z;
  return y;
}

A robust version of this function would test that the arguments are finite and that sigma is non-negative, e.g.,

此函数的健壮版本将测试参数是有限的,且 sigma 是非负的,例如,

  if (is_nan(mu) || is_inf(mu)) {
    reject("my_normal_rng: mu must be finite; ",
           "found mu = ", mu);
  }
  if (is_nan(sigma) || is_inf(sigma) || sigma < 0) {
    reject("my_normal_rng: sigma must be finite and non-negative; ",
           "found sigma = ", sigma);
  }

Truncated variate generation

截断变量生成

Often truncated uniform variates are needed, as in survival analysis when a time of death is censored beyond the end of the observations. To generate a truncated random variate, the cumulative distribution is used to find the truncation point in the inverse CDF, a uniform variate is generated in range, and then the inverse CDF translates it back.

经常需要截断的均匀变量,如在生存分析中,当死亡时间在观察结束后被审查时。要生成截断的随机变量,使用累积分布在逆 CDF 中找到截断点,在范围内生成均匀变量,然后逆 CDF 将其转换回来。

Truncating below

下截断

For example, the following code generates a \(\textsf{Weibull}(\alpha, \sigma)\) variate truncated below at a time \(t\),9

例如,以下代码生成在时间 \(t\) 处下截断的 \(\textsf{Weibull}(\alpha, \sigma)\) 变量,10

real weibull_lb_rng(real alpha, real sigma, real t) {
  real p = weibull_cdf(lt | alpha, sigma);   // cdf for lb
  real u = uniform_rng(p, 1);               // unif in bounds
  real y = sigma * (-log1m(u))^inv(alpha);  // inverse cdf
  return y;
}

Truncating above and below

上下截断

If there is a lower bound and upper bound, then the CDF trick is used twice to find a lower and upper bound. For example, to generate a \(\textsf{normal}(\mu, \sigma)\) truncated to a region \((a, b)\), the following code suffices,

如果有下界和上界,则 CDF 技巧被使用两次以找到下界和上界。例如,要生成截断到区域 \((a, b)\)\(\textsf{normal}(\mu, \sigma)\),以下代码就足够了,

real normal_lub_rng(real mu, real sigma, real lb, real ub) {
  real p_lb = normal_cdf(lb | mu, sigma);
  real p_ub = normal_cdf(ub | mu, sigma);
  real u = uniform_rng(p_lb, p_ub);
  real y = mu + sigma * inv_Phi(u);
  return y;
}

To make this more robust, all variables should be tested for finiteness, sigma should be tested for positiveness, and lb and ub should be tested to ensure the upper bound is greater than the lower bound. While it may be tempting to compress lines, the variable names serve as a kind of chunking of operations and naming for readability; compare the multiple statement version above with the single statement

为了使其更健壮,应测试所有变量的有限性,应测试 sigma 的正性,应测试 lbub 以确保上界大于下界。虽然压缩行可能很诱人,但变量名称用作操作分块和命名以提高可读性;将上面的多语句版本与单语句进行比较

  return mu + sigma * inv_Phi(uniform_rng(normal_cdf(lb | mu, sigma),
                                          normal_cdf(ub | mu, sigma)));

for readability. The names like p indicate probabilities, and p_lb and p_ub indicate the probabilities of the bounds. The variable u is clearly named as a uniform variate, and y is used to denote the variate being generated itself.

以提高可读性。像 p 这样的名称表示概率,p_lbp_ub 表示边界的概率。变量 u 明确命名为均匀变量,y 用于表示正在生成的变量本身。


  1. The main problem with comments is that they can be misleading, either due to misunderstandings on the programmer’s part or because the program’s behavior is modified after the comment is written. The program always behaves the way the code is written, which is why refactoring complex code into understandable units is preferable to simply adding comments.↩︎

  2. 注释的主要问题是它们可能具有误导性,要么是由于程序员的误解,要么是因为在编写注释后程序的行为被修改。程序始终按照代码编写的方式运行,这就是为什么将复杂代码重构为可理解的单元比简单地添加注释更可取。↩︎

  3. Just because this makes it possible to code a rejection sampler does not make it a good idea. Rejections break differentiability and the smooth exploration of the posterior. In Hamiltonian Monte Carlo, it can cause the sampler to be reduced to a diffusive random walk.↩︎

  4. 仅仅因为这使得编码拒绝采样器成为可能并不意味着这是个好主意。拒绝会破坏可微性和后验的平滑探索。在哈密顿蒙特卡罗中,它可能导致采样器退化为扩散随机游走。↩︎

  5. A range of built-in validation routines is coming to Stan soon! Alternatively, the reject statement can be used to check constraints on the simplex.↩︎

  6. Stan 即将推出一系列内置验证例程!或者,可以使用 reject 语句来检查单纯形上的约束。↩︎

  7. As of Stan 2.9.0, the only way a user-defined producer will raise an exception is if a function it calls (including distribution statements) raises an exception via the reject statement.↩︎

  8. 截至 Stan 2.9.0,用户定义的生产者引发异常的唯一方法是它调用的函数(包括分布语句)通过拒绝语句引发异常。↩︎

  9. The original code and impetus for including this in the manual came from the Stan forums post; by user lcomm, who also explained truncation above and below.↩︎

  10. 包含在本手册中的原始代码和动力来自 Stan 论坛帖子;由用户 lcomm 提供,他还解释了上截断和下截断。↩︎