25  重参数化和变量变换

Reparameterization and Change of Variables

本节译者:杨智

初次校审:李君竹

二次校审:李君竹(Claude 辅助)

Stan supports a direct encoding of reparameterizations. Stan also supports changes of variables by directly incrementing the log probability accumulator with the log Jacobian of the transform.

Stan 支持重参数化的直接编码。Stan 还通过使用变换的对数雅可比行列式直接增加对数概率累加器来支持变量变换。

25.1 Theoretical and practical background

理论与实践背景

A Bayesian posterior is technically a probability measure, which is a parameterization-invariant, abstract mathematical object.1

贝叶斯后验在技术上是一个概率测度,它是一个参数化不变的抽象数学对象。2

Stan’s modeling language, on the other hand, defines a probability density, which is a non-unique, parameterization-dependent function in \(\mathbb{R}^N \rightarrow \mathbb{R}^{+}\). In practice, this means a given model can be represented different ways in Stan, and different representations have different computational performances.

另一方面,Stan 的建模语言定义了一个概率密度,它是一个非唯一的、依赖于参数化的函数 \(\mathbb{R}^N \rightarrow \mathbb{R}^{+}\)。在实践中,这意味着给定的模型在 Stan 中可以有不同的表示方式,不同的表示方式具有不同的计算性能。

As pointed out by Gelman (2004) in a paper discussing the relation between parameterizations and Bayesian modeling, a change of parameterization often carries with it suggestions of how the model might change, because we tend to use certain natural classes of prior distributions. Thus, it’s not just that we have a fixed distribution that we want to sample from, with reparameterizations being computational aids. In addition, once we reparameterize and add prior information, the model itself typically changes, often in useful ways.

正如 Gelman (2004) 在一篇讨论参数化与贝叶斯建模关系的论文中指出的,参数化的改变通常会带来模型可能如何变化的建议,因为我们倾向于使用某些自然类别的先验分布。因此,这不仅仅是我们有一个想要采样的固定分布,而重参数化只是计算辅助工具。此外,一旦我们重参数化并添加先验信息,模型本身通常会发生变化,而且往往是以有用的方式变化。

25.2 Reparameterizations

重参数化

Reparameterizations may be implemented directly using the transformed parameters block or just in the model block.

重参数化可以直接在转换参数块中实现,也可以仅在模型块中实现。

Beta and Dirichlet priors

Beta 和 Dirichlet 先验

The beta and Dirichlet distributions may both be reparameterized from a vector of counts to use a mean and total count.

Beta 分布和 Dirichlet 分布都可以从计数向量重参数化为使用均值和总计数的形式。

Beta distribution

Beta 分布

For example, the Beta distribution is parameterized by two positive count parameters \(\alpha, \beta > 0\). The following example illustrates a hierarchical Stan model with a vector of parameters theta are drawn i.i.d. for a Beta distribution whose parameters are themselves drawn from a hyperprior distribution.

例如,Beta 分布由两个正计数参数 \(\alpha, \beta > 0\) 参数化。以下示例展示了一个分层 Stan 模型,其中参数向量 theta 独立同分布地从 Beta 分布中抽取,而 Beta 分布的参数本身从超先验分布中抽取。

parameters {
  real<lower=0> alpha;
  real<lower=0> beta;
  // ...
}
model {
  alpha ~ ...
  beta ~ ...
  for (n in 1:N) {
    theta[n] ~ beta(alpha, beta);
  }
  // ...
}

It is often more natural to specify hyperpriors in terms of transformed parameters. In the case of the Beta, the obvious choice for reparameterization is in terms of a mean parameter

使用转换后的参数来指定超先验通常更自然。对于 Beta 分布,明显的重参数化选择是使用均值参数

\[ \phi = \alpha / (\alpha + \beta) \]

and total count parameter

和总计数参数

\[ \lambda = \alpha + \beta. \]

Following @[GelmanEtAl:2013, Chapter 5] the mean gets a uniform prior and the count parameter a Pareto prior with \(p(\lambda) \propto \lambda^{-2.5}\).

根据 @[GelmanEtAl:2013, 第5章],均值采用均匀先验,计数参数采用 Pareto 先验,其中 \(p(\lambda) \propto \lambda^{-2.5}\)

parameters {
  real<lower=0, upper=1> phi;
  real<lower=0.1> lambda;
  // ...
}
transformed parameters {
  real<lower=0> alpha = lambda * phi;
  real<lower=0> beta = lambda * (1 - phi);
  // ...
}
model {
  phi ~ beta(1, 1); // uniform on phi, could drop
  lambda ~ pareto(0.1, 1.5);
  for (n in 1:N) {
    theta[n] ~ beta(alpha, beta);
  }
  // ...
}

The new parameters, phi and lambda, are declared in the parameters block and the parameters for the Beta distribution, alpha and beta, are declared and defined in the transformed parameters block. And If their values are not of interest, they could instead be defined as local variables in the model as follows.

新参数 philambda 在参数块中声明,Beta 分布的参数 alphabeta 在转换参数块中声明和定义。如果不关心它们的值,可以将它们定义为模型中的局部变量,如下所示。

model {
  real alpha = lambda * phi
  real beta = lambda * (1 - phi);
  // ...
  for (n in 1:N) {
    theta[n] ~ beta(alpha, beta);
  }
  // ...
}

With vectorization, this could be expressed more compactly and efficiently as follows.

使用向量化,可以更简洁高效地表达如下。

model {
  theta ~ beta(lambda * phi, lambda * (1 - phi));
  // ...
}

If the variables alpha and beta are of interest, they can be defined in the transformed parameter block and then used in the model.

如果对变量 alphabeta 感兴趣,可以在转换参数块中定义它们,然后在模型中使用。

Jacobians not necessary

无需雅可比行列式

Because the transformed parameters are being used, rather than given a distribution, there is no need to apply a Jacobian adjustment for the transform. For example, in the beta distribution example, alpha and beta have the correct posterior distribution.

因为使用的是转换参数,而不是给定分布,所以不需要为变换应用雅可比调整。例如,在 Beta 分布示例中,alphabeta 具有正确的后验分布。

Dirichlet priors

Dirichlet 先验

The same thing can be done with a Dirichlet, replacing the mean for the Beta, which is a probability value, with a simplex. Assume there are \(K > 0\) dimensions being considered (\(K=1\) is trivial and \(K=2\) reduces to the beta distribution case). The traditional prior is

Dirichlet 分布也可以进行同样的操作,用单纯形替换 Beta 分布的均值(概率值)。假设考虑 \(K > 0\) 个维度(\(K=1\) 是平凡的,\(K=2\) 简化为 Beta 分布情况)。传统的先验是

parameters {
  vector[K] alpha;
  array[N] simplex[K] theta;
  // ...
}
model {
  alpha ~ // ...
  for (n in 1:N) {
    theta[n] ~ dirichlet(alpha);
  }
}

This provides essentially \(K\) degrees of freedom, one for each dimension of alpha, and it is not obvious how to specify a reasonable prior for alpha.

这基本上提供了 \(K\) 个自由度,对应 alpha 的每个维度,而如何为 alpha 指定合理的先验并不明显。

An alternative coding is to use the mean, which is a simplex, and a total count.

另一种编码方式是使用均值(单纯形)和总计数。

parameters {
  simplex[K] phi;
  real<lower=0> kappa;
  array[N] simplex[K] theta;
  // ...
}
transformed parameters {
  vector[K] alpha = kappa * phi;
  // ...
}
model {
  phi ~ // ...
  kappa ~ // ...
  for (n in 1:N) {
    theta[n] ~ dirichlet(alpha);
  }
  // ...
}

Now it is much easier to formulate priors, because phi is the expected value of theta and kappa (minus K) is the strength of the prior mean measured in number of prior observations.

现在制定先验变得容易得多,因为 phitheta 的期望值,而 kappa(减去 K)是以先验观测数量衡量的先验均值强度。

Transforming unconstrained priors: probit and logit

转换无约束先验:probit 和 logit

If the variable \(u\) has a \(\textsf{uniform}(0, 1)\) distribution, then \(\operatorname{logit}(u)\) is distributed as \(\textsf{logistic}(0, 1)\). This is because inverse logit is the cumulative distribution function (cdf) for the logistic distribution, so that the logit function itself is the inverse CDF and thus maps a uniform draw in \((0, 1)\) to a logistically-distributed quantity.

如果变量 \(u\) 服从 \(\textsf{uniform}(0, 1)\) 分布,那么 \(\operatorname{logit}(u)\) 服从 \(\textsf{logistic}(0, 1)\) 分布。这是因为逆 logit 是 logistic 分布的累积分布函数(CDF),因此 logit 函数本身是逆 CDF,从而将 \((0, 1)\) 中的均匀抽取映射到 logistic 分布的量。

Things work the same way for the probit case: if \(u\) has a \(\textsf{uniform}(0, 1)\) distribution, then \(\Phi^{-1}(u)\) has a \(\textsf{normal}(0, 1)\) distribution. The other way around, if \(v\) has a \(\textsf{normal}(0, 1)\) distribution, then \(\Phi(v)\) has a \(\textsf{uniform}(0, 1)\) distribution.

对于 probit 情况也是如此:如果 \(u\) 服从 \(\textsf{uniform}(0, 1)\) 分布,那么 \(\Phi^{-1}(u)\) 服从 \(\textsf{normal}(0, 1)\) 分布。反过来,如果 \(v\) 服从 \(\textsf{normal}(0, 1)\) 分布,那么 \(\Phi(v)\) 服从 \(\textsf{uniform}(0, 1)\) 分布。

In order to use the probit and logistic as priors on variables constrained to \((0, 1)\), create an unconstrained variable and transform it appropriately. For comparison, the following Stan program fragment declares a \((0, 1)\)-constrained parameter theta and gives it a beta prior, then uses it as a parameter in a distribution (here using foo as a placeholder).

为了将 probit 和 logistic 用作约束在 \((0, 1)\) 上的变量的先验,需要创建一个无约束变量并适当地转换它。作为比较,以下 Stan 程序片段声明了一个 \((0, 1)\) 约束的参数 theta 并给它一个 Beta 先验,然后将其用作分布中的参数(这里使用 foo 作为占位符)。

parameters {
  real<lower=0, upper=1> theta;
  // ...
}
model {
  theta ~ beta(a, b);
  // ...
  y ~ foo(theta);
  // ...
}

If the variables a and b are one, then this imposes a uniform distribution theta. If a and b are both less than one, then the density on theta has a U shape, whereas if they are both greater than one, the density of theta has an inverted-U or more bell-like shape.

如果变量 ab 都为 1,这会对 theta 施加均匀分布。如果 ab 都小于 1,那么 theta 的密度呈 U 形;而如果它们都大于 1,theta 的密度呈倒 U 形或更像钟形。

Roughly the same result can be achieved with unbounded parameters that are probit or inverse-logit-transformed. For example,

使用经过 probit 或逆 logit 变换的无界参数可以达到大致相同的结果。例如,

parameters {
  real theta_raw;
  // ...
}
transformed parameters {
  real<lower=0, upper=1> theta = inv_logit(theta_raw);
  // ...
}
model {
  theta_raw ~ logistic(mu, sigma);
  // ...
  y ~ foo(theta);
  // ...
}

In this model, an unconstrained parameter theta_raw gets a logistic prior, and then the transformed parameter theta is defined to be the inverse logit of theta_raw. In this parameterization, inv_logit(mu) is the mean of the implied prior on theta. The prior distribution on theta will be flat if sigma is one and mu is zero, and will be U-shaped if sigma is larger than one and bell shaped if sigma is less than one.

在这个模型中,无约束参数 theta_raw 获得 logistic 先验,然后转换参数 theta 被定义为 theta_raw 的逆 logit。在这种参数化中,inv_logit(mu)theta 隐含先验的均值。如果 sigma 为 1 且 mu 为 0,theta 的先验分布将是平坦的;如果 sigma 大于 1,将呈 U 形;如果 sigma 小于 1,将呈钟形。

When moving from a variable in \((0, 1)\) to a simplex, the same trick may be performed using the softmax function, which is a multinomial generalization of the inverse logit function. First, consider a simplex parameter with a Dirichlet prior.

\((0, 1)\) 中的变量转移到单纯形时,可以使用 softmax 函数执行相同的技巧,它是逆 logit 函数的多项式推广。首先,考虑具有 Dirichlet 先验的单纯形参数。

parameters {
  simplex[K] theta;
  // ...
}
model {
  theta ~ dirichlet(a);
  // ...
  y ~ foo(theta);
}

Now a is a vector with K rows, but it has the same shape properties as the pair a and b for a beta; the beta distribution is just the distribution of the first component of a Dirichlet with parameter vector \([a b]^{\top}\). To formulate an unconstrained prior, the exact same strategy works as for the beta.

现在 a 是一个有 K 行的向量,但它具有与 Beta 分布的 ab 对相同的形状属性;Beta 分布只是具有参数向量 \([a b]^{\top}\) 的 Dirichlet 分布第一个分量的分布。要制定无约束先验,与 Beta 分布完全相同的策略也适用。

parameters {
  vector[K] theta_raw;
  // ...
}
transformed parameters {
  simplex[K] theta = softmax(theta_raw);
  // ...
}
model {
  theta_raw ~ multi_normal_cholesky(mu, L_Sigma);
}

The multivariate normal is used for convenience and efficiency with its Cholesky-factor parameterization. Now the mean is controlled by softmax(mu), but we have additional control of covariance through L_Sigma at the expense of having on the order of \(K^2\) parameters in the prior rather than order \(K\). If no covariance is desired, the number of parameters can be reduced back to \(K\) using a vectorized normal distribution as follows.

为了方便和效率,使用具有 Cholesky 因子参数化的多元正态分布。现在均值由 softmax(mu) 控制,但我们通过 L_Sigma 对协方差有额外的控制,代价是先验中有 \(K^2\) 数量级的参数而不是 \(K\) 数量级。如果不需要协方差,可以使用向量化的正态分布将参数数量减少到 \(K\),如下所示。

theta_raw ~ normal(mu, sigma);

where either or both of mu and sigma can be vectors.

其中 musigma 可以是向量,也可以都是向量。

25.3 Changes of variables

变量变换

Changes of variables are applied when the transformation of a parameter is characterized by a distribution. The standard textbook example is the lognormal distribution, which is the distribution of a variable \(y > 0\) whose logarithm \(\log y\) has a normal distribution. The distribution is being assigned to \(\log y\).

当参数的变换由分布来表征时,应用变量变换。标准教科书的例子是对数正态分布,它是变量 \(y > 0\) 的分布,其对数 \(\log y\) 服从正态分布。分布被赋予 \(\log y\)

The change of variables requires an adjustment to the probability to account for the distortion caused by the transform. For this to work, univariate changes of variables must be monotonic and differentiable everywhere in their support. Multivariate changes of variables must be injective and differentiable everywhere in their support, and they must map \(\mathbb{R}^N \rightarrow \mathbb{R}^N\).

变量变换需要对概率进行调整,以考虑变换引起的扭曲。为使其有效,单变量的变量变换必须在其支撑集上处处单调且可微。多元变量变换必须在其支撑集上处处单射且可微,并且必须映射 \(\mathbb{R}^N \rightarrow \mathbb{R}^N\)

The probability must be scaled by a Jacobian adjustment equal to the absolute determinant of the Jacobian of the transform. In the univariate case, the Jacobian adjustment is simply the absolute derivative of the transform.

概率必须通过雅可比调整进行缩放,该调整等于变换的雅可比行列式的绝对值。在单变量情况下,雅可比调整就是变换的绝对导数。

In the case of log normals, if \(y\)’s logarithm is normal with mean \(\mu\) and deviation \(\sigma\), then the distribution of \(y\) is given by

在对数正态的情况下,如果 \(y\) 的对数服从均值为 \(\mu\)、标准差为 \(\sigma\) 的正态分布,那么 \(y\) 的分布为

\[ p(y) = \textsf{normal}(\log y \mid \mu, \sigma) \, \left| \frac{d}{dy} \log y \right| = \textsf{normal}(\log y \mid \mu, \sigma) \, \frac{1}{y}. \]

Stan works on the log scale to prevent underflow, where

Stan 在对数尺度上工作以防止下溢,其中

\[ \log p(y) = \log \textsf{normal}(\log y \mid \mu, \sigma) - \log y. \]

In Stan, the change of variables can be applied in the sampling statement. To adjust for the curvature, the log probability accumulator is incremented with the log absolute derivative of the transform. The lognormal distribution can thus be implemented directly in Stan as follows.3

在 Stan 中,变量变换可以在采样语句中应用。为了调整曲率,对数概率累加器用变换的对数绝对导数增加。因此,对数正态分布可以在 Stan 中直接实现如下。4

parameters {
  real<lower=0> y;
  // ...
}
model {
  log(y) ~ normal(mu, sigma);
  target += -log(y);
  // ...
}

It is important, as always, to declare appropriate constraints on parameters; here y is constrained to be positive.

一如既往地重要的是,要对参数声明适当的约束;这里 y 被约束为正数。

It would be slightly more efficient to define a local variable for the logarithm, as follows.

为对数定义一个局部变量会稍微更高效,如下所示。

model {
  real log_y;
  log_y = log(y);
  log_y ~ normal(mu, sigma);
  target += -log_y;
  // ...
}

If y were declared as data instead of as a parameter, then the adjustment can be ignored because the data will be constant and Stan only requires the log probability up to a constant.

如果 y 被声明为数据而不是参数,那么可以忽略调整,因为数据将是常数,Stan 只需要对数概率到常数项。

Change of variables vs. transformations

变量变换与转换

This section illustrates the difference between a change of variables and a simple variable transformation. A transformation samples a parameter, then transforms it, whereas a change of variables transforms a parameter, then samples it. Only the latter requires a Jacobian adjustment.

本节说明变量变换和简单变量转换之间的区别。转换是采样一个参数,然后转换它;而变量变换是转换一个参数,然后采样它。只有后者需要雅可比调整。

It does not matter whether the probability function is expressed using a distribution statement, such as

概率函数是使用分布语句表达的,如

log(y) ~ normal(mu, sigma);

or as an increment to the log probability function, as in

还是作为对数概率函数的增量,如

target += normal_lpdf(log(y) | mu, sigma);

都没有关系。

Gamma and inverse gamma distribution

伽马和逆伽马分布

Like the log normal, the inverse gamma distribution is a distribution of variables whose inverse has a gamma distribution. This section contrasts two approaches, first with a transform, then with a change of variables.

像对数正态分布一样,逆伽马分布是其倒数服从伽马分布的变量的分布。本节对比两种方法,首先是转换,然后是变量变换。

The transform based approach to defining y_inv to have an inverse gamma distribution can be coded as follows.

基于转换的方法定义 y_inv 具有逆伽马分布可以编码如下。

parameters {
  real<lower=0> y;
}
transformed parameters {
  real<lower=0> y_inv;
  y_inv = 1 / y;
}
model {
  y ~ gamma(2,4);
}

The change-of-variables approach to defining y_inv to have an inverse gamma distribution can be coded as follows.

基于变量变换的方法定义 y_inv 具有逆伽马分布可以编码如下。

parameters {
  real<lower=0> y_inv;
}
transformed parameters {
  real<lower=0> y;
  y = 1 / y_inv;  // change variables
  jacobian += -2 * log(y_inv); // Jacobian adjustment
}
model {
  y ~ gamma(2,4);
}

The Jacobian adjustment is the log of the absolute derivative of the transform, which in this case is

雅可比调整是变换的绝对导数的对数,在这种情况下是

\[ \log \left| \frac{d}{du} \left( \frac{1}{u} \right) \right| = \log \left| - u^{-2} \right| = \log u^{-2} = -2 \log u. \]

Multivariate changes of variables

多元变量变换

In the case of a multivariate transform, the log of the absolute determinant of the Jacobian of the transform must be added to the log probability accumulator. In Stan, this can be coded as follows in the general case where the Jacobian is not a full matrix.

在多元变换的情况下,必须将变换的雅可比行列式的绝对值的对数添加到对数概率累加器中。在 Stan 中,当雅可比不是满秩矩阵的一般情况下,可以编码如下。

parameters {
  vector[K] u;      // multivariate parameter
   // ...
}
transformed parameters {
  vector[K] v;     // transformed parameter
  matrix[K, K] J;   // Jacobian matrix of transform
  // ... compute v as a function of u ...
  // ... compute J[m, n] = d.v[m] / d.u[n] ...
  jacobian += log(abs(determinant(J)));
  // ...
}
model {
  v ~ // ...
  // ...
}

If the determinant of the Jacobian is known analytically, it will be more efficient to apply it directly than to call the determinant function, which is neither efficient nor particularly stable numerically.

如果雅可比行列式可以解析地知道,直接应用它会比调用行列式函数更高效,后者既不高效,在数值上也不是特别稳定。

In many cases, the Jacobian matrix will be triangular, so that only the diagonal elements will be required for the determinant calculation. Triangular Jacobians arise when each element v[k] of the transformed parameter vector only depends on elements u[1], …, u[k] of the parameter vector. For triangular matrices, the determinant is the product of the diagonal elements, so the transformed parameters block of the above model can be simplified and made more efficient by recoding as follows.

在许多情况下,雅可比矩阵是三角形的,因此只需要对角元素来计算行列式。当转换参数向量的每个元素 v[k] 只依赖于参数向量的元素 u[1], …, u[k] 时,会产生三角雅可比。对于三角矩阵,行列式是对角元素的乘积,因此上述模型的转换参数块可以通过重新编码简化并提高效率,如下所示。

transformed parameters {
  // ...
  vector[K] J_diag;  // diagonals of Jacobian matrix
  // ...
  // ... compute J[k, k] = d.v[k] / d.u[k] ...
  jacobian += sum(log(J_diag));
  // ...
}

25.4 Vectors with varying bounds

具有变化边界的向量

Stan allows scalar and non-scalar upper and lower bounds to be declared in the constraints for a container data type. The transforms are calculated and their log Jacobians added to the log density accumulator; the Jacobian calculations are described in detail in the reference manual chapter on constrained parameter transforms.

Stan 允许在容器数据类型的约束中声明标量和非标量的上下界。计算变换并将其对数雅可比添加到对数密度累加器中;雅可比计算在约束参数变换的参考手册章节中有详细描述。

Varying lower bounds

变化的下界

For example, suppose there is a vector parameter \(\alpha\) with a vector \(L\) of lower bounds. The simplest way to deal with this if \(L\) is a constant is to shift a lower-bounded parameter.

例如,假设有一个向量参数 \(\alpha\) 带有下界向量 \(L\)。如果 \(L\) 是常数,处理这个问题的最简单方法是移动一个有下界的参数。

data {
  int N;
  vector[N] L;  // lower bounds
  // ...
}
parameters {
  vector<lower=L>[N] alpha_raw;
  // ...
}

The above is equivalent to manually calculating the vector bounds by the following.

上述等价于通过以下方式手动计算向量边界。

data {
  int N;
  vector[N] L;  // lower bounds
  // ...
}
parameters {
  vector<lower=0>[N] alpha_raw;
  // ...
}
transformed parameters {
  vector[N] alpha = L + alpha_raw;
  // ...
}

The Jacobian for adding a constant is one, so its log drops out of the log density.

添加常数的雅可比是 1,因此其对数从对数密度中消失。

Even if the lower bound is a parameter rather than data, there is no Jacobian required, because the transform from \((L, \alpha_{\textrm{raw}})\) to \((L + \alpha_{\textrm{raw}}, \alpha_{\textrm{raw}})\) produces a Jacobian derivative matrix with a unit determinant.

即使下界是参数而不是数据,也不需要雅可比,因为从 \((L, \alpha_{\textrm{raw}})\)\((L + \alpha_{\textrm{raw}}, \alpha_{\textrm{raw}})\) 的变换产生的雅可比导数矩阵具有单位行列式。

It’s also possible to implement the transform using an array or vector of parameters as bounds (with the requirement that the type of the variable must match the bound type) in the following.

也可以使用参数数组或向量作为边界来实现变换(要求变量类型必须匹配边界类型),如下所示。

data {
  int N;
  vector[N] L;  // lower bounds
  // ...
}
parameters {
  vector<lower=0>[N] alpha_raw;
  vector<lower=L + alpha_raw>[N] alpha;
  // ...
}

This is equivalent to directly transforming an unconstrained parameter and accounting for the Jacobian.

这等价于直接变换无约束参数并考虑雅可比。

data {
  int N;
  vector[N] L;  // lower bounds
  // ...
}
parameters {
  vector[N] alpha_raw;
  // ...
}
transformed parameters {
  vector[N] alpha = L + exp(alpha_raw);
  jacobian += sum(alpha_raw); // log Jacobian
  // ...
}
model {
  // ...
}

The adjustment in the log Jacobian determinant of the transform mapping \(\alpha_{\textrm{raw}}\) to \(\alpha = L + \exp(\alpha_{\textrm{raw}})\). The details are simple in this case because the Jacobian is diagonal; see the reference manual chapter on constrained parameter transforms for full details. Here \(L\) can even be a vector containing parameters that don’t depend on \(\alpha_{\textrm{raw}}\); if the bounds do depend on \(\alpha_{\textrm{raw}}\) then a revised Jacobian needs to be calculated taking into account the dependencies.

变换映射 \(\alpha_{\textrm{raw}}\)\(\alpha = L + \exp(\alpha_{\textrm{raw}})\) 的对数雅可比行列式的调整。在这种情况下细节很简单,因为雅可比是对角的;详见约束参数变换的参考手册章节。这里 \(L\) 甚至可以是包含不依赖于 \(\alpha_{\textrm{raw}}\) 的参数的向量;如果边界确实依赖于 \(\alpha_{\textrm{raw}}\),则需要计算考虑依赖关系的修正雅可比。

Varying upper and lower bounds

变化的上下界

Suppose there are lower and upper bounds that vary by parameter. These can be applied to shift and rescale a parameter constrained to \((0, 1)\). This is easily accomplished as the following.

假设有随参数变化的上下界。这些可以应用于移动和重新缩放约束到 \((0, 1)\) 的参数。这可以轻松完成,如下所示。

data {
  int N;
  vector[N] L;  // lower bounds
  vector[N] U;  // upper bounds
  // ...
}
parameters {
  vector<lower=L, upper=U>[N] alpha;
  // ...
}

The same may be accomplished by manually constructing the transform as follows.

通过手动构建变换也可以实现相同效果,如下所示。

data {
  int N;
  vector[N] L;  // lower bounds
  vector[N] U;  // upper bounds
  // ...
}
parameters {
  vector<lower=0, upper=1>[N] alpha_raw;
  // ...
}
transformed parameters {
  vector[N] alpha = L + (U - L) .* alpha_raw;
}

The expression U - L is multiplied by alpha_raw elementwise to produce a vector of variables in \((0, U-L)\), then adding \(L\) results in a variable ranging between \((L, U)\).

表达式 U - Lalpha_raw 逐元素相乘,产生 \((0, U-L)\) 中的变量向量,然后加上 \(L\) 得到范围在 \((L, U)\) 之间的变量。

In this case, it is important that \(L\) and \(U\) are constants, otherwise a Jacobian would be required when multiplying by \(U - L\).

在这种情况下,重要的是 \(L\)\(U\) 是常数,否则在乘以 \(U - L\) 时需要雅可比。

Gelman, Andrew. 2004. “Parameterization and Bayesian Modeling.” Journal of the American Statistical Association 99: 537–45.

  1. This is in contrast to (penalized) maximum likelihood estimates, which are not parameterization invariant.↩︎

  2. 这与(惩罚)最大似然估计形成对比,后者不具有参数化不变性。↩︎

  3. This example is for illustrative purposes only; the recommended way to implement the lognormal distribution in Stan is with the built-in lognormal probability function; see the functions reference manual for details.↩︎

  4. 此示例仅用于说明目的;在 Stan 中实现对数正态分布的推荐方法是使用内置的 lognormal 概率函数;详见函数参考手册。↩︎