23  比例常数

Proportionality Constants

本节译者:沈梓梁

初次校审:李君竹

二次校审:邱怡轩(DeepSeek 辅助)

When evaluating a likelihood or prior as part of the log density computation in MCMC, variational inference, or optimization, it is usually only necessary to compute the functions up to a proportionality constant (or similarly compute log densities up to an additive constant). In MCMC this comes from the fact that the distribution being sampled does not need to be normalized (and so it is the normalization constant that is ignored). Similarly the distribution does not need normalized to perform variational inference or do optimizations. The advantage of working with unnormalized distributions is they can make computation quite a bit cheaper.

在 MCMC、变分推断或优化过程中计算似然函数或先验时(它们通常是对数密度计算的一部分),通常只需要计算函数到某个比例常数(或类似地,计算对数密度到某个加法常数)。在 MCMC 中,这是因为被采样的分布不需要归一化(因此被忽略的是归一化常数)。类似地,在进行变分推断或优化时,分布也不需要归一化。使用非归一化分布的优势在于它们可以显著降低计算成本。

There are three different syntaxes to build the model in Stan. The way to select between them is by determining if the proportionality constants are necessary. If performance is not a problem, it is always safe to use the normalized densities.

在 Stan 中有三种不同的语法来建立模型,而选择哪种语法取决于是否需要比例常数。如果性能不是问题,使用归一化的密度总是安全的。

The distribution statement (~) and log density increment statement (target +=) with _lupdf() use unnormalized densities for \(x\) (dropping proportionality constants):

分布语句(~)和带有 _lupdf() 的对数密度增量语句(target +=)使用 \(x\) 的非归一化密度(即丢弃比例常数):

x ~ normal(0, 1);
target += normal_lupdf(x | 0, 1); // the 'u' is for unnormalized

The log density increment statement (target +=) with _lpdf() uses the full normalized density for \(x\) (dropping no constants):

带有 _lpdf() 的对数密度增量语句(target +=)使用 \(x\) 的完整归一化密度(即不丢弃任何常数):

target += normal_lpdf(x | 0, 1);

For discrete distributions, the target += syntax is using _lupmf and _lpmf instead:

对于离散分布,target += 语法使用的是 _lupmf_lpmf:

y ~ bernoulli(0.5);
target += bernoulli_lupmf(y | 0.5);
target += bernoulli_lpmf(y | 0.5);

23.1 Dropping Proportionality Constants

丢弃比例常数

If a density \(p(\theta)\) can be factored into \(K g(\theta)\) where \(K\) are all the factors that are a not a function of \(\theta\) and \(g(\theta)\) are all the terms that are a function of \(\theta\), then it is said that \(g(\theta)\) is proportional to \(p(\theta)\) up to a constant.

如果一个密度 \(p(\theta)\) 可以分解成 \(K g(\theta)\),其中 \(K\) 是一个不依赖于 \(\theta\) 的因子,而 \(g(\theta)\) 是所有依赖于 \(\theta\) 的项,那么称 \(g(\theta)\)\(p(\theta)\) 成比例,它们相差一个常数。

The advantage of all this is that sometimes \(K\) is expensive to compute and if it is not a function of the distribution that is to be sampled (or optimized or approximated with variational inference), there is no need to compute it because it will not affect the results.

这样做的好处是,有时 \(K\) 的计算成本很高,并且如果它不是待采样(或优化,或用变分推断近似)的分布的函数,则无需计算它,因为它不会影响结果。

Stan takes advantage of the proportionality constant fact with the ~ syntax. Take for instance the normal data model:

Stan 利用比例常数这一特性,通过 ~ 语法予以实现。以正态数据模型为例:

data {
  real mu;
  real<lower=0.0> sigma;
}
parameters {
  real x;
}
model {
  x ~ normal(mu, sigma);
}

Syntactically, this is just shorthand for the equivalent model that replaces the ~ syntax with a target += statement and a normal_lupdf function call:

在语法上,这只是一个等价模型的简写形式,该等价模型使用 target += 语句和 normal_lupdf 函数调用替换了 ~ 语法:

data {
  real mu;
  real<lower=0.0> sigma;
}
parameters {
  real x;
}
model {
  target += normal_lupdf(x | mu, sigma)
}

The function normal_lupdf is only guaranteed to return the log density of the normal distribution up to a proportionality constant density to be sampled. The proportionality constant itself is not defined. The full log density of the statement here is:

函数 normal_lupdf 只保证返回待采样的正态分布的对数密度加上某一个比例常数。比例常数本身是未定义的。这里语句的完整对数密度是:

\[ \textsf{normal\_lpdf}(x | \mu, \sigma) = -\log \left( \sigma \sqrt{2 \pi} \right) -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2. \]

Now because the density here is only a function of \(x\), the additive terms in the log density that are not a function of \(x\) can be dropped. In this case it is enough to know only the quadratic term:

因为这里的密度仅是 \(x\) 的函数,因此对数密度中不依赖于 \(x\) 的加法项可以被丢弃。在这种情况下,只需要知道二次项即可:

\[ \textsf{normal\_lupdf}(x | \mu, \sigma) = -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2. \]

23.2 Keeping Proportionality Constants

保留比例常数

In the case that the proportionality constants were needed for a normal log density the function normal_lpdf can be used. For clarity, if there is ever a situation where it is unclear if the normalization is necessary, it should always be safe to include it. Only use the ~ or target += normal_lupdf syntaxes if it is absolutely clear that the proportionality constants are not necessary.

在需要正态对数密度的比例常数的情况下,可以使用函数 normal_lpdf。为清晰起见,如果存在不清楚是否需要归一化的情形,包含它总是安全的。只有绝对清楚比例常数不必要时,才应使用 ~target += normal_lupdf 语法。

23.3 User-defined Distributions

用户自定义分布

When a custom _lpdf or _lpmf function is defined, the compiler will automatically make available a _lupdf or _lupmf version of the function. It is only possible to define custom distributions in the normalized form in Stan. Any attempt to define an unnormalized distribution directly will result in an error.

当定义一个自定义的 _lpdf_lpmf 函数时,编译器将自动提供该函数的 _lupdf_lupmf 版本。在 Stan 中,只能以归一化形式定义自定义分布。任何直接定义非归一化分布的尝试都会导致错误。

The difference in the normalized and unnormalized versions of custom probability functions is how probability functions are treated inside these functions. Any internal unnormalized probability function call will be replaced with its normalized equivalent if the normalized version of the parent custom distribution is called.

自定义概率函数的归一化和非归一化版本之间的区别在于这些函数内部如何处理概率函数调用。如果调用的外层自定义分布是归一化版本,则内层的任何非归一化概率函数调用都将被替换为其归一化的等效形式。

The following code demonstrates the different behaviors:

下面的代码演示了不同的行为:

functions {
  real custom1_lpdf(x) {
    return normal_lupdf(x | 0.0, 1.0)
  }
  real custom2_lpdf(x) {
    return normal_lpdf(x | 0.0, 1.0)
  }
}
parameters {
  real mu;
}
model {
  mu ~ custom1(); // Normalization constants dropped
  target += custom1_lupdf(mu); // Normalization constants dropped
  target += custom1_lpdf(mu);  // Normalization constants kept

  mu ~ custom2();  // Normalization constants kept
  target += custom2_lupdf(mu);  // Normalization constants kept
  target += custom2_lpdf(mu);  // Normalization constants kept
}

23.4 Limitations on Using _lupdf and _lupmf Functions

使用 _lupdf_lupmf 函数的限制

To avoid ambiguities in how the normalization constants work, functions ending in _lupdf and _lupmf can only be used in the model block or user-defined probability functions (functions ending in _lpdf or _lpmf).

为了避免在归一化常数工作机制上产生歧义,以 _lupdf_lupmf 结尾的函数只能在模型块或用户定义的概率函数(以 _lpdf_lpmf 结尾的函数)中使用。