Rate this Page

RunningValueNorm#

class torchrl.modules.RunningValueNorm(*, shape: int | tuple[int, ...] = 1, epsilon: float = 1e-05, device: device | None = None)[source]#

Exact running mean / variance (Welford’s online algorithm).

Unlike PopArtValueNorm, this normaliser does not decay older samples — it accumulates the true sample mean and variance over every target it has ever seen. Useful when value targets are roughly stationary (no curriculum, no reward-shaping schedule), where the EMA’s adaptivity is unnecessary and the exact running stats give a slightly tighter estimate.

Keyword Arguments:
  • shape – per-element shape of the value tensor. Defaults to 1.

  • epsilon – numerical stabiliser added to the running variance. Defaults to 1e-5.

  • device – device for the running-stats buffers.

Example

>>> vn = RunningValueNorm(shape=1)
>>> for _ in range(10):
...     vn.update(torch.randn(64, 1) * 3.0 + 1.0)
>>> normed = vn.normalize(torch.randn(8, 1))
denormalize(normalised_value: Tensor) Tensor[source]#

Inverse of normalize() — recover real-scale values.

normalize(value_target: Tensor) Tensor[source]#

Standardise value_target using the current running stats.

update(value_target: Tensor) None[source]#

Fold a batch of value targets into the running stats.