Value Networks and Critics#
Value networks estimate the value of states or state-action pairs.
|
General class for value functions in RL. |
|
Abstract base class for value normalisers. |
|
PopArt-style EMA value normaliser. |
|
Exact running mean / variance (Welford's online algorithm). |
|
Dueling CNN Q-network. |
|
Distributional Deep Q-Network softmax layer. |
|
A convolutional neural network. |
|
Specification for one agent group used by |
|
Centralised critic that conditions on observations from multiple agent groups. |
|
A multi-layer perceptron. |
|
DDPG Convolutional Actor class. |
|
DDPG Convolutional Q-value class. |
|
DDPG Actor class. |
|
DDPG Q-value MLP class. |
|
An embedder for an LSTM module. |
|
An embedder for an GRU module. |
|
Canonicalize only the union of RNN keys used by |
|
Context manager for setting RNNs recurrent mode. |
|
Online Decision Transformer Actor class. |
|
Decision Transformer Actor class. |
|
Online Decision Transformer. |