Note
Go to the end to download the full example code.
Using the Evaluator¶
Author: Vincent Moens
How to run synchronous and asynchronous evaluations during training
How to pass updated weights to the evaluator
How to use the
on_resultcallback for loggingHow to run evaluation in a separate process
In RL training loops, evaluation is often done inline: you stop training, run a few rollouts, log the metrics, then resume. This blocks the training loop while rollouts are collected. For environments with expensive step functions (robotics simulators, LLM generation, etc.), this can waste significant GPU time.
The Evaluator decouples evaluation from
training by running rollouts in the background and letting you poll for
metrics or react to results via a callback.
In this tutorial we will cover:
Synchronous evaluation — blocking calls
Asynchronous evaluation — fire-and-poll
Weight updates — passing trained weights
Process-based evaluation — out-of-process
Logging with callbacks —
on_result
from functools import partial
import torch
from tensordict import from_module
from tensordict.nn import TensorDictModule
from torch import nn
from torchrl.collectors import Evaluator, RandomPolicy
from torchrl.envs import GymEnv
Synchronous evaluation¶
The simplest way to use the Evaluator is to call
evaluate(), which blocks until
the rollout completes and returns a metrics dict.
We start by creating an environment factory and a random policy. The Evaluator accepts either a live environment or a callable that creates one — the callable form is preferred because it lets the evaluator recreate the environment if needed.
env_maker = partial(GymEnv, "Pendulum-v1")
policy = RandomPolicy(env_maker().action_spec)
evaluator = Evaluator(env_maker, policy, num_trajectories=1)
Now we can run a blocking evaluation. The returned dict contains prefixed metrics: reward, episode length, number of episodes, and frames-per-second.
result = evaluator.evaluate()
print("First eval:", result)
First eval: {'eval/reward': -1641.96826171875, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 708.2942105549897, 'eval/step': 0}
Each subsequent call increments the internal step counter:
result = evaluator.evaluate()
print("Second eval:", result)
Second eval: {'eval/reward': -1682.063720703125, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 725.2587154674727, 'eval/step': 1}
Asynchronous evaluation¶
For non-blocking evaluation, use trigger_eval()
to start a rollout in the background, then poll()
or wait() to retrieve the result.
evaluator.trigger_eval()
# poll() is non-blocking: returns None if the result isn't ready yet
result = evaluator.poll()
print("poll() returned:", result)
poll() returned: None
To wait for the result, pass a timeout to poll() or use wait():
result = evaluator.poll(timeout=30)
print("poll(timeout=30) returned:", result)
poll(timeout=30) returned: {'eval/reward': -1457.2855224609375, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 721.3010574578926, 'eval/step': 2}
By default, calling trigger_eval() while a previous evaluation is
still pending raises an error. This prevents silently piling up stale
requests:
evaluator.trigger_eval()
try:
evaluator.trigger_eval()
except RuntimeError as e:
print(f"Errored with: {e}")
# Clean up
evaluator.wait(timeout=30)
evaluator.shutdown()
Errored with: Evaluation already pending. Wait for completion or set busy_policy='queue'.
If you prefer to enqueue evaluations, pass busy_policy="queue"
when creating the evaluator.
Weight updates¶
In a real training loop, you want to evaluate the latest trained
weights, not the initial ones. The evaluate()
and trigger_eval() methods accept a
weights argument — either an nn.Module or a TensorDictBase.
Let’s create a simple MLP policy and an evaluator for it:
env = env_maker()
net = nn.Sequential(
nn.Linear(env.observation_spec["observation"].shape[-1], 64),
nn.Tanh(),
nn.Linear(64, env.action_spec.shape[-1]),
)
real_policy = TensorDictModule(net, in_keys=["observation"], out_keys=["action"])
evaluator_w = Evaluator(env_maker, real_policy, num_trajectories=1)
Evaluate with the initial (random) weights:
print("Before weight update:", evaluator_w.evaluate())
Before weight update: {'eval/reward': -1413.3218994140625, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 661.9042895729661, 'eval/step': 0}
Simulate a “training step” by perturbing the weights:
with torch.no_grad():
for p in net.parameters():
p.add_(torch.randn_like(p) * 0.1)
Now evaluate with the updated weights. You can pass the module directly — the evaluator extracts and transfers the weights automatically:
print("After weight update:", evaluator_w.evaluate(weights=real_policy))
After weight update: {'eval/reward': -1208.4190673828125, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 675.5085238299725, 'eval/step': 1}
You can also pass a TensorDictBase of weights, which is useful
when you already have detached weight snapshots:
real_weights = from_module(real_policy)
print("With TensorDict weights:", evaluator_w.evaluate(weights=real_weights))
evaluator_w.shutdown()
With TensorDict weights: {'eval/reward': -1890.05029296875, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 653.3084171956334, 'eval/step': 2}
Process-based evaluation¶
For full isolation (e.g. to place evaluation on a dedicated GPU or to
avoid GIL contention), use backend="process". This runs the
environment and policy inside a child process via
MultiSyncCollector.
The process backend requires callable factories for both the environment and the policy:
env_maker = partial(GymEnv, "Pendulum-v1")
action_spec = env_maker().action_spec
policy_factory = partial(RandomPolicy, action_spec)
evaluator_proc = Evaluator(
env_maker,
policy_factory=policy_factory,
num_trajectories=1,
backend="process",
)
result = evaluator_proc.evaluate()
print("Process backend:", result)
evaluator_proc.shutdown()
Process backend: {'eval/reward': -1453.0191650390625, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 505.6974491606256, 'eval/step': 0}
Logging with callbacks¶
Rather than manually logging after each poll() or wait(), you
can pass an on_result callback to the evaluator. It receives a flat
TensorDictBase with the same prefixed metric names.
Here we use TorchRL’s CSVLogger to
automatically log every evaluation result to a CSV file:
import tempfile
from torchrl.record.loggers.csv import CSVLogger
log_dir = tempfile.mkdtemp()
logger = CSVLogger(exp_name="eval_demo", log_dir=log_dir)
evaluator_log = Evaluator(
env_maker,
real_policy,
num_trajectories=1,
on_result=lambda result: logger.log_metrics(
{k: v.item() for k, v in result.items() if k != "eval/step"},
step=result["eval/step"].item(),
),
)
Run a few evals. Each one automatically logs to CSV via the callback:
for _ in range(3):
evaluator_log.evaluate(weights=real_policy)
evaluator_log.shutdown()
Let’s verify what was logged:
from pathlib import Path
csv_path = next(Path(log_dir).rglob("*.csv"))
print(f"Logged to: {csv_path}")
print(csv_path.read_text())
Logged to: /tmp/tmprby2c4z0/eval_demo/scalars/eval/reward.csv
0,-1873.7410888671875
1,-1070.4537353515625
2,-1615.29541015625
The on_result callback works with both synchronous and asynchronous
evaluation. For async usage, the callback runs on the evaluator’s
background thread — if your callback writes to a shared logger, handle
any required locking inside the callback.
Summary¶
The Evaluator provides a single,
composable entry-point for evaluation:
Synchronous:
evaluate()for blocking rollouts.Asynchronous:
trigger_eval()+poll()/wait()for background rollouts.Weight sync: pass
weights(module or tensordict) to evaluate the latest trained parameters.Process isolation:
backend="process"for dedicated-device eval.Callbacks:
on_resultfor automatic logging or checkpointing.
Useful next resources¶
Evaluator API reference — full parameter docs.
Collector trajectory tutorial — deep dive into how collectors assemble data.
Total running time of the script: (0 minutes 6.320 seconds)