Shortcuts

Using the Evaluator

Author: Vincent Moens

What you will learn
  • How to run synchronous and asynchronous evaluations during training

  • How to pass updated weights to the evaluator

  • How to use the on_result callback for logging

  • How to run evaluation in a separate process

Prerequisites

In RL training loops, evaluation is often done inline: you stop training, run a few rollouts, log the metrics, then resume. This blocks the training loop while rollouts are collected. For environments with expensive step functions (robotics simulators, LLM generation, etc.), this can waste significant GPU time.

The Evaluator decouples evaluation from training by running rollouts in the background and letting you poll for metrics or react to results via a callback.

In this tutorial we will cover:

  1. Synchronous evaluation — blocking calls

  2. Asynchronous evaluation — fire-and-poll

  3. Weight updates — passing trained weights

  4. Process-based evaluation — out-of-process

  5. Logging with callbackson_result

from functools import partial

import torch
from tensordict import from_module
from tensordict.nn import TensorDictModule
from torch import nn
from torchrl.collectors import Evaluator, RandomPolicy
from torchrl.envs import GymEnv

Synchronous evaluation

The simplest way to use the Evaluator is to call evaluate(), which blocks until the rollout completes and returns a metrics dict.

We start by creating an environment factory and a random policy. The Evaluator accepts either a live environment or a callable that creates one — the callable form is preferred because it lets the evaluator recreate the environment if needed.

env_maker = partial(GymEnv, "Pendulum-v1")
policy = RandomPolicy(env_maker().action_spec)
evaluator = Evaluator(env_maker, policy, num_trajectories=1)

Now we can run a blocking evaluation. The returned dict contains prefixed metrics: reward, episode length, number of episodes, and frames-per-second.

result = evaluator.evaluate()
print("First eval:", result)
First eval: {'eval/reward': -1641.96826171875, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 708.2942105549897, 'eval/step': 0}

Each subsequent call increments the internal step counter:

result = evaluator.evaluate()
print("Second eval:", result)
Second eval: {'eval/reward': -1682.063720703125, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 725.2587154674727, 'eval/step': 1}

Asynchronous evaluation

For non-blocking evaluation, use trigger_eval() to start a rollout in the background, then poll() or wait() to retrieve the result.

evaluator.trigger_eval()

# poll() is non-blocking: returns None if the result isn't ready yet
result = evaluator.poll()
print("poll() returned:", result)
poll() returned: None

To wait for the result, pass a timeout to poll() or use wait():

result = evaluator.poll(timeout=30)
print("poll(timeout=30) returned:", result)
poll(timeout=30) returned: {'eval/reward': -1457.2855224609375, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 721.3010574578926, 'eval/step': 2}

By default, calling trigger_eval() while a previous evaluation is still pending raises an error. This prevents silently piling up stale requests:

evaluator.trigger_eval()
try:
    evaluator.trigger_eval()
except RuntimeError as e:
    print(f"Errored with: {e}")

# Clean up
evaluator.wait(timeout=30)
evaluator.shutdown()
Errored with: Evaluation already pending. Wait for completion or set busy_policy='queue'.

If you prefer to enqueue evaluations, pass busy_policy="queue" when creating the evaluator.

Weight updates

In a real training loop, you want to evaluate the latest trained weights, not the initial ones. The evaluate() and trigger_eval() methods accept a weights argument — either an nn.Module or a TensorDictBase.

Let’s create a simple MLP policy and an evaluator for it:

env = env_maker()
net = nn.Sequential(
    nn.Linear(env.observation_spec["observation"].shape[-1], 64),
    nn.Tanh(),
    nn.Linear(64, env.action_spec.shape[-1]),
)
real_policy = TensorDictModule(net, in_keys=["observation"], out_keys=["action"])

evaluator_w = Evaluator(env_maker, real_policy, num_trajectories=1)

Evaluate with the initial (random) weights:

print("Before weight update:", evaluator_w.evaluate())
Before weight update: {'eval/reward': -1413.3218994140625, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 661.9042895729661, 'eval/step': 0}

Simulate a “training step” by perturbing the weights:

with torch.no_grad():
    for p in net.parameters():
        p.add_(torch.randn_like(p) * 0.1)

Now evaluate with the updated weights. You can pass the module directly — the evaluator extracts and transfers the weights automatically:

print("After weight update:", evaluator_w.evaluate(weights=real_policy))
After weight update: {'eval/reward': -1208.4190673828125, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 675.5085238299725, 'eval/step': 1}

You can also pass a TensorDictBase of weights, which is useful when you already have detached weight snapshots:

real_weights = from_module(real_policy)
print("With TensorDict weights:", evaluator_w.evaluate(weights=real_weights))
evaluator_w.shutdown()
With TensorDict weights: {'eval/reward': -1890.05029296875, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 653.3084171956334, 'eval/step': 2}

Process-based evaluation

For full isolation (e.g. to place evaluation on a dedicated GPU or to avoid GIL contention), use backend="process". This runs the environment and policy inside a child process via MultiSyncCollector.

The process backend requires callable factories for both the environment and the policy:

env_maker = partial(GymEnv, "Pendulum-v1")
action_spec = env_maker().action_spec
policy_factory = partial(RandomPolicy, action_spec)

evaluator_proc = Evaluator(
    env_maker,
    policy_factory=policy_factory,
    num_trajectories=1,
    backend="process",
)

result = evaluator_proc.evaluate()
print("Process backend:", result)
evaluator_proc.shutdown()
Process backend: {'eval/reward': -1453.0191650390625, 'eval/reward_std': 0.0, 'eval/num_episodes': 1, 'eval/episode_length': 200.0, 'eval/fps': 505.6974491606256, 'eval/step': 0}

Logging with callbacks

Rather than manually logging after each poll() or wait(), you can pass an on_result callback to the evaluator. It receives a flat TensorDictBase with the same prefixed metric names.

Here we use TorchRL’s CSVLogger to automatically log every evaluation result to a CSV file:

import tempfile

from torchrl.record.loggers.csv import CSVLogger

log_dir = tempfile.mkdtemp()
logger = CSVLogger(exp_name="eval_demo", log_dir=log_dir)

evaluator_log = Evaluator(
    env_maker,
    real_policy,
    num_trajectories=1,
    on_result=lambda result: logger.log_metrics(
        {k: v.item() for k, v in result.items() if k != "eval/step"},
        step=result["eval/step"].item(),
    ),
)

Run a few evals. Each one automatically logs to CSV via the callback:

for _ in range(3):
    evaluator_log.evaluate(weights=real_policy)

evaluator_log.shutdown()

Let’s verify what was logged:

from pathlib import Path

csv_path = next(Path(log_dir).rglob("*.csv"))
print(f"Logged to: {csv_path}")
print(csv_path.read_text())
Logged to: /tmp/tmprby2c4z0/eval_demo/scalars/eval/reward.csv
0,-1873.7410888671875
1,-1070.4537353515625
2,-1615.29541015625

The on_result callback works with both synchronous and asynchronous evaluation. For async usage, the callback runs on the evaluator’s background thread — if your callback writes to a shared logger, handle any required locking inside the callback.

Summary

The Evaluator provides a single, composable entry-point for evaluation:

  • Synchronous: evaluate() for blocking rollouts.

  • Asynchronous: trigger_eval() + poll() / wait() for background rollouts.

  • Weight sync: pass weights (module or tensordict) to evaluate the latest trained parameters.

  • Process isolation: backend="process" for dedicated-device eval.

  • Callbacks: on_result for automatic logging or checkpointing.

Useful next resources

Total running time of the script: (0 minutes 6.320 seconds)

Gallery generated by Sphinx-Gallery

Docs

Lorem ipsum dolor sit amet, consectetur

View Docs

Tutorials

Lorem ipsum dolor sit amet, consectetur

View Tutorials

Resources

Lorem ipsum dolor sit amet, consectetur

View Resources