.. currentmodule:: torchrl.collectors

.. _collectors_eval:

Evaluation
==========

The :class:`Evaluator` class provides a unified interface for running evaluation
rollouts during RL training, either **synchronously** (blocking) or
**asynchronously** (in a background thread or Ray actor).

Why use an Evaluator?
---------------------

In most RL training loops, evaluation is done inline and **blocks** the training
loop while rollouts are collected.  For environments with expensive step
functions (robotics simulators, LLM generation, etc.) this can waste
significant training time.  The :class:`Evaluator` decouples evaluation from
training by running rollouts in the background and letting you poll for metrics
or react to results via a callback.

Quick example
-------------

.. code-block:: python

    from torchrl.collectors import Evaluator
    from torchrl.envs import GymEnv
    from tensordict.nn import TensorDictModule
    import torch.nn as nn

    def make_eval_env():
        return GymEnv("HalfCheetah-v4")

    eval_policy = TensorDictModule(
        nn.Linear(17, 6), in_keys=["observation"], out_keys=["action"],
    )

    evaluator = Evaluator(
        make_eval_env,
        eval_policy,
        max_steps=1000,
        on_result=lambda result: my_logger.log_metrics(
            {k: v.item() for k, v in result.items() if k != "eval/step"},
            step=result["eval/step"].item(),
        ),
    )

    # --- Inside training loop ---
    for data in collector:
        train(data)

        if should_eval:
            # Non-blocking: kick off eval and move on
            evaluator.trigger_eval(weights=train_policy, step=collected_frames)

        # Optionally check for results
        result = evaluator.poll()
        if result is not None:
            print(result)  # {"eval/reward": ..., "eval/episode_length": ..., "eval/fps": ...}

    evaluator.shutdown()

Synchronous usage
-----------------

If you prefer blocking evaluation (e.g. for final evaluation or simple
scripts), use :meth:`~Evaluator.evaluate`:

.. code-block:: python

    metrics = evaluator.evaluate(weights=train_policy, step=step)
    # metrics == {"eval/reward": -123.4, "eval/episode_length": 1000, "eval/fps": ...}

Asynchronous usage
------------------

For non-blocking evaluation during training:

.. code-block:: python

    # Start eval in the background
    evaluator.trigger_eval(weights=train_policy, step=step)

    # ... continue training ...

    # Check if results are ready (non-blocking)
    result = evaluator.poll()          # returns None if still running

    # Or block until done
    result = evaluator.wait(timeout=60)

By default, :meth:`~Evaluator.trigger_eval` raises if a previous evaluation is
still pending. Set ``busy_policy="queue"`` to enqueue later requests instead.

Device placement and compilation
--------------------------------

For best performance, place the eval policy on a **dedicated device** and
optionally ``torch.compile`` both the env and policy independently of the
training pipeline:

.. code-block:: python

    import torch

    eval_device = torch.device("cuda:1")  # training on cuda:0

    eval_policy = make_policy().to(eval_device)
    eval_env = make_env(device=eval_device)

    # Optional: compile for extra speed
    eval_policy = torch.compile(eval_policy)

    evaluator = Evaluator(
        eval_env,
        eval_policy,
        max_steps=1000,
        device=eval_device,
    )

The ``device`` parameter controls where policy weights are moved before
each rollout.  When passing weights from the training policy (which may
live on a different device), the Evaluator automatically moves them to
the eval device.

Overlap policy (backpressure)
-----------------------------

Calling :meth:`~Evaluator.trigger_eval` while a previous evaluation is
still pending raises immediately by default (``busy_policy="error"``).
This keeps training loops from silently piling up stale evaluation requests.

If you prefer to enqueue evaluations, pass ``busy_policy="queue"``.
Queued requests are processed in order as earlier evaluations finish.

Result callbacks
----------------

Pass ``on_result`` to react to completed evaluations without manual
``poll()`` bookkeeping:

.. code-block:: python

    def on_eval(result):
        metrics = {k: v.item() for k, v in result.items() if k != "eval/step"}
        if metrics["eval/reward"] > best_reward:
            save_checkpoint(step=result["eval/step"].item())

    evaluator = Evaluator(env, policy, max_steps=1000, on_result=on_eval)

For asynchronous evaluations, ``on_result`` runs on the evaluator's
background coordination thread. If your callback talks to a logger that
is also used by the training loop, handle any required locking inside the
callback.

Backends
--------

The :class:`Evaluator` supports two backends selected via the ``backend``
parameter:

**Thread backend** (default, ``backend="thread"``):

- Runs ``env.rollout()`` in a daemon thread within the same process.
- No extra dependencies required.
- Best for most single-node training setups.

**Ray backend** (``backend="ray"``):

- Wraps :class:`~torchrl.collectors.distributed.RayEvalWorker` under the same
  API.
- Runs evaluation in a separate Ray actor process with its own GPU.
- Required when the eval environment needs process-level initialisation
  (e.g. Isaac Lab's ``AppLauncher``).

.. code-block:: python

    evaluator = Evaluator(
        make_eval_env,
        policy_factory=make_eval_policy,
        max_steps=1000,
        backend="ray",
        init_fn=my_process_init,
        num_gpus=1,
    )

Custom metrics and callbacks
----------------------------

Pass a ``metrics_fn`` to extract custom metrics from rollout data:

.. code-block:: python

    def my_metrics(rollout_td):
        return {
            "success_rate": (rollout_td["next", "success"].any(-1).float().mean().item()),
        }

    evaluator = Evaluator(env, policy, max_steps=1000, metrics_fn=my_metrics)

Or use ``on_result`` to consume the prefixed evaluation metrics as a flat
tensordict:

.. code-block:: python

    def on_eval(result):
        if result["eval/reward"].item() > best_reward:
            save_checkpoint(result["eval/step"].item())

    evaluator = Evaluator(env, policy, max_steps=1000, on_result=on_eval)

API Reference
-------------

.. autosummary::
    :toctree: generated/
    :template: rl_template.rst

    Evaluator