Shortcuts

make_vllm_worker

class torchrl.modules.llm.make_vllm_worker(*, model_name: str, devices: list[torch.device | int] | None = None, num_devices: int | None = None, make_ray_worker: bool = True, enforce_eager: bool = False, **kwargs)[source]

Creates a vLLM inference engine with tensor parallelism support.

Parameters:
  • model_name (str) – The model name to pass to vLLM.LLM.

  • devices (list[torch.device | int], optional) – List of devices to use. Exclusive with num_devices.

  • num_devices (int, optional) – Number of devices to use. Exclusive with devices.

  • make_ray_worker (bool, optional) – Whether to create a Ray actor. Defaults to True.

  • enforce_eager (bool, optional) – Whether to enforce eager execution. Defaults to False.

  • **kwargs – Additional arguments passed to vLLM.LLM.__init__.

Returns:

Either a local vLLM LLM instance or a Ray actor handle.

Return type:

LLM | ray.actor.ActorClass

Example

>>> # Create a 2-GPU tensor parallel worker with Ray
>>> worker = make_vllm_worker("Qwen/Qwen2.5-3B", num_devices=2)
>>> # Create a local LLM instance on GPU 1
>>> llm = make_vllm_worker("Qwen/Qwen2.5-3B", devices=[1], make_ray_worker=False)

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources