make_vllm_worker¶

class torchrl.modules.llm.make_vllm_worker(*, model_name: str, devices: list[torch.device | int] | None = None, num_devices: int | None = None, make_ray_worker: bool = True, enforce_eager: bool = False, **kwargs)[source]¶

Creates a vLLM inference engine with tensor parallelism support.

Parameters:

model_name (str) – The model name to pass to vLLM.LLM.
devices (list[torch.device | int], optional) – List of devices to use. Exclusive with num_devices.
num_devices (int, optional) – Number of devices to use. Exclusive with devices.
make_ray_worker (bool, optional) – Whether to create a Ray actor. Defaults to True.
enforce_eager (bool, optional) – Whether to enforce eager execution. Defaults to False.
**kwargs – Additional arguments passed to vLLM.LLM.__init__.

Returns:

Either a local vLLM LLM instance or a Ray actor handle.

Return type:

LLM | ray.actor.ActorClass

Example

>>> # Create a 2-GPU tensor parallel worker with Ray
>>> worker = make_vllm_worker("Qwen/Qwen2.5-3B", num_devices=2)
>>> # Create a local LLM instance on GPU 1
>>> llm = make_vllm_worker("Qwen/Qwen2.5-3B", devices=[1], make_ray_worker=False)

make_vllm_worker¶

Docs

Tutorials

Resources