make_vllm_worker¶
- class torchrl.modules.llm.make_vllm_worker(*, model_name: str, devices: list[torch.device | int] | None = None, num_devices: int | None = None, make_ray_worker: bool = True, enforce_eager: bool = False, **kwargs)[source]¶
Creates a vLLM inference engine with tensor parallelism support.
- Parameters:
model_name (str) – The model name to pass to vLLM.LLM.
devices (list[torch.device | int], optional) – List of devices to use. Exclusive with num_devices.
num_devices (int, optional) – Number of devices to use. Exclusive with devices.
make_ray_worker (bool, optional) – Whether to create a Ray actor. Defaults to True.
enforce_eager (bool, optional) – Whether to enforce eager execution. Defaults to False.
**kwargs – Additional arguments passed to vLLM.LLM.__init__.
- Returns:
Either a local vLLM LLM instance or a Ray actor handle.
- Return type:
LLM | ray.actor.ActorClass
Example
>>> # Create a 2-GPU tensor parallel worker with Ray >>> worker = make_vllm_worker("Qwen/Qwen2.5-3B", num_devices=2) >>> # Create a local LLM instance on GPU 1 >>> llm = make_vllm_worker("Qwen/Qwen2.5-3B", devices=[1], make_ray_worker=False)