llama4_vision_projection_head¶

torchtune.models.llama4.llama4_vision_projection_head(*, decoder_embed_dim: int, clip_embed_dim: int, projection_embed_dim: int) → Llama4VisionProjectionHead[source]¶

Build the Llama 4 Vision Projection Head that maps the output of the CLIP encoder to embeddings that can be fed into the decoder.

Parameters:

decoder_embed_dim (int) – embedding dimension for the decoder.
clip_embed_dim (int) – embedding dimension for the CLIP encoder.
projection_embed_dim (int) – embedding dimension for the inner linear layers in the projection head.

Returns:

Instantiation of Llama 4 vision projection head.

Return type:

Llama4VisionProjectionHead

Docs