Shortcuts

llama4_vision_projection_head

torchtune.models.llama4.llama4_vision_projection_head(*, decoder_embed_dim: int, clip_embed_dim: int, projection_embed_dim: int) Llama4VisionProjectionHead[source]

Build the Llama 4 Vision Projection Head that maps the output of the CLIP encoder to embeddings that can be fed into the decoder.

Parameters:
  • decoder_embed_dim (int) – embedding dimension for the decoder.

  • clip_embed_dim (int) – embedding dimension for the CLIP encoder.

  • projection_embed_dim (int) – embedding dimension for the inner linear layers in the projection head.

Returns:

Instantiation of Llama 4 vision projection head.

Return type:

Llama4VisionProjectionHead

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources