qwen2¶

torchtune.models.qwen2.qwen2(vocab_size: int, num_layers: int, num_heads: int, num_kv_heads: int, embed_dim: int, intermediate_dim: int, max_seq_len: int, head_dim: Optional[int] = None, attn_dropout: float = 0.0, norm_eps: float = 1e-05, rope_base: float = 1000000.0, tie_word_embeddings: bool = False, q_proj_bias: bool = True, k_proj_bias: bool = True, v_proj_bias: bool = True, q_norm: bool = False, k_norm: bool = False) → TransformerDecoder[source]¶

Build the decoder associated with the Qwen2 model. This includes: - Token embeddings - num_layers number of TransformerSelfAttentionLayer blocks - RMS Norm layer applied to the output of the transformer - Final projection into token space

Parameters:

vocab_size (int) – number of tokens in vocabulary.
num_layers (int) – number of layers in the transformer decoder.
num_heads (int) – number of query heads. For MHA this is also the number of heads for key and value
num_kv_heads (int) – number of key and value heads. User should ensure num_heads % num_kv_heads == 0. For standard MHA set num_kv_heads == num_heads, for GQA num_kv_heads < num_heads, and for MQA set num_kv_heads == 1.
embed_dim (int) – embedding dimension for self-attention
max_seq_len (int) – maximum sequence length the model will be run with, as used by KVCache()
attn_dropout (float) – dropout value passed onto scaled_dot_product_attention. Default: 0.0
intermediate_dim (Optional[int]) – intermediate dimension for MLP. If not specified, this is computed using scale_hidden_dim_for_mlp()
head_dim (Optional[int]) – Dimension of each attention head. If not specified, it defaults to embed_dim // num_heads. In GQA, head_dim is not necessarily equal to embed_dim // num_heads, so this parameter allows the caller to explicitly specify a custom value.
norm_eps (float) – epsilon in RMS norms.
rope_base (float) – the base period of the RoPE embeddings.
tie_word_embeddings (bool) – whether the model’s input and output word embeddings should be tied.
q_proj_bias (bool) – whether to use bias in the query projection.
k_proj_bias (bool) – whether to use bias in the key projection.
v_proj_bias (bool) – whether to use bias in the value projection.
q_norm (bool) – whether to use normalization in the query projection.
k_norm (bool) – whether to use normalization in the key projection.

Returns:

Instantiation of Qwen2 model.

Return type:

TransformerDecoder

qwen2¶

Docs

Tutorials

Resources