TBE CPU Autovectorization¶
FP8/16/32 Autovec Implementation Methods¶
- template<typename InType, typename IndexType, typename OffsetType, typename OutType> static bool ALWAYS_INLINE EmbeddingSpMDM_autovec (const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const InType *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const bool no_bag, const bool is_bf16_out, const bool is_bf16_in)
- Autovectorized version of method - EmbeddingSpMDM_reffor FP32 weight type.- Template Parameters:
- InType – input data type ( - uint8_tis used)
- IndexType – index data type ( - int64_tis used)
- OffsetType – offset data type ( - int32_tis used)
- OutType – output data type ( - floatis used)
 
- Parameters:
- block_size – Number of elements in a block ( - int64_t)
- output_size – Number of elements in output ( - int64_t)
- index_size – Number of elements in index ( - int64_t)
- data_size – Number of elements in data ( - int64_t)
- input – Address of input ( - InType*)
- indices – Address of index ( - IndexType*)
- offsets_or_lengths – Address of offset ( - OffsetType*)
- weights – Weights of sum; optional, can be null for non-weighted sum ( - float*)
- normalize_by_lengths – Whether or not to normalize by lengths ( - bool)
- out – Address of output ( - OutType*)
- is_weight_positional – If - true, weight is positional; set to- falsefor FP32 autovec implementation (- bool)
- use_offsets – If - true, will use offsets instead of lengths; set to- truefor FP32 autovec implementation (- bool)
- output_stride – If -1, output_stride is same as block_size; set to -1 for FP32 autovec implementation ( - int64_t)
- input_stride – If -1, input_stride is same as block_size; set to -1 for FP32 autovec implementation ( - int64_t)
- scale_bias_last – If - true, scale and bias appear at end of each row; set to- truefor FP32 autovec implementation (- bool)
- no_bag – If - true, no embedding bag; set to- falsefor FP32 autovec implementation (- bool)
- is_bf16_out – If - true, output is- BFLOAT16type; set to- falsefor FP32 autovec implementation (- bool)
- is_bf16_in – If - true, input is- BFLOAT16type; set to- falsefor FP32 autovec implementation (- bool)
 
 
- template<typename IndexType, typename OffsetType, typename OutType> static bool ALWAYS_INLINE EmbeddingSpMDMFP8_autovec (const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const uint8_t *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const int exponent_bits, const int exponent_bias, const bool is_bf16_out)
- Autovectorized version of method - EmbeddingSpMDM_reffor FP8 weight type.- Template Parameters:
- InType – input data type ( - uint8_tis used)
- IndexType – index data type ( - int64_tis used)
- OffsetType – offset data type ( - int32_tis used)
- OutType – output data type ( - floatis used)
 
- Parameters:
- block_size – Number of elements in a block ( - int64_t)
- output_size – Number of elements in output ( - int64_t)
- index_size – Number of elements in index ( - int64_t)
- data_size – Number of elements in data ( - int64_t)
- input – Address of input ( - InType*)
- indices – Address of index ( - IndexType*)
- offsets_or_lengths – Address of offset ( - OffsetType*)
- weights – Weights of sum; optional, can be null for non-weighted sum ( - float*)
- normalize_by_lengths – Whether or not to normalize by lengths ( - bool)
- out – Address of output ( - OutType*)
- is_weight_positional – If - true, weight is positional; set to- falsefor FP8 autovec implementation (- bool)
- use_offsets – If - true, will use offsets instead of lengths; set to- truefor FP8 autovec implementation (- bool)
- output_stride – If -1, output_stride is same as block_size; set to -1 for FP8 autovec implementation ( - int64_t)
- exponent_bits – Bits to use in exponent 
- exponent_bias – Bias to use in exponent 
- is_bf16_out – If - true, output is- BFLOAT16type; set to- falsefor FP8 autovec implementation (- bool)