Quantization Operators¶
Quantization is a model optimization technique to reduce the size of a large model in order to achieve better storage performance with a small loss in accuracy.
CUDA Operators¶
-
at::Tensor _float_to_bfloat16_gpu(const at::Tensor &input)¶
Converts a tensor of
float
values into a tensor of Brain Floating Point (bfloat16
) values.- Parameters:
input – A tensor of
float
values- Returns:
A new tensor with values from the input tensor converted to
bfloat16
.
-
at::Tensor _bfloat16_to_float_gpu(const at::Tensor &input)¶
Converts a tensor of Brain Floating Point (
bfloat16
) values into a tensor offloat
values.- Parameters:
input – A tensor of
bfloat16
values- Returns:
A new tensor with values from the input tensor converted to
float
.
-
Tensor _float_to_FP8rowwise_gpu(const Tensor &input, const bool forward)¶
Converts a tensor of
float
values into a tensor offp8
values.- Parameters:
input – A tensor of
float
values. The dtype can be eitherSparseType::FP32
,SparseType::FP16
, orSparseType::BF16
forward –
- Throws:
c10::Error – if
input.dtype
is not one of (SparseType::FP32
,SparseType::FP16
, orSparseType::BF16
).- Returns:
A new tensor with values from the input tensor converted to
fp8
.
-
at::Tensor _FP8rowwise_to_float_gpu(const at::Tensor &input, bool forward, const int64_t output_dtype)¶
Converts a tensor of
fp8
values into a tensor offloat
values.- Parameters:
input – A tensor of
fp8
valuesforward –
output_dtype – The target floating point type, specified as integer representation of
SparseType
enum
- Throws:
c10::Error – if
output_dtype
is not one of (SparseType::FP32
,SparseType::FP16
, orSparseType::BF16
).- Returns:
A new tensor with values from the input tensor converted to
float
(withdtype
of eitherSparseType::FP32
,SparseType::FP16
, orSparseType::BF16
).
-
Tensor _float_to_fused8bitrowwise_gpu(const Tensor &input)¶
Converts a tensor of
float
values into a tensor of fused 8-bit rowwise values.- Parameters:
input – A tensor of
float
values- Returns:
A new tensor with values from the input tensor converted to fused 8-bit rowwise.
-
Tensor _half_to_fused8bitrowwise_gpu(const Tensor &input)¶
Converts a tensor of
at::Half
values into a tensor of fused 8-bit rowwise values.- Parameters:
input – A tensor of
at::Half
values- Returns:
A new tensor with values from the input tensor converted to fused 8-bit rowwise.
-
Tensor _single_or_half_precision_to_fused8bitrowwise_gpu(const Tensor &input)¶
Converts a tensor of
at::Single
orat::Half
values into a tensor of fused 8-bit rowwise values.- Parameters:
input – A tensor of
at::Single
orat::Half
values- Returns:
A new tensor with values from the input tensor converted to fused 8-bit rowwise.
-
at::Tensor _fused8bitrowwise_to_float_gpu(const at::Tensor &input)¶
Converts a tensor of fused 8-bit rowwise values into a tensor of
float
values.- Parameters:
input – A tensor of fused 8-bit rowwise values
- Returns:
A new tensor with values from the input tensor converted to
float
.
-
at::Tensor _fused8bitrowwise_to_half_gpu(const at::Tensor &input)¶
Converts a tensor of fused 8-bit rowwise values into a tensor of
at::Half
values.- Parameters:
input – A tensor of fused 8-bit rowwise values
- Returns:
A new tensor with values from the input tensor converted to
at::Half
.
-
at::Tensor _fused8bitrowwise_to_single_or_half_precision_gpu(const at::Tensor &input, const int64_t output_dtype, const bool scale_bias_last, const bool quant_padding_float_type)¶
Converts a tensor of fused 8-bit rowwise values into a tensor of
float
,at::Half
, orat::BFloat16
values.- Parameters:
input – A tensor of fused 8-bit rowwise values
output_dtype – The target floating point type, specified as integer representation of
SparseType
enum
- Throws:
c10::Error – if
output_dtype
is not one of (SparseType::FP32
,SparseType::FP16
, orSparseType::BF16
).- Returns:
A new tensor with values from the input tensor converted to
float
,at::Half
, orat::BFloat16
.
-
at::Tensor _fused8bitrowwise_to_float_mixed_dim_gpu(const at::Tensor &input, const at::Tensor &D_offsets, const int64_t output_dtype)¶
Converts a tensor of fused 8-bit rowwise values into a tensor of
at::kFloat
orat::kHalf
values.- Parameters:
input – A tensor of fused 8-bit rowwise values
D_offsets –
output_dtype – The target floating point type, specified as integer representation of
SparseType
enum
- Throws:
c10::Error – if
output_dtype
is not one of (SparseType::FP32
,SparseType::FP16
)- Returns:
A new tensor with values from the input tensor converted to
at::kFloat
orat::kHalf
.
-
Tensor _float_to_fusednbitrowwise_gpu(const Tensor &input, const int64_t bit_rate)¶
Converts a tensor of
float
values into a tensor of fused N-bit rowwise values.- Parameters:
input – A tensor of
float
valuesbit_rate –
- Returns:
A new tensor with values from the input tensor converted to fused N-bit rowwise.
-
at::Tensor _half_to_fusednbitrowwise_gpu(const at::Tensor &input, const int64_t bit_rate)¶
Converts a tensor of
at::Half
values into a tensor of fused N-bit rowwise values.- Parameters:
input – A tensor of
at::Half
valuesbit_rate –
- Returns:
A new tensor with values from the input tensor converted to fused N-bit rowwise.
-
Tensor _single_or_half_precision_to_fusednbitrowwise_gpu(const Tensor &input, const int64_t bit_rate)¶
Converts a tensor of
float
orat::Half
values into a tensor of fused N-bit rowwise values.- Parameters:
input – A tensor of
float
orat::Half
valuesbit_rate –
- Returns:
A new tensor with values from the input tensor converted to fused N-bit rowwise.
-
at::Tensor _fusednbitrowwise_to_float_gpu(const at::Tensor &input, const int64_t bit_rate)¶
Converts a tensor of fused N-bit rowwise values into a tensor of
float
values.- Parameters:
input – A tensor of fused N-bit rowwise values
bit_rate –
- Returns:
A new tensor with values from the input tensor converted to
float
.
-
at::Tensor _fusednbitrowwise_to_half_gpu(const at::Tensor &input, const int64_t bit_rate)¶
Converts a tensor of fused N-bit rowwise values into a tensor of
at::Half
values.- Parameters:
input – A tensor of fused N-bit rowwise values
bit_rate –
- Returns:
A new tensor with values from the input tensor converted to
at::Half
.
-
at::Tensor _fusednbitrowwise_to_single_or_half_precision_gpu(const at::Tensor &input, const int64_t bit_rate, const int64_t output_dtype)¶
Converts a tensor of fused N-bit rowwise values into a tensor of
float
orat::Half
orat::Bf16
values.- Parameters:
input – A tensor of fused N-bit rowwise values
bit_rate –
output_dtype – The target floating point type, specified as integer representation of
SparseType
enum
- Throws:
c10::Error – if
output_dtype
is not one of (SparseType::FP32
orSparseType::FP16
orSparseType::BF16
).- Returns:
A new tensor with values from the input tensor converted to
float
orat::Half
orat::Bf16
, depending onoutput_dtype
.
-
at::Tensor _float_to_hfp8_gpu(const at::Tensor &input, const int64_t ebits, const int64_t exponent_bias, const double max_pos)¶
Converts a tensor of
float
values into a tensor of Hybrid 8-bit Floating Point (hfp8
) values.- Parameters:
input – A tensor of
float
valuesebits –
exponent_bias –
max_pos –
- Throws:
c10::Error – if
ebits > 0
orexponent_bias > 0
.- Returns:
A new tensor with values from the input tensor converted to
hfp8
.
-
at::Tensor _hfp8_to_float_gpu(const at::Tensor &input, const int64_t ebits, const int64_t exponent_bias)¶
Converts a tensor of Hybrid 8-bit Floating Point (
hfp8
) values into a tensor offloat
values.- Parameters:
input – A tensor of
hfp8
valuesebits –
exponent_bias –
- Throws:
c10::Error – if
ebits > 0
orexponent_bias > 0
.- Returns:
A new tensor with values from the input tensor converted to
float
.
-
at::Tensor _float_to_msfp_gpu(const at::Tensor &input, const int64_t bounding_box_size, const int64_t ebits, const int64_t mbits, const int64_t bias, const double min_pos, const double max_pos)¶
Converts a tensor of
float
values into a tensor of Microsoft Floating Point (msfp
) values.- Parameters:
input – A tensor of
float
valuesbounding_box_size –
ebits –
mbits –
bias –
min_pos –
max_pos –
- Returns:
A new tensor with values from the input tensor converted to
msfp
.
-
at::Tensor _msfp_to_float_gpu(const at::Tensor &input, const int64_t ebits, const int64_t mbits, const int64_t bias)¶
Converts a tensor of Microsoft Floating Point (
msfp
) values into a tensor offloat
values.- Parameters:
input – A tensor of
msfp
valuesebits –
mbits –
bias –
- Returns:
A new tensor with values from the input tensor converted to
float
.
-
Tensor _float_to_paddedFP8rowwise_gpu(const Tensor &input, const bool forward, const int64_t row_dim)¶
Converts a tensor of
float
values into a tensor of paddedfp8
rowwise values.- Parameters:
input – A tensor of
float
values. The dtype can be eitherSparseType::FP32
,SparseType::FP16
, orSparseType::BF16
forward –
row_dim –
- Returns:
A new tensor with values from the input tensor converted to padded
fp8
rowwise.
-
at::Tensor _paddedFP8rowwise_to_float_gpu(const at::Tensor &input, const bool forward, const int64_t row_dim, const int64_t output_last_dim, const int64_t output_dtype)¶
Converts a tensor of padded
fp8
rowwise values into a tensor offloat values
.- Parameters:
input – A tensor of
float
values. The dtype can be eitherSparseType::FP32
,SparseType::FP16
, orSparseType::BF16
forward –
row_dim –
output_last_dim –
output_dtype – The target floating point type, specified as integer representation of
SparseType
enum
- Throws:
c10::Error – if
output_dtype
is not one of (SparseType::FP32
,SparseType::FP16
,SparseType::BF16
).- Returns:
A new tensor with values from the input tensor converted to
float
.
CPU Operators¶
-
Tensor &_fused8bitrowwise_to_float_cpu_out(Tensor &output, const Tensor &input)¶
-
Tensor &_float_to_fused8bitrowwise_cpu_out(Tensor &output, const Tensor &input)¶
-
Tensor float_to_fused8bitrowwise_cpu(const Tensor &input)¶
-
Tensor half_to_fused8bitrowwise_cpu(const Tensor &input)¶
-
Tensor float_or_half_to_fused8bitrowwise_cpu(const Tensor &input)¶
-
Tensor fused8bitrowwise_to_float_cpu(const Tensor &input)¶
-
Tensor fused8bitrowwise_to_half_cpu(const Tensor &input)¶
-
Tensor fused8bitrowwise_to_float_or_half_cpu(const Tensor &input, const int64_t output_dtype, const bool scale_bias_last, const bool quant_padding_float_type)¶
-
Tensor float_to_FP8rowwise_cpu(const Tensor &input, bool forward)¶
-
Tensor FP8rowwise_to_float_cpu(const Tensor &input, bool forward, const int64_t output_dtype)¶
-
Tensor fusednbitrowwise_to_float_cpu(const Tensor &input, const int64_t bit_rate)¶
-
Tensor fusednbitrowwise_to_half_cpu(const Tensor &input, const int64_t bit_rate)¶
-
Tensor fusednbitrowwise_to_float_or_half_cpu(const Tensor &input, const int64_t bit_rate, const int64_t output_dtype)¶
-
void FloatToFP8Quantized_ref(const float *const input, const size_t nrows, const size_t ncols, uint8_t *const output, const int ebits, const int exponent_bias, const double max_pos)¶
-
void FP8QuantizedToFloat_ref(const uint8_t *const input, const size_t nrows, const size_t ncols, float *const output, const int ebits, const int exponent_bias)¶