分类:: Quantization

Gemmlowp

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference Quantization scheme Equation(1): \[ r=S(q-Z)\\ where\,\bold{S}\,is\,scale,\,\bold{Z}\,is\,zero-point\\

量化算法的一个总结

https://nervanasystems.github.io/distiller/algo_quantization/index.html 量化算法 基于范围线性量化 分解以上专业术语: 线性:Means a float value is quantized by multiplying with a numeric constant (the scale factor).

TensorRT-量化指北

TensorRT量化指北 对称的线性量化: \[ TensorValues = FP32\,scale\,factor\,*int8\,array \] One FP32 scale factor for the entire int8 tensor Q: 怎么设置scale factor? 非饱和方式:映射|max|到127 下图所示 一般上面的方式映射就会出现精