2017 was the year of the last ImageNet competition. Deep neural networks brought the top-5 error rate on the famous data set down to 2.3% over the years, far beyond human ability. It’s time to push industrial applications, where new challenges await. On mobile and embedded devices, we do not only care for good accuracy. Models need a small memory footprint, high processing speed and low power consumption.
That’s where quantization steps in. The idea is to represent each weight in a network with only a few bits instead of a 32-bit floating point number. This drastically reduces the model size — crucial to fit the model on a small chip. If we further quantize the activations of each layer, efficient hardware implementations become possible. Cumbersome float multiplications can be replaced by cheaper procedures which makes inference faster and more energy-efficient. In a nutshell, this is what we work on at LeapMind!
Marianne’s full article on Quantized Networks can be found on Medium. Please have a look: https://firstname.lastname@example.org/step-by-step-to-a-quantized-network-5d7da6c52af1