LeapMind BLOG

"Quantized Convolutional Neural Networks Implementation"

My name is Antonio and I recently joined LeapMind.
Two months passed and every single day I left the office having learnt
lots of interesting concepts related to deep learning. I feel lucky and
proud of being part of such an amazing team.

I am currently involved in the implementation of a quantized version of
the AlexNet convolutional network architecture with low bit width
weights, activations and gradients. It is being a great experience
to develop every component from scratch without the help of any third
party libraries or frameworks.

One of the most interesting parts was writing the main container forh
holding both the weights and the values. These containers despite being
able to hold N-dimensional arrays I imposed a restriction where the
slowest changing dimensions are the ones used for sliding the kernel
during a convolution. With these arrangement we can convolve tensors of
arbitrary dimensions and still getting as a result volumes composed of
K 2-dimensional feature maps, being K the number of kernels to be convolved
with the input. Hence, the only limitation is that both the input and the
kernels share the first N-2 number of dimensions. The last two dimensions
of the kernel, the stride and the padding will define the size of the
feature map while the number of kernels will define the number of channels
of the output channel.

The re-arrangement of the bits necessary for convolving is also interesting
because it is possible to obtain a representation which is independent of
the number of dimensions but at the same time suitable to compute efficiently
the dot products using using bit-wise operations. If K kernels have to be
convolved with an input we can re-use the bit channel representation of the
input for every channel. For now, I started to use multiple threads so every
feature map can be generated in parallel in case of CPU-only systems.
Because every map needs basically bit-wise operators to be generated we could
also implement GPU kernels easily just in the same way every feature map is sent
to different CPU threads.

The implementation support some operators like low bit width convolution,
reshaping, activation and quantization. The operands are always tensors
following the above mentioned restriction in the case of the convolution
operator. I am trying to make as much compatible as possible with other
frameworks as TensorFlow so we can accurately be sure that our implementation
is working as expected. There is another advantage and it is about being able
to load pre-trained models and running them using our implementation.

I will talk about other details in the future. I am also currently interested
on other topics that I personally found fascinating like the properties of the
loss surfaces of multilayer neural networks and how these surfaces change as the
number of parameters increase. Recently I am also feeling very curious about
the neural turing machines that recently became very popular thanks to the recent
work of Google where they evolved the original idea and finally successfully define
the Differentiable Neural Computers. You can read about this in:

Back to Index