Our technologies for energy-efficient deep learning on edge will be the key to implementing intelligent machines in every aspect of our world.
Quantization to low bit
Quantization enables us to use bit manipulation operations to accelarate neural networks.
We develop our own original neural network architectures that run optimally on our target hardwares.
We enable deep learning with low power consumption embedded FPGAs.
Software & Hardware Technology
Deep learning compression and acceleration
We develop highly efficient deep learning architectures that work with limited memory and at high speed, customizable to meet specific hardware requirements.
Our embedded deep learning solutions make use of an unique hardware IPs that can be implemented on small size FPGA-chips.
Automated flow for compressing convolution neural networks for efficient edge-computation with FPGA
Deep convolutional neural networks (CNN) based solutions are the current state-of-the-art for computer vision tasks. Due to the large size of these models, they are typically run on clusters of CPUs or GPUs. However, power requirements and cost
budgets can be a major hindrance in adoption of CNN for IoT applications. Recent research highlights that CNN contain significant redundancy in their structure and can be quantized to lower bit-width parameters and activations, while maintaining
acceptable accuracy. Low bit-width and especially single bit-width (binary) CNN are particularly suitable for mobile applications based on FPGA implementation, due to the bitwise logic operations involved in binarized CNN. Moreover, the transition
to lower bit-widths opens new avenues for performance optimizations and model improvement. In this paper, we present an automatic flow from trained TensorFlow models to FPGA system on chip implementation of binarized CNN. This flow involves
quantization of model parameters and activations, generation of network and model in embedded-C, followed by automatic generation of the FPGA accelerator for binary convolutions. The automated flow is demonstrated through implementation of binarized
"YOLOV2" on the low cost, low power Cyclone-V FPGA device. Experiments on object detection using binarized YOLOV2 demonstrate significant performance benefit in terms of model size and inference speed on FPGA as compared to CPU and mobile CPU
platforms. Furthermore, the entire automated flow from trained models to FPGA synthesis can be completed within one hour.
Antonio T. Vilchez,
NIPS MLPCD Workshop, 2017
Comparison of Deep Learning Models for Semantic Segmentation on Domain Specific Data in Food Processing
In recent years deep convolutional neural networks (CNN) have set the state-of-the-art for semantic segmentation. However, the reported results are commonly based on large public datasets covering a variation of outdoor/indoor images or medical
domains while the performance of these methods on limited domain specific datasets remains an open question to both the research community and practitioners. We present experimental results obtained using deep semantic segmentation for a domain
specific task. The aim of the task is an accurate localization of certain bones of the leg part of raw pork meat to automate an essential aspect of the food processing pipeline.