Please enable JS

Make things come alive with deep learning

Our technologies for energy-efficient deep learning on edge will be the key to implementing intelligent machines in every aspect of our world.

Quantization to low bit

Quantization enables us to use bit manipulation operations to accelarate neural networks.

Efficient architecture

We develop our own original neural network architectures that run optimally on our target hardwares.


We enable deep learning with low power consumption embedded FPGAs.

Key Technologies

Software & Hardware Technology


Deep learning compression and acceleration

We develop highly efficient deep learning architectures that work with limited memory and at high speed, customizable to meet specific hardware requirements.


Hardware Design

Our embedded deep learning solutions make use of an unique hardware IPs that can be implemented on small size FPGA-chips.


Automated flow for compressing convolution neural networks for efficient edge-computation with FPGA


Deep convolutional neural networks (CNN) based solutions are the current state-of-the-art for computer vision tasks. Due to the large size of these models, they are typically run on clusters of CPUs or GPUs. However, power requirements and cost budgets can be a major hindrance in adoption of CNN for IoT applications. Recent research highlights that CNN contain significant redundancy in their structure and can be quantized to lower bit-width parameters and activations, while maintaining acceptable accuracy. Low bit-width and especially single bit-width (binary) CNN are particularly suitable for mobile applications based on FPGA implementation, due to the bitwise logic operations involved in binarized CNN. Moreover, the transition to lower bit-widths opens new avenues for performance optimizations and model improvement. In this paper, we present an automatic flow from trained TensorFlow models to FPGA system on chip implementation of binarized CNN. This flow involves quantization of model parameters and activations, generation of network and model in embedded-C, followed by automatic generation of the FPGA accelerator for binary convolutions. The automated flow is demonstrated through implementation of binarized "YOLOV2" on the low cost, low power Cyclone-V FPGA device. Experiments on object detection using binarized YOLOV2 demonstrate significant performance benefit in terms of model size and inference speed on FPGA as compared to CPU and mobile CPU platforms. Furthermore, the entire automated flow from trained models to FPGA synthesis can be completed within one hour.


  • Farhan Shafiq,
  • Takato Yamada,
  • Antonio T. Vilchez,
  • Sakyasingha Dasgupta


NIPS MLPCD Workshop, 2017


Comparison of Deep Learning Models for Semantic Segmentation on Domain Specific Data in Food Processing


In recent years deep convolutional neural networks (CNN) have set the state-of-the-art for semantic segmentation. However, the reported results are commonly based on large public datasets covering a variation of outdoor/indoor images or medical domains while the performance of these methods on limited domain specific datasets remains an open question to both the research community and practitioners. We present experimental results obtained using deep semantic segmentation for a domain specific task. The aim of the task is an accurate localization of certain bones of the leg part of raw pork meat to automate an essential aspect of the food processing pipeline.


  • Nicolas Loerbroks,
  • Piyawat (Patrick) Suwanvithaya,
  • Isabel Schwende,
  • Marko Simic,
  • Elie Magambo


CVPR Deep-Vision Workshop, 2018