Technology

With original Deep Neural Networks, we accelerate and miniaturize by hardware technology

Our technologies for energy-efficient deep learning on edge will be the key to implementing intelligent machines in every aspect of our world.

Low-bit quantization Deep Neural Network

Low-bit quantization Deep Neural Network

Make neural networks simpler and smaller. By deeply understanding deep learning, we implemented a smart neural network that does not require floating-point circuits.

Dedicated circuit design for minimal quantization Deep Neural Network

Dedicated circuit design for minimal quantization Deep Neural Network

By replacing the multiplication of floating point numbers with subtraction, we created a more power efficient circuit. Ideal for IoT and automotive embedded devices that require real-time.

Publication

Automated flow for compressing convolution neural networks for efficient edge-computation with FPGA

Abstract

Deep convolutional neural networks (CNN) based solutions are the current state-of-the-art for computer vision tasks. Due to the large size of these models, they are typically run on clusters of CPUs or GPUs. However, power requirements and cost budgets can be a major hindrance in adoption of CNN for IoT applications. Recent research highlights that CNN contain significant redundancy in their structure and can be quantized to lower bit-width parameters and activations, while maintaining acceptable accuracy. Low bit-width and especially single bit-width (binary) CNN are particularly suitable for mobile applications based on FPGA implementation, due to the bitwise logic operations involved in binarized CNN. Moreover, the transition to lower bit-widths opens new avenues for performance optimizations and model improvement. In this paper, we present an automatic flow from trained TensorFlow models to FPGA system on chip implementation of binarized CNN. This flow involves quantization of model parameters and activations, generation of network and model in embedded-C, followed by automatic generation of the FPGA accelerator for binary convolutions. The automated flow is demonstrated through implementation of binarized "YOLOV2" on the low cost, low power Cyclone-V FPGA device. Experiments on object detection using binarized YOLOV2 demonstrate significant performance benefit in terms of model size and inference speed on FPGA as compared to CPU and mobile CPU platforms. Furthermore, the entire automated flow from trained models to FPGA synthesis can be completed within one hour.

Authors

  • Farhan Shafiq
  • Takato Yamada
  • Antonio T. Vilchez
  • Sakyasingha Dasgupta

Venue

NIPS MLPCD Workshop, 2017

Publication

Comparison of Deep Learning Models for Semantic Segmentation on Domain Specific Data in Food Processing

Abstract

In recent years deep convolutional neural networks (CNN) have set the state-of-the-art for semantic segmentation. However, the reported results are commonly based on large public datasets covering a variation of outdoor/indoor images or medical domains while the performance of these methods on limited domain specific datasets remains an open question to both the research community and practitioners. We present experimental results obtained using deep semantic segmentation for a domain specific task. The aim of the task is an accurate localization of certain bones of the leg part of raw pork meat to automate an essential aspect of the food processing pipeline.

Authors

  • Nicolas Loerbroks
  • Piyawat (Patrick) Suwanvithaya
  • Isabel Schwende
  • Marko Simic
  • Elie Magambo

Venue

CVPR Deep-Vision Workshop, 2018