Please enable JS

Technology everywhere

世界の隅々に、Deep Learningを。LeapMindの小型・省エネ技術は、あらゆる機械に人間を支援する知性を与えます。

低ビットへの量子化

量子化により、ビット演算を使った高速化を実現します。

効率的なアーキテクチャ

ターゲットハードウェアを考慮した独自のニューラルネットワークアーキテクチャを開発しています。

FPGA

低消費電力の組み込み用FPGAでのDeep Learningを可能にします。

Key Technologies

Software & Hardware Technology

組込みDeep Learningにはソフトウェアとハードウェアの両方の高度な技術が必要です

research_software

ソフトウェア高速化・圧縮技術

精度を維持しながらモデルを圧縮
モデルサイズを小さくすることで実行速度を大幅に高速化することが可能です。独自技術により精度を維持しつつモデルサイズを圧縮することが可能です。

research_hardware

ハードウェア設計

FPGAによるDeep Learning専用回路を構築することでCPUと比べて高速に
FPGA上にDeep Learningに適した専用回路を構築することが可能です。エッジ上で動かすことにより高速かつリアルタイムでレスポンスを取得することが可能になり、IoTや自動車組込み機器など様々な用途に活用できます。

Publication

Automated flow for compressing convolution neural networks for efficient edge-computation with FPGA

Abstract

Deep convolutional neural networks (CNN) based solutions are the current state-of-the-art for computer vision tasks. Due to the large size of these models, they are typically run on clusters of CPUs or GPUs. However, power requirements and cost budgets can be a major hindrance in adoption of CNN for IoT applications. Recent research highlights that CNN contain significant redundancy in their structure and can be quantized to lower bit-width parameters and activations, while maintaining acceptable accuracy. Low bit-width and especially single bit-width (binary) CNN are particularly suitable for mobile applications based on FPGA implementation, due to the bitwise logic operations involved in binarized CNN. Moreover, the transition to lower bit-widths opens new avenues for performance optimizations and model improvement. In this paper, we present an automatic flow from trained TensorFlow models to FPGA system on chip implementation of binarized CNN. This flow involves quantization of model parameters and activations, generation of network and model in embedded-C, followed by automatic generation of the FPGA accelerator for binary convolutions. The automated flow is demonstrated through implementation of binarized "YOLOV2" on the low cost, low power Cyclone-V FPGA device. Experiments on object detection using binarized YOLOV2 demonstrate significant performance benefit in terms of model size and inference speed on FPGA as compared to CPU and mobile CPU platforms. Furthermore, the entire automated flow from trained models to FPGA synthesis can be completed within one hour.

Authors

  • Farhan Shafiq,
  • Takato Yamada,
  • Antonio T. Vilchez,
  • Sakyasingha Dasgupta

Venue

NIPS MLPCD Workshop, 2017

Publication

Comparison of Deep Learning Models for Semantic Segmentation on Domain Specific Data in Food Processing

Abstract

In recent years deep convolutional neural networks (CNN) have set the state-of-the-art for semantic segmentation. However, the reported results are commonly based on large public datasets covering a variation of outdoor/indoor images or medical domains while the performance of these methods on limited domain specific datasets remains an open question to both the research community and practitioners. We present experimental results obtained using deep semantic segmentation for a domain specific task. The aim of the task is an accurate localization of certain bones of the leg part of raw pork meat to automate an essential aspect of the food processing pipeline.

Authors

  • Nicolas Loerbroks,
  • Piyawat (Patrick) Suwanvithaya,
  • Isabel Schwende,
  • Marko Simic,
  • Elie Magambo

Venue

CVPR Deep-Vision Workshop, 2018