LeapMind BLOG

Two presentations at JSAI 2020 (Online conference)

This is Akina Tani, a Planning Staff at the Codev Division, LeapMind.

LeapMind will have two presentations at the 34th Annual Conference of the Japanese Society for Artificial Intelligence (JSAI2020) in this year. JSAI2020 will be held online to prevent new coronavirus infection.
In this blog, I will give an overview of the accepted papers.
(Last year, we presented "Reducing the computations in ConvRNNs" and "Final sample batch normalization". Click here for details)

Contents

  1. 1. Paper (1).Designing Lightweight Feature Descriptor Networks with Depthwise Separable Convolution
  2. 2. Paper (2).Gated extra memory recurrent unit for learning video representations
  3. 3. JSAI 2020 online conference participation and lecture attendance
  4. 4. Looking for motivated students for our internship program

Paper (1). Designing Lightweight Feature Descriptor Networks with Depthwise Separable Convolution

Official information: Time table
Presentation Date: Wed. Jun 10, 2020 10:00 AM - 10:20 AM
Webcast: Zoom
Program Number:2K1-ES-2-04

Y. R. Wang and A. Kanemura, “Designing Lightweight Feature Descriptor Networks with Depthwise Separable Convolution”, Annual Conference of Japanese Society for Artificial Intelligence (JSAI) , Japan, 2020.

Extracting feature points and their descriptors from images is one of the fundamental techniques in computer vision with many applications such as geometric fitting and camera calibration, and for feature extraction and description tasks several deep learning models have been proposed. However, existing feature descriptor networks have been developed with the intention of improving the accuracy, and consideration for practical networks that can run on embedded devices has somewhat been deferred. Therefore, the objective of this study was to devise lightweight feature descriptor networks.

In our paper, we reported that it was possible to reduce the theoretical computational load of a state-of-the-art local feature descriptor network (RF-Net)'s detector model by 80% with only a 11% degradation at worst performance in our final lightweight detector model for image matching tasks. Such a lightweight model will fit to more edge devices.

Below are some figures presented in the paper supporting our claims. Benchmarking was done following the RF-Net authors’ choice of using HPatches and EF dataset [arXiv].
Please refer for details of our paper to be published on the JSAI2020 website on May 20th onwards. [JSAI2020 Proceedings]

Model evaluation for our final lightweight RF-Net detector vs. matching scores. "Reported" means what the RF-Net paper reported and "Pretrained" means the performance of the pretrained RF-Net model available from https://github.com/Xylon-Sean/rfnet.

Percentage difference in matching score of our final lightweight RF-Net detector model and RF-Net’s "Reported" model, both against RF-Net’s "Pretrained" model.

Original RF-Net author matched image

Our final lightweight detector model matched image


Paper 2. Gated extra memory recurrent unit for learning video representations

Official information: Time table
Presentation Date: Wed. Jun 10, 2020 4:30 PM - 4:50 PM
Webcast: Zoom
Program Number: 2K5-ES-2-03

D. Vazhenina and A. Kanemura, “Gated extra memory recurrent unit for learning video representations”, Annual Conference of Japanese Society for Artificial Intelligence (JSAI) , Kumamoto, Japan, 2020.

To learn useful representations of a video, it is important to model dynamical structures. A model that can predict future observations is likely to have acquired a useful representation for a variety of visual perception tasks, such as object tracking and action recognition [1, 2]. Video frames (image sequences) contain both spatial and temporal information, which is commonly modelled by a variant of convolutional recurrent neural networks (ConvRNNs). Temporal information flow within recurrent nodes is controlled with a gating mechanism. For example, in the most common baseline recurrent unit, ConvLSTM, three gates are used to control the flow [2]. Even though ConvLSTM combines the temporal prediction ability of RNNs and efficient image processing with convolutional networks, this unit and its fewer-gated variants often keep short-time context (activating forget gate too often) for next frame prediction [3]. As a result, the shape of an object is not maintained well after the object has been occluded, causing large errors of next frame prediction [3, 4]. To cope with this drawback of the plain ConvLSTM, increasing the number of gates and making memory units more complicated are effective approaches. Such modifications allow us to keep object shapes for a longer time period, making video representations more reliable. However, the computational load of such models creates an obstacle to put them on embedded devices and apply them to real-world applications.

In this JSAI 2020 paper, we propose a ConvRNN unit with additional memory and a reduced number of gates. We employ a gated extra memory recurrent unit (GEM-RU), which has two memory units to keep temporal changes over frame sequences, and two gates to control temporal information flow. As can be seen from Figure 1, these changes improved the ability to keep object information over the baseline ConvLSTM, with fewer trainable parameters per layer than the state-of-the-art unit. We further reduced the number of parameters and multiplications by replacing convolution operators with the Hadamard product, as we proposed in our previous work [5], which showed that the ConvIndConvLSTM (CIC-LSTM) unit obtained by Hadamard product replacement provided better performance. Thus, we employed the same approach for our proposed units, and substituted the second convolutional operation in the two gate equations to the Hadamard product, resulting in a CIC-GEM-RU unit. The resulting model outperformed the baseline model by 5% in MSE, while reducing the number of parameters by 14% and the number of multiplications by 25% compared to the ConvLSTM baseline. Such reduction of computational loads will be useful for real-world applications.
Please refer for details of our paper to be published on the JSAI2020 website on May 20th onwards. [JSAI2020 Proceedings]

Figure 1: Comparison of the proposed GEM-RU and reduced CI-GEM-RU with the state-of-the-art and baseline units.

[1] Denton, Emily L. "Unsupervised learning of disentangled representations from video." Advances in Neural Information Processing Systems (NIPS), 2017.
[2] Xingjian, Shi, et al. "Convolutional LSTM network: A machine learning approach for precipitation nowcasting." Advances in neural information processing systems (NIPS), 2015.
[3] Wang, Yunbo et al. "PredRNN: Recurrent neural networks for predictive learning using spatiotemporal LSTMs." Advances in Neural Information Processing Systems (NIPS), 2017.
[4] Wang, Yunbo et al. "PredRNN++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning." International Conference on Machine Learning (ICML), 2018.
[5] Vazhenina, Daria and Kanemura, Atsunori. “Reducing the number of multiplications in convolutional recurrent neural networks (ConvRNNs)”, Advances in Intelligent Systems and Computing (AISC), volume 1128, Springer, 2019.


JSAI 2020 online conference participation and lecture attendance

The 34th Annual Conference of the Japanese Society for Artificial Intelligence, 2020. will be held online, but you need to register in advance to participation and lecture attendance. Researchers in Japan related to artificial intelligence will gather together to present the latest technological trends, research results, ideas, etc. related to artificial intelligence, so please consider taking this opportunity to participate JSAI 2020 online conference.

To apply for participation, please apply from the JSAI 2020 official website.


Looking for motivated students for our internship program

Let's stand shoulder-to-shoulder with skilled engineers around the world and get hands-on experience in LeapMind.

LeapMind is recruiting students year-round, who are interested in technical intern and deep learning.
Let's join in software development, processor design and other projects, which provide opportunity to deliver cutting-edge technology to society.

Click here for entry of intern.
Click here for entry of employee.

Back to Index