LP-3DCNN: Unveiling local phase in 3D convolutional neural networks

Raman, Shanmuganathan

doi:10.1109/CVPR.2019.00504

LP-3DCNN: Unveiling local phase in 3D convolutional neural networks

Source

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

ISSN

10636919

Date Issued

2019-06-01

Author(s)

Kumawat, Sudhakar

Raman, Shanmuganathan

DOI

10.1109/CVPR.2019.00504

Volume

2019-June

Abstract

Traditional 3D Convolutional Neural Networks (CNNs) are computationally expensive, memory intensive, prone to overfit, and most importantly, there is a need to improve their feature learning capabilities. To address these issues, we propose Rectified Local Phase Volume (ReLPV) block, an efficient alternative to the standard 3D convolutional layer. The ReLPV block extracts the phase in a 3D local neighborhood (e.g., 3x3x3) of each position of the input map to obtain the feature maps. The phase is extracted by computing 3D Short Term Fourier Transform (STFT) at multiple fixed low frequency points in the 3D local neighborhood of each position. These feature maps at different frequency points are then linearly combined after passing them through an activation function. The ReLPV block provides significant parameter savings of at least, 3 3 to 13 3 times compared to the standard 3D convolutional layer with the filter sizes 3x3x3 to 13x13x13, respectively. We show that the feature learning capabilities of the ReLPV block are significantly better than the standard 3D convolutional layer. Furthermore, it produces consistently better results across different 3D data representations. We achieve state-of-the-art accuracy on the volumetric ModelNet10 and ModelNet40 datasets while utilizing only 11% parameters of the current state-of-the-art. We also improve the state-of-the-art on the UCF-101 split-1 action recognition dataset by 5.68% (when trained from scratch) while using only 15% of the parameters of the state-of-the-art.

Unpaywall

URI

https://d8.irins.org/handle/IITG2025/23254

Subjects

Deep Learning | Scene Analysis and Understanding | Video Analytics | Vision Applications and Systems