eduzhai > Applied Sciences > Engineering >

Pyramidal Convolution Rethinking Convolutional Neural Networks for Visual Recognition

  • Save

... pages left unread,continue reading

Document pages: 16 pages

Abstract: This work introduces pyramidal convolution (PyConv), which is capable ofprocessing the input at multiple filter scales. PyConv contains a pyramid ofkernels, where each level involves different types of filters with varying sizeand depth, which are able to capture different levels of details in the scene.On top of these improved recognition capabilities, PyConv is also efficientand, with our formulation, it does not increase the computational cost andparameters compared to standard convolution. Moreover, it is very flexible andextensible, providing a large space of potential network architectures fordifferent applications. PyConv has the potential to impact nearly everycomputer vision task and, in this work, we present different architecturesbased on PyConv for four main tasks on visual recognition: imageclassification, video action classification recognition, object detection andsemantic image segmentation parsing. Our approach shows significantimprovements over all these core tasks in comparison with the baselines. Forinstance, on image recognition, our 50-layers network outperforms in terms ofrecognition performance on ImageNet dataset its counterpart baseline ResNetwith 152 layers, while having 2.39 times less parameters, 2.52 times lowercomputational complexity and more than 3 times less layers. On imagesegmentation, our novel framework sets a new state-of-the-art on thechallenging ADE20K benchmark for scene parsing. Code is available at:this https URL

Please select stars to rate!

         

0 comments Sign in to leave a comment.

    Data loading, please wait...
×