eduzhai > Applied Sciences > Engineering >

Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs

  • Save

... pages left unread,continue reading

Document pages: 14 pages

Abstract: The spread of deep learning on embedded devices has prompted the developmentof numerous methods to optimise the deployment of deep neural networks (DNN).Works have mainly focused on: i) efficient DNN architectures, ii) networkoptimisation techniques such as pruning and quantisation, iii) optimisedalgorithms to speed up the execution of the most computational intensive layersand, iv) dedicated hardware to accelerate the data flow and computation.However, there is a lack of research on cross-level optimisation as the spaceof approaches becomes too large to test and obtain a globally optimisedsolution. Thus, leading to suboptimal deployment in terms of latency, accuracy,and memory. In this work, we first detail and analyse the methods to improvethe deployment of DNNs across the different levels of software optimisation.Building on this knowledge, we present an automated exploration framework toease the deployment of DNNs. The framework relies on a Reinforcement Learningsearch that, combined with a deep learning inference framework, automaticallyexplores the design space and learns an optimised solution that speeds up theperformance and reduces the memory on embedded CPU platforms. Thus, we presenta set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPUplatforms achieving up to 4x improvement in performance and over 2x reductionin memory with negligible loss in accuracy with respect to the BLASfloating-point implementation.

Please select stars to rate!

         

0 comments Sign in to leave a comment.

    Data loading, please wait...
×