eduzhai > Applied Sciences > Engineering >

Automatic deep heterogeneous quantization of Deep Neural Networks for ultra low-area low-latency inference on the edge at particle colliders

  • Save

... pages left unread,continue reading

Document pages: 13 pages

Abstract: While the quest for more accurate solutions is pushing deep learning researchtowards larger and more complex algorithms, edge devices demand efficientinference i.e. reduction in model size, latency and energy consumption. Atechnique to limit model size is quantization, i.e. using fewer bits torepresent weights and biases. Such an approach usually results in a decline inperformance. Here, we introduce a novel method for designing optimallyheterogeneously quantized versions of deep neural network models forminimum-energy, high-accuracy, nanosecond inference and fully automateddeployment on chip. With a per-layer, per-parameter type automatic quantizationprocedure, sampling from a wide range of quantizers, model energy consumptionand size are minimized while high accuracy is maintained. This is crucial forthe event selection procedure in proton-proton collisions at the CERN LargeHadron Collider, where resources are strictly limited and a latency of${ mathcal O}(1)~ mu$s is required. Nanosecond inference and a resourceconsumption reduced by a factor of $50$ when implemented on FPGA hardware isachieved.

Please select stars to rate!

         

0 comments Sign in to leave a comment.

    Data loading, please wait...
×