Abstract

I will be reviewing our efforts in identifying value properties of Deep Learning models that hardware accelerators can use to improve execution time performance and energy efficiency. Our goal is to not sacrifice accuracy and to not require any changes to the model. I will be presenting our accelerator family which includes designs that exploit these properties. Our accelerators exploit ineffectual activations and weights, their variable precision requirements, or even their value content at the bit level. We have demonstrated performance benefits of 50% to up 27x over a highly optimized execution engine for neural networks. Further, our accelerators also enable on-the-fly trading off accuracy for further performance and energy efficiency improvements. I will emphasize our latest designs, Diffy, Tactical and Laconic. Tactical targets sparse models whereas Laconic achieves the highest performance when configured for embedded class devices. Diffy opens up new opportunities for deep learning models as it favors models where the values are spatially correlated. Computational Imaging models naturally exhibit such values. All our designs work with out-of-the-box networks and require no modifications or retraining. They deliver immediate benefits but also provide an incentive for further innovation in model design such as targeting a reduction in precision.

Biography: Since 2000 Andreas Moshovos has been teaching how to design and optimize computing hardware engines at the University of Toronto He has also taught at Northwestern University, USA, the University of Athens, Greece, the Hellenic Open University, Greece, and as an invited professor at the École Polytechnique Fédérale de Lausanne, Switzerland. He is a Fellow of the ACM.

keynote1

Abstract