Analysis of Quantization Across DNN Accelerator Architecture Paradigms

Mekie, Joycee

doi:10.23919/DATE56975.2023.10136899

Analysis of Quantization Across DNN Accelerator Architecture Paradigms

Source

Proceedings Design Automation and Test in Europe Date

ISSN

15301591

Date Issued

2023-01-01

Author(s)

Glint, Tom

Jha, Chandan Kumar

Awasthi, Manu

Mekie, Joycee

DOI

10.23919/DATE56975.2023.10136899

Volume

2023-April

Abstract

Quantization techniques promise to significantly reduce the latency, energy, and area associated with multiplier hardware. This work, to the best of our knowledge, for the first time, shows the system-level impact of quantization on SOTA DNN accelerators from different digital accelerator paradigms. Based on the placement of data and compute site, we identify SOTA designs from Conventional Hardware Accelerators (CHA), Near Data Processors (NDP), and Processing-in-Memory (PIM) paradigms and show the impact of quantization when inferencing CNN and Fully Connected Layer (FCL) workloads. We show that the 32-bit implementation of SOTA from PIM consumes less energy than the 8-bit implementation of SOTA from CHA for FCL, while the trend reverses for CNN workloads. Further, PIM has stable latency while scaling the word size while CHA and NDP suffer 20% to 2× slow down for doubling word size.

Unpaywall

URI

https://d8.irins.org/handle/IITG2025/27026