Glint, TomTomGlintAwasthi, ManuManuAwasthiMekie, JoyceeJoyceeMekie2025-08-312025-08-312024-01-01[9798350393545]10.1109/ASP-DAC58780.2024.104739352-s2.0-85189366217https://d8.irins.org/handle/IITG2025/29163Hardware accelerators are preferred over general-purpose processors for processing Deep Neural Networks (DNN) as the later suffer from power and memory walls. However, hardware accelerators designed as a separate logic chip from the memory still suffer from memory wall. Processing-in-memory accelerators, which try to overcome this memory wall by developing the compute elements as part of the memory structures, are highly constrained due to the memory manufacturing process. Near-data-processing (NDP) based hardware accelerator design is an alternative paradigm that could combine the benefit of high bandwidth, low access energy of processing-in-memory, and design flexibility of separate logic chip. However, NDP has area, data flow and thermal constraints, hindering high throughput designs. In this work, we propose an HBM3-based NDP accelerator that tackles the constraints of NDP with a hardware-software co-design approach. The proposed design takes only 50% area, delivers a speed-up of 3×, and is about 6 × more energy efficient than state-of-the-art NDP hardware accelerator for inferencing workloads such as AlexNet, MobileNet, ResNet, and VGG without loss of accuracy.falseHardware-software co-design. Deep Neural Network Accelerator | HBM3Hardware-Software Co-Design of a Collaborative DNN Accelerator for 3D Stacked Memories with Multi-Channel DataConference Paper454-45920242cpConference Proceeding2