AxLaM: Energy-efficient accelerator design for language models for edge computing

Glint, TomTomGlintMittal, BhumikaBhumikaMittalSharma, SantriptaSantriptaSharmaRonak, Abdul QadirAbdul QadirRonakGoud, AbhinavAbhinavGoudKasture, NeerjaNeerjaKastureMomin, ZaqiZaqiMominKrishna, AravindAravindKrishnaMekie, JoyceeJoyceeMekie2025-08-312025-08-312025-01-1610.1098/rsta.2023.03952-s2.0-85216008333https://d8.irins.org/handle/IITG2025/2828339815979Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba. Compared to Simba, AxLaM achieves a ninefold energy reduction, 58% area reduction and 1.2 times improved latency, making it suitable for deployment in edge devices. The energy efficiency of AxLaN is 1.8 TOPS/W, 65% higher than FACT, which requires pre-processing of the language model before implementing it on the hardware. This article is part of the theme issue 'Emerging technologies for future secure computing platforms'.falsehardware accelerator | language model BERT | transformer acceleratorAxLaM: Energy-efficient accelerator design for language models for edge computingArticle16 January 2025220230395arJournal2WOS:001408814800001