VeriBench: Benchmarking Large Language Models for Verilog Code Generation and Design Synthesis

Mekie, Joycee

doi:10.1109/ISCAS56072.2025.11044004

VeriBench: Benchmarking Large Language Models for Verilog Code Generation and Design Synthesis

Source

Proceedings IEEE International Symposium on Circuits and Systems

ISSN

02714310

Date Issued

2025-01-01

Author(s)

Agarwal, Mihir

Momin, Zaqi

Prasad, Kailash

Mekie, Joycee

DOI

10.1109/ISCAS56072.2025.11044004

Abstract

In the rapidly advancing field of hardware design, Electronic Design Automation (EDA) tools can be significantly improved using Machine Learning. This study evaluates the efficacy of various Large Language Models (LLMs) for automating Electronic Design Automation for Verilog design, testbench generation, and Formal Verification (FV) assertion synthesis by comparing 3 closed-source LLMs and 14 Open-Source LLM variants. In our setup of 33 Verilog designs, ChatGPT-4 generates 22 synthesizable Verilog designs in one-shot without feedback, while the Llama 3 (8B) model generates 20. Both models generate all testbenches correctly, 9 of which are given in our setup. For generating Formal Verification properties, ChatGPT-4 generates all properties correctly, whereas Llama 3 synthesizes 7 out of 9 properties correctly. Of the sample synthesized in Vivado, ChatGPT-4 codes result into power-efficient designs as compared to Llama-3, whereas in Genus there is no clear winner. These results underscore the efficacy of open-source models, which perform competitively despite having significantly fewer parameters (8 billion) compared to closed-source models such as ChatGPT-4. This study demonstrates the potential of parameter-efficient, open-source models for hardware design and verification tasks.

Unpaywall

URI

https://d8.irins.org/handle/IITG2025/28346

Subjects