Do Large Language Models Recognize Python Identifier Swaps in Their Generated Code?

Chavan, Sagar BhikanSagar BhikanChavanMondal, ShouvickShouvickMondal2025-08-312025-08-312024-07-10[9798400706585]10.1145/3663529.36638692-s2.0-85199051042https://d8.irins.org/handle/IITG2025/28829Large Language Models (LLMs) have transformed natural language processing and generation activities in recent years. However, as the scale and complexity of these models grow, their ability to write correct and secure code has come under scrutiny. In our research, we delve into the critical examination of LLMs including ChatGPT-3.5, legacy Bard, and Gemini Pro, and their proficiency in generating accurate and secure code, particularly focusing on the occurrence of identifier swaps within the code they produce. Our methodology encompasses the creation of a diverse dataset comprising a range of coding tasks designed to challenge the code generation capabilities of these models. Further, we employ Pylint for an extensive code quality assessment and undertake a manual multi-turn prompted “Python identifier-swap” test session to evaluate the models’ ability to maintain context and coherence over sequential coding prompts. Our preliminary findings indicate a concern for developers: LLMs capable of generating better quality codes can perform worse when queried to recognize identifier swaps.trueGemini Pro | LLMs | Python Identifier SwapDo Large Language Models Recognize Python Identifier Swaps in Their Generated Code?Conference Paperhttps://doi.org/10.1145/3663529.3663869663-66410 July 20240cpConference Proceeding0