Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, yet their e ectiveness varies depending on multiple factors, including correctness, e ciency, and maintainability. This study presents a comprehensive evaluation framework for assessing LLM-generated code based on a diverse set of metrics. These include correctness, execution time, cyclomatic complexity, maintainability, style compliance, redundancy, and code similarity, among others. Additionally, generated unit tests are analyzed for coverage and e ectiveness. To enable a exible assessment, a benchmarking system is developed, allowing users to assign di erent weights to each metric based on the desired attributes of the generated code. Furthermore, the generated code is compared against reference implementations to determine whether LLMs can improve upon existing solutions. A functional analysis is performed using unit tests generated by the models. The results of this study provide valuable insights into the strengths and limitations of LLMs in software development, guiding future improvements and practical applications.
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, yet their e ectiveness varies depending on multiple factors, including correctness, e ciency, and maintainability. This study presents a comprehensive evaluation framework for assessing LLM-generated code based on a diverse set of metrics. These include correctness, execution time, cyclomatic complexity, maintainability, style compliance, redundancy, and code similarity, among others. Additionally, generated unit tests are analyzed for coverage and e ectiveness. To enable a exible assessment, a benchmarking system is developed, allowing users to assign di erent weights to each metric based on the desired attributes of the generated code. Furthermore, the generated code is compared against reference implementations to determine whether LLMs can improve upon existing solutions. A functional analysis is performed using unit tests generated by the models. The results of this study provide valuable insights into the strengths and limitations of LLMs in software development, guiding future improvements and practical applications. Read More


