A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification

November 25, 2023 at 11:42 am

Abstract

In this paper we present a novel solution that combines the capabilities of Large Language Models (LLMs) with Formal Verification strategies to verify and automatically repair software vulnerabilities. Initially, we employ Bounded Model Checking (BMC) to locate the software vulnerability and derive a counterexample. The counterexample provides evidence that the system behaves incorrectly or contains a vulnerability. The counterexample that has been detected, along with the source code, are provided to the LLM engine. Our approach involves establishing a specialized prompt language for conducting code debugging and generation to understand the vulnerability's root cause and repair the code. Finally, we use BMC to verify the corrected version of the code generated by the LLM. As a proof of concept, we create ESBMC-AI based on the Efficient SMT-based Context-Bounded Model Checker (ESBMC) and a pre-trained Transformer model, specifically gpt-3.5-turbo, to detect and fix errors in C programs. Our experimentation involved generating a dataset comprising 1000 C code samples, each consisting of 20 to 50 lines of code. Notably, our proposed method achieved an impressive success rate of up to 80% in repairing vulnerable code encompassing buffer overflow and pointer dereference failures. We assert that this automated approach can effectively incorporate into the software development lifecycle's continuous integration and deployment (CI/CD) process.

Summarize

This paper interface LLM with BMC (Bounded model checking) to conform bugs in C code. BMC (based on SMT solver) improves the arithmetic performance, which is not good in LLM. However, I tested the cases which said to fail in LLM on GPT 4 (Ver Aug. 3) whereas it could find the vulnerability, and this is inconsistent with the paper.

The evaluation uses 1000 cases generated by LLM. It's still unclear if such methodology is applicable to very large code base/functions or on real world code base.