Finding Bugs Using Your Own Code: Detecting Functionally-similar yet Inconsistent Code

 February 1, 2024 at 4:01 pm

Applying cluster on code to find inconsistency.

Key insight: Our approach is inspired by the observation that many bugs in software manifest as inconsistencies deviating from their non-buggy counterparts, namely the code snippets that implement the similar logic in the same codebase. Such bugs, regardless of their types, can be detected by identifying functionally-similar yet inconsistent code snippets in the same codebase

Pros

  • Two-level clustering
  • Embedding on program dependency graph
  • Found bugs (at function level) in large projects
  • Embedding on code structures
  • Generality

Cons

  • Need repo-specific training
  • Literals are removed at the IR level

To abstract Constructs, we preserve only the variable types for each program statement and remove all variable names and versions.

  • Needs repository-specific configuration of thresholds
  • Still very high FP

Others

  • Granularity at the function level

If an inconsistent cluster contains more than a fixed number (e.g., 2) of deviating nodes (i.e., nodes in Table 2), the inconsistency is deprioritized because it is unlikely to be a true inconsistency (i.e., a single inconsistency rarely involves many deviations).