WaDec: Decompiling WebAssembly Using Large Language Model

Background

  • Previous WASM decompilers have poor readability.
  • Existing LLM decompilers are not specialized for WASM binaries.
    • Stack-based architecture makes the assembly very uncomprehensive.
    • Cannot handle fine-grained snippets.
    • Cannot handle complicated structures, including nested loops.

Method

1. Dataset construction

  • Collected about 52,000 C programs.
  • Consists of (WAT, C, Spatial_info, Temporal_info, Offset2String).
  • Main idea: String Substitution -> Variable Renaming -> Snippet Slicing
WAT(Webassembly Text Format)Snippet of human-readable format of WASM binary
CSnippet of original source code
Spatial_infoFunction signature (parameters & return values)
Temporal_infoLocal variables defined before the snippet
Offset2StringMapping from offsets to string constants

String Substitution

  • String substitution enables the model to recover the string value correctly.
  • Every string is substituted with an offset from the data segment. (Mapped in ‘Offset2String’)

Variable Renaming

  • Variable renaming improves the consistency in recovered variable names.
  • Renamed the variables of the source code and wat snippets into ‘local_N’
  • A variable name is mapped with a specific offset in a function using DWARF information.

Slicing

  • Slicing improves nested loop handling capability.
  • Codes are sliced to include at most one loop statement.

2. Modeling

  • Fine-tuned based on CodeLLaMa-7b-hf.
  • Next token prediction (CAUSAL_LM task type)
  • Synthesized a prompt (pn) to train the model.
    • In the evaluation prompt, cn is excluded.
iInstruction (i.e. You are a decompiler…)
VbnPreviously defined variables
dnFunction signatures
wnWAT code
cnC code

Evaluation

Similarity

  • Evaluated with the test set included in the dataset. About 4,000 C programs were used.
  • Overall, the decompiled code shows high similarity to the original source code.
  • Metrics
    • C@Func = (# functions in decompiled code)/(# total functions)
AED-S (Normalized AST edit distance)AST similarity (higher is better)
CCN (Cyclomatic complexity)Program complexity (higher is better)
COS (Cosine similarity)Token similarity (higher is better)
CodeBLEUStructural & dataflow similarity in AST-level (higher is better)
C@Func (Function completeness)Ratio of functions recovered (higher is better)
Bloat RateExpansion in code lines (lower is better)
C@Syntax (Syntax completeness)Ratio of syntactically flawless statements (higher is better)

Re-executability

  • Acceptable recompilation and re-execution rate
  • Low output consistency

Discussion

Variable renaming in the aspect of readability

  • Variable renaming might be effective for consistency.
  • However, it can cause bad effect to the decompiled code in terms of readability.
  • It might be improved by post-processing the variable names using LLM.

CodeBLEU score

  • The CodeBLEU score was 0.6353, which is relatively low.
  • The authors insist that the CodeBLEU cannot accurately measure the structural similarity.
    • This might be due to the rich semantics of C language.

Consistency

  • The experiment shows that the outputs of the re-compiled binaries are not consistent.
  • It might be challenging to resolve this at this moment, as it is the fundamental issue of the language model.

Evaluated only for non-optimized binaries

  • More optimizations may lead the model accuracy to be lowered.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *