Threatlabz Research

Smash PostScript Interpreters Using A Syntax-Aware Fuzzer

A Journey of Fuzzing PostScript Interpreters Using A Syntax-Aware Fuzzer

Smash PostScript Interpreters Using a Syntax-Aware Fuzzer

In 2022, Zscaler’s ThreatLabz performed vulnerability hunting for some of the most popular PostScript interpreters using a custom-built syntax-aware fuzzer. The PostScript interpreters that were evaluated include Adobe Acrobat Distiller and Apple’s PSNormalizer. At the time of publication, ThreatLabz has discovered three vulnerabilities (CVE-2022-35665, CVE-2022-35666, CVE-2022-35668) in Adobe Acrobat Distiller and one vulnerability (CVE-2022-32843) in Apple’s PSNormalizer. This blog presents how the syntax-aware fuzzer was developed and analyzes the results.

 

PostScript Language

PostScript is a stack-based programming language, where values are pushed onto a stack and popped off by subsequent operations. It is a high-level language with a rich set of built-in operators for executing a wide range of tasks, including manipulating text, drawing shapes, and transforming graphics.

A PostScript interpreter executes a PostScript language according to the rules that determine the order in which operations are carried out and how the pieces of a PostScript program fit together to produce results. The interpreter manipulates entities that are called PostScript objects. Some objects are data, such as numbers, boolean values, strings, and arrays. Other objects are elements of programs to be executed, such as names, operators, and procedures. However, there is no distinction between data and programs. Any PostScript object can be treated as data or executed as part of a program. A character stream can be scanned according to the syntax rules of the PostScript language, producing a sequence of new objects. The interpreter operates by executing a sequence of objects. The PostScript interpreter can manage five stacks representing the execution state of a PostScript program: the operand stack, dictionary stack, execution stack, graphics state stack, and clipping path stack. The former three stacks are relevant for this blog and described below:

  • The operand stack is used to hold arbitrary PostScript objects that are the operands and results of PostScript operators being executed. The interpreter pushes objects on the operand stack when it encounters them as literal data in a program being executed. 
  • The dictionary stack is used to hold the dictionary objects that define the current context for PostScript operations.
  • The execution stack is used to hold the executable objects which are mainly procedures and files. At any point in the execution of a PostScript program, this stack represents the program’s call stack.

There are more than 300 operators supported in the PostScript language. Each operator description is presented in the following format illustrated in Figure 1.

Figure 1. A detailed description of the operator in the PostScript program

 

Syntax-Aware Fuzzer

We developed a syntax-aware fuzzer to find vulnerabilities in two popular PostScript interpreters: Acrobat Distiller and PSNormalizer. In order to implement a syntax-aware fuzzer, we first needed to write well-functioning grammar rules to parse all kinds of character streams for the PostScript language. We wrote a grammar file for the PostScript language in ANTLR (ANother Tool for Language Recognition). Second, we needed to implement a PostScript generator based on the grammar rules. We constructed the parse tree by walking through parser rules in the grammar file randomly, using the Python treelib package. After completing the construction of the parse tree, we traversed all terminal nodes in the parse tree and generated a new PostScript character stream to derive a new test case.

 

Grammar Development With ANTLR

A grammar file in ANTLR is made up of the parser rules and the lexer rules. The lexers are also known as tokenizers, which are the first step in creating a parser. A lexer takes the individual characters and transforms them into tokens. The parser then uses the tokens to create a logical structure and generate a parse tree. Figure 2 shows a code snippet of the grammar file for the PostScript language in ANTLRv4.