College of Science & Engineering
This group's research applies automated program analysis techniques to computer software, generally operating on compiled binaries, to improve the software's quality and security, such as creating test cases or finding bugs that represent security vulnerabilities. Two key techniques used in this work are symbolic execution using logical decision procedures, and test generation via fuzzing. Symbolic execution automates reasoning about the possible execution paths through software while retaining precision (e.g., not using information-losing approximations). Decision procedures are back-end tools that check whether logical combinations of mathematical statements are true or false: they allow symbolic execution to determine whether particular execution paths are feasible, and create input test cases for them if so. Because these techniques perform precise reasoning, they require substantial computing power to produce full results, as the software analyzed gets bigger; but generally they are organized into larger tasks that are highly parallelizable.
Fuzzing is another automated technique for exploring the execution space of a program and generating corresponding test cases. Fuzzing differs from symbolic execution in that symbolic execution considers a program execution path and then generates an input that triggers it, while fuzzing generates a program input and observes the path that the program executes while processing it. Because building and executing modified program inputs can be done quickly, fuzzing processes inputs at a much faster rate, but with less guarantee that an individual input will cause interesting behavior. Many concerns about how to effectively guide the search process apply to both symbolic execution and fuzzing, but differ in details.
These researchers are using MSI resources for a project that concerns searching for heap-related software vulnerabilities, using both symbolic execution and fuzzing. The "heap" is the name for the memory region programs use for allocating objects whose lifetime is not known at the time a program is compiled. Because decisions about heap allocation must be made at runtime, they require a library with data-structures to efficiently keep track of what memory has been allocated to what object. However, if a buggy program comes under attack, heap data structures can also be a point of vulnerability, because the heap management library may be tricked into changing memory in a way useful to an attacker if the data that it uses is maliciously corrupted. Vulnerabilities that can be exploited together with a heap management library are a cutting-edge area for automatic vulnerability finding using symbolic exploration, which this group is exploring using their FuzzBALL binary symbolic execution tool, and using enhanced versions of the open-source fuzzer AFL++. A major focus of the group's current research is the problem of searching for inputs to a program that cause the program to make a particular sequence of heap operations, for instance the heap operations that trigger a heap management library vulnerability. The target sequence of heap operations allows this to be a guided search process, but the decisions made by the symbolic execution or fuzzing tool need to be adjusted to efficiently meet the target.
MSI's parallel computing resources are particularly appealing for experiments using fuzzing, because while FuzzBALL is a single-threaded tool, AFL++ scales well across multiple cores: it runs separate copies of the engine that periodically exchange computed results. The researchers have observed close to linear scaling of AFL++'s progress towards our heap operation goals at the scale for instance of parallel execution up to 16 processes on a workstation for 1 hour versus a single-core job running for 16 hours. Thus being able to elastically use more parallel computation resources than our individual workstations improves research productivity even during exploratory work (much less large scale evaluation).