https://galois.com/blog/2022/08/mate-interactive-program-analysis-with-code-property-graphs/ * Galois, Inc. * Menu * Research & Development * Commercial Services * News * Blog * Tech Reports * Tech Talks * Team * Careers * Blog > * MATE: Interactive Program Analysis with Code Property Graphs [ ][Search] Categories * Cryptography * Cyber-Physical Systems * Data Science * Demo * Digital Engineering * Domain Specific Languages * Elections * Formal Methods * Functional Programming * Hardware * Highlights * Machine Learning * Mobile Security * Network Security * News * Open-source * Security * Systems Software * Tech Talks * Tools + Crux + Cryptol + SAW * Uncategorized Share this article * Facebook * Twitter * LinkedIn+ Subscribe Get notified about new posts Email Address [ ] Sign Up MATE: Interactive Program Analysis with Code Property Graphs * Wednesday, August 24, 2022 * Open-source * Langston Barrett Galois is open-sourcing MATE, a suite of tools for interactive program analysis with a focus on hunting for bugs in C and C++ code. MATE unifies application-specific and low-level vulnerability analysis using code property graphs (CPGs), enabling the discovery of highly application-specific vulnerabilities that depend on both implementation details and the high-level semantics of target C/C++ programs. MATE primarily finds vulnerabilities by static program analysis over the target's CPG, which combines representations of a program's syntax, control-flow, and dependencies into a unified graph structure that can be queried to identify potential flaws. The MATE CPG consists of the target's: * abstract syntax tree (AST) * call graph (CG) * control-flow graph (CFG) * inter-procedural control-flow graph (ICFG) * inter-procedural dataflow-graph (DFG) * control-dependence graph (CDG) * points-to graph (PTG) * source-code to machine-code mapping * memory layout and DWARF type graph MATE Tools MATE comes with a number of applications built on top of the foundation of the CPG. Flowfinder Flowfinder displays a small fragment of a CPG in a browser window. Flowfinder is an interactive, graphical, browser-based user interface for exploring a program's code property graph. Similar to other program analysis tools, such as IDA Pro, Binary Ninja, and angr management, Flowfinder is designed to help answer questions such as "How does this data get from here to there and how is it changed along the way?" or "If I can control this buffer, what effect can I have on the execution of the program?" By leveraging the CPG, Flowfinder enables interprocedural analysis of program dataflows at a relatively high level of abstraction. Rather than navigating by jumping between views of the program's concrete syntax (whether disassembly or source code), Flowfinder is designed to support expanding and contracting semantic representations of code and data as needed and creating and manipulating visualizations of high-level flows between different components. MATE Notebooks A web-based Jupyter notebook containing CPG queries written with the MATE Python API MATE has a Python API for querying the CPG and exposes browser-based, interactive Jupyter notebooks with this query interface pre-loaded. These notebooks can be used to write complex, recursive, whole-program queries that answer detailed questions like "What sequences of function calls can lead from point A to point B in this program?" or "Can user input flow into a memory location with a specific struct type, and from there to some particular function without passing through one of these three sanitization routines?" MATE POIs Flowfinder displays a CPG fragment relevant to understanding a possible use of a stack variable before initialization MATE ships with a number of automated analyses that detect potential vulnerabilities, called Points of Interest (POIs). These detectors are written in the same Python API available in the MATE notebooks; it's easy to write additional application-, domain-, or API-specific detectors. Potential vulnerabilities found by these queries can be viewed in Flowfinder for collaboration and triage. The following table lists a few examples of MATE detectors (see the MATE documentation for a complete list): Analysis CPG Description Layers Find calls to filesystem operations PathTraversal DFG where the path may be influenced by user input PointerDisclosure DFG Finds pointer-typed values that may be output to the user Finds calls to malloc where the size OverflowableAllocations DFG calculation may be influenced by user input Finds calls to malloc where the size TruncatedInteger DFG may be influenced by user input and the input is used elsewhere as a signed integer Finds uses of C++ iterators IteratorInvalidation CFG subsequent to iterator-invalidating collection modifications CFG, Finds potential intra- and UninitializedStackMemory PTG inter-procedural uses of uninitialized stack memory Mantiserve Under-Constrained Manticore finds a potential out-of-bounds memory access Mantiserve integrates the CPG with the Manticore symbolic execution tool. Symbolic execution complements MATE's facilities for reasoning about high-level data flows by providing means to explore detailed and low-level issues like memory corruption. There are two primary modes of using Manticore with MATE. Exploration: Mantiserve ships with "detectors" which use data from the CPG to detect memory corruption during symbolic execution. * The Variable Bounds Access Detector searches for out of bounds memory access on the stack. * The Uninitialized Stack Variable Detector searches for variables allocated on the stack and used prior to initialization. * The Use After Free (UAF) Detector searches for and validates UAF vulnerabilities from calls to malloc and free. Under-constrained symbolic execution: Unlike traditional symbolic execution which begins at the program entry point and executes until the program exits, under-constrained symbolic execution starts at an arbitrary function in the program. This specificity means that under-constrained symbolic execution can analyze parts of programs that would be too large or complex for traditional symbolic execution. As explained in our previous blog post Under-Constrained Symbolic Execution with Crucible, under-constrained symbolic execution may lead to false positives due to unknown preconditions. MATE's under-constrained feature comes with a web UI that allows users to provide constraints to avoid this issue. Use-Cases and Comparison to Other Tools We built MATE with two primary use-cases in mind: 1. Use by security researchers to find bugs in C and C++ programs 2. Integration of the CPG and corresponding Python API into other applications The following table compares MATE to tools with similar goals: Tool Advantages over MATE Disadvantages vs MATE * Does not explicitly model GitHub/Semmle CodeQL * Robust commercial memory implementation relationships * Commercial code * Faster * Less analysis for C/ whole-program comprehensive C++, Java, etc. analysis dataflow * Custom * Large library of analyses logic-based provided and * Limited support query language community-authored for interactive * Library of vulnerability and exploration and detectors for code quality visualization vulnerabilities checks * Difficult to and flaws * IDE integration integrate queries into custom workflows or applications * Limited support for Joern inter-procedural dataflow * Open source CPG * Fuzzy parsing analysis analysis approach gives * Limited support * Custom query best-effort for interactive language DSL results on most exploration and * Coverage of wide programs without visualization range of significant effort * Difficult to languages integrate queries into custom workflows or app * Limited support * Fast, low-latency for Semgrep/weggli interactions inter-procedural * Syntax-focused dataflow * AST-aware grep analysis works on analysis for code all programs * Limited support without for interactive pre-processing exploration and visualization * Robust * Limited support implementations for interactive Other SAST tools, * Large libraries of exploration and e.g., Coverity, provided and visualization CodeSonar, Veracode, community-authored * Difficult to Infer, Clang Static vulnerability and integrate Analyzer code quality queries into checks custom workflows * IDE integration or app * Robust commercial * Limited support Interactive implementation for binary-level * Can work without inter-procedural bug-hunting tools, source code dataflow e.g., IDA Pro, * Community support analysis Binary Ninja, and and features * Limited support angr management (e.g., for high-level community-authored (source-level) plugins) semantics Limitations MATE has several important limitations: * MATE analyzes only statically-linked code, so it can't find bugs or follow control- and data-flows in dynamically-linked libraries without users writing detailed "signatures" for external code. * MATE analyzes LLVM bitcode. In practice, obtaining LLVM bitcode requires access to the source code, that the project can be compiled using clang/clang++, and may require some mucking around with the build system. Additionally, it's much easier to use and understand MATE given familiarity with the LLVM language, but such familiarity is fairly uncommon. * MATE's static analysis is fairly heavy-weight. The pointer analysis in particular requires a significant amount of time and RAM, on the order of hours and up to dozens of GB for large programs. Furthermore, these requirements don't relate predictably to program size or other features. * MATE is still research-grade software. We have worked hard to make it robust, but not all of MATE's tools and features will work well on all programs. Conclusion and Acknowledgements We're happy to finally share MATE with the research community under the BSD 3-clause license. We look forward to discussing possible collaborations, additional use cases, and future research directions. Please reach out to mate@galois.com to start a conversation! More information about MATE is available in the project documentation. MATE was developed collaboratively by Galois, Trail of Bits, and the lab of Dr. Stephen Chong at Harvard on the DARPA CHESS program. This material is based upon work supported by the United States Air Force and Defense Advanced Research Project Agency (DARPA) under Contract No. FA8750-19-C-0004. Previous Next * Contact * About * Careers * Open Source * Privacy Policy * Terms of Use * GitHub * * Twitter * * RSS --------------------------------------------------------------------- Most Recent Tech Talk * Title Public Tech Talk: "Foundational and Automated Verification, Together at Last" by John Sarracino * Date Monday, August 29, 2022 Time 1:00 pm * Speaker John Sarracino is a postdoctoral researcher at Cornell, working with Greg Morrisett. * Location Galois is pleased to host this tech talk via live-stream for the public on August 29, 2022 from 1:00 pm to 2:00 pm Pacific Time. Send a request to techtalkcoordinator@galois.com to receive meeting information. * About Galois News * Introducing Galois's new Principal Researcher, Dr. Adam Bryant Press Release * Introducing Galois's new Principal Scientist, Dr. R. Chandramouli Press Release * Galois to develop new privacy-preserving and data-sharing platform Press Release * Galois introduces new spin-out, Niobium Microsystems Press Release Portland, OR 421 SW 6th Avenue, Suite 300 Portland, Oregon 97204 Arlington, VA 901 N Stuart Street, Suite 501 Arlington, Virginia 22203 Dayton, OH 444 E 2nd Street Dayton, Ohio 45402 * T 503.626.6616 * F 503.350.0833 contact@galois.com (c) 2022 Galois, Inc. x Contact Galois We take pride in personally connecting with all interested partners, collaborators and potential clients. Please email us with a brief description of how you would like to be connected with Galois and we will do our best to respond within one business day. * General inquiries: * contact@galois.com * T 503.626.6616 * F 503.350.0833 * Google Map * Learn more about our Services * View our Technical Areas --------------------------------------------------------------------- Stay Connected * GitHub * Google+ * Twitter * Vimeo * RSS