https://galois.com/blog/2022/08/mate-interactive-program-analysis-with-code-property-graphs/

  * Galois, Inc.

  * Menu

  * Research & Development
  * Commercial Services
  * News
  * Blog
  * Tech Reports
  * Tech Talks
  * Team
  * Careers

  * Blog >
  * MATE: Interactive Program Analysis with Code Property Graphs

[                    ][Search]
Categories

  * Cryptography
  * Cyber-Physical Systems
  * Data Science
  * Demo
  * Digital Engineering
  * Domain Specific Languages
  * Elections
  * Formal Methods
  * Functional Programming
  * Hardware
  * Highlights
  * Machine Learning
  * Mobile Security
  * Network Security
  * News
  * Open-source
  * Security
  * Systems Software
  * Tech Talks
  * Tools
      + Crux
      + Cryptol
      + SAW
  * Uncategorized

Share this article

  * Facebook
  * Twitter
  * LinkedIn+

Subscribe

Get notified about new posts

Email Address [                    ]

Sign Up

MATE: Interactive Program Analysis with Code Property Graphs

  * Wednesday, August 24, 2022
  * Open-source

  * Langston Barrett

Galois is open-sourcing MATE, a suite of tools for interactive
program analysis with a focus on hunting for bugs in C and C++ code.
MATE unifies application-specific and low-level vulnerability
analysis using code property graphs (CPGs), enabling the discovery of
highly application-specific vulnerabilities that depend on both
implementation details and the high-level semantics of target C/C++
programs.

MATE primarily finds vulnerabilities by static program analysis over
the target's CPG, which combines representations of a program's
syntax, control-flow, and dependencies into a unified graph structure
that can be queried to identify potential flaws. The MATE CPG
consists of the target's:

  * abstract syntax tree (AST)
  * call graph (CG)
  * control-flow graph (CFG)
  * inter-procedural control-flow graph (ICFG)
  * inter-procedural dataflow-graph (DFG)
  * control-dependence graph (CDG)
  * points-to graph (PTG)
  * source-code to machine-code mapping
  * memory layout and DWARF type graph

MATE Tools

MATE comes with a number of applications built on top of the
foundation of the CPG.

Flowfinder

Flowfinder displays a small fragment of a CPG in a browser window.

Flowfinder is an interactive, graphical, browser-based user interface
for exploring a program's code property graph. Similar to other
program analysis tools, such as IDA Pro, Binary Ninja, and angr
management, Flowfinder is designed to help answer questions such as
"How does this data get from here to there and how is it changed
along the way?" or "If I can control this buffer, what effect can I
have on the execution of the program?" By leveraging the CPG,
Flowfinder enables interprocedural analysis of program dataflows at a
relatively high level of abstraction. Rather than navigating by
jumping between views of the program's concrete syntax (whether
disassembly or source code), Flowfinder is designed to support
expanding and contracting semantic representations of code and data
as needed and creating and manipulating visualizations of high-level
flows between different components.

MATE Notebooks

A web-based Jupyter notebook containing CPG queries written with the
MATE Python API

MATE has a Python API for querying the CPG and exposes browser-based,
interactive Jupyter notebooks with this query interface pre-loaded.
These notebooks can be used to write complex, recursive,
whole-program queries that answer detailed questions like "What
sequences of function calls can lead from point A to point B in this
program?" or "Can user input flow into a memory location with a
specific struct type, and from there to some particular function
without passing through one of these three sanitization routines?"

MATE POIs

Flowfinder displays a CPG fragment relevant to understanding a
possible use of a stack variable before initialization

MATE ships with a number of automated analyses that detect potential
vulnerabilities, called Points of Interest (POIs). These detectors
are written in the same Python API available in the MATE notebooks;
it's easy to write additional application-, domain-, or API-specific
detectors. Potential vulnerabilities found by these queries can be
viewed in Flowfinder for collaboration and triage. The following
table lists a few examples of MATE detectors (see the MATE
documentation for a complete list):

Analysis                 CPG    Description
                         Layers
                                Find calls to filesystem operations
PathTraversal            DFG    where the path may be influenced by
                                user input
PointerDisclosure        DFG    Finds pointer-typed values that may
                                be output to the user
                                Finds calls to malloc where the size
OverflowableAllocations  DFG    calculation may be influenced by user
                                input
                                Finds calls to malloc where the size
TruncatedInteger         DFG    may be influenced by user input and
                                the input is used elsewhere as a
                                signed integer
                                Finds uses of C++ iterators
IteratorInvalidation     CFG    subsequent to iterator-invalidating
                                collection modifications
                         CFG,   Finds potential intra- and
UninitializedStackMemory PTG    inter-procedural uses of
                                uninitialized stack memory

Mantiserve

Under-Constrained Manticore finds a potential out-of-bounds memory
access

Mantiserve integrates the CPG with the Manticore symbolic execution
tool. Symbolic execution complements MATE's facilities for reasoning
about high-level data flows by providing means to explore detailed
and low-level issues like memory corruption. There are two primary
modes of using Manticore with MATE.

Exploration: Mantiserve ships with "detectors" which use data from
the CPG to detect memory corruption during symbolic execution.

  * The Variable Bounds Access Detector searches for out of bounds
    memory access on the stack.
  * The Uninitialized Stack Variable Detector searches for variables
    allocated on the stack and used prior to initialization.
  * The Use After Free (UAF) Detector searches for and validates UAF
    vulnerabilities from calls to malloc and free.

Under-constrained symbolic execution: Unlike traditional symbolic
execution which begins at the program entry point and executes until
the program exits, under-constrained symbolic execution starts at an
arbitrary function in the program. This specificity means that
under-constrained symbolic execution can analyze parts of programs
that would be too large or complex for traditional symbolic
execution. As explained in our previous blog post Under-Constrained
Symbolic Execution with Crucible, under-constrained symbolic
execution may lead to false positives due to unknown preconditions.
MATE's under-constrained feature comes with a web UI that allows
users to provide constraints to avoid this issue.

Use-Cases and Comparison to Other Tools

We built MATE with two primary use-cases in mind:

 1. Use by security researchers to find bugs in C and C++ programs
 2. Integration of the CPG and corresponding Python API into other
    applications

The following table compares MATE to tools with similar goals:

        Tool          Advantages over MATE    Disadvantages vs
                                                    MATE
                                              * Does not
                                                explicitly model
GitHub/Semmle CodeQL   * Robust commercial      memory
                         implementation         relationships
  * Commercial code    * Faster               * Less
    analysis for C/      whole-program          comprehensive
    C++, Java, etc.      analysis               dataflow
  * Custom             * Large library of       analyses
    logic-based          provided and         * Limited support
    query language       community-authored     for interactive
  * Library of           vulnerability and      exploration and
    detectors for        code quality           visualization
    vulnerabilities      checks               * Difficult to
    and flaws          * IDE integration        integrate
                                                queries into
                                                custom workflows
                                                or applications

                                              * Limited support
                                                for
Joern                                           inter-procedural
                                                dataflow
  * Open source CPG    * Fuzzy parsing          analysis
    analysis             approach gives       * Limited support
  * Custom query         best-effort            for interactive
    language DSL         results on most        exploration and
  * Coverage of wide     programs without       visualization
    range of             significant effort   * Difficult to
    languages                                   integrate
                                                queries into
                                                custom workflows
                                                or app

                                              * Limited support
                       * Fast, low-latency      for
Semgrep/weggli           interactions           inter-procedural
                       * Syntax-focused         dataflow
  * AST-aware grep       analysis works on      analysis
    for code             all programs         * Limited support
                         without                for interactive
                         pre-processing         exploration and
                                                visualization

                       * Robust               * Limited support
                         implementations        for interactive
Other SAST tools,      * Large libraries of     exploration and
e.g., Coverity,          provided and           visualization
CodeSonar, Veracode,     community-authored   * Difficult to
Infer, Clang Static      vulnerability and      integrate
Analyzer                 code quality           queries into
                         checks                 custom workflows
                       * IDE integration        or app

                       * Robust commercial    * Limited support
Interactive              implementation         for
binary-level           * Can work without       inter-procedural
bug-hunting tools,       source code            dataflow
e.g., IDA Pro,         * Community support      analysis
Binary Ninja, and        and features         * Limited support
angr management          (e.g.,                 for high-level
                         community-authored     (source-level)
                         plugins)               semantics

Limitations

MATE has several important limitations:

  * MATE analyzes only statically-linked code, so it can't find bugs
    or follow control- and data-flows in dynamically-linked libraries
    without users writing detailed "signatures" for external code.
  * MATE analyzes LLVM bitcode. In practice, obtaining LLVM bitcode
    requires access to the source code, that the project can be
    compiled using clang/clang++, and may require some mucking around
    with the build system. Additionally, it's much easier to use and
    understand MATE given familiarity with the LLVM language, but
    such familiarity is fairly uncommon.
  * MATE's static analysis is fairly heavy-weight. The pointer
    analysis in particular requires a significant amount of time and
    RAM, on the order of hours and up to dozens of GB for large
    programs. Furthermore, these requirements don't relate
    predictably to program size or other features.
  * MATE is still research-grade software. We have worked hard to
    make it robust, but not all of MATE's tools and features will
    work well on all programs.

Conclusion and Acknowledgements

We're happy to finally share MATE with the research community under
the BSD 3-clause license. We look forward to discussing possible
collaborations, additional use cases, and future research directions.
Please reach out to mate@galois.com to start a conversation! More
information about MATE is available in the project documentation.
MATE was developed collaboratively by Galois, Trail of Bits, and the
lab of Dr. Stephen Chong at Harvard on the DARPA CHESS program. This
material is based upon work supported by the United States Air Force
and Defense Advanced Research Project Agency (DARPA) under Contract
No. FA8750-19-C-0004.

Previous Next

  * Contact
  * About
  * Careers
  * Open Source
  * Privacy Policy
  * Terms of Use

  * GitHub
  *  
  * Twitter
  *  
  * RSS

---------------------------------------------------------------------

Most Recent Tech Talk

  * Title Public Tech Talk: "Foundational and Automated Verification,
    Together at Last" by John Sarracino
  * Date Monday, August 29, 2022 Time 1:00 pm
  * Speaker John Sarracino is a postdoctoral researcher at Cornell,
    working with Greg Morrisett.
  * Location Galois is pleased to host this tech talk via live-stream
    for the public on August 29, 2022 from 1:00 pm to 2:00 pm Pacific
    Time. Send a request to techtalkcoordinator@galois.com to receive
    meeting information.
  * About

 

Galois News

  * Introducing Galois's new Principal Researcher, Dr. Adam Bryant

    Press Release

  * Introducing Galois's new Principal Scientist, Dr. R. Chandramouli

    Press Release

  * Galois to develop new privacy-preserving and data-sharing
    platform

    Press Release

  * Galois introduces new spin-out, Niobium Microsystems

    Press Release

 

Portland, OR


421 SW 6th Avenue, Suite 300
Portland, Oregon 97204

Arlington, VA


901 N Stuart Street, Suite 501
Arlington, Virginia 22203

Dayton, OH


444 E 2nd Street
Dayton, Ohio 45402

  * T 503.626.6616
  * F 503.350.0833

contact@galois.com

(c) 2022 Galois, Inc.

x

Contact Galois

We take pride in personally connecting with all interested partners,
collaborators and potential clients. Please email us with a brief
description of how you would like to be connected with Galois and we
will do our best to respond within one business day.

  * General inquiries:
  * contact@galois.com
  * T 503.626.6616
  * F 503.350.0833
  * Google Map

  * Learn more about our Services
  * View our Technical Areas

---------------------------------------------------------------------

Stay Connected

  * GitHub
  * Google+
  * Twitter
  * Vimeo
  * RSS