https://facebookresearch.github.io/CompilerGym/getting_started.html CompilerGym 0.1.2 [ ] User Guide * Getting Started + Key Concepts + Installation o Building from Source + Using CompilerGym o Selecting an environment o Installing benchmarks o The compiler environment o Interacting with the environment o Using the command line tools * LLVM Environment Reference * Command Line Reference * About * Contributing * Changelog * Frequently Asked Questions Python API Reference * compiler_gym * compiler_gym.datasets * compiler_gym.envs * compiler_gym.envs.llvm * compiler_gym.service * compiler_gym.spaces * compiler_gym.views CompilerGym * >> * Getting Started * View page source --------------------------------------------------------------------- Getting StartedP https://colab.research.google.com/assets/colab-badge.svg CompilerGym is a toolkit for applying reinforcement learning to compiler optimization tasks. This document provides a short walkthrough of the key concepts, using the codesize reduction task of a production-grade compiler as an example. It will take about 20 minutes to work through. Lets get started! Topics covered: * Key Concepts * Installation + Building from Source * Using CompilerGym + Selecting an environment + Installing benchmarks + The compiler environment + Interacting with the environment + Using the command line tools Key ConceptsP CompilerGym exposes compiler optimization problems as environments for reinforcement learning. It uses the OpenAI Gym interface to expose the "agent-environment loop" of reinforcement learning: _images/overview.png The ingredients for reinforcement learning that CompilerGym provides are: * Environment: a compiler optimization task. For example, optimizing a C++ graph-traversal program for codesize using LLVM. The environment encapsulates an instance of a compiler and a particular program that is being compiled. As an agent interacts with the environment, the state of the program, and the compiler, can change. * Action Space: the actions that may be taken at the current environment state. For example, this could be a set of optimization transformations that the compiler can apply to the program. * Observation: a view of the current environment state. For example, this could be the Intermediate Representation (IR) of the program that is being compiled. The types of observations that are available depend on the compiler. * Reward: a metric indicating the quality of the previous action. For example, for a codesize optimization task this could be the change to the number of instructions of the previous action. A single instance of this "agent-environment loop" represents the compilation of a particular program. The goal is to develop an agent that maximises the cumulative reward from these environments so as to produce the best programs. InstallationP Install the latest CompilerGym release using: $ pip install compiler_gym The binary works on macOS and Linux (on Ubuntu 18.04, Fedora 28, Debian 10 or newer equivalents). Building from SourceP If you prefer, you may build from source. This requires a modern C++ toolchain. On macOS you can use the system compiler. On linux, install the required toolchain using: $ sudo apt install clang libtinfo5 patchelf $ export CC=clang $ export CXX=clang++ We recommend using conda to manage the remaining build dependencies. First create a conda environment with the required dependencies: $ conda create -n compiler_gym python=3.8 bazel=3.1.0 cmake pandoc $ conda activate compiler_gym Then clone the CompilerGym source code using: $ git clone https://github.com/facebookresearch/CompilerGym.git $ cd CompilerGym Install the python development dependencies using: $ make init Then run the test suite to confirm that everything is working: $ make test To build and install the python package, run: $ make install When you are finished, you can deactivate and delete the conda environment using: $ conda deactivate $ conda env remove -n compiler_gym Using CompilerGymP Begin by firing up a python interpreter: $ python To start with we import the gym module and the CompilerGym environments: >>> import gym >>> import compiler_gym Importing compiler_gym automatically registers the compiler environments. We can see what environments are available using: >>> compiler_gym.COMPILER_GYM_ENVS ['llvm-v0', 'llvm-ic-v0', 'llvm-autophase-ic-v0', 'llvm-ir-ic-v0'] Selecting an environmentP CompilerGym environments are named using one of the following formats: 1. --- 2. -- 3. - Where identifiers the compiler optimization task, is the default type of observations that are provided, and is the reward signal. Note A key concept is that CompilerGym environments enables lazy evaluation of observations and reward signals. This makes the environment much more computationally efficient for scenarios in which you do not need to compute a reward or observation for every step. If an environment omits a or tag, this means that no observation or reward is provided by default. See compiler_gym.views for further details. For this tutorial, we will use the following environment: * Compiler: LLVM. * Observation Type: Autophase. * Reward Signal: IR Instruction count relative to -Oz. Create an instance of this environment using: >>> env = gym.make("llvm-autophase-ic-v0") Installing benchmarksP A compiler requires a program as input. For the purposes of CompilerGym we call these input programs benchmarks, and collections of benchmarks are assembled into datasets. You may provide your own programs to use as benchmarks, or download one of our pre-assembled datasets. The benchmarks that are available to an environment can be queried using env.benchmarks: >>> env.benchmarks [] As you can see, there are no benchmarks installed by default. We have provided a collection of pre-assembled LLVM benchmark datasets that can be installed using env.require_dataset(). For this tutorial we will use the NAS Parallel Benchmarks dataset: >>> env.require_dataset("npb-v0") Now, env.benchmarks lists the 123 benchmarks that comprise the dataset we just installed: >>> env.benchmarks ['benchmark://npb-v0/46', 'benchmark://npb-v0/17', ...] The compiler environmentP If you have experience using OpenAI Gym, the CompilerGym environments will be familiar. If not, you can call help() on any function, object, or method to query the documentation: >>> help(env) The action space is described by env.action_space. The LLVM Action Space is discrete: >>> env.action_space.dtype dtype('int64') >>> env.action_space.n 138 The observation space is described by env.observation_space. The Autophase observation space is a 56-dimension vector of integers: >>> env.observation_space.shape (56,) >>> env.observation_space.dtype dtype('int64') The upper and lower bounds of the reward signal are described by env.reward_range: >>> env.reward_range (0.0, inf) As with other Gym environments, reset() must be called before a CompilerGym environment may be used: >>> env.reset() array([ 0, 0, 399, 381, 10, 399, 147, 8, 137, 147, 0, 0, 0, 556, 0, 546, 0, 15, 693, 574, 1214, 1180, 384, 399, 214, 0, 120, 116, 0, 88, 468, 8, 546, 16, 1073, 147, 0, 1551, 0, 0, 0, 10, 766, 0, 0, 505, 46, 0, 0, 0, 556, 5075, 3261, 13, 0, 2441]) The numpy array that is returned here is the initial Autophase observation. Calling env.reset() starts an instance of the compiler and selects a random benchmark to use. You can see which benchmark is currently being used by an environment using env.benchmark: >>> env.benchmark 'benchmark://npb-v0/90' If we want to force the environment to use a specific benchmark, we can pass the name of the benchmark as an argument to env.reset(): >>> env.reset(benchmark="benchmark://npb-v0/50") array([ 0, 0, 26, 25, 1, 26, 10, 1, 8, 10, 0, 0, 0, 37, 0, 36, 0, 2, 46, 175, 1664, 1212, 263, 26, 193, 0, 59, 6, 0, 3, 32, 0, 36, 10, 1058, 10, 0, 840, 0, 0, 0, 1, 416, 0, 0, 148, 60, 0, 0, 0, 37, 3008, 2062, 9, 0, 1262]) Interacting with the environmentP Once an environment has been initialized, you interact with it in the same way that you would with any other OpenAI Gym environment. env.render() prints the Intermediate Representation (IR) of the program in the current state: >>> env.render() ; ModuleID = 'benchmark://npb-v0/83' target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-linux-gnu" ... env.step() runs an action: >>> observation, reward, done, info = env.step(0) This returns four values: a new observation, a reward, a boolean value indicating whether the episode has ended, and a dictionary of additional information: >>> observation array([ 0, 0, 26, 25, 1, 26, 10, 1, 8, 10, 0, 0, 0, 37, 0, 36, 0, 2, 46, 175, 1664, 1212, 263, 26, 193, 0, 59, 6, 0, 3, 32, 0, 36, 10, 1058, 10, 0, 840, 0, 0, 0, 1, 416, 0, 0, 148, 60, 0, 0, 0, 37, 3008, 2062, 9, 0, 1262]) >>> reward 0.3151595744680851 >>> done False >>> info {'action_had_no_effect': True, 'new_action_space': False} For this environment, reward represents the reduction in code size of the previous action, scaled to the total codesize reduction achieved with LLVM's -Oz optimizations enabled. A cumulative reward greater than one means that the sequence of optimizations performed yields better results than LLVM's default optimizations. Let's run 100 random actions and see how close we can get: >>> env.reset(benchmark="benchmark://npb-v0/50") >>> episode_reward = 0 >>> for i in range(1, 101): ... observation, reward, done, info = env.step(env.action_space.sample()) ... if done: ... break ... episode_reward += reward ... print(f"Step {i}, quality={episode_reward:.3%}") ... Step 1, quality=44.299% Step 2, quality=44.299% Step 3, quality=44.299% Step 4, quality=44.299% Step 5, quality=44.299% Step 6, quality=54.671% Step 7, quality=54.671% Step 8, quality=54.608% Step 9, quality=54.608% Step 10, quality=54.608% Step 11, quality=54.608% Step 12, quality=54.766% Step 13, quality=54.766% Step 14, quality=53.650% Step 15, quality=53.650% ... Step 97, quality=88.104% Step 98, quality=88.104% Step 99, quality=88.104% Step 100, quality=88.104% Not bad, but clearly there is room for improvement! Because at each step we are taking random actions, your results will differ with every run. Try running it again. Was the result better or worse? Of course, there may be better ways of selecting actions than choosing randomly, but for the purpose of this tutorial we will leave that as an exercise for the reader :) Before we finish, lets use env.commandline() to produce an LLVM opt command line invocation that is equivalent to the sequence of actions we just run: >>> env.commandline() 'opt -consthoist -sancov -inferattrs ... -place-safepoints input.bc -o output.bc' We can also save the program for future reference: >>> env.write_bitcode("~/program.bc") Once we are finished, we must close the environment to end the compiler instance: >>> env.close() And finally we are done with our python session: >>> exit() Using the command line toolsP CompilerGym includes a set of useful command line tools. Each of the steps above could be replicated from the command line. For example, compiler_gym.bin.service can be used to list the available environments: $ python -m compiler_gym.bin.service --ls_env llvm-v0 ... And to describe the capabilities of each environment: $ python -m compiler_gym.bin.service --env=llvm-v0 # CompilerGym Service `/path/to/compiler_gym/envs/llvm/service/service` ## Programs +------------------------+ | Benchmark | +========================+ | benchmark://npb-v0/1 | +------------------------+ ... ## Action Spaces ### `PassesAll` (Commandline) +---------------------------------------+-----------------------------------+-------------------------------+ | Action | Flag | Description | +=======================================+===================================+===============================+ | AddDiscriminatorsPass | `-add-discriminators` | Add DWARF path discriminators | +---------------------------------------+-----------------------------------+-------------------------------+ ... The compiler_gym.bin.manual_env module provides a thin text user interface around the environment for interactive sessions: $ python -m compiler_gym.bin.manual_env --env=llvm-autophase-ic-v0 --benchmark=npb-v0/50 Initialized environment in 264.9ms Reset benchmark://npb-v0/50 environment in 27.6ms Observation: [ 0 0 11 10 1 11 5 1 3 5 0 0 0 17 0 16 0 2 21 20 50 44 18 11 6 0 3 10 0 1 16 0 16 2 40 5 0 59 0 0 0 1 30 0 0 18 0 0 0 0 17 193 129 4 0 99] Finally, the compiler_gym.bin.random_search module provides a simple but powerful strategy for randomly searching the optimization space: $ python -m compiler_gym.bin.random_search --env=llvm-autophase-ic-v0 --benchmark=npb-v0/50 --runtime=10 Started 16 worker threads for benchmark://npb-v0/50 (3,008 instructions) using reward IrInstructionCountOz. Writing logs to /home/user/logs/compiler_gym/random/npb-v0/50/2020-12-03T17:24:17.304887 === Running for 10 seconds === Runtime: 10 seconds. Num steps: 32,287 (3,206 / sec). Num episodes: 285 (28 / sec). Num restarts: 0. Best reward: 107.85% (69 passes, found after 9 seconds) Ending worker threads ... done Replaying actions from best solution found: Step [000 / 069]: reward=31.52% Step [001 / 069]: reward=31.52%, change=0.00%, action=SlpvectorizerPass Step [002 / 069]: reward=37.09%, change=5.57%, action=Sroapass ... Step [067 / 069]: reward=107.60%, change=0.00%, action=InductiveRangeCheckEliminationPass Step [068 / 069]: reward=107.60%, change=0.00%, action=LoopDeletionPass Step [069 / 069]: reward=107.85%, change=0.24%, action=Gvnpass To beat the compiler by 7.85% after 10 seconds of random trials is not bad going! Next Previous --------------------------------------------------------------------- (c) Copyright Facebook AI Research Built with Sphinx using a theme provided by Read the Docs.