https://www.usenix.org/publications/loginonline/codon-python-compiler#Rik%20Farrow

Skip to main content

USENIX supports diversity, equity, and inclusion and condemns hate
and discrimination.

Home

  * About
  * Conferences
  * Publications
  * Membership
  * Students
  * Search
  * Donate Today

  * Sign In
  * Search

 

  * About
      + USENIX Board
      + Staff
      + Newsroom
      + Good Works
      + Blog
      + Governance and Financials
      + USENIX Awards
      + USENIX Supporters
      + 2022 Board Election
      + Board Meeting Minutes
      + Annual Fund
  * Conferences
      + Upcoming
      + By Name
      + Calls for Papers
      + Grants
      + Sponsorship
      + Best Papers
      + Test of Time Awards
      + Multimedia
      + Conference Policies
      + Code of Conduct
  * Publications
      + Proceedings
      + Author Resources
      + ;login: Online
      + Writing for ;login: Online
      + ;login: Archive
  * Membership
  * Students
      + Conference Fees
      + Campus Representative Program
      + Student Grant Program
  * Search
  * Donate Today

Join the conversation
Back to ;login: Online

Codon: Python Compiler

Donate Today
Codon compiles Pythonic code into executables that support
parallelism
April 25, 2023
Deployed System
Authors: 
Rik Farrow
Article shepherded by: 
Laura Nolan

I first learned about Codon by reading an article in MIT's campus
news magazine [1] about a compiler for Python. The article's author,
Rachel Gordon, stressed that Codon can compile Python scripts into
native code that is as fast, and sometimes faster, than hand-crafted
code in C/C++. But quotes from Ariya Shajii, one of Codon's two
developers, mentioned that there were also limitations, in that some
of the dynamic features of Python would not work, and that not all of
Python 3 was available.

I had read another article, Investigating Managed Language Runtime
Performance [2], that explained the reasons why Python and JS (v8)
are so much slower than C, and even Java, another managed language.
Managed languages have the advantage of being more secure, as they
don't have the problems with pointers and strings that C/C++
programmers do. Java programs not only get compiled into bytecode,
but JDK has a just-in-time (JIT) compiler that produces faster
machine code than JS. JS is faster than Python because it converts
its code into machine code using JIT, but produces code that is
slower than that produced by Java.

Both Python and JS share two of the same performance limitations, as
pointed out in the Performance article [2] and associated research
[3]. The first is that every access to data or objects must be
checked for its type before any operation can continue. In Python,
that checking has an overhead of over 40%.
                                       
The second issue that both Python and JS have has to do with being
able to process data in parallel. JS can have multiple threads, but
only one thread gets executed at-a-time. Python has the global
interpreter lock (GIL), again limiting a running Python process to a
single thread at-a-time. These limitations do make programming easier
as they eliminate the possibility of concurrently modifying the same
data structure, likely corrupting data, but on modern CPUs, being
confined to a single processor thread is lame.
Codon gets around all three of these issues: it's compiled, variables
are typed at compile time, and it supports parallel execution. Codon
is much faster than other managed languages, and in some cases faster
than C/C++.
The Details

Codon sounds too good to be true: a version of Python that gets
compiled into machine code and supports multiple threads of
execution. But the "version of Python" part is actually an important
point: the builders of Codon have built a compiler that accepts a
large portion of Python, including all of the most commonly used
parts--but not all.

Codon got its start as Seq [4], a domain specific language (DSL)
written specifically for working with genomics data. A single
sequenced genome consists of a five gigabyte index and tens of
gigabytes for hash tables, meaning that programs or scripts written
to analyze genomes will always be working with enormous data sets.
The authors of Seq had several goals:

  * Allow researchers, who are not programmers by training, to use
    Pythonic syntax
  * Support parallel execution
  * Include specific features helpful for working with genome
    sequences

I am going to quote the Seq paper here, as I think the authors did a
fine job of explained how they proceeded:

To achieve this, we designed a compiler with a static type system. It
performs Python-style duck typing and runtime type checking at
compile time, completely eliminating the substantial runtime overhead
imposed by the reference Python implementation, CPython, and most
other Python implementations alike. Unlike these, we reimplemented
all of Python's language features and built-in facilities from the
ground up, completely independent of the CPython runtime. The Seq
compiler uses an LLVM backend, and in general uses LLVM as a
framework for performing general-purpose optimizations. Seq programs
additionally use a lightweight (<200 LOC) runtime library for I/O and
memory allocation; for the latter, CPython's reference counting is
replaced with the Boehm garbage collector, a widely-used conservative
GC that is a drop-in replacement for malloc. 

I want to expand on this a little, although you can learn a lot more
by reading the Seq paper, or the later paper [4] that explains Codon
in more detail.

Dynamic typing is very handy, and appropriate, for scripting
languages like Python. Programmers can write prototypes of programs
very quickly because Python abstracts away a lot of the detail in
exchange for execution performance. And, for many purposes, you don't
need Python to be ten or a hundred times faster when compiled because
the script you are running is not processing enormous data files.

Duck typing means that the Codon compiler uses hints found in the
source or attempts to deduce them to determine the correct type, and
assigns that as a static type. If you wanted to process data where
the type is unknown before execution, this may not work for you,
although Codon does support a union type that is a possible
workaround. In most cases of processing large data sets, the types
are known in advance so this is not an issue. 

Codon uses LLVM because of its flexibility and support for many
platforms and types of hardware, allowing the Codon authors to use it
in their backend for code generation and general-purpose
optimizations. Compilers begin by parsing the input file, using a set
of rules to convert the code into an abstract syntax tree (AST).
Later phases of the Codon compiler perform type checking, convert the
AST into intermediate representations (IR) that get optimized, then
finally converted into machine code through LLVM.

[pipeline]
The Codon pipeline: starting with Pythonic code, Codon parses it into
an abstract syntax tree, performs static type checking, converts the
AST into Codon Intermediate Representation, performs optimizations or
includes DSL extensions, before using LLVM for the conversion to
native code.

Codon uses OpenMP, an API for shared-memory multiprocessing (https://
openmp.org). Programmers using Codon indicate the loops that are
candidates for multiple threads using the @par decorator. @par
expects several parameters, similar to the pragmas used in C++ with
OpenMP, such as scheduling, chunk size and number of threads. You can
find out more about Codon's multithreading in the documentation [5]
and in [4].

/ C++ #pragma omp parallel for schedule(dynamic, 10) num_threads(8)
for (int i = 0; i < N; i++) c[i] = a[i] + b[i] # Codon @par(schedule=
'dynamic', chunk_size=10, num_threads=8) for i in range(N): c[i] = a
[i] + b[i]
Codon's @par decorator can be used alone, having the compiler choose
parameters for parallelization; or you can set them similarly as you
would using OpenMP in C++ (section 4.2 in [4])
Trials

I decided I would install Codon and try to use it. Installation is
fairly easy for Linux and Mac users, including support for Apple
silicon. At the time I wrote this article, Codon had not been ported
to Windows, but Ibrahim Numanagic, one of the developers, said that
the code doesn't haven't many system dependencies and should be easy
to port.

I downloaded the Linux/Debian version, ran the Python setup.py script
after installing two dependencies: Cython and astunparse. I wondered
about the inclusion of Cython, a tool for wrapping C++ libraries for
use in Python scripts, and was told it would probably be removed in
future versions.

Codon is not the same as Python, in that the developers have not yet
implemented all the features you would find in Python 3.10, and this,
along with duck typing, will likely cause problems if you just try
and compile existing scripts. I quickly ran into problems, as I
uncovered unsupported bits of Python, and, by looking at the Issues
section of their Github pages, so have other people.

Codon supports a JIT feature, so that instead of attempting to
compile complete scripts, you can just add a @codon.jit decorator to
functions that you think would benefit from being compiled or
executed in parallel, becoming much faster to execute.

The developers of Codon have formed a company, Exaloop, to support
the further development of Codon. They have also chosen a relatively
restrictive license, Business Source License, where the software must
be licensed for commercial use, but versions older than three years
convert to an Apache license. Non-commercial users are welcome to
experiment with Codon.
Conclusions

The developers of Codon have taken a unique approach to supporting
Python, in that they have built a compiler and added optimizations
that are not possible with other tools for Python. I had thought that
CPython was also a compiler, and learned that CPython is the
reference implementation of Python, with the 'C' meaning that it is
written in C. Numpy is a math library for Python, and PyPy adds a
form of JIT to Python.

Whether your projects will benefit from experimenting with Codon will
mean taking the time to read the documentation. Codon is not exactly
like Python. For example, there's support for Nvidia GPUs included as
well and I ran into a limitation when using a dictionary. I suspect
that some potential users will appreciate that Codon takes Python as
input and produces executables, making the distribution of code
simpler while avoiding disclosure of the source. Codon, with its LLVM
backend, also seems like a great solution for people wanting to use
Python for embedded projects.

My uses of Python are much simpler: I can process millions of lines
of nginx logs in seconds, so a reduction in execution time means
little to me. I do think there will be others who can take full
advantage of Codon.

Acknowledgements

I want to thank two of the developers of Codon, Ariya Shajii and
Ibrahim  Numanagic, for answering my many questions and reviewing a
draft of this article for technical accuracy. The analysis and
opinions are my own.

Appendix
References: 

[1] Rachel Gordon, Python-based compiler achieves orders-of-magnitude
speedups; MIT News, March 14, 2023: https://news.mit.edu/2023/
codon-python-based-compiler-achieve-orders-mag...

[2] David Lion, Adrian Chiu, Michael Stumm, Ding Yuan, Investigating
Manage Language Runtime Performance (June 2022): https://
www.usenix.org/publications/loginonline/investigating-managed-la...

[3] Ariya Shajii, Ibrahim Numanagic, Riyadh Baghdadi, Bonnie Berger,
and Saman Amarasinghe. 2019. Seq: A High-Performance Language for
Bioinformatics. Proc. ACM Program. Lang. 3, OOPSLA, Article 125
(October 2019), 29 pages. https://doi.org/10.1145/3360551, available
at nih.gov

[4] Ariya Shajii, Gabriel Ramirez, Haris Smajlovic, Jessica Ray,
Bonnie Berger, Saman Amarasinghe, and Ibrahim Numanagic. 2023. Codon:
A Compiler for High-Performance Pythonic Applications and DSLs.  In
Proceedings of the 32nd ACM SIGPLAN International Conference on
Compiler Construction (CC '23), February 25-26, 2023, Montreal, QC,
Canada. ACM, New York, NY, USA, 12 pages. https://doi.org/ 10.1145/
3578360.3580275

[5] Documentation page about using OpenMP in Codon: https://
docs.exaloop.io/codon/advanced/parallel

Article Categories: 
SRE
Programming
Last updated April 25, 2023
Authors: 
[rik-lisa13_1]
Rik Farrow has been a consultant for 43 years. He has written two
books, as well as worked as the technical editor for a UNIX magazine
and for two editions of a popular operating system book. He also
taught UNIX system administration and Internet security during the
90s internationally, and worked as a volunteer for USENIX program and
steering committees. Rik has been the editor of ;login: since 2005.
rik@rikfarrow.com

  * Log in or Register to post comments

Home

(c) USENIX
Website designed and built
by Giant Rabbit LLC

  *  
  *  
  *  
  *  

  * Privacy Policy
  * Contact Us

Sign up for Our Newsletter:
[                    ] [                    ] [                    ]
[Submit]