Wednesday, August 15, 2012

Numba and LLVMPy

It's been a busy year so far.    All the time spent on starting a new company, starting new open source projects, and keeping up with the open source projects that I have interest in, has meant that I haven't written nearly as many blog-posts as I planned on.   But, this is probably a good thing at least if you follow the wisdom attributed to Solomon --- which has been paraphrased in this quote attributed to Abraham Lincoln.

One of the things that has been on my mind for the past year is promoting array-oriented computing as a fundamental concept more developers need exposure to.    This is one reason that I am so excited that I've been able to find great people to work on Numba (which intends to be an array-oriented compiler for Python code).      I have given a few talks trying to convey what is meant by array-oriented computing, but the essence is captured by the difference between the life.py example in the Python code-base and a NumPy version of the same code.

I have seen many, many real world examples of very complicated code that could be simplified and sped up (especially on modern hardware) by just thinking about the problem differently using array-oriented concepts. 

One of the goals for Numba is to make it possible to write more vectorized code easily in Python without relying just on the pre-compiled loops that NumPy provides.    In order to write Numba, though, we first needed to resurrect the llvm-py project which provides easy access to the LLVM C++ libraries from Python.   This project is interesting in its own right and in addition to forming a base tool chain for Numba, allows you to do very interesting things (like instrument C-code compiled with Clang to bitcode), build a compiler, or import bitcode directly into Python (a la bitey).

While the documentation for llvm-py left me frustrated early on, I have to admit that llvm-py re-kindled some of the joy I experienced when being first exposed to Python.    Over the past several weeks we have worked to create the llvmpy project from llvm-py.   We now have a domain http://www.llvmpy.org, a GitHub repository, a website served from GitHub, and sphinx-based documents that can be edited via a pull request.    The documentation still needs a lot of improvement (even to get it to the state that the old llvm-py project was in), and contributions are welcome.  

I'm grateful to Fernando Perez, author of IPython, for explaining the 4-repository approach to managing an open source web-site and documentation via github.   We are using the same pattern that IPython uses for both numba and llvmpy.   It took a bit of work to get set-up but it's a nice approach that should make it easier for the community to maintain the documentation and web-site of both of these projects.    The idea is simple.   Use a project page (repo llvmpy.github.com) to be the web-site but generate this repo from another repo (llvmpy-webpage) which contains the actual sources.   I borrowed the scripts from the IPython project to build the page from the sources, check-out the llvmpy.github.com repo, copy the built pages to the repo, and then push the updates back to github which actually updates the site.    The same process (slightly modified) is used for the documentation except the sources for the docs live in the llvmpy repo under the docs directory and the built pages are pushed to the gh-pages branch of the llvmpy-doc repo.    If you are editing sources you only modify llvmpy/docs and llvmpy-webpage files.   The other repos are generated and pushed via scripts.

We are using the same general scheme to host the numba pages (although there I couldn't get the numba.org domain name and so I am using http://numba.pydata.org).   With llvmpy on a relatively solid footing, attention could be shifted to getting a Numba release out.  Today, we finally released Numba 0.1.   It took longer than expected after the SciPy conference mainly because we were hoping that some of the changes (still currently in a devel branch) to use an AST-based code-generator could be merged into the main-line before the release.  

Jon Riehl did the lion's share of the work to transform Numba from my early prototype to a functioning system in 0.1 with funding from Continuum Analytics, Inc.   Thanks to him, I can proudly say that Numba is ready to be tried and used.    It is still early software --- but it is ready for wider testing.   One of the problems you will have with Numba right now is error reporting.  If you make a mistake in the Python code that you are decorating, the error you get will not be informative -- so test the Python code before decorating it with Numba.    But, if you get things right, Numba can speed up your Python code by 200 times or more.    It's is really pretty fun to be able to write image-processing routines in Python.   PyPy can do this too, of course, but with Numba you have full integration with the CPython stack and you don't have to wait for someone to port the library you also want to use to PyPy.

Numba's road-map is being defined right now by the people involved in the project.  On the horizon is support for NumPy index expressions (slices, etc.), merging of the devel branch which uses the AST and Mark Florisson's minivect compiler, improving support for error checking, emitting calls to the Python C-API for code that cannot be type-specialized, and improving complex-number support.  Your suggestions are welcome.