NOTE: This is another guest post by Felix Winkelmann, the founder and one of the current maintainers of CHICKEN Scheme.
Introduction
Hi! This post is about a new project of mine, called "CRUNCH", a compiler for a statically typed subset of the programming language Scheme, specifically, the R7RS (small) standard.
The compiler runs on top of the CHICKEN Scheme system and produces portable C99 that can then be compiled and executed on any platform that has a decent C compiler.
So, why another Scheme implementation, considering that there already exists such a large number of interpreters and compilers for this language? What motivated me was the emergence of the PreScheme restoration project, a modernisation of "PreScheme", a statically typed compiler for Scheme that is used in the Scheme48 implementation. The original PreScheme was embedded into S48 and was used to generate the virtual machine that is targeted by the latter system. Andrew Whatson couragously started a project to port PreScheme to modern R7RS Scheme systems (PreScheme is written in Scheme, of course) with the intention of extending it and keep the quite sophisticated and interesting compiler alive.
The announcement of the project and some of the reactions that it spawned made me realize that there seems to be a genuine demand for a statically typed high-performance compiler for Scheme (even if just for a subset) that would close a gap in the spectrum of Scheme systems currently available.
There are compilers and interpreters for all sorts of platforms, ranging from tiny, minimal interpreters to state-of-the-art compilers, targeting about every imaginable computer system. But most Schemes need relatively complex runtime systems, have numerous dependencies, or have slow performance, which is simply due to the powerful semantics of the language: dynamic typing, automatic memory management (garbage collection), first class continuations, etc. which all have a cost in terms of overhead.
What is needed is a small, portable compiler that generates more or less "natural" C code with minimal dependencies and runtime system that supports at least the basic constructs of the language and that puts an emphasis on producing efficient code, even if some of the more powerful features of Scheme are not available. Such a system would be perfect for writing games, virtual machines, or performance-sensitive libraries for other programs where you still want to use a high-level language to master the task of implementing complex algorithms, while keeping as close to C/C++ as possible. Another use is as a tool to write bare-metal code for embedded systems, device drivers and kernels for operating systems.
There are some high-performance compilers like Bigloo or Stalin. But the former still needs a non-trivial runtime-system and the latter is brittle and not actively maintained. Also, one doesn't necessarily need support for the full Scheme language and if one is willing to drop the requirement of dynamic typing, a lot of performance can be gained while still having a relatively simple compiler implementation. Even without continuations, dynamic typing, the full numeric tower and general tail call optimization, the powerful metaprogramming facilities of Scheme and the clear and simple syntax make it a useful notation for many uses that require a high level of abstraction. Using type inference mostly avoids having to annotate a source program with type information and thus allows creating code which still is to a large part standard Scheme code that can (with a little care) be tested on a normal Scheme system before compiling it to more efficient native code.
History
There was a previous extension for CHICKEN, also called "crunch", that compiled to C++, used a somewhat improvised type-inferencing algorithm and was severely restricted. It was used to allow embedding statically typed code into normal CHICKEN Scheme programs. The new CRUNCH picks up this specific way of use, but is a complete reimplementation that targets C99, has a more sophisticated type system, offers some powerful optimizations and has the option to create standalone programs or separately compilable C modules.
Installation
CRUNCH is only available for the new major release of CHICKEN (version 6). You will need to build and install a development snapshot containing the sources of this release, which is still unofficial and under development:
$ wget https://code.call-cc.org/dev-snapshots/2024/12/09/chicken-6.0.0pre1.tar.gz $ tar xfz chicken-6.0.0pre1.tar.gz $ cd chicken-6.0.0pre1 $ ./configure --prefix <install location> $ make $ make install $ <install location>/bin/chicken-install -test crunch
CHICKEN has minimal dependencies (a C compiler, sh(1) and GNU make(1)), so don't be put off to give it a try.
Basic Operation and Usage
CRUNCH can be used as a batch compiler, translating Scheme to standalone C programs or can be used at compile time for embedded fragments of Scheme code, automatically creating the necessary glue to use the compiled code from CHICKEN Scheme. The compiler itself is also exposed as a library function, making various scenarios possible where you want to programmatically convert Scheme into native code.
There are four modes of using CRUNCH:
1. Embedding:
;; embed compiled code into Scheme (called using the foreign function interface): (import crunch) (crunch (define (stuff arg) ...) ) (stuff 123)
2. Standalone:
$ cat hello.scm (define (main) (display "Hello world\n")) $ chicken-crunch hello.scm -o hello.c $ cc hello.c $(chicken-crunch -cflags -libs) $ ./a.out
3. Wrap compiled code in Scheme stubs to use it from CHICKEN:
$ cat fast-stuff.scm (module fast-stuff (do-something) (import (scheme base)) (define (do-something) ...)) $ cat use-fast-stuff.scm (import fast-stuff) (fast-wait) $ chicken-crunch -emit-wrappers wrap.scm -J fast-stuff.scm -o fast-stuff.c $ csc -s wrap.scm fast-stuff.c -o wrap.so $ csc use-fast-stuff.scm -o a.out
4. Using CRUNCH as a library:
#;1> (import (crunch compiler)) #;2> (crunch '(begin (define (main) (display "Hello world\n")) '(output-file "out.c") )
Module system and integration into CHICKEN
CRUNCH uses the module system and syntactic metaprogramming facilities of CHICKEN. Syntax defined in CHICKEN modules can be used in CRUNCH code and vice versa. CRUNCHed code can produce "import libraries", like in CHICKEN to provide separate compilation of modules.
Modules compiled by CRUNCH may only export procedures and a standalone program is expected to export a procedure called main. This simplifies interfacing to C and makes callbacks from C into Scheme straightforward.
As in PreScheme, toplevel code is evaluated at compile time. Most assigned values can be accessed in compiled code.
;; build a table of sine values at compile time (define sines (list->f64vector (list-tabulate 360 (lambda (n) (sin (/ (* n π) 180))) ) ) )
Restrictions
A number of significant restrictions apply to Scheme code compiled with CRUNCH:
- No support for multiple values
- No support for first class continuations
- Tail calls can only be optimized into loops for local procedure calls or calls that can be inlined
- Closures (procedures capturing free variables) are not supported
- Procedures can have no "rest" argument
- Imported global variables can not be modified
- Currently only 2-argument arithmetic and comparison operators are supported
- It must be possible to eliminate all free variables via inlining and lambda-lifting
This list looks quite severe but it should be noted that a large amount of idiomatic Scheme code can still be compiled that way. Also, CRUNCH does not attempt to be a perfect replacement for a traditional Scheme system, it merely tries to provide an efficient programming system for domains where performance and interoperability with native code are of high importance.
Datums are restricted to the following types:
- basic types: integer, float, complex, boolean, char, pointer
- procedure types
- strings
- vectors of any of the basic types, and vectors for specific numeric types
- structs and unions
Note the absence of pairs, lists and symbols. Structures and unions are representations of the equivalent C object and can be passed by value or by pointer.
The Runtime System
The runtime system required to run compiled code is minimal and contained in a single C header file. CRUNCH supports UNICODE and the code for UNICODE-aware case conversions and some other non-trivial operations is provided in a separate C file. UNICODE support is optional and can be disabled.
No garbage collector is needed. Non-atomic data like strings and vectors are managed using reference counting without any precautions taken to avoid circular data, which is something that is unlikely to happen by accident with the data types currently supported.
Optimizations
CRUNCH provides a small number of powerful optimizations to ensure decent performance and to allow more or less idiomatic Scheme code to be compiled. The type system is not fully polymorphic, but allows overloading of many standard procedures to handle generic operations that accept a number of different argument types. Additionally, a "monomorphization" optimization is provided that clones user procedures that are called with different argument types. Standard procedures that accept procedures are often expanded inline which further increases the opportunities for inlining of procedure calls - this reduces the chance of having "free" variables, which the compiler must be able to eliminate as it doesn't support closures. Aggressively moving lexically bound variables to toplevel (making them globals) can further reduce the amount of free variables.
Procedures that are called only once are inlined at the call site ("integrated"). Fully general inlining is not supported, we leave that to the C compiler. Integrated procedures that call themselves recursively in tail position are turned into loops.
A crucial transformation to eliminate free variables is "lambda lifting", which passes free variables as extra arguments to procedures that do not escape and whose argument list can be modified by the compiler without interfering with user code:
(let ((x 123)) ; ... more code ... (define (foo y) (+ x y)) ; ... more code ... (foo 99) ) ~> (let ((x 123)) ; ... more code ... (define (foo y x) (+ x y)) ; ... more code ... (foo 99 x) )
Monomorphization duplicates procedures called with arguments of (potentially) different types:
(define (inc x) (+ x 1)) (foo (inc 123) (inc 99.0)) ~> ;; a "variant" represents several instantiations of the original procedure (define inc (%variant (lambda (x'int) (+ x'int 1)) ; "+" will be specialized to integer (lambda (x'float) (+ x'float 1))))) ; ... and here to float (foo (inc'int 123) (inc'float 99.0))
Certain higher-order primitives are expanded inline:
(vector-for-each v (lambda (x) ...) ) ~> ; (roughly) (let loop ((i 0)) (unless (>= i (vector-length v)) (let ((x (vector-ref v i))) ... (loop (+ i 1))) ) )
A final pass removes unused variables and procedure arguments and code that has no side effects and has unused results.
Together these transformations can get you far enough to write relatively complex Scheme programs while ensuring the generated C code is tight, and with a little effort, easy to understand (in case you need to verify the translation) and (hopefully) does what it is intended to do.
Performance
Code compiled with CRUNCH should be equivalent to a straightforward translation of the Scheme code to C. Scalar values are not tagged nor boxed and are represented with the most fitting underlying C type. There is no extra overhead introduced by the translation, with the following exceptions:
- Vector- and string accesses perform bound checks (these can be disabled)
- Using vectors and strings will add some reference counting overhead
If you study the generated code you will encounter many useless variable assignments and some unused values in statement position, these will be removed by the C compiler, also unexported procedures are declared static and so can also very often be inlined by the C compiler leading to little or no overhead.
The Debugger
For analyzing type errors, a static debugger is included, that presents a graphical user interface. When the -debug option is given, a Tcl/Tk script is invoked in a subprocess that shows the internal node tree and can be used to examine the transformed code and the types of sub-expressions, together with the corresponding source code line (if available). Should the compilation abort with an error, the shown node-tree is the state of the program at the point where the error occurred.
Differences to PreScheme
CRUNCH is inspired by and very similar to PreScheme, but has a number of noteworthy differences. CRUNCH tries to be as conformant to R7RS (small) as possible and handles UNICODE characters and strings. It also is tightly integrated into CHICKEN, allowing nearly seamless embedding of high-performance code sections. Macros and top-level code can take full advantage of the full CHICKEN Scheme language and its ecosystem of extension libraries.
PreScheme supports multiple values, while CRUNCH currently does not.
PreScheme uses explicit allocation and deallocation for compound data objects, while CRUNCH utilizes reference counting, removing the need to manually clean up resources.
I'm not too familiar with the PreScheme compiler itself, but I assume it provides more sophisticated optimizations, as it does convert to Static Single Assignment form (SSA), so it is to be expected that the effort to optimise the code is quite high. On the other hand, modern C compilers already provide a multitude of powerful optimizations, so it is not clear how many advantages lower-level optimizations will bring.
Future Plans
There is a lot of room for improvements. Support of multiple vales would be nice, and not too hard to implement, but will need to follow a convention that should not be too awkward to use on the C side. Also, the support for optional arguments is currently quite weak; the ability to specify default values is something that needs to be added.
Primitives for many POSIX libc system calls and library functions should be straightforward to use in CRUNCH code, at least the operations provided by the (chicken file posix) module.
What would be particularly nice would be if the compiler detects state machines - mutually recursive procedures that call each other in tail position.
Other targets are possible, like GPUs. I don't know anything about that, so if you are interested and think you can contribute, please don't hesitate to contact me.
Disclaimer
CRUNCH is currently alpha-software. It certainly contains numerous bugs and shortcomings that will hopefully be found and corrected as the compiler is used. If you are interested, I invite you to give it a try. Contact me directly or join the #chicken IRC channel on Libera.chat, if you have questions, want to report bugs, if you would like to suggest improvements or if you just want to know more about it.
All feedback is very welcome!
Links
The CRUNCH manual can be found at the CHICKEN wiki, the source code repository is here.