The 3 most recent posts (archive)
Let's CRUNCH! Posted on 2024-12-16
NOTE: This is another guest post by Felix Winkelmann, the founder and one of the current maintainers of CHICKEN Scheme.
Introduction
Hi! This post is about a new project of mine, called "CRUNCH", a compiler for a statically typed subset of the programming language Scheme, specifically, the R7RS (small) standard.
The compiler runs on top of the CHICKEN Scheme system and produces portable C99 that can then be compiled and executed on any platform that has a decent C compiler.
So, why another Scheme implementation, considering that there already exists such a large number of interpreters and compilers for this language? What motivated me was the emergence of the PreScheme restoration project, a modernisation of "PreScheme", a statically typed compiler for Scheme that is used in the Scheme48 implementation. The original PreScheme was embedded into S48 and was used to generate the virtual machine that is targeted by the latter system. Andrew Whatson couragously started a project to port PreScheme to modern R7RS Scheme systems (PreScheme is written in Scheme, of course) with the intention of extending it and keep the quite sophisticated and interesting compiler alive.
The announcement of the project and some of the reactions that it spawned made me realize that there seems to be a genuine demand for a statically typed high-performance compiler for Scheme (even if just for a subset) that would close a gap in the spectrum of Scheme systems currently available.
There are compilers and interpreters for all sorts of platforms, ranging from tiny, minimal interpreters to state-of-the-art compilers, targeting about every imaginable computer system. But most Schemes need relatively complex runtime systems, have numerous dependencies, or have slow performance, which is simply due to the powerful semantics of the language: dynamic typing, automatic memory management (garbage collection), first class continuations, etc. which all have a cost in terms of overhead.
What is needed is a small, portable compiler that generates more or less "natural" C code with minimal dependencies and runtime system that supports at least the basic constructs of the language and that puts an emphasis on producing efficient code, even if some of the more powerful features of Scheme are not available. Such a system would be perfect for writing games, virtual machines, or performance-sensitive libraries for other programs where you still want to use a high-level language to master the task of implementing complex algorithms, while keeping as close to C/C++ as possible. Another use is as a tool to write bare-metal code for embedded systems, device drivers and kernels for operating systems.
There are some high-performance compilers like Bigloo or Stalin. But the former still needs a non-trivial runtime-system and the latter is brittle and not actively maintained. Also, one doesn't necessarily need support for the full Scheme language and if one is willing to drop the requirement of dynamic typing, a lot of performance can be gained while still having a relatively simple compiler implementation. Even without continuations, dynamic typing, the full numeric tower and general tail call optimization, the powerful metaprogramming facilities of Scheme and the clear and simple syntax make it a useful notation for many uses that require a high level of abstraction. Using type inference mostly avoids having to annotate a source program with type information and thus allows creating code which still is to a large part standard Scheme code that can (with a little care) be tested on a normal Scheme system before compiling it to more efficient native code.
History
There was a previous extension for CHICKEN, also called "crunch", that compiled to C++, used a somewhat improvised type-inferencing algorithm and was severely restricted. It was used to allow embedding statically typed code into normal CHICKEN Scheme programs. The new CRUNCH picks up this specific way of use, but is a complete reimplementation that targets C99, has a more sophisticated type system, offers some powerful optimizations and has the option to create standalone programs or separately compilable C modules.
Installation
CRUNCH is only available for the new major release of CHICKEN (version 6). You will need to build and install a development snapshot containing the sources of this release, which is still unofficial and under development:
$ wget https://code.call-cc.org/dev-snapshots/2024/12/09/chicken-6.0.0pre1.tar.gz $ tar xfz chicken-6.0.0pre1.tar.gz $ cd chicken-6.0.0pre1 $ ./configure --prefix <install location> $ make $ make install $ <install location>/bin/chicken-install -test crunch
CHICKEN has minimal dependencies (a C compiler, sh(1) and GNU make(1)), so don't be put off to give it a try.
Basic Operation and Usage
CRUNCH can be used as a batch compiler, translating Scheme to standalone C programs or can be used at compile time for embedded fragments of Scheme code, automatically creating the necessary glue to use the compiled code from CHICKEN Scheme. The compiler itself is also exposed as a library function, making various scenarios possible where you want to programmatically convert Scheme into native code.
There are four modes of using CRUNCH:
1. Embedding:
;; embed compiled code into Scheme (called using the foreign function interface): (import crunch) (crunch (define (stuff arg) ...) ) (stuff 123)
2. Standalone:
$ cat hello.scm (define (main) (display "Hello world\n")) $ chicken-crunch hello.scm -o hello.c $ cc hello.c $(chicken-crunch -cflags -libs) $ ./a.out
3. Wrap compiled code in Scheme stubs to use it from CHICKEN:
$ cat fast-stuff.scm (module fast-stuff (do-something) (import (scheme base)) (define (do-something) ...)) $ cat use-fast-stuff.scm (import fast-stuff) (fast-wait) $ chicken-crunch -emit-wrappers wrap.scm -J fast-stuff.scm -o fast-stuff.c $ csc -s wrap.scm fast-stuff.c -o wrap.so $ csc use-fast-stuff.scm -o a.out
4. Using CRUNCH as a library:
#;1> (import (crunch compiler)) #;2> (crunch '(begin (define (main) (display "Hello world\n")) '(output-file "out.c") )
Module system and integration into CHICKEN
CRUNCH uses the module system and syntactic metaprogramming facilities of CHICKEN. Syntax defined in CHICKEN modules can be used in CRUNCH code and vice versa. CRUNCHed code can produce "import libraries", like in CHICKEN to provide separate compilation of modules.
Modules compiled by CRUNCH may only export procedures and a standalone program is expected to export a procedure called main. This simplifies interfacing to C and makes callbacks from C into Scheme straightforward.
As in PreScheme, toplevel code is evaluated at compile time. Most assigned values can be accessed in compiled code.
;; build a table of sine values at compile time (define sines (list->f64vector (list-tabulate 360 (lambda (n) (sin (/ (* n π) 180))) ) ) )
Restrictions
A number of significant restrictions apply to Scheme code compiled with CRUNCH:
- No support for multiple values
- No support for first class continuations
- Tail calls can only be optimized into loops for local procedure calls or calls that can be inlined
- Closures (procedures capturing free variables) are not supported
- Procedures can have no "rest" argument
- Imported global variables can not be modified
- Currently only 2-argument arithmetic and comparison operators are supported
- It must be possible to eliminate all free variables via inlining and lambda-lifting
This list looks quite severe but it should be noted that a large amount of idiomatic Scheme code can still be compiled that way. Also, CRUNCH does not attempt to be a perfect replacement for a traditional Scheme system, it merely tries to provide an efficient programming system for domains where performance and interoperability with native code are of high importance.
Datums are restricted to the following types:
- basic types: integer, float, complex, boolean, char, pointer
- procedure types
- strings
- vectors of any of the basic types, and vectors for specific numeric types
- structs and unions
Note the absence of pairs, lists and symbols. Structures and unions are representations of the equivalent C object and can be passed by value or by pointer.
The Runtime System
The runtime system required to run compiled code is minimal and contained in a single C header file. CRUNCH supports UNICODE and the code for UNICODE-aware case conversions and some other non-trivial operations is provided in a separate C file. UNICODE support is optional and can be disabled.
No garbage collector is needed. Non-atomic data like strings and vectors are managed using reference counting without any precautions taken to avoid circular data, which is something that is unlikely to happen by accident with the data types currently supported.
Optimizations
CRUNCH provides a small number of powerful optimizations to ensure decent performance and to allow more or less idiomatic Scheme code to be compiled. The type system is not fully polymorphic, but allows overloading of many standard procedures to handle generic operations that accept a number of different argument types. Additionally, a "monomorphization" optimization is provided that clones user procedures that are called with different argument types. Standard procedures that accept procedures are often expanded inline which further increases the opportunities for inlining of procedure calls - this reduces the chance of having "free" variables, which the compiler must be able to eliminate as it doesn't support closures. Aggressively moving lexically bound variables to toplevel (making them globals) can further reduce the amount of free variables.
Procedures that are called only once are inlined at the call site ("integrated"). Fully general inlining is not supported, we leave that to the C compiler. Integrated procedures that call themselves recursively in tail position are turned into loops.
A crucial transformation to eliminate free variables is "lambda lifting", which passes free variables as extra arguments to procedures that do not escape and whose argument list can be modified by the compiler without interfering with user code:
(let ((x 123)) ; ... more code ... (define (foo y) (+ x y)) ; ... more code ... (foo 99) ) ~> (let ((x 123)) ; ... more code ... (define (foo y x) (+ x y)) ; ... more code ... (foo 99 x) )
Monomorphization duplicates procedures called with arguments of (potentially) different types:
(define (inc x) (+ x 1)) (foo (inc 123) (inc 99.0)) ~> ;; a "variant" represents several instantiations of the original procedure (define inc (%variant (lambda (x'int) (+ x'int 1)) ; "+" will be specialized to integer (lambda (x'float) (+ x'float 1))))) ; ... and here to float (foo (inc'int 123) (inc'float 99.0))
Certain higher-order primitives are expanded inline:
(vector-for-each v (lambda (x) ...) ) ~> ; (roughly) (let loop ((i 0)) (unless (>= i (vector-length v)) (let ((x (vector-ref v i))) ... (loop (+ i 1))) ) )
A final pass removes unused variables and procedure arguments and code that has no side effects and has unused results.
Together these transformations can get you far enough to write relatively complex Scheme programs while ensuring the generated C code is tight, and with a little effort, easy to understand (in case you need to verify the translation) and (hopefully) does what it is intended to do.
Performance
Code compiled with CRUNCH should be equivalent to a straightforward translation of the Scheme code to C. Scalar values are not tagged nor boxed and are represented with the most fitting underlying C type. There is no extra overhead introduced by the translation, with the following exceptions:
- Vector- and string accesses perform bound checks (these can be disabled)
- Using vectors and strings will add some reference counting overhead
If you study the generated code you will encounter many useless variable assignments and some unused values in statement position, these will be removed by the C compiler, also unexported procedures are declared static and so can also very often be inlined by the C compiler leading to little or no overhead.
The Debugger
For analyzing type errors, a static debugger is included, that presents a graphical user interface. When the -debug option is given, a Tcl/Tk script is invoked in a subprocess that shows the internal node tree and can be used to examine the transformed code and the types of sub-expressions, together with the corresponding source code line (if available). Should the compilation abort with an error, the shown node-tree is the state of the program at the point where the error occurred.
Differences to PreScheme
CRUNCH is inspired by and very similar to PreScheme, but has a number of noteworthy differences. CRUNCH tries to be as conformant to R7RS (small) as possible and handles UNICODE characters and strings. It also is tightly integrated into CHICKEN, allowing nearly seamless embedding of high-performance code sections. Macros and top-level code can take full advantage of the full CHICKEN Scheme language and its ecosystem of extension libraries.
PreScheme supports multiple values, while CRUNCH currently does not.
PreScheme uses explicit allocation and deallocation for compound data objects, while CRUNCH utilizes reference counting, removing the need to manually clean up resources.
I'm not too familiar with the PreScheme compiler itself, but I assume it provides more sophisticated optimizations, as it does convert to Static Single Assignment form (SSA), so it is to be expected that the effort to optimise the code is quite high. On the other hand, modern C compilers already provide a multitude of powerful optimizations, so it is not clear how many advantages lower-level optimizations will bring.
Future Plans
There is a lot of room for improvements. Support of multiple vales would be nice, and not too hard to implement, but will need to follow a convention that should not be too awkward to use on the C side. Also, the support for optional arguments is currently quite weak; the ability to specify default values is something that needs to be added.
Primitives for many POSIX libc system calls and library functions should be straightforward to use in CRUNCH code, at least the operations provided by the (chicken file posix) module.
What would be particularly nice would be if the compiler detects state machines - mutually recursive procedures that call each other in tail position.
Other targets are possible, like GPUs. I don't know anything about that, so if you are interested and think you can contribute, please don't hesitate to contact me.
Disclaimer
CRUNCH is currently alpha-software. It certainly contains numerous bugs and shortcomings that will hopefully be found and corrected as the compiler is used. If you are interested, I invite you to give it a try. Contact me directly or join the #chicken IRC channel on Libera.chat, if you have questions, want to report bugs, if you would like to suggest improvements or if you just want to know more about it.
All feedback is very welcome!
Links
The CRUNCH manual can be found at the CHICKEN wiki, the source code repository is here.
What to expect from CHICKEN 6 Posted on 2024-11-18
NOTE: This is a guest post by Felix Winkelmann, the founder and one of the current maintainers of CHICKEN Scheme.
Introduction
This article is about the changes in the next major version of CHICKEN Scheme.
The current version is 5.4.0. The next major version, 6.0.0, is mostly implemented. Currently we are adding final touches before preparing the next release. If you are already familiar with CHICKEN, this article will help you get a more detailed view of what has been done. If you are not into CHICKEN or even Scheme, this article may still give you an insight into what's involved in the development and maintenance of a programming language project. You may also find it interesting to know how we address issues like portability and backwards-compatibility. There are also some juicy details on implementation techniques.
No previous knowledge of CHICKEN Scheme is required, but it will help if you are familiar with common terms used in programming language design and Lisp-y languages.
Versions
CHICKEN uses the usual major.minor.patch versioning scheme, where we bump major versions only for significant changes that break compatibility with older versions. CHICKEN has a relatively large community for a non-mainstream language and has lots of contributed extension libraries called "eggs". Breaking backwards compatibility in non-major versions adds implementation effort for users that just want keep their CHICKEN up to date and any eggs they're using should also keep working.
The process of avoiding breakage can sometimes become challenging. We may, for example, provide two different implementations for one and the same thing, to allow adoption of the new feature while keeping old code working. Typically this happens when the new feature is safer, easier to use, faster or more standards compliant than the old. We also remove features in stages. First we mark it as deprecated, and delete it later. This allows users to upgrade their code gradually.
On major version changes, we create a new mirror of the egg "repository", where most of the extensions are stored. Externally contributed eggs can choose when and how to publish an extension for specific versions. This is described in a previous post on this blog.
Major version changes also usually bump the "binary version". This is a suffix to the shared runtime library name to avoid intermixing tools and programs written for one version with the libraries for another.
We started CHICKEN major version 6 to introduce new features that were dearly required but were impossible to add without changing external interfaces, in particular what modules exist and what they export. Specifically, we needed full support for UNICODE strings and compliance with the R7RS (small) language standard.
Both features were already available in a rudimentary manner as external libraries. They were not fully integrated, a fact that showed in various places and could lead to subtle bugs as core and extension code assumed differences in string representation and the behaviour of standard procedures. The same approach was used for integrating the "full numeric tower" (the full set of numeric types, including rationals, big integers and complex numbers), which was formerly a library and was moved into the core system with the 5.0.0 release.
We'll now describe the most noteworthy changes made during this major version transition.
UNICODE
A major shortcoming of CHICKEN up to now was that it didn't support full UNICODE strings. All string data was assumed to consist of 8-bit characters in the ASCII or Latin-1 range. Internationalisation of source code and applications is unavoidable and libraries and operating systems have moved towards UTF-8 as a standard character encoding. So we saw no choice but to find a reasonably efficient way of using the full range of possible characters in all parts of the system.
There is an open-ended space of design choices regarding how to efficiently implement UNICODE strings. The Damocles Sword of unanticipated performance degradation constantly looms over the head of the language implementer, so finding the ideal solution in terms of memory use, speed and simplicity is not easy. Some implementations go to great lengths by providing complex storage schemes or multiple representations depending on a strings' content. In the end, we think a simple representation serves everybody better by being easier to understand and maintain.
Also it is not entirely sure that (say) a dynamic multi-representation approach pays off sufficiently. Too many factors come into play, like string usage patterns in applications, memory management and implementation overhead. Data exchange at the boundaries of operating system and foreign code also have to be taken into account. You want to avoid unnecessary conversions and copying, especially because CHICKEN is geared towards easy interfacing to external libraries and OS functionality.
We decided to use an UTF-8 representation internally. Many operating systems and applications already support UTF-8, which removes the need for costly conversions. UTF-8 is also backwards compatible to ASCII and keeps storage requirements at a minimum.
Since UTF-8 is a multi-byte encoding, character lookup in a string is of linear complexity. Characters have to be scanned when addressing a specific index in a string. To avoid repeatedly scanning the same section of a string when iterating over it, we use a simple cache slot in the string that maps byte-offsets to code point indices and holds the offset/index pair of the last access.
String representation
A quick recap regarding how strings are currently stored in CHICKEN is in order. If you are interested, there's also an older post on this blog with a more detailed explanation of the data representation used by CHICKEN.
Characters are stored as immediates with the special bit pattern 1110 (type code 111):
This gives us enough bits (even on 32-bit platforms) to hold all code points in the Basic Multilingual Plane (code points in the range 32 - 0x1fffff). CHICKEN 5 already supported characters in that range, but strings were still restricted to having 8-bit elements.
Up to CHICKEN 5 a string was represented by a single byteblock:
In CHICKEN 6 we add an indirection:
As you can see, the string itself is a normal block containing a pointer to a byteblock holding the UTF-8 byte sequence and a trailing zero byte. The "Count" slot is a fixnum with the number of code points in the string (the length). "Offset" and "Index" act as a cache for the byte-offset and code point-index of the last access. They are reset to zero if the string changes its length due to a character's width changing at an offset before the cached point.
An indirection is necessary regardless of the other details of the representation: as UTF-8 is a multi-byte encoding, destructively modifying characters in a string may grow or shrink the total byte-sequence. The string must still keep its identity and be indistinguishable from the original string (in terms of the primitive eq? procedure).
One obvious optimisation we can do here is that character accesses for strings where the length ("Count") and the length of the character data (encoded in the header of the byteblock) minus one is equal: then we can simply index by byte offset.
Alternative representations are possible. I believe the following is used in Chibi Scheme: keep a list of "chunks", i.e. multiple byte-offset/code point-index pairs per string. You can traverse this list to obtain the first chunk of string data containing the index you want to access.
In CHICKEN this would probably be represented thus:
This way we can find the offset for the index right at or before the location addressed. This reduces the amount of scanning to the part inside a chunk pointed to by the offset/index pair.
Such an approach of maintaining the offset/index list is more complex and keeping it up to date is more effort than the simple offset/index cache. Should the performance impact of the simpler representation turn out to be too large, the alternative approach may still be an option to try.
This leads me to the subject of benchmarks. We do have a benchmark suite holding a number of micro-benchmarks and CHICKEN 6 has been compared to previous versions using this suite. Nevertheless, the results of micro-benchmarking have to be taken with a grain of salt.
The behaviour of real-world applications may differ considerably depending on the memory consumption, memory address patterns and the type and amount of data processed. Reasoning about performance is especially difficult due to the complex caching mechanisms of contemporary hardware and operating systems. Therefore we will try to use the simplest approach that has a good chance of providing sufficient performance while keeping the implementation effort and complexity at a minimum.
Dealing with the outside world
There's another point we have to address when considering strings that are received from external sources like OS APIs, from files and from the foreign function interface. What to do when encountering invalid UTF-8 byte sequences, for example when reading file or directory names? Signal an error? Convert to some other representation or mark the location in the byte sequence using some special pattern?
We decided to basically ignore the problem. Strings are accepted whether valid or not. Only when decoding a string we distinguish between valid and invalid sequences. When indexing and scanning a string's byte sequence, we return the invalid byte as a "surrogate pair end" code point that has the value 0xDCxx. This approach allows us to store the byte in the lower 8 bits of the code point. When inserting a character in a string that has such a value, we simply copy the byte. As I understand it, this is the approach used by the "surrogateescape" error handler in PEP 383. However, we do this masking every time we decode an unexpected byte (and do the inverse when encoding).
Say we received a string from an external source containing the following byte sequence:
This is invalid UTF-8. Extracting character by character with the described method would yield the following code point values:
Inserting such a surrogate pair end code point will inject the value of the lower byte. For example, calling (string-set! str #\xdcfe), where str is the above invalid utf-8 encoded string would yield the following byte sequence:
The advantage is that we can simply accept and pass strings containing invalid sequences as if nothing happened. We don't need to do any conversions and checks. We can also access items in the character sequence and store them back without having to worry about the validity of the sequence with respect to the UTF-8 encoding. The process is "lossless".
The disadvantage is that we have to perform an additional check when encoding or decoding UTF-8 sequences. Again we decided to reduce copying and transcoding overhead and rather have complete transparency regardless of the source of the string. Furthermore no validation is performed for incoming strings. The R7RS standard procedure utf8->string which converts bytevectors into strings does validation, though to at least ensure that strings created from binary data are always correctly encoded.
A final issue is UNICODE handling on Windows platforms. There, the OS API entry-points that are UNICODE aware use UTF-16 for strings like filenames or the values of environment variables. On Windows there is no choice but to encode and decode from UTF-8 to UTF-16 and back when interfacing with the OS.
Port encodings
When accessing files, we still want to be able to read and write data in an 8-bit encoding or in binary. We may also want to support additional encodings, even though these are currently not provided by default so we'll need to associate an "encoding" property to "ports". Ports are the Scheme objects that provide access to files and other streams of data, like traffic received from a network. The encoding is selected when opening a port using additional arguments to standard procedures that create ports for textual input and output like open-input-port/open-output-port.
Ports internally hold a table of "port methods", roughly similar to how object-oriented programming systems attach behaviour to data objects of a particular class. The port methods in former versions of CHICKEN included the low-level counterpart of the standard operations peek-char, read-char and read-string for input ports and write-char/write-string (among others) for output ports. The major change here is to replace the string I/O methods with methods that act upon bytevectors.
An internal mechanism delegates the operations for encoding and decoding or scanning for the next character in a stream to an internal hook procedure. Additional encodings can be registered by using a built-in procedure to extend the hook. Currently supported are binary, UTF-8 and Latin-1 encodings. Support for a larger set of encodings can be done through extensions and thus can be developed separately from the core system. Port encodings can be accessed using port-encoding. They can also be changed using the SRFI-17 setter for port-encoding, because encodings need sometimes to be changed while the port is still open.
R7RS does not define whether binary I/O standard procedures are allowed to operate on textual ports and vice versa. In CHICKEN we do not restrict the set of operations depending on the port type, so you can read and write bytevectors to and from a textual port and the other way round. We think this is more practical and intuitive - it makes more sense to read and write binary data as bytevectors and have dedicated operations for this.
R7RS support
The second big change for CHICKEN 6 is proper R7RS (small) compliance. Like with the UTF-8 support this was previously provided by an extension library, but using it needed some minor extra steps to set up. Now, all syntactic definitions of the (scheme base) module are available by default (most notably define-library) without requiring any further action.
Deciding what is available by default in a compilation unit or interpretation environment is a bit of a problem: to make it easy to get going when experimenting or scripting, we defaulted to having all standard R5RS procedures and macros available in the base environment of the interpreter (csi), together with the imports from the (chicken base) module. Compiled code was slightly more restricted but defaulted also to R5RS.
In CHICKEN 5 the scheme module referred to the R5RS standard procedures. To avoid breaking too much existing code this is still the case. So now, scheme is an alias for the R7RS (scheme r5rs) library module that exports all bindings of the former language standard. But to avoid duplicating the set of exported identifiers over several core modules, certain functionality has been moved from (chicken base) to (scheme base) and is thus not available in the default toplevel environment.
To make a long story short, the switch makes it necessary to access certain standard bindings like open-input-string by importing additional modules like (scheme base). This is not necessarily a bad thing, as it incentivises the user to write code in a more standard compliant way. But it all feels a bit clunky and may have been a suboptimal choice. Note that just adding an (import (scheme base)) is usually enough to make existing code run. We will see how that works out.
All required (scheme ...) modules are implemented and can be imported using their full functionality. Two notable changes that influence backwards compatibility are generative record types and hexadecimal escape sequences in string literals.
Formerly record types defined with define-record-type were non-generative: re-defining a record type with the same name, even if the type definition is completely different, would not create a new type. Instances of the former definition would also be of the newly defined type, provided the type name is the same. Now every definition of a record type creates a completely new type, regardless of the name. This is of course more correct and much safer as it doesn't invalidate previously defined instances.
A second (and much more annoying) change is that R7RS requires hex escape sequences in string literals to be terminated by a semicolon. Unfortunately the change is incompatible to the convention used in most programming languages, including existing many Lisp and Scheme implementations.
What in CHICKEN 5 would looked like this:
"the color \x1b[31mRED\x1b[0m"
in CHICKEN 6 (and R7RS) must now be (note the semicolon):
"the color \x1b;[31mRED\x1b;[0m"
The motivation here was probably to avoid having dedicated escape sequences for values that require more than 2 hex digits (e.g. \uNNNN). The change is particularly bad from a compatibility point of view. All string literals that contain the old style of escape sequences must be changed. To keep code working in both CHICKEN 5 and 6 you can use the 4-digit \uNNNN escape sequence which is still valid in all versions.
Foreign Function Interface changes
CHICKEN has a very easy to use foreign function interface, which mostly derives from the fact that the compiler generates C. The alternative approach are binary FFIs that use native code to interface with C code, like libffi, which must reproduce a lot of ABI details to safely interface with C libraries, things like struct alignment and padding, how arguments of various lengths are passed on the stack, etc.
The most notable FFI-related change for CHICKEN 6 is that it allows passing and returning C structs and unions to and from foreign code by value. The contents are copied in and out of bytevectors and wrapped in a block. The components of the struct or union can not be directly manipulated in Scheme but can be passed on to other foreign functions. Additionally, for completeness, you can now also directly pass C99 complex numbers. Note that complex numbers in the FFI are always passed as inexact (i.e., floating-point), as opposed to Scheme complex numbers that may have exact (integer or even rational) real and imaginary components.
Platform support and build system
Here two major changes have been introduced. First, we now have a proper configuration ("configure") script. Formerly, all build parameters were passed as extra arguments to make(1), like this:
make PREFIX=$HOME/.local ...
This required that for every make(1) invocation, the same set of parameters had to be given to avoid inconsistencies. A configuration script written in portable POSIX sh(1) notation is now provided to perform the configuration step once before building. It also follows the de facto convention used in many GNU programs, where the usual incantation is:
./configure; make; make install
Note that we don't use the dreaded "autotools" (autoconf, automake and libtool), which have arcane syntax, are very brittle and produce more problems that they are trying to solve. They were originally designed to port code to now dead or obscure UNIX derivatives, yet fail to provide a straightforward and easy to use configuration tool for modern systems (Linux/BSD, Mac, Windows, mostly). Our configure script is hand written instead of auto-generated, and only checks for the platform differences that are relevant to those platforms that CHICKEN actually supports.
The old style of passing variables to make(1) should still work, but is deprecated.
The second change in the build system is that we cleaned up the confusion about toolchains on Windows systems. There is a bunch of half-maintained flavors of GNU-based C development environments for Windows systems ("MingGW", "MingGW-w64", "MSYS2", etc.) and it is a constant source of pain to support and test all these variants.
There is now one official "blessed" toolchain that we support. Specifically, we recommend Chris Wellon's excellent w64devkit. It contains compilers, build tools and enough of a POSIX environment for building CHICKEN on Windows. We also require a POSIX sh(1) (contained in w64devkit) and have dropped all support for building on a non-POSIX shell, i.e. cmd.exe. This simplifies the build and package management tools considerably. It also ensures we have less environments to test.
Building for Windows Subsystem for Linux (WSL) and Cygwin is of course still supported, but those use the UNIX build setup and need no or very little specific platform support.
Minor changes
Quite a number of minor changes have taken place that either increase the robustness or standards compliance of the system.
syntax-error is now a macro, as required by R7RS. Previously there was a syntax-error procedure in (chicken syntax) with the same argument signature. So where the error was previously raised at runtime, it is now done at expansion time. This is something to take note of when porting code to CHICKEN 6. The invocation is still the same, so the difference can at least be identified easily and corrected, and (scheme base) needs to be imported, of course.
The csc compiler driver passes arguments now properly to subprocesses via execve(2) and not system(3) which reduces shell quoting headaches.
The chicken-install package ("egg") manager locks the build cache directory to avoid conflicts when running multiple installations in parallel. Also, a custom-config feature has been added to places in the package description (.egg) file that specify compile and link options provided by external tools like pkg-config for more portable configuration of native libraries that packages use. The configuration script is expected to be Scheme code. Other eggs can extend the set of possible customisation facilities by providing library code to access them.
The feathers debugger has been removed from the core system and re-packaged as an egg, allowing separate development or replacement. It always was a bit of a proof-of-concept thing that lacks the robustness and flexibility of a real debugger. Some users found it helpful, so we keep it for the time being.
Future directions
Every major release is a chance of fixing long-standing problems with the codebase and address bad design decisions. CHICKEN is now nearly 25 years old and we had many major overhauls of the system. Sometimes these caused a lot of pain, but still we always try to improve things and hopefully make it more enjoyable and practical for our users. There are places in the code that are messy, too complex, or that require cleanup or rewrite, always sitting there waiting to be addressed. On the other hand CHICKEN has been relatively stable compared to many other language implementations and has a priceless community of users that help us improving it. Our users never stop reminding us of what could be better, where the shortcomings are, where things are hard to use or inefficient.
So the final goal is and will always be to make it more robust and easier to use. Performance improvements are always good but are secondary. Strong standards compliance is a requirement, unless it interferes with practicality. We also try to avoid dependencies in the core system at all costs. This eases porting and avoids friction that is very often introduced by inadequate tools or overdesigned development environments.
Some moody ramblings about Scheme standards
The switch to seamless support for R7RS (small) was due for quite a while. R7RS is by now the generally accepted Scheme standard that implementations try to follow. How the further development of R7RS (large) will turn out remains to be seen, but I'm not very confident that it will result in anything more that just a shapeless agglomeration of features. The current direction seems to be oriented towards creating something that includes everything but the "kitchen-sink" - too ambitious and too much driven by the urge to produce something big and comprehensive.
What always distinguished Scheme from other languages and Lisp dialects was its elegance and minimalism. This resulted in a large number of experimental implementations, the investigation of various directions of programming language semantics and in highly interesting advanced implementation techniques that are still in use today. The expressiveness and the small number of core concepts made it also perfect for teaching computing. It still is great for teaching, even if the tendency to address perceived requirements of the "market" replaces the academic use of Scheme with languages that belong more to the "mainstream". This strikes me as strange, as if learning multiple languages and studying different programming paradigms couldn't help in obtaining a broader view of software development; or as if one is somehow wasting time by exploring the world of programming in a non-mainstream language.
Repeatedly the small size and limited scope of Scheme has driven a number of enthusiasts dedicated to favouring a broader use in the industry to demand "bigness". They dream of a comprehensive standard that will support everything and please everyone and makes it easy to write portable and non-trivial applications in Scheme and have them run on large number of implementations. With this comes the tacit expectation that implementers will just follow such a large standard and implement it faithfully.
But what made Scheme successful (and beautiful) was the small size, the small number of powerful concepts, the minimalism, the joy in experimentation. Trying to create the Comprehensive Mother of all Schemes is in my opinion a waste of time. In fact, it makes Scheme less relevant. Large languages inevitably die - the warts and inconsistencies that crop up when you try to design too much up front will just get more blatant as the language ages and will annoy users, constrain implementers and make people search for alternatives.
You can already write serious programs and use a large number of libraries in Scheme: just install CHICKEN, Guile or (even) Racket and get going. The code will to a certain part not be portable across implementations, that's unavoidable, but to use dynamic languages effectively and successfully, some experience with and a certain understanding of the design of the underlying compiler or interpreter is necessary anyway.
Acknowledgements
I would like to thank my employer, bevuta IT GmbH which sponsored part of the effort of preparing a new major release of CHICKEN.
Adding weak references to CHICKEN Posted on 2023-06-27
Recently, a user was asking on IRC why CHICKEN doesn't give proper line numbers for error call traces in the interpreter. This is indeed rather odd, because the compiler gives numbered traces just fine.
After a brief discussion, we figured that this was probably because line numbers are stored in a "database" (hash table) which maps the source expression to a line number. Because you can keep calling (eval) with fresh input, the number of expressions evaluated is potentially infinite.
This would lead to unbounded growth of the line number database, eventually eating up all available memory.
We'd like to fix this. A great solution for this problem would be weak references. A weak reference is a reference to a value that does not cause the garbage collector to hold on to that value. If other things still refer to the value, it is preserved and the weak reference is maintained. If only weak references refer to a value, it may be collected.
The line number database could then consist of such weak references to source expressions. If an expression is no longer held onto, it can be collected and the line number eventually removed. This would turn the database from a regular hash table into a weak hash table.
This requires hooking into the garbage collector so that it knows about these references and can drop them. Since our collector is a copying collector, addresses of objects change over time, and a weak reference needs to be updated if something is still hanging onto it, without itself causing the weakly held object to stick around.
A quick recap of garbage collection
To explain how we changed our garbage collector, I will start with a quick high-level recap of how the garbage collection works. I'll explain just enough to cover what is needed to understand weak references. For a more in-depth look at CHICKEN's garbage collection, see my post on the GC.
First, we have to realise that CHICKEN stores all its values in a machine word. It distinguishes between "immediate" values and non-immediate or "block" values. Immediate values can be either fixnums (bit 0 is set) or other "small" values (bit 1 is set). Non-immediates are recognised because the lower two bits are zero, which is very convenient, as a pointer to a word-aligned address happens to have the lower two bits cleared as well!
In other words, non-immediate values are simply C pointers, whereas the immediate values are encoded long integers. So, non-immediates are allocated in memory and represented by a pointer to them. At the pointer's address we find a header which encodes what exactly is stored there, and (optionally) the data, where we store the contents of compound objects.
typedef struct { C_word header; C_word data[]; /* Variable-length array: header determines length */ } C_SCHEME_BLOCK;
In typical Scheme objects, the data consists of slots. For example, the car and the cdr of a pair are slots, the elements of a vector are slots, etc. Each slot contains a Scheme value. In a graphical representation, a block object looks like this:
The data representation will be important to understand how we implement weak pairs, and to follow the garbage collection process.
If you want more details, see my post about data representation in CHICKEN.
A tale of two spaces
CHICKEN divides the heap up in two halves. While the program is running, only one half is in use. When this half fills up, we start the garbage collection process. We trace all the data that's still considered "live" (aka GC "roots") and copy it over to the other half.
Let's look at an example:
(cons 5 '()) ; unreferenced, may be collected (define x (cons (vector 42) '())) ; sticks around
The half that was full is called the fromspace and the half that starts out empty is called the tospace. Initially, the tospace is empty, and the data is only in the fromspace:
Then, we move the objects, one at a time, starting at the roots. So in the example, the pair that we named "x" gets moved first to tospace, while its slots still point into fromspace:
The changes are in yellow. After tracing the roots, we trace all the contents of the objects in fromspace and copy over the pointed-to objects:
Finally, we're done. Any remaining objects in fromspace are garbage and can be ignored, effectively "clearing" fromspace, and we flip the two spaces so that fromspace becomes tospace and vice versa:
Onward! I mean, forward!
This was a very simple case. A more complex case is when there are several objects with mutual references. We must somehow keep track of which objects got moved where, so that we don't copy the same object more than once.
Another example:
(define shr (vector 42)) (define x (cons shr 1) (define y (cons shr 2))
This would look like the following when visualized:
As I've explained, roots are copied first, so let's say shr gets copied first:
As you can see, this introduces something new. There's a forwarding pointer stored at the address where we originally found the header of the pair known as shr. This is done so that we don't have to traverse all the objects pointing to shr (which could even be cyclic!). Instead, whenever we find an object that holds an address where there now is a forwarding pointer, we derefence the pointer and change the object to point to the target address.
So, if x is copied next, we get the following picture:
We actually *always* leave behind a forwarding pointer when an object is copied, because the GC does not "know" whether anything references the object. I skipped that in the initial examples to keep it simple. And finally, y can be copied:
Now you can see, the forwarding pointers are still there in fromspace, but nothing references them anymore. The GC is now done. Of course, fromspace and tospace will be flipped (but I'm not showing that here).
How weak pairs work in the GC
Let's make a few changes to our example:
(define zero (weak-cons (vector 0) 0)) ;; new - car may be collected (weak-cons (vector 9) 9) ;; new - may be collected (define shr (vector 42)) (define x (cons shr 1) (define y (weak-cons shr 2)) ;; changed to be weak (set! shr #f) ;; new - dropping root for shr (define z (weak-cons (vector #f) 3)) ;; new - car may be collected
Now, y is changed to be a weak pair holding shr in its car, while x is still a normal pair with the same object in its car. The shr variable is also cleared, so that only x holds onto shr's old value strongly. We've added a few more weak pairs while we're at it.
This weak-cons procedure does not exist in CHICKEN 5.3.0; this is what we'll be adding. It creates a pair whose car field is a weak reference. The cdr field is a regular reference. The reason for having the cdr be a regular reference is that this allows us to build up lists of items which may be collected. The "spine" of the list will remain intact, but the list entries will be cleared (see below).
Let's take a look at our new initial situation:
The GC will start with the live objects, as usual. Let's start at the top, copying zero:
Since zero is a weak pair, we won't traverse the car slot, keeping the vector it points to in fromspace. This is intentional, as the GC treats weak pairs differently; only the cdr slot gets scanned, while the car gets skipped and left as a "dangling" pointer into fromspace.
We'll fix that later, as we don't want to keep this situation after the garbage collection process is done.
Normally we'd pick up shr next, but that is not a root anymore. Let's continue with x instead:
Then, as we scan over the contents of x, we encounter what used to be shr and move it, updating the pointer in x:
Next, we move y:
You'll notice that because it's a weak pair, its car slot does not get updated to the new location of what used to be shr, even though that vector has already been copied. As mentioned before, we'll fix this up later.
Finally, z gets moved:
I'm sure you've noticed that now we're left with a big spaghetti bowl of pointers, some of which are still pointing into fromspace. So let's investigate how we can fix this mess.
Fixing up the dangling weak pairs, naively
It just so happens that CHICKEN internally already has support for weak pairs. Those are pairs whose car field holds objects that may be collected. These are used solely in the symbol table, so that symbols aren't maintained indefinitely. This is important because it stops symbol table stuffing attacks. Those are a form of Denial of Service (DoS) by forcing the system to eat up all memory.
In this current implementation, the only weak pairs in existence are those in the symbol table's hash bucket chains. So, the solution there is very simple: traverse all the symbol table buckets and update weak pointers.
So we traverse the slots like during normal GC's mark operation, but with two exceptions:
- When we mark the slot, we don't cause the contents to be copied over into tospace, but only chase any forwarding pointers.
- When we encounter a pointer into fromspace, we replace the slot with a special "broken weak pair" marker.
That would yield this final picture:
As you can see, all the weak pairs have their car slots updated, even the ones that we could consider as garbage, like the two at the top. Those get the special value #!bwp, or "broken weak pointer" to indicate that the value originally stored there is no longer live.
Smarter fixing up of weak pairs
Traversing all the weak pairs is fine when we know every single one in the system. When only hash table buckets in the symbol table may be weak, that's the case. But now we want to expose weak pairs to the user, what do we do?
The simplest solution would be to add a table or list of all the weak pairs as they're constructed, or as the GC encounters them. This has a few problems:
- It is wasteful of memory, as the extra pointers in this table would be redundant.
- If done during garbage collection, we'll have to dynamically allocate during a GC, when memory is already tight (and it's slow).
- We'd end up traversing potentially loads and loads of dead weak pairs that would be themselves collected, as in the example above.
Alternatively, you could traverse the entire heap again after GC and mark the weak pairs. In fact, that's what Chez Scheme does. However, Chez has separate heaps for different types of objects (known as a BIBOP scheme). This has the advantage of traversing only the live weak pairs, instead of every single live object.
In CHICKEN, we'd be traversing every single object, as we store everything on a single heap. If the heap is large, this would be wasteful. But at least we wouldn't have to traverse dead weak pairs!
When looking into this problem, I decided to study the Schemes which have weak pointers. Those were Chez Scheme, Chibi Scheme and MIT Scheme. Of those, only MIT Scheme has a Cheney-style copying garbage collector with a single heap like CHICKEN does. The sources to MIT Scheme are well-documented, so I quickly found how they do it, and boy is it clever!
MIT Scheme's solution:
- Traverses only the still-live weak pairs,
- introduces only a single word of memory overhead,
- and is brilliantly simple once you get how it works.
Garbage collecting weak pairs, MIT style
Let's take a closer look at that end state of that last GC run. Remember that a forwarding pointer overwrites the original header of the object that used to live there. But the memory of the slots is still sitting there, allocated but unused!
The great idea in MIT Scheme is that we can "recycle" that unused memory and store a linked list of still-live weak pairs in that space. The extra word of overhead I mentioned before acts as the head of that list. Let's look at how that looks during the GC, by replaying that last run, but with MIT Scheme's algorithm for tracking weak pairs:
The initial state is the same, except we have the additional weak pair chain head. The GC again starts at the top, copying zero, same as before, but with one addition. When the forwarding pointer gets created, we notice that it's a weak pair. This causes the weak pair chain head to get modified so it points to the new forwarding pointer. The original value of the chain head (the empty list marker) gets stored where the old weak pair's car slot used to be.
Again, we continue with x, which is a regular pair, so nothing interesting happens to the space behind the forwarding pointer:
Then, as we scan over the contents of x, we again encounter what used to be shr and move it, updating the pointer in x. The forwarding pointer for shr is also not put in the chain because shr is not a weak pair:
Next, we move y. Because it is a weak pair, we again link up its forwarding pointer into the chain that we're building up. This means the pointer in the weak chain head (which pointed to zero's original address) gets put in the old car field of y, and the head itself now points to y's forwarding pointer:
Finally, z gets moved. Since it is also a weak pair, when making the forwarding pointer we also link it up into the chain:
Finally, we now only need to traverse the weak pairs by following the trail of pointers starting at the head of the live weak pair chain. For each of the forwarding pointers in the chain:
- Follow the forwarding pointer to its corresponding live weak pair,
- take that pair's car,
- check if the car refers to a forwarding pointer. If it does:
- update the car to the forwarding pointer's destination.
I'm not going to do this one by one as it's pretty obvious. I will show you the result of this walk:
You can see that we've only updated three out of four weak pairs - exactly the number of weak pairs that got copied during this GC and are still considered live. Because only weak pairs that exist in tospace have to be updated, we are doing the absolute minimum work necessary.
The nice thing about this is, that even if you produce lots of garbage weak pairs, they don't have any additional impact on GC performance. Only those weak pairs that survive garbage collection will be scanned and updated.
When implementing this improvement I noticed that our performance on benchmarks improved a little bit, even though they don't use weak pairs directly. This was all due to only scanning copied weak pairs and not having to scan the entire symbol table on every GC.
With the support for user-facing weak pairs in place, supporting line numbers in the interpreter was relatively easy. This means that debugging your code will be easier starting with the upcoming 5.4 release of CHICKEN!
Further reading
- At the Village CHICKENs event, I presented my findings about how to add weak references. Slides are here.
- Back in 2014, Ruby added symbol GC to stop symbol table stuffing attacks. It also improved performance because the GC had to do less work copying stale symbols. The flurry of interest in symbol table stuffing attacks in Ruby was also my inspiration for fixing and improving the symbol GC in CHICKEN.
- The manual to MIT Scheme has a section on weak pairs, and the manual to Chez Scheme has its own, slightly different API.
- MultiScheme: A Parallel Processing System Based on MIT Scheme by James S. Miller. This is the paper that introduces the weak-cons operation and explains in more detail how the MIT Scheme GC was modified, in the same way we modified CHICKEN's GC.
- Don't stop the BIBOP: Flexible and Efficient Storage Management for Dynamically Typed Languages by R. Kent Dybvig, David Eby, and Carl Bruggeman. This paper explains how Chez Scheme's allocation scheme works. We've toyed with the idea of implementing a BIBOP allocator and GC in CHICKEN as well, but it'd be a large undertaking without knowing for sure that it would improve performance.
- Data representations in PDP-10 MacLISP by Guy L. Steele describes the original BIBOP allocation scheme.