Compile-time vs. compile-time

From D Wiki
Revision as of 22:14, 21 March 2017 by Quickfur (talk | contribs) (expand)
Jump to: navigation, search
By H. S. Teoh, March 2017

Introduction

One of D's oft-touted features is its awesome compile-time capabilities, which open up wonderful meta-programming opportunities, code-generation techniques, compile-time introspection, DSLs that are transformed into code at compile-time and therefore incur zero runtime overhead, and plenty more. Acronyms like CTFE have become common parlance amongst D circles.

However, said "compile-time" capabilities are also often the source of much confusion and misunderstanding, especially on the part of newcomers to D, often taking the form of questions posted to the discussion forum by frustrated users such as: "Why doesn't the compiler let me do this?!", "Why doesn't this do what I think it should do?", "Why can't the compiler figure this simple thing out?! The compiler is so stupid!", and so on.

This article hopes to clear up most of these misunderstandings by explaining just what exactly D's "compile-time" capabilities are, give a brief overview of how it works, and thereby hopefully give newcomers to D a better handle on what exactly is possible, and what to do when you run into a snag.

There's compile-time, and then there's compile-time

Part of the confusion is no thanks to the overloaded term "compile-time". It sounds straightforward enough -- "compile-time" is simply the time when the compiler does whatever it does when it performs its black magic of transforming human-written D code into machine-readable executables. Therefore, if feature X is a "compile-time" feature, and feature Y is another "compile-time" feature, then X and Y ought to be usable in any combination, right? Since, after all, it all happens at "compile-time", so surely the compiler, with its access to black magic, should be able to just sort it all out, no problem.

The reality, of course, is a bit more involved than this. There are, roughly speaking, actually at least two distinct categories of D features that are commonly labelled "compile-time":

  • Template expansion, or abstract syntax tree (AST) manipulation; and
  • Compile-time function evaluation (CTFE).

While these two take place at "compile-time", they represent distinct phases in the process of compilation, and understanding this distinction is the key to understanding how D's "compile-time" features work.

(The D compiler, of course, has more distinct phases of compilation than these two, but for our purposes, we don't have to worry about the other phases.)

Template expansion / AST manipulation

One of the first things the compiler does when it compiles your code, is to transform the text of the code into what is commonly known as the Abstract Syntax Tree (AST).

For example, this program:

import std.stdio;
void main(string[] args)
{
    writeln("Hello, world!");
}

is parsed into something resembling this:

AST.svg

(Note: this is not the actual AST created by the compiler; it is only a simplified example. The actual AST created by the compiler would be more detailed and have more information stored in each node.)

The AST represents the structure of the program as seen by the compiler, and contains everything the compiler needs to eventually transform the program into executable machine code.

One key point to note here is that in this AST, there are no such things as variables, memory, or input and output. At this stage of compilation, the compiler has only gone as far as building a model of the program structure. In this structure, we have identifiers like args and writeln, but the compiler has not yet attached semantic meanings to them yet. That will be done in a later stage of compilation.

One of D's powerful features is templates, which are similar to C++ templates. Templates can be thought of as code stencils, or stencils of a subtree of the AST, that can be used to generate AST subtrees. For example, consider the following template struct:

struct Box(T)
{
    T data;
}

In D, this is shorthand for:

template Box(T)
{
    struct Box
    {
        T data;
    }
}

Its corresponding AST tree looks something like this:

Template1.svg

When you instantiate the template with a declaration like:

Box!int intBox;

for example, what the compiler does is to make a copy of the AST subtree under the TemplateBody node and substitute int for every occurrence of T in it. So it is as if the compiler inserted this generated AST subtree into the program's AST at this point:

Template-example1.svg

Which corresponds to this code fragment:

struct Box!int
{
    int data;
}

(Note that you cannot actually write this in your source code; the name Box!int cannot be defined in user code, and can only be generated by the template expansion process.)