Introduction to D templates

From D Wiki
Jump to: navigation, search

Up until a few months ago, I had no experience with templates whatsoever.

At first, I started using function templates, because they were the easiest to grasp. Or so I thought. Basically I just used function templates as if they are a special kind of function that takes any types of arguments. Sort of like having a regular Python function in your D code. I didn't quite grasp the point of templates back then.

What really opened my eyes was the realization that in D there's a distinct notion between manipulating types at compile-time and manipulating types at run-time.

Once I've realized that, I knew why template arguments have to be known at compile time, because templates are a compile-time feature. And templates aren't really a "type" at runtime, they're only visible at compile-time. They're just "placeholders" with their own scope, and they hold declarations in that scope. And to access those declarations (or fields would be a proper word), you generally use the dot operator. There's also the eponymous trick where you can use the template instantiation itself to refer to one of its fields (this is useful when you're only interested in getting back one result, as it saves you the burden of having to explicitly specify the field you want to use when you're instantiating the template).

Then, I've realized that templates can be used almost anywhere. A template can have fields which are templated structs and classes, which in turn can have templated functions, which in turn have templated declarations in them, and the functions can have templated return values.

So, let's talk about unaryFun now. unaryFun is a template which has the ability to construct a function by simply passing a function string literal as an argument. It's named *unary* because it constructs a function that can take only a single argument. You pass the string in the syntax form where it is as if you're only writing the return expression. unaryFun has the ability to convert some special predefined characters to parameters of a function. For example, it recognizes that 'a' should be replaced with the first argument to the newly constructed function.

Here's a simple instantiation of unaryFun:

unaryFun!("(a & 1) == 0")

unaryFun takes the string argument and returns a templated function. When that templated function is invoked with an int argument, it might be represented like so:

    int unaryFun(int a)
    {
       return (a & 1) == 0;
    }

Let's walk through unaryFun one line at a time (the line numbers are relevant to the DMD v2.051 release). First, our example code:

    alias unaryFun!("(a & 1) == 0") isEven;
    assert(isEven(2) && !isEven(1));

Now open the std.functional file and follow along.

Line 40: unaryFun instantiates unaryFunImpl. You'll notice it passes the very same arguments to unaryFunImpl as the ones unaryFun was called with. We've passed a string, the two other arguments are set to 'false' and 'a' by default unless we specify them. Once the unaryFunImpl template is instantiated, we can select any of its fields with the dot operator. In this case we're interested in the result field.

So.. what does the alias do? It's a cool thing called the eponymous trick. With the eponymous trick you can use the name of the template itself as the result of the instantiation. In other words, if you alias the name of a template to some declaration, when instantiating that template you won't have to use the dot prefix to get a single field.

A quick example without the eponymous trick:

    template foo(int value)
    {
        int result = value;
    }
    void main()
    {
        writeln( foo!(4).result );
    }

And here's one with the eponymous trick*:

    template foo(int value)
    {
        alias fooImpl!(value).result foo;  // we've aliased the foo template name to the result field
    }

    template fooImpl(int value)
    {
        int result = value;
    }

    void main()
    {
        writeln( foo!(4) );  // no need to explicitly use .result here
    }
  • Note: You can't use the eponymous trick with multiple declarations in a single template. This is why many template functions in Phobos are split into template and templateImpl counterparts.

This is very handy when you're only interested in getting just one field of a template without refering to it explicitly. Now let's see how unaryFunImpl works.

Line 45: This is a *compile-time* check. It tries to determine if the fun argument is a string. There's an else clause way down on line 98 which, in case the fun argument isn't a string, simply aliases the fun argument to the name 'result'. So, why do we need to check if the 'fun' argument is a string? Well, because 'fun' in the template signature of unaryFunImpl is defined as an 'alias' type. An alias is a parameter type that can be a string, a value, a type, a function, or any other valid D symbol. The reason alias is used in unaryFunImpl is because we can call unaryFunImpl in several different ways. In our initial code example we've used:

    void main() 
    {
        alias unaryFun!("(a & 1) == 0") isEven;
    }

In this case the 'static if' check on line 45 will pass since the argument is a string. But we might have called unaryFun like so:

    void foo(int x)
    {
    }
    void main()
    {
        alias unaryFun!(foo) isEven;
    }

In this case we're passing a function parameter to unaryFun, at compile time. The 'static if' check will evaluate to false since the fun parameter is an actual function type, not a string. This brings us to the 'else' clause on line 98. Since the whole purpose of unaryFun is to construct a function out of a function string literal, it doesn't really make sense to do any work if 'fun' isn't a string. And if 'fun' isn't a string, it's likely a function, so we alias it to 'result' and we're done. 'fun' could also be a delegate (a function pointer), or a struct with a defined opCall method (these kinds of structs can be called just as if they were functions).

Let's continue. We've passed the string literal ("(a & 1) == 0"), so our static if check on line 45 evaluates to true. Now comes the fun part, a template defined within a template!

Line 47: 'Body' is a template that is only used within unaryFunImpl, which explains why it's a nested template. It takes one compile-time argument, in this case the argument will be a 'type' argument (such as an int, a double or another type).

Let's just assume for a moment that we've instantiated the Body template with an int argument ( e.g. Body!(int) ).

On line 51 and 52 there are two enum declarations:

    enum testAsExpression = "{ ElementType "~parmName ~"; return ("~fun~");}()";
    enum testAsStmts = "{"~ElementType.stringof ~" "~parmName~"; "~fun~"}()";

Note how we're able to use the arguments from the enclosing template. Here I'm refering to the parameters 'parmName' and 'fun', which were originally passed to unaryFunImpl.

Recall of our initial code example:

    unaryFun!("(a & 1) == 0")

We can easily expand these two enums to something more meaningful once we know what what 'fun' evaluates to:

    enum testAsExpression = "{ ElementType a; return ((a & 1) == 0);}()";
    enum testAsStmts = "{ int a; (a & 1) == 0; }()";

These might be a bit hard to read. The two enums are two versions of a function (in string literal form) that will be tested by the compiler whether they can be compiled.

Notice that we didn't specify the function name in the function literal, nor the function arguments for that matter. This kind of syntax is used to build a 'lambda', an anonymous function which doesn't have a name. Also notice at the end of the string there's an open and closed parentheses '()'. Keep that in mind for now.

Moving to line 57: Another static if check. __traits is a compiler feature, similar to a pragma, where you can do some compiler intrinsics to get some information from the actual compiler. __traits takes two arguments, the first is one of the predefined keywords that specifies your intrinsic request (e.g. "give me all the members of a struct declaration", or "hey compiler, does the following code expression compile?", see http://www.digitalmars.com/d/2.0/traits.html for more on these keywords). The second argument depends on the keyword you've set in the first argument. Here we're using the 'compiles' keyword, which asks the compiler if it can succesfully compile an expression that's located in the second argument.

This is where our enums come to play. What we now must do is convert the enums from their string representation into actual D code that is compiled in. This is where 'mixin' comes into play. mixin takes any string literal and compiles it as if it were any regular D code.

For example:

    void main()
    {
        mixin("int x;");
        writeln( x );  // x is defined
        
        mixin(" struct Foo { } ");
        Foo foo;  // foo is a Foo type
    }

With the power of __traits(compiles, expression) and mixin('string'), we're able to convert strings to D code, and check if that code is actually valid and compiles. Here's a few examples on the usage of both of these:

    static if (__traits(compiles, mixin("int x;")))
    {
    }
    
    static if (__traits(compiles, mixin("class Foo { void bar() { } }")))
    {
    }    

    static if (__traits(compiles, mixin(" { int a; a++; return a; } ")))
    {
    }

Notice in the last example is an anonymous function (lambda) literal. So far we were on line 56, the static if is doing a check if the enum 'testAsExpression' is compilable as D code. If it's not compilable (e.g. if we passed an invalid string representation of a function as our 'fun' argument), then we reach the else clause on line 67. In that case we'll hit a static assert that evaluates to false, which means the compilation process stops and you get a compile-time error. Here's an example of that:

    import std.functional;
    void main()
    {
        string brokenFunctionRepresentation = "(a & 1) == b";
        alias unaryFun!(brokenFunctionRepresentation) isEven;
        isEven(4);
    }
    > D:\DMD\dmd2\windows\bin\..\..\src\phobos\std\functional.d(74): Error: static assert  "Bad unary function: " ~ brokenFunctionRepresentation ~ " for type " ~ "int"c
    > D:\DMD\dmd2\windows\bin\..\..\src\phobos\std\functional.d(87):        instantiated from here: Body!(int)
    > testing45.d(10):        instantiated from here: result!(int)

In our initial code example we've passed a valid function literal, so the static if check will evaluate to true.

Line 59: The enum on this line expands to:

    enum string code = "return ( (a & 1) == b );";

The next line can be difficult to grasp, so let me walk you through it. Again, we're using mixin to compile code that is in string form. As a reminder, the value of 'testAsExpression' is:

    enum testAsExpression = "{ ElementType a; return ((a & 1) == 0);}()";

Ok, so how do we compile this in? Well, remember that ElementType is a type argument passed to the Body template (the one we're exploring right now). Also, remember that for the moment we're pretending that the argument passed was an int. This means the mixed in expressions is now:

    { int a; return ((a & 1) == 0);}();

Remember the two enums 'testAsExpression' and 'testAsStmts' and how they have the open and closed parenthesis at the end? These are used to call the function that was just defined. Since the enums define 'lambda' function literals, we have no other way of calling them other than calling them immediately by appending a set of open and closed parenthesis after their definition. Alternatively we could use an alias to give our lambdas a symbol name, but this isn't necessary in this case.

Here are some lambda examples, one is hardcoded to be immediately called, while the second one is not:

    import std.functional;
    import std.stdio;
    void main()
    {
        enum lamdaWithCall = " { return 10 * 20; }()";
        writeln( mixin(lamdaWithCall) );  // writes 200
        
        enum lambdaWithoutCall = " { return 10 * 20; }";
        writeln( mixin(lambdaWithoutCall)() );  // notice the parenthesis outside the mixin
    }

Back to the code at line 60. We're mixing in the enum that has a lambda literal embedded with a call. This will be compiled in, which means the function will be called and it will return a value. The 'typeof' function returns the type of its argument. Since our lambda function will most likely return an int, typeof will return 'int' (In other words, when our lambda function returns '1', typeof returns the type of this return value -> int). We're still on line 60. On the left side we're using an alias. We're aliasing the int type to the 'ReturnType' symbol name.

That's the full explanation of the Body template. It's purpose is to parse a string defined in UnaryFunImpl's first argument ('fun'), and try to construct and evaluate a function. In turn it defines two important declarations that our unaryFunImpl template will need:

   1. The actual code that will be used as the function definition ('Body.code').
   2. The return type of this function ('Body.ReturnType').

Now let's head on back to our UnaryFunImpl template. So far we've pretended that Body was called with an int argument, but let's look at how UnaryFunImpl uses the Body function.

Line 78: This static if checks if our function takes a reference as an argument. We do this by setting the bool argument byRef in the call to unaryFun to true. Here's an example demonstrating the usage of byRef:

    import std.functional;
    import std.stdio;
    void main()
    {
        alias unaryFun!("a = 2") byValue;
        alias unaryFun!("a = 2", true) byReference;  // set byRef to true
        
        int x = 5;

        byValue(x);
        assert(x == 5);
        
        byReference(x);
        assert(x == 2);
    }

After the static if check on line 78 follows a complicated function definition. The full definition of the function is:

    Body!(ElementType).ReturnType result(ElementType)(ref ElementType a)

Let's chop this down into smaller pieces:

'result' is a templated function. It takes a compile-time argument ElementType, and a runtime argument 'a' which is an ElementType reference.

The return type is specified as:

    Body!(ElementType).ReturnType

'ReturnType' is one of the two fields that the 'Body' template defines for us after we instantiate it. This shows another feature of templates, they can be used as a function return type. The compiler instantiates the Body template with the ElementType argument, and then takes the field 'ReturnType' and uses that as the return type of our 'return' templated function. Remember that 'Body.ReturnType' will be the type of the return value of the function literal passed in unaryFun.

But, where exactly is ElementType defined? We have to take a step back and look at the 'result' function definition once more:

    Body!(ElementType).ReturnType result(ElementType)(ref ElementType a)

ElementType is a compile-time argument to result. So now you're probably wondering "Wait a minute! We never passed any arguments to result!". That's right. We're not actually calling result yet. We're just defining a templated function. Remember that a template is nothing more then a set of declarations. The 'result' templated function is a declaration inside the UnaryFunImpl template.

Now, onto line 82: Here we instantiate Body with the ElementType argument, we take its 'code' field (which is the compilable representation of our function literal) and we mix it in. We've now built the entire definition of the 'result' templated function by using other templates. The work of unaryFunImpl is now complete.

Now let's jump back to line 40, the unaryFun template:

    alias unaryFunImpl!(funbody, byRef, parmName).result unaryFun;

Here, unaryFun instantiates unaryFunImpl with a string argument ('funbody', although unaryFunImpl renames this as 'fun') that is going to be constructed as a lambda in the Body template and checked for validity. unaryFun also passes two other arguments which are normally set to their default values. After the template is instantiated, we're taking its 'result' field. 'result' is a full-fledged templated function which we have just been talking about. We alias this new function to 'unaryFun', and this is our eponymous template trick where we alias some symbol to the template name.

Now, up the stack, we've made a complete trip so let's see our initial code again:

    alias unaryFun!("(a & 1) == 0") isEven;
    assert(isEven(2) && !isEven(1));

We instantiated unaryFun with a string which calls unaryFunImpl and passes all of its three arguments, unaryFunImpl in turn calls Body to parse the string and construct the code expression and the return type of the function literal. unaryFunImpl then proceeds to construct the 'result' templated function.

Once unaryFunImpl constructs the 'result' function, unaryFun takes this function and aliases it to itself. In our initial code example, we've aliased the unaryFun instantiation to isEven:

    alias unaryFun!("(a & 1) == 0") isEven;

isEven is now equal to the 'result' templated function. You can prove it, too! Here's another instantiation of unaryFun, and two separate instantiations of the resulting templated function ('result' from unaryFunImpl):

    import std.functional;
    import std.stdio;
    void main()
    {
        alias unaryFun!(" 2 / a") twoDividedByArg;
        
        assert(twoDividedByArg!(int)(1) == 2);
        assert(twoDividedByArg!(double)(0.5) == 4);
        
        assert(twoDividedByArg(1) == 2);
        assert(twoDividedByArg(0.5) == 4);    
    }

The reason the last two calls work is because D has a shorthand for calling templated functions. If the runtime argument can be determined at compile-time (in this case we're using integral literals, so they can definitely be determined at compile time), you don't have to use the bang-parenthesis !() syntax to call a templated function.

We still have two more things to explain. The first is how the 'result' function is constructed when called with a function that doesn't take ref parameters.

On to line 85 in functional.d:

If we didn't set byRef to true, then the contents of this else clause will be executed. This is almost the same definition of 'result' as the one when byRef is set to true. Except this time 'result' won't take a reference as a runtime argument.

On the next line there's this strange mixin:

    mixin("alias __a "~parmName~";");

The reason it's strange is that I haven't explained what the third parameter to unaryFun is, the 'parmName' parameter. 'parmName' is used in case you want to use a different character or string of characters to designate as the name that your newly built function will use as its parameter.

Here's an example of using a different name for the parameter:

    import std.functional;
    import std.stdio;
    void main()
    {
        alias unaryFun!(" 2 / a", false) twoDividedByArg;
        alias unaryFun!(" 2 / bcd", false, "bcd") twoDividedByArgB;
        
        assert(twoDividedByArg!(int)(1) == 2);
        assert(twoDividedByArgB!(double)(0.5) == 4);
    }

These two mixins will make a lot more sense now:

    mixin("alias __a "~parmName~";");
    mixin(Body!(ElementType).code);

Here's why: Body.code' is the code of our function literal. If our function uses the symbol name 'bcd' instead of the name that was explicitly defined in 'result' ('__a'), compilation will fail.

The 'result' function prototype on line 87 is:

    Body!(ElementType).ReturnType result(ElementType)(ElementType __a)

So, how do we make our code use any character of strings as the parameter name, even though the parameter is explicitly named '__a'? The solution is to use an alias:

    mixin("alias __a "~parmName~";"); //expands to:
    mixin("alias __a bcd;");
   and..
   
    mixin(Body!(ElementType).code); //expands to:
    return (2 / bcd);

The 'result' function definition is now complete:

    Body!(ElementType).ReturnType result(ElementType)(ElementType __a)
    {
        alias __a bcd;
        return (2 / bcd);
    }

And that's the whole story on how unaryFun works!

Hopefully, this wasn't too difficult to follow. unaryFun might actually be a rather complicated example to explain. But the good news is, once you've understood how a template like unaryFun works, you'll figure out the rest in no-time. The reason why Phobos has difficult to read templates is because writing templates in D is very easy. Hence, your templates end up being feature-loaded and end up costing hundreds of lines.

But the benefit of D templates over C++ templates is that you can read the template syntax with little to no missunderstanding. The complications arise when the templates have a lot of nested templates, or when they make calls to a lot of other templates. However, template calls can be easy to trace. Dissasembling the heavy C++ template syntax might not be so easy.