DIP50

From D Wiki
Revision as of 11:17, 18 January 2014 by Doob (talk | contribs)
Jump to: navigation, search
Title: AST Macros
DIP: 50
Version: 1
Status: Draft
Created: 2013-11-10
Last Modified: 2013-11-10
Author: Jacob Carlborg

Abstract

The basic concept of AST (Abstract Syntax Tree) macros, or just syntax macros, is fairly simple. A macro is just like any other function or method except that it will only run at compile time. When a macro is called, instead of evaluating its argument and then calling the function, an AST is created for each argument passed to the function. The macro will then return a new AST which is injected and type checked at the call site. This means that the call to the macro will be replaced with the AST returned by the macro.

Example

macro myAssert (Context context, Ast!(bool) val, Ast!(string) str = null)
{
    auto message = str ? "Assertion failure: " ~ str.eval : val.toString();
    auto msgExpr = literal(constant(message));

    return <[
        if (!$val)
            throw new AssertError($msgExpr);
    ]>;
}

void main ()
{
    myAssert(1 + 2 == 4);
}

Compiling and running the above program would result in the following assert error:

core.exception.AssertError@main(13): Assertion failure: 1 + 2 == 4

The interesting part here is that the assert message contains the actual expression that failed.

Rationale

AST macros can be used extend the language with new semantics without changing the actual language. Instead of adding new features to the language AST macros can be a general solution to implement language changes in library code. Many existing language features could have been implemented with AST macros, like scope, foreach and similar language constructs.

Formal Definition

Declaring a Macro

A macro is always declared with the macro keyword followed by its name and a parameter list. The first parameter of macro is always of the type Context, therefore the parameter list cannot be empty. The rest of the parameters are always of the type Ast. A macro always need to return either void or a value of the type Ast.

macro foo (Context context, Ast!(string) str)
{
    return str;
}

Calling a Macro

A macro is called just like any other function. The first parameter, which is of the type Context, is passed implicitly by the compiler. The rest of the arguments are passed like in a regular function call. Although you won't pass arguments of the type Ast, regular values are passed instead and the compiler creates an AST of the arguments and pass them as Ast arguments to the macro.

The Context Parameter

The first parameter of a macro declaration is always of the type Context. This parameter is mostly passed implicitly by the compiler. It's also possible to pass the context parameter manually. This is useful when having helper functions for a macro and need to retain the context given to the original macro.

The context parameter contains information about the surrounding context where the macro was called. This can be information like the surrounding method and class from which the macro was called.

Bonus

This context parameter also contain information about the complete compilation environment, like:

  • The arguments used when the compiler was invoked
  • Functions for emitting messages of various verbosity level, like error,

warning and info

  • Functions for querying various types of settings/options, like which versions

are defined, is "debug" or "release" defined and so on

  • In general providing as much as possible of what the compiler knows about the

compile run

  • The context should have an associative array with references to all scoped variables at initiation point.

This has the benefit of enabling a macro to check variables that are "passed" to it as well as modify it. The this keyword value should be available to the macro if it is from either a class or struct.

Semantics

Since macros can only be called at compile time the compiler will strip out all macros before the code generating phase.

Quasi-Quoting

Quasi-quoting is basically a form of syntax sugar for creating syntax tree. It could be considered as AST literals. In all examples in this text the following syntax is used for quasi-quoting:

<[ writeln("asd"); ]>

Splicing

Splicing is a syntax used for dynamically inserting a piece of an AST in quasi-quotes. In all examples in this text a dollar sign, $, is used for splicing.

<[ writeln($expr); ]>

The syntax for quasi-quoting and splicing is just an abstraction of what the syntax would actually look like. Regardless of what syntax is used there's always the option to manually create syntax trees using an API.

The AST Macro

The ast macro is an option to implement quasi-quoting in a library macro. The ast macro takes an arbitrary expression and transform it to an AST. It also supports splicing. The ast macro has a couple of overloads and their declarations look as follows:

macro ast (T) (Context context, Ast!(T) expr)
{
    // ...
}

macro ast (Context context, Ast!(void delegate ()) block)
{
    // ...
}

The first overload takes an arbitrary expression and converts it into an AST. The second overload takes a delegate, this is to be able to convert a whole block of code to an AST.

Bonus

Calling a Macro

Macros are extend to be callable from anywhere it's possible to use a mixin.

Statement Macros

A statement macro is a macro that takes a Statement as its last parameter. The difference compared to regular macros is the calling syntax. Statement macros are called with the same syntax used for statements, like the example below:

macro foo (Context context, Statement block)
{
    return block;
}

macro bar (Context context, Ast!(int) arg, Statement block)
{
    return block;
}

void main ()
{
    foo
    {
        writeln("foo");
        writeln("foo again");
    }

    foo
        writeln("foo2");

    bar(3) {
        writeln("bar");
        writeln("bar again");
    }

    bar(3)
        writeln("bar2");
}


Just like many of the built-in statements the braces are optional when there's only a single expression in the statement. Since the statement is always the last parameter in the macro declaration and it's always passed outside the regular argument list it's legal to have parameters with default arguments or a variadic parameter list before the statement parameter.

macro foo (Context context, Ast!(string)[] arg ..., Statement block)
{
    return block;
}

macro bar (Context context, Ast!(string) fmt = null, Statement block)
{
    return block;
}

Declaration macros

A declaration macro is a macro that acts like a user defined attribute. It can be applied to any declaration. When a declaration macro is used, the macro is called and the AST of the declaration is passed as the last parameter to the macro. The declaration is replaced with whatever syntax tree the macro returns.

A declaration macro always take a Declaration as its last parameter. The same rules about default arguments and variadic parameter that apply to statement macros apply to declaration macros as well.

macro attr (Context context, Declaration decl)
{
    auto attrName = decl.name;
    auto type = decl.type;

    return <[
        private $decl.type _$decl.name;

        $decl.type $decl.name ()
        {
            return _$decl.name;
        }

        $decl.type $decl.name ($decl.type value)
        {
            return _$decl.name = value;
        }
    ]>;
}

class Foo
{
    @attr int bar;
}

Use cases

Examples of usage of AST macros that would be useful for extending the language.

Linq

Linq is a .net library that encorperates searching and manipulation of data. A c# example is:

using System;
using System.Linq;

class Program
{
    static void Main()
    {
	int[] array = { 1, 2, 3, 6, 7, 8 };

	var elements = from element in array
		       where element > 5
		       select element;

	foreach (var element in elements)
	{

	}
    }
}

This could be implemented by an end user as:

import linq;
import std.stdio;

void main() {
    int[] array = [1, 2, 3, 6, 7, 8];
    int[] data;
    query {
        from element in array
        where element > 2
        add element to data
    }
}

That code would be converted to:

import linq;

void main() {
    int[] array = [1, 2, 3, 6, 7, 8];
    int[] data;
    foreach (element; array) {
        if (element > 5) data ~= element;
    }
}

C#'s ability of specifying the variable to be set to is not required at least for this example. However it should be able to be specified e.g.

query {
    int data
    from element in array
    where element > 5
    select element
}

This would be closer to c#'s.

That code would be converted to:

import linq;

void main() {
    int[] array = [1, 2, 3, 6, 7, 8];
    int[] data;
    foreach (element; array) {
        if (element > 5) data ~= element;
    }
}

For improvements of this it would be suggested that the ability to be able to get the current variables declared within scope. This will enable the ability to check for if variables defined e.g. the array. If it is not it will be possible give a good compiler error. It would enable the ability to instead of specifying the type of an array value it could determine if based upon the array given.

Reflection

class Person {

  macro where (Context context, Statement statement) {
    // ...
  }

}

auto foo = "John";
auto result = Person.where(e => e.name == foo);

// is replaced by
auto foo = "John";
auto result = Person.query("select * from person where person.name = " ~
sqlQuote(foo) ~ ";");

Calculation

Given a simple macro example that will add two numbers together and then return it, the values requested must be available. Using scoped variables passed by reference on the context this is possible.

func(1, 2); // example args

void func(int i, int i2) {
    foo {
       output
       i, i2
    }
}

macro foo (Context context, Ast!(string) str)
{
    string outputVariable = // get return through str
    string name1 = // get i through str
    string name2 = // get i2 through str
    return outputVariable = "auto " ~ outputVariable ~ text(context.scopeVariables!int(name1) + context.scopeVariables!int(name2)) ~ ";";
}

When unrolled it will become:

void func(int i, int i2) {
    auto output = 3;
}

This essentially emulates pure functions however as stated in Linq example that it would enable checking of variables and types as required.

C++ Namespaces (issue 7961)

Bugzilla issue 7961 talks about adding support for C++ namespaces. This should be possible to solve with library code, especially since we already have pragma(mangle):

What we have today, declaration of a C++ function, without namespace:

extern (C++) void x ();

Namespaces in C++ is all about mangling of symbols. Since we already have pragma(mangle) one could think that it would be possible to solve with library code. Unfortunately this causes some problems:

string namespace (string namespace) { // mangle the namespace ... }
pragma(mangle, namespace("foo::bar") extern (C++) void x ();

In the about example the namespace is properly mangled but we're missing the mangling of "x". That's not something we want to do manually. Next try:

string namespace (string namespace, alias func) () { // mangle the namespace ... }
pragma(mangle, namespace!("foo::bar", x) extern (C++) void x ();

The above doesn't work either because of forward references of "x". Next try:

string namespace (string namespace, T, string name) () { // mangle the namespace ... }
pragma(mangle, namespace!("foo::bar", void function (), "x") extern (C++) void x ();

The above would mostly likely work. But now we're duplicating the signature and the name of "x". This is error prone and we will lead hard to find bugs or irritating linker errors. Not something we want to do.

Instead we can solve it with AST macros:

string mangle_cpp (string namespace, T, string name) () { // mangle the declaration ... }

macro namespace (Context context, Ast!(string) namespace, Declaration declaration)
{
    auto name = declaration.name;
    auto type = declaration.type;
    auto mangledName = mangle_cpp(namespace.eval(), type.eval(), name.eval());
    auto mangeldNameAst = literal(constant(mangledName));

    return <|
        pragma(mangle, $mangledName) $declaration;
    ]>;
}

Usage:

@namespace("foo::bar") extern (C++) void x ();

This can also be used to look more like a real namespace in C++:

@namespace("foo::bar") extern (C++)
{
    void x ();
    void y ();
}

Attribute inference

Currently attributes are inferred automatically for template functions. This shows an example of automatically infer attributes for a non-template function based the attributes of another symbol [1].

macro inferAttributes (Context context, Ast!(Symbol) symbol, Declaration decl)
{
    foreach (attr ; symbol.attributes)
        decl.attributes ~= attr;

    return decl;
}

Usage:

class Foo (T)
{
    @inferAttributes(T.foo) void thisIsSoPolymorphic () { }
}