Defining custom print format specifiers

From D Wiki
Jump to: navigation, search

Motivating example

Suppose you have some kind of data structure, let's call it S, and at some point in your program, you want to output it. The most obvious way, of course, is to implement a toString() method:

struct S
{
    ... // my sooper sekret data here!
    string toString() const pure @safe
    {
        // Typical implementation to minimize overhead
        // of constructing string
        auto app = appender!string();
        ... // transform data into string
        return app.data;
    }                                                                                        
}

void main()
{
    auto s = S();
    ... // do wonderful stuff with s
    writeln(s);
}

This is the "traditional" implementation, of course. A slight optimization that's possible is to realize that there's an alternative signature of toString() that alleviates the overhead of doing any string allocations at all:

struct S
{
    // This method now takes a delegate to send data to.
    void toString(scope void delegate(const(char)[]) sink) const
    {
        // So you can write your data piecemeal to its
        // destination, without having to construct a
        // string and then return it.
        sink("prelude");
        sink(... /* beautiful prologue */);
        sink("concerto");
        sink(... /* beautiful body */);
        sink("finale");
        sink(... /* beautiful trailer */);

        // Look, ma! No string allocations needed!
    }
}

So far so good. This is (or should be) all familiar ground.

But suppose now you want to write your data to, say, a backup file in one format, but output your data to the user in another format. How would you do this?

You could make toString() output one format, say the on-disk format, then add another method, say toUserReadableString() for outputting the other format. But this is ugly and non-extensible. What if you have a whole bunch of other formats that need to be output? You'd be drowning in toNetworkString(), toDatabaseString(), toHtmlEscapedString(), etc., etc., which bloats your data's API and isn't very maintainable to boot.

How format specifiers are implemented in D

Here's where a little known feature of std.format comes in. Note that when you write:

S s;
writeln(s);

This actually gets translated to the equivalent of:

S s;
writefln("%s", s);

Where the %s specifier, of course, means "convert to the standard string representation". What is less known, though, is that this ultimately translates to something like this:

Writer w = ... /* writer object that outputs to stdout */
FormatSpec!Char fmt = ... /* object representing the meaning of "%s" */
s.toString((const(char)[] s) { w.put(s); }, fmt);

In human language, this means that "%s" gets translated into a FormatSpec object containing "s" in its .spec field (and if you write, say, "%10s", the 10 gets stored in the .width field, etc.), and then this FormatSpec object gets passed to the toString method of the object being formatted, if it is defined with the correct signature.

To see this in action, let's do this:

struct S
{
    void toString(scope void delegate(const(char)[]) sink,
                  FormatSpec!char fmt) const
    {
        // This is for probing how std.format works
        // under the hood.
        writeln(fmt.spec);
    }
}

void main()
{
    S s;

    // Wait -- what? What on earth are %i, %j, %k, and %l?!
    writeln("%i", s);       // Hmm, prints "i"!
    writeln("%j", s);       // Hmm, prints "j"!
    writeln("%k", s);       // Hmm, prints "k"!
    writeln("%l", s);       // Hmm, prints "l"!
}

Do you see what's going on? The format specifiers are not hard-coded into the library! You can invent your own specifiers, and they get passed into the toString method.

Defining your own format specifiers

This allows us to do this:

struct S
{
    void toString(scope void delegate(const(char)[]) sink,
                  FormatSpec!char fmt) const
    {
        switch(fmt.spec)
        {
            // Look, ma! I invented my own format specs!
            case 'i':
                // output first format to sink
                break;
            case 'j':
                // output second format to sink
                break;
            case 'k':
                // output third format to sink
                break;
            case 'l':
                // output fourth format to sink
                break;
            case 's':
                // output boring default string format
                break;
            default:
                throw new Exception("Unknown format specifier: %" ~
                                    fmt.spec);
        }
    }
}

Of course, FormatSpec contains much more than just the letter that defines the specifier. It also contains field width, precision, etc.. So you can implement your own handling for all of these parameters that are specifiable in a writefln format string.

Here's a somewhat silly, but complete example to show the flexibility conferred:

import std.format;
import std.stdio;

struct BoxPrinter
{
    void toString(scope void delegate(const(char)[]) sink,
                  FormatSpec!char fmt) const
    {
        if (fmt.spec == 'b')
        {
            // Draws a starry rectangle
            foreach (j; 0..fmt.precision)
            {
                foreach (i; 0..fmt.width)
                {
                    sink("*");
                }
                sink("\n");
            }
        }
        else
        {
            // Boring old traditional string representation
            sink("BoxPrinter");
        }
    }
}

void main()
{
    BoxPrinter box;
    writefln("%s", box);
    writefln("%6.5b", box);
    writefln("%3.2b", box);
    writefln("%7.4b", box);
}

Here's the output:

BoxPrinter
******
******
******
******
******

***
***

*******
*******
*******
*******

As you can see, the width and precision parts of the custom %b specifier has been reinterpreted into the dimensions of the box that will be printed. And when you specify %s, a traditional innocent-looking string is printed instead. In effect, we have implemented our own custom format specifier.

Re-using existing formatting tools

You don't have to handle all the formatting specifiers manually. If you want to control which items in your structure are formatted but you do not care about handling the formatting yourself, you can use the existing tools from std.format. For example:

import std.conv;
import std.range;
import std.stdio;
import std.string;
import std.format;

struct S
{
    int[] arr1 = [1, 2, 3];
    int[] arr2 = [10, 20, 30];

    void toString(scope void delegate(const(char)[]) sink,
                  FormatSpec!char fmt)
    {
        put(sink, "[");

        foreach (a, b; zip(arr1, arr2))
        {
            formatValue(sink, a, fmt);
            put(sink, " : ");
            formatValue(sink, b, fmt);
            put(sink, ", ");
        }

        put(sink, "]");
    }
}

void main()
{
    S s;
    assert(to!string(s) == "[1 : 10, 2 : 20, 3 : 30, ]");
    assert(format("%#x", s) == "[0x1 : 0xa, 0x2 : 0x14, 0x3 : 0x1e, ]");
}

Summary

In summary, creating custom print format specifiers is very straightforward:

  • Implement a toString method with the following signature:
void toString(scope void delegate(const(char)[]) sink, FormatSpec!char fmt) const;
  • In this method, use fmt.spec to decide which format to output, for example, 's' to handle the default %s specifier that is implied if no format string is passed to writeln (& friends). Check for fmt.spec to equal whatever custom specifier letter you wish, say 'z'.
  • Now you can print your object using the %z specifier to trigger the custom formatting.

Additional notes

In the above examples, we used FormatSpec!char as the most common case of formatting into a UTF-8 string. If you wish to implement formatting in 16-bit wstring or 32-bit dstring as well, you will have to implement toString functions that take FormatSpec!wchar or FormatSpec!dchar as parameter (or use a template the subsume these cases).