Contributing to dlang.org

From D Wiki
Revision as of 23:27, 18 December 2015 by AndreiAlexandrescu (talk | contribs) (Macro Definition Batteries: .ddoc files)
Jump to: navigation, search

This document is aimed at people who want to contribute to the dlang.org website. It includes a brief tutorial in the technologies used and step-by-step instructions for typical tasks.

Installing

We assume you already have a dmd rig up and running. If not, follow the steps at Starting as a Contributor to get it running. The directory structure we'll assume in the following is:

~/
  d/
    dmd/
    druntime/
    phobos/

With this setup, let's proceed to downloading the source of dlang.org as follows:

cd ~/d
git clone https://github.com/D-Programming-Language/dlang.org

After this, the dlang.org directory will end up parallel to dmd, druntime, and phobos. Let's build the site (for now without the standard library documentation) by using the following command:

cd ~/d/dlang.org
make -j32 -f posix.mak html

The html target instructs make to only build the site. By default (if you specify no target), make also builds the core runtime and standard library documentation, both for the latest release and for the current code residing on your machine. That may get pretty involved, so let's leave it for later. For now, let's inspect the result of the build, all of which goes in the dlang.org/web directory. On Linux, for example, the command sensible-browser opens your default browser with a given file or address so we can use it as such:

sensible-browser ~/d/dlang.org/web/index.html

At this point if all went well a nicely-formatted HTML file pops up featuring a local replica of the dlang.org homepage. Congratulations!

Editing Content

Browsing through ~/d/dlang.org/ reveals that most files have the .dd extension. Those files are in Ddoc format; in order to work on dlang.org a basic understanding of the Ddoc format is needed.

At its core, Ddoc is a pure macro expansion system. In this context "pure" means the macro language has no relationship to other file format; all the expansion engine does is take in macro definitions and then munch through text and expand macros as they come along. A few macros are predefined to values that make HTML generation easy, so in that sense a slight affinity with HTML does exist; however, those macros can be trivially redefined to any other purpose. Also, Ddoc recognizes sections marked in a particular way as D source code. There are a few subtleties inherent to macro processors (e.g. how recursion works or in what order nested macros are expanded), but aside from those Ddoc is deceptively simple and very flexible. Exploiting the favorable relationship between Ddoc's simplicity and power is key to using it effectively.

Ddoc source files have the following structure:

Ddoc
Text with embedded macros such as $(MACRO1) and $(MACRO2) goes here.
Macros:
  MACRO1=definition1
  MACRO2=definition2

That is, a Ddoc file consists of the actual word "Ddoc" followed by a newline, then followed by the actual text of the document, followed by a line containing "Macros:", followed by macro definitions of the form NAME=value. The "Macros:" section is optional (as is the indentation of the macro definitions; it's present here for aesthetic reasons only). You might have guessed already that the syntax $(MACRONAME) expands the macro called MACRONAME into whatever text was ascribed to it in the "Macros:" section. Let's actually test that by saving the following text into a file called e.g. test.dd:

Ddoc
Text with embedded macros such as $(MACRO1) and $(MACRO2) goes here.
Macros:
  MACRO1=definition1
  MACRO2=definition2
  DDOC_COMMENT=
  DDOC=$(BODY)

The last two macro definitions seem to come out of nowhere and deserve some explanation, which this document will provide soon. For now let's "build" this file like this:

~/d/dmd/src/dmd test.dd
cat test.html

(You may of course just type dmd if it's in your $PATH) The produced file, by default carrying the .html extension, contains:

Text with embedded macros such as definition1 and definition2 goes here.

So the macro names got expanded to the text in their respective definitions. Sweet!

The Ddoc Expansion Process

Time to clear the air about the mysterious DDOC_COMMENT= and DDOC=$(BODY) definitions. When dmd processes a .ddoc file, it doesn't immediately expand the text; the process goes as follows:

  • accumulate the text (sans the opening Ddoc\n) in memory and put it in a variable called BODY;
  • when processing reaches the \nMacros:\n section, read and memorize the macro definitions underneath;
  • expand and output $(DDOC_COMMENT Generated by Ddoc from filename.dd);
  • expand and output the macro DDOC.

The default values of DDOC_COMMENT and DDOC are geared toward building simple HTML files. Indeed, if we remove the macro definitions from the test.dd file and rebuild, the resulting test.html contains:

<html><head>
        <META http-equiv="content-type" content="text/html; charset=utf-8">
        <title>test</title>
        </head><body>
        <h1>test</h1>
        <!-- Generated by Ddoc from test.dd -->

Text with embedded macros such as definition1 and definition2 goes here.

        <hr><small>Page generated by <a href="http://dlang.org/ddoc.html">Ddoc</a>. </small>
        </body></html>

which is a serviceable albeit bland HTML document.

Macros with Parameters

Macros are key to the power of Ddoc for at least two reasons. First, they shorten and simplify the document by allowing you to replace clumsy typesetting directives such as <span class="important">Attention!</span> with $(ATTENTION). Second, they elevate the level of the document you're writing by leveraging the traditional "extra level of indirection". Depending on how you define ATTENTION, you get to format $(ATTENTION) in various formats. For example, for HTML you'd use:

ATTENTION=<span class="important">Attention!</span>

If you don't care to define css classes etc. you may want to just go sloppy:

ATTENTION=<font color="red">Attention!</font>

To format the document as plain text, just write:

ATTENTION=Attention!

And if you want to output LaTeX, write something like:

ATTENTION={\color{red}Attention!}

In short, you get to expand Ddoc documents into many other formats by using macros and combining the documents with appropriately-defined macro batteries. (This document will discuss soon how to effect such combinations.)

For now, let's note that ATTENTION is not quite a sterling example of flexibility; it allows us to render "Attention!" in a special way, but often the need is to render various other words and phrases as attention-attracting text. So what's needed is a macro taking the text to render as a parameter. Here's how to do so in Ddoc.

In a macro definition, certain constructs access arguments as follows:

  • $1, $2, $3, ..., $9 expand to the first, second, third, ..., ninth argument;
  • $+ expands to all arguments except for the first;
  • $0 simply expands to all macro arguments.

To pass arguments to a macro, insert a space after the macro name, then pass the arguments separated them by commas, then close the paren: for example, $(MYMACRO how are you doing, Jeff?) passes the arguments "how are you doing" and "Jeff?" to a macro called MYMACRO.

There are a couple of subtleties related to passing and expanding parameters. The first whitespace after the macro name is "munched", i.e. it just disappears from the expansion. But if there's any whitespace character following the first one, it will be considered part of the first argument. An example will make this clear. Consider the macro definition TEST=|$1|. Then, the invocation

$(TEST abc)

produces

|abc|

whereas the invocation (note the extra space)

$(TEST  abc)

produces (note the extra space before the letters):

| abc|

The whitespace after comma (if present) in multiple argument lists is handled with a similar logic. The first whitespace character immediately after the comma is considered "aesthetic" and not present in the expanded text. However, if the source inserts more than one whitespace character after comma, the extra ones are considered part of the argument. By means of example, consider the macro definition TEST=|$1|$2|. Then we have the following expansions:

$(TEST abc,xyz)

produces

|abc|xyz|

then the invocation (note the extra space)

$(TEST abc, xyz)

produces no change in output:

|abc|xyz|

However, the expansion (there are THREE spaces after the comma):

$(TEST abc,   xyz)

produces the output:

|abc| xyz|

which inserts TWO spaces before "xyz". One has been munched, the other two copied.

Ddoc "understands" parentheses and likes them paired. If a comma occurs within parenthesized text, it won't be considered a macro argument separator. For example, in the invocation $(MYMACRO abc, (xyz, tuv)), MYMACRO receives two arguments, not three. Nested parentheses work as expected, too; only top-level commas within the macro are considered argument separators.

Whitespace in macro definition

In the "Macros:" section, any number of whitespace characters preceding or following the = sign are ignored. That makes it problematic to define macros that start with a space. To remedy that, define this macro:

SPACE = $(SPACE) $(SPACE)

The recursive expansion (which will be explained in detail below) expands to nothing while the macro itself is being expanded; so the final expansion will be a space flanked by two empty strings, which is exactly what we needed. Now it's easy to use space to e.g. indent some text:

INDENT = $(SPACE)$(SPACE)$(SPACE)$(SPACE)$0

The INDENT macro inserts four spaces before its arguments. Note that the space between "=" and the first "$(SPACE)" is not part of the output.

Similarly, to define a macro that inserts exactly one newline:

NEWLINE = $(NEWLINE)
$(NEWLINE)

Special characters: "(", ")", "$", ","

As shown above, the characters "(", ")", "$", and "," have special meaning to Ddoc. Sometimes it's necessary to "escape" them, i.e. make them part of the produced output without them being interpreted by Ddoc. To achieve that, the following macros are useful:

ARGS = $0
COMMA = ,
COMMENT = 
DOLLAR = $
LPAREN = (
RPAREN = )
TAIL = $+

The ARGS macro is useful when we want to pass a large text (possibly including comma) as a single argument to another macro. Consider:

SECTION=<h2>$1</h2>$+

Then if the first argument to SECTION includes one ore more commas, use it like this:

$(SECTION $(ARGS Trinkets, Treasures, and Other Tchotchkes),
Here goes the text of the section...
)

Alternatively (and this brings us to the second useful macro), you could use COMMA for the same effect:

$(SECTION Trinkets$(COMMA) Treasures$(COMMA) and Other Tchotchkes,
Here goes the text of the section...
)

The macro COMMENT expands to absolutely nothing whatsoever, which makes it a great macro for inserting comments in Ddoc files (including commenting out portions of the document). Pay attention, however: comments must still pair parentheses properly.

The DOLLAR macro expands to the dollar sign in a way that makes it impossible for later use for initiating a macro expansion. Same goes about LPAREN and RPAREN: they expand to parentheses, but only in a blind textual way; they don't carry the usual meaning of parentheses. So you get to use these macros if you want e.g. to define a macro that includes the dollar sign and/or includes unbalanced parentheses.

Last but (somewhat ironically) not least, the TAIL macro expands to all of its arguments except the last one. Why is not $+ just enough? Because you may apply TAIL to $+ itself to access all arguments except the first two ones. Consider:

SAYS=$1 $2 says: ==$(TAIL $+)==

Then, $(SAYS Mad, Hatter, A land full of wonder, mystery, and danger!) expands to "Mad Hatter says: ==A land full of wonder, mystery, and danger!==" as we'd need it to.

Unfortunately, $(TAIL $(TAIL $+)) does not expand to all arguments except the first three ones, as we'd wish. Instead, it expands to nothing at all. This is to keep the overall expansion mechanics simple: once the first macro expansion occurs, the commas found it it are "painted" to be inert so they can't be reused as commas to separate arguments to the outer macro.

All is not lost, however, with a little handwritten drudgery. These macros do work properly:

TAIL=$+
TAIL2=$(TAIL $+)
TAIL3=$(TAIL2 $+)
TAIL4=$(TAIL3 $+)

You may continue this for as long as needed. A natural limit is 9 because numbered arguments go up to $9.

Recursive Macros

Ddoc strives to avoid being a Turing-complete language, thus keeping the complexity/power balance in a safe area. That means recursion is theoretically off the table. However, there are two crude mechanisms that make recursive macros usable in Ddoc:

  • If a macro is expanded within its own expansion more than 1000 levels, ddoc compilation fails with an error message.
  • If a macro is expanded without any argument within its own definition, it "disappears" i.e. it expands to the null string.

The first rule is a hamfisted means to enforce that any ddoc expansion will finish and is rather uninteresting. The only thing to note is the limit on expansion depth, which is generous enough to not make it a nuisance. The second rule is the interesting one because it allows us to write recursive macros as long as the recursive invocation is always smaller than the input.

Consider, for example, generating an HTML list which needs to look like this:

<ul>
<li>Apples</li>
<li>Oranges</li>
<li>Bananas</li>
<li>Pineapples</li>
</ul>

One direct but awkward approach is:

UL = <ul>$0</ul>
LI = <li>$0</li>

which is then used like this:

$(UL
$(LI Apples)
$(LI Oranges)
$(LI Bananas)
$(LI Pineapples)
)

This works, but there's a more compact way of creating the list by using recursive macros:

UL = <ul>$(LIs $0)</ul>
LIs = <li>$1</li>$(LIs $+)

Now all you need to type to create the list is $(UL Apples, Oranges, Bananas, Pineapples). Sweet! (Literally.) Got an argument with commas inside? No problem, ARGS is ready to help: $(UL Apples, Oranges, $(ARGS Mandarins, Tangerines, Clementines, etc.), Bananas, Pineapples) expands as desired.

Other Special Characters and Patterns

There are two more patterns you should know about. Snippets of D code should go in between lines consisting of three or more equal signs, as follows:

===
void main() {
    import std.stdio;
    writeln("Hello, world");
}
===

Then the Ddoc processor understands and syntax-colorizes the code appropriately. The coloring is done by means of predefined macros, all of which may be redefined by the user. Only D is currently supported; no other languages are "understood" by the Ddoc engine.

Another pattern worth noting is words or short phrases enclosed in backticks (those small back apostrophes ` that look like specs of dust on the screen to those of us missing our glasses). Such `text` enclosed in backticks, when both the opening and the closing backtick occur on the same line, is rewritten as $(DDOC_BACKQUOTED text). By default the macro expands to an HTML span tag, but you may change it as needed.

Macro Definition Batteries: .ddoc files

So far, this document discussed .dd files as defining both their content and macros (the latter by means of the "Macros:" section). However, the whole point of a macro system is that you get to swap easily what the generated text looks like by swapping one set of macros for another. So we need to explore the opportunity of defining the content in a file, and the macros controlling the expansion of the content in a distinct file. Enter .ddoc files containing macros alone.

Ddoc has no file inclusion feature, but it has a simple mechanism of combining files in the command line. Simply specify the files in the command line in the order you'd like them processed, and dmd loads each in turn and processes it. (The order does matter; for example, a macro may be defined by multiple files; only the last definition sticks.) The .ddoc files may only contain macros, i.e. the entire content of a .ddoc file is processed the same way as the "Macros:" section of a .dd file.

Consider, for example, we have a file html.ddoc with the following content:

DDOC=<!DOCTYPE html>
<html>
<head><title>$(TITLE)</title></head>
<body>
$(BODY)
</body>
</html>
EMPH=<i>$0</i>

Now, say we also have a file called text.dd with the content:

Ddoc
Yo, this is one $(EMPH fine) HTML doc!

To process text.dd in conjunction with html.ddoc, run this:

~/d/dmd/src/dmd html.ddoc test.dd

The generated content of test.html is:

<!DOCTYPE html>
<html>
<head><title>test</title></head>
<body>
Yo, this is one <i>fine</i> HTML doc!
</body>
</html>

To illustrate the flexibility gained, let's also define latex.ddoc as follows:

\documentclass{article}
\title{$(TITLE)}
\begin{document}
\maketitle
$(BODY)
\end{document}

Predefined Macros

TODO