Difference between revisions of "DMD Source Guide"

From D Wiki
Jump to: navigation, search
(Itanium, not Italium. And the link was dead.)
m (oops Note the difference between two mixin declarations and mixin statements)
 
(32 intermediate revisions by 15 users not shown)
Line 1: Line 1:
''Note: This article is very old, and it is outdated in many parts. Please bring it up to date where you can.''
+
== Overview ==
  
== DMD Source Guide ==
+
=== Major components ===
  
''If it's wrong, please correct it. If it's not here, please add it.''
+
All D compilers are divided into two parts: the front-end and the back-end.
 +
 
 +
The front-end (DMD-FE) implements all things D-specific: lexing and parsing D syntax, instantiating templates, producing error messages, etc. The same front-end code is used by [[DMD]], [[GDC]] and [[LDC]].
 +
 
 +
The back-end is what emits machine code. It contains code generation, optimization, object file writing, etc. The back-end is specific to each D compiler: DMD uses a D-specific Boost-licensed (as of April 2017) back-end, LDC uses [[LLVM]], and GDC uses [[GCC]] for their respective back-end processing.
 +
 
 +
There is also a glue layer, which is the interface between the front-end and back-end. This component is custom for each D compiler.
 +
 
 +
=== Compilation cycle ===
 +
 
 +
D source code goes through the following stages when compiled:
 +
 
 +
* First, the file is loaded into memory as-is, and converted to UTF-8 when necessary.
 +
* The lexer transforms the file into an array of tokens. There is no structure yet at this point - just a flat list of tokens. (lexer.c)
 +
* The parser then builds a simple AST out of the token stream. (parser.c)
 +
* The AST is then semantically processed. This is done in three stages (called semantic, semantic2 and semantic3). This is done in a loop in mars.c. Each pass transforms the AST to be closer to the final representation: types are resolved, templates are instantiated, etc.
 +
:1. The "semantic" phase will analyze the full signature of all declarations. For example:
 +
::* members of aggregate type
 +
::* function parameter types and return type
 +
::* variable types
 +
::* evaluation of pragma(msg)
 +
:
 +
:2. The "semantic2" phase will analyze some additional part of the declarations, For example:
 +
::* initializer of variable declarations
 +
::* evaluation of static assert condition
 +
:
 +
:3. The "semantic3" phase will analyze the body of function declarations.
 +
::If a function is declared in the module which is not directly compiled (== not listed in the command line), semantic3 pass won't analyze its body.
 +
:
 +
:4. During each phases, some declarations will partially invoke the subsequent phases due to resolve forward reference, For example:
 +
 
 +
immutable string x = "hello";
 +
static if (x == "hello") { ... }
 +
// The static if condition will invoke semantic2 of the variable 'x'
 +
 
 +
auto foo() { ... }
 +
typeof(&foo) fp;
 +
// "semantic" phase of the variable 'fp' will run "semantic3" of 'foo'
 +
// to demand the full signature of the function (== infer the return type)
 +
 
 +
string foo() { ... }
 +
mixin(foo());
 +
// For CTFE, the mixin declaration will invoke the semantic3 of 'foo'
 +
* Finally, the AST is handed over to the glue layer, which feeds it into the back-end, which in turn produces machine code and object files.
 +
 
 +
=== Runtime interoperability ===
 +
 
 +
Non-trivial operations (e.g. memory allocation, array operations) are implemented in the D runtime. The compiler integrates with the runtime using a number of so-called hook functions (which by convention have the <tt>_d_</tt> name prefix).
 +
 
 +
A list can be found here: [[Runtime_Hooks]]
 +
 
 +
=== Details ===
 +
 
 +
''Note: This section may be considerably outdated. Please bring it up to date where you can.''
 +
 
 +
There are a number of types that are stored in various nodes that are never actually used in the front end.  They are merely stored and passed around as pointers.
 +
 
 +
* Symbol - Appears to have something to do with the names used by the linker.  Appears to be used by Dsymbol and its subclasses.
 +
* dt_t - "Data to be added to the data segment of the output object file" ''source: todt.c''
 +
* elem - A node in the internal representation.
 +
 
 +
The code generator is split among the various AST nodes.  Certain methods of almost every AST node are part of the code generator.
 +
 
 +
(it's an interesting solution to the problem.  It would have never occurred to a Java programmer)
 +
 
 +
Most notably:
 +
* all Statement subclasses must define a toIR method
 +
* All Expression subclasses must define a toElem method
 +
* Initializers and certain Expression subclasses must define toDt
 +
* Declarations must define toObjFile
 +
* Dsymbol subclasses must define toSymbol
 +
 
 +
==== Other things ====
 +
Floating point libraries seem to be atrociously incompatible between compilers.  Replacing strtold with strtod may be necessary, for instance.  (this does "break" the compiler, however: it will lose precision on literals of type 'real')
 +
-- AndyFriesen
 +
 
 +
==== Intermediate Representation ====
 +
 
 +
'''From [http://www.digitalmars.com/webnews/newsgroups.php?art_group=D.gnu&article_id=762 NG:D.gnu/762]'''
 +
 
 +
I've been looking at trying to hook the DMD frontend up to LLVM (www.llvm.org), but I've been having some trouble.  The LLVM IR (Intermediate Representation) is very well documented, but I'm having a  rough time figuring out how DMD holds its IR.  Since at least three people (David, Ben, and Walter) seem to have understand, I thought I'd ask for guidance.
 +
 
 +
What's the best way to traverse the DMD IR once I've run the three semantic phases?  As far as I can tell  it's all held in the SymbolTable as a bunch of Symbols.  Is there a good way to traverse that and reconstruct it into another IR?
 +
 
 +
----
 +
 
 +
'''From [http://www.digitalmars.com/webnews/newsgroups.php?art_group=D.gnu&article_id=764 NG:D.gnu/764]'''
 +
 
 +
There isn't a generic visitor interface.  Instead, there are several methods with are responsible for emiting code/data and then calling that method for child objects.  Start by implementing Module::genobjfile and loop over the 'members' array, calling each Dsymbol object's toObjFile method.  From there, you will need to implement these methods:
 +
 
 +
Dsymbol (and descendents) ::toObjFile -- Emits code and data for objects that have generally have a symbol name and storage in memory. Containers like ClassDeclaration also have a 'members' array with child Dsymbols.  Most of these are descendents of the Declaration class.
 +
 
 +
Statement (and descendents) ::toIR -- Emits instructions.  Usually, you just call toObjFile, toIR, toElem, etc. on the statement's fields and string  the results together in the IR.
 +
 
 +
Expression (and descendents) ::toElem -- Returns a back end representation of numeric constants, variable references, and operations that expression trees are composed of.  This was very simple for GCC because the back end already had the code to convert expression trees to ordered instructions.  If LLVM doesn't do this, I think you could generate the instructions here since LLVM has SSA.
 +
 
 +
Type (and descendents) ::toCtype -- Returns the back end representation of the type.  Note that a lot of classes don't override this -- you just need to do a switch on the 'ty' field in Type::toCtype.
 +
 
 +
Dsymbol (and descendents) ::toSymbol -- returns the back end reference to the object.  For example, FuncDeclaration::toSymbol could return a llvm::Function. These are already implemented in tocsym.c, but you will probably rewrite them to create LLVM objects.
 +
 
 +
----
 +
 
 +
(Thread:  http://digitalmars.com/d/archives/D/gnu/762.html)
 +
 
 +
==== Inliner ====
 +
 
 +
DMD's inliner is part of the frontend, existing entirely in the file '''inline.c'''.
 +
 
 +
This inliner is conceptually quite simple: It traverses the AST looking for function calls. Each function found is analysed for cost by adding up the number of expression nodes in its body. Anything non-inlinable counts as "maximum cost". If the total cost is below the maximum, the function call is inlined.
 +
 
 +
In DMD's AST, certain statements cannot currently be represented as expressions (such as non-unrolled loops and throwing). Because of this, the inliner makes a distinction between two main types of inlining:
 +
 
 +
* Converting a function call to an inline expression: This must be used whenever the function's return value is actually used. Ex: "x = foo();" or "2 + foo()".
 +
* Converting a function call to an inline statement: Used when a function's return value is ignored, or when calling a void function.
 +
 
 +
Those two scenarios are inlined by mostly separate codepaths. Cost analysis is mostly the same codepath, but "inlinable as a statement" and "inlinable as an expression" are separate decisions (again, due to certain statements not being representable as expressions).
 +
 
 +
The inliner is divided into four main parts:
 +
 
 +
* Main entry point: '''inlineScan''' (which utilizes class '''InlineScanVisitor''' and function '''expandInline''')
 +
* Cost analysis (to determine inlinability): '''canInline''' and class '''InlineCostVisitor'''
 +
* Inlining a function call as a statement: '''inlineAsStatement''' and its embedded class '''InlineAsStatement'''
 +
* Inlining a function call as an expression: '''doInline''' and its embedded class '''InlineStatement'''
 +
 
 +
===== Inliner: Main Entry Point =====
 +
 
 +
The whole inliner is driven by the '''inlineScan''' function and '''InlineScanVisitor''' class, but the bulk of the real work is performed by '''expandInline''' (described in this section) and the other three main parts of the inliner (described in the following sections).
 +
 
 +
The global function '''inlineScan''' is the inliner's main entry point. This uses class '''InlineScanVisitor''' to traverse the AST looking for function calls to inline, and inlining them as they're found. Whenever '''InlineScanVisitor''' finds an inlinable function call (determined by the cost analyzer), it calls '''expandInline''' to start the inlining process.
 +
 
 +
'''InlineScanVisitor''' also decides whether to inline as a statement or an expression based on the type of AST node found:
 +
 
 +
* '''ExpStatement''': Implies the function either has no return value, or the return value is unused. Therefore, inline as a statement (if permitted by cost analysis).
 +
* '''CallExp''': Implies the function returns a value which is used. Therefore, inline as an expression (if permitted by cost analysis).
 +
 
 +
Called by '''InlineScanVisitor''', '''expandInline''' drives the actual inlining for both "as statement" and "as expression" cases. It converts the function call scaffolding, parameters and return value (if any) into the appropriate inline statement or expression. To inline the function's body, '''expandInline''' hands over to either '''inlineAsStatement''' (if inlining the call as a statement) or '''doInline''' (if inlining the call as an expression).
 +
 
 +
===== Inliner: Cost Analysis =====
 +
 
 +
The function '''canInline''', unsurprisingly, determines if a function can be inlined. To decide this, it uses class '''InlineCostVisitor''', which traverses the AST calculating a sum of all costs involved.
 +
 
 +
'''InlineCostVisitor''' is a '''Visitor''' class which works just like any other AST visitor class in DMD, or any other usage of the [http://en.wikipedia.org/wiki/Visitor_pattern visitor pattern]: It contains a '''visit''' function for each AST node type supported by the inliner. Each '''visit''' function traverses its children nodes (if any) by calling the child node's '''accept''' function, passing the visitor class itself as an argument. Then the node's '''accept''' automatically calls its corresponding '''visit''' function.
 +
 
 +
Any type of node not supported by '''InlineCostVisitor''' automatically calls a default function '''InlineCostVisitor::visit(Statement *s)''', which flags the function being analyzed as non-inlinable.
 +
 
 +
The actual '''cost''' variable is slightly complicated since it's really two packed values:
 +
 
 +
The low 12-bits of '''cost''' are the actual accumulated cost. A value of 1 is added for every inlinable expression node in the function's body (ex: "a+b*c" has a cost of 2: One for the multiplication and one for the addition). Anything that can't be inlined, or that the cost analyzer knows nothing about, adds a cost of '''COST_MAX'''. If this total cost, in the low 12-bits, is at least '''COST_MAX''' (determined by the helper function '''tooCostly'''), the function is considered non-inlinable.
 +
 
 +
The upper bits of '''cost''' (bits 13 and up) are separate from the actual cost and keep track of whether the function can be inlined as an expression. Whenever a statement is found which can be inlined ''only'' as a statement (and cannot be converted to an expression), this is flagged by adding '''STATEMENT_COST''' to '''cost'''.
 +
 
 +
Note: It looks as if at one point in time there had been a limit (or perhaps plans to eventually limit) the number of statements allowed in inlined functions, just as there's currently a limit to the number of expression nodes. But this does not currently appear to be enforced, so '''STATEMENT_COST''' is essentially used as a "this can only be inlined as a statement" flag.
 +
 
 +
Sometimes expressions are evaluated for cost by simply visiting the the expression node, via the node's '''accept''' function. Other times, the helper function '''InlineCostVisitor::expressionInlineCost''' is used instead. The benefit of '''expressionInlineCost''' is it automatically halts analysis of an expression as soon as it reaches '''COST_MAX'''.
 +
 
 +
The '''canInline''' function caches its results in two members of '''FuncDeclaration''': In '''ILS inlineStatusStmt''' (for inlinability as a statement) and '''ILS inlineStatusExp''' (for inlinability as an expression). '''ILS''' is an enum defined in '''declaration.h''' supporting three states: '''ILSuninitialized''' (not yet cached), '''ILSno''' (not inlinable) and '''ILSyes''' (inlinable).
 +
 
 +
===== Inliner: Inlining as a Statement =====
 +
 
 +
Any functions DMD is capable of inlining, can be inlined as a statement. As explained above, this is performed whenever a function call ignores the return value, or has no return value. In this case, the function's body is inlined via '''inlineAsStatement'''. Internally, '''inlineAsStatement''' works using its embedded visitor class '''InlineAsStatement'''.
 +
 
 +
To paraphrase a comment in '''inline.c''', this inlining is done by converting to a statement, copying the trees of the function to be inlined, and renaming the variables. Most of this is fairly straightforward: Much like the cost analyzer's '''InlineCostVisitor''' class, the '''InlineAsStatement''' class has a '''visit''' function for each supported type of statement and expression. Each of these visitors copies the node, makes any adjustments if necessary, and then visits all subnodes by calling their '''accept''' functions.
 +
 
 +
There's also a default catch-all function which asserts, indicating the cost analyzer failed to disallow something which has no corresponding visitor in '''InlineAsStatement'''.
 +
 
 +
===== Inliner: Inlining as an Expression =====
 +
 
 +
''Some'', but not all, inlinable functions can be inlined as an expression. This must be done whenever a function call uses the return value (Ex: "x = foo();" or "2 + foo()"). In this case, inlining the function's body as an expression works very much like inlining it as a statement (see the section above), but with a separate code path and a few differences:
 +
 
 +
* The function body is inlined by '''doInline''' instead of '''inlineAsStatement'''.
 +
* There are two overloads of '''doInline''': One to inline expressions ''as'' expressions, and one to convert statements ''to'' inline expressions.
 +
* As discussed in the sections above, not all statements can be converted to expressions. Because of this, these statements' corresponding '''visit''' functions are omitted from '''doInline''', since the cost analyzer should have already prevented the inliner from attempting to inline any offending functions.
 +
 
 +
===== Inliner: How to Add More Support =====
 +
 
 +
If a particular statement is unsupported by the inliner (thereby preventing any function using it from being inlined), support can be added like this:
 +
 
 +
* Add an overload of '''InlineCostVisitor::visit''' for the type of AST node you wish to support. Following the example of the other visit functions:
 +
:* Increase '''cost''' however is appropriate.
 +
:* Add '''STATEMENT_COST''' to '''cost''' if the statement cannot be converted to an expression (ex: '''ForStatement''' and '''ThrowStatement'''). This allows you to omit a corresponding overload of '''doInline's''' '''InlineStatement::visit'''.
 +
:* Add '''COST_MAX''' to '''cost''' for any situations that are not inlinable.
 +
:* Call '''accept''' on all subnodes. If the subnode is an expression, it may be better to use '''expressionInlineCost''' instead since this will automatically halt analysis as soon as the expression's cost reaches the maximum.
 +
* In '''inlineAsStatement''', add an overload of '''InlineAsStatement::visit''' for the appropriate AST node type. Following the example of the other visit overloads: Copy the node, make any adjustments if necessary, and traverse to all subnodes.
 +
* If the statement can be converted to an expression (ex: '''IfStatement'''), then inside the '''Statement''' overload of '''doInline''', add an overload of '''InlineStatement::visit''' for the appropriate AST node type. Following the other examples, convert the node to an expression, make any adjustments if necessary, and traverse to all subnodes.
 +
 
 +
==== The Back End ====
 +
 
 +
DMD's internal representation uses expression trees with 'elem' nodes (defined in el.h). The "Rosetta Stone" for understanding the backend is enum OPER in oper.h. This lists all the types of nodes which can be in an expression tree.
 +
 
 +
If you compile dmd with debug on, and compile with:
 +
 
 +
  -O --c
 +
 
 +
you'll get reports of the various optimizations done.
 +
 
 +
Other useful undocumented flags:
 +
 
 +
--b  show block optimisation
 +
--f  full output
 +
--r  show register allocation
 +
--x  suppress predefined C++ stuff
 +
--y  show output to Intermediate Language (IL) buffer
 +
 
 +
Others which are present in the back-end but not exposed as DMD flags are:
 +
debuge show exception handling info
 +
debugs show common subexpression eliminator
 +
 +
 
 +
The most important entry point from the front-end to the backend is writefunc() in out.c, which optimises a function, and then generates code for it.
 +
 
 +
* writefunc() sets up the parameters, then calls codgen() to generate the code inside the function.
 +
* it generates code for each block. Then puts vars in registers.
 +
* generates function start code, does pinhole optimisation. (cod3.pinholeopt()).
 +
* does jump optimisation
 +
* emit the generated code in codout().
 +
* writes switch tables
 +
* writes exception tables (nteh_gentables() or except_gentables()
 +
 
 +
In cgcod.c, blcodgen() generates code for a basic block. Deals with the way the block ends (return, switch,
 +
if, etc).
 +
 
 +
cod1.gencodelem() does the codegen inside the block. It just calls codelem().
 +
 
 +
cgcod.codelem()  generates code for an elem. This distributes code generation depending on elem type.
 +
 
 +
Most x86 integer code generation happens in cod1,cod2, cod3, cod4, and cod5.c
 +
Floating-point code generation happens in cg87. Compared to the integer code generation, the x87 code generator is extremely simple. Most importantly, it cannot cope with common subexpressions. This is the primary reason why it is less efficient than compilers from many other vendors.
 +
 
 +
===== Optimiser =====
 +
The main optimiser is in go.c, optfunc().
 +
This calls:
 +
* blockopt.c blockopt(iter) -- branch optimisation on basic blocks, iter = 0 or 1.
 +
* gother.c constprop() -- constant propagation
 +
* gother.c copyprop() -- copy propagation
 +
* gother.c rmdeadass() -- remove dead assignments
 +
* gother.c verybusyexp() -- very busy expressions
 +
* gother.c deadvar() -- eliminate dead variables
 +
 
 +
* gloop.c loopopt() -- remove loop invariants and induction vars. Do loop rotation
 +
 
 +
* gdag.c  boolopt() -- optimize booleans.
 +
* gdag.c builddags() -- common subexpressions
 +
 
 +
* el.c el_convert() -- Put float and string literals into the data segment
 +
* el.c el_combine() -- merges two expressions (uses a comma-expression to join them).
 +
* glocal.c localize() -- improve expression locality
 +
 
 +
 
 +
* cod3.c pinholeopt() -- Performs peephole optimisation. Doesn't do much, could do a lot more.
 +
 
 +
===== Code generation =====
 +
The code generation for each function is done individually. Each function is placed into its own COMDAT segment in the obj file.
 +
The function is divided into blocks, which are linear sections of code ending with a jump or other control instruction (http://en.wikipedia.org/wiki/Basic_block).
 +
 
 +
===== Scheduler (cgsched.c) =====
 +
 
 +
Pentium only
 +
 
 +
== Source files ==
 +
 
 +
''Note: This section may be considerably outdated. If it's wrong, please correct it. If it's not here, please add it.''
 +
 
 +
=== Front end ===
 +
See the [https://github.com/dlang/dmd/blob/master/compiler/src/dmd/README.md official docs].
  
'''Front end'''
 
 
{| class="wikitable" |
 
{| class="wikitable" |
 
! File || Function
 
! File || Function
 
|-
 
|-
| access.c || Access check ('''private''', '''public''', '''package''' ...)
+
| access.d || Access check ('''private''', '''public''', '''package''' ...)
 
|-
 
|-
| aliasthis.c || Implements the '''[http://digitalmars.com/d/2.0/class.html#AliasThis alias this]''' D symbol.
+
| aliasthis.d || Implements the '''[http://digitalmars.com/d/2.0/class.html#AliasThis alias this]''' D symbol.
 
|-
 
|-
| argtypes.c || Convert types for argument passing (e.g. '''char''' are passed as '''ubyte''').
+
| argtypes.d || Convert types for argument passing (e.g. '''char''' are passed as '''ubyte''').
 
|-
 
|-
| arrayop.c || [http://digitalmars.com/d/2.0/arrays.html#array-operations Array operations] (e.g. ''a[] = b[] + c[]'').
+
| arrayop.d || [http://digitalmars.com/d/2.0/arrays.html#array-operations Array operations] (e.g. ''a[] = b[] + c[]'').
 
|-
 
|-
| attrib.c || [http://digitalmars.com/d/2.0/attribute.html Attributes] i.e. storage class ('''const''', '''@safe''' ...), linkage ('''extern(C)''' ...), protection ('''private''' ...), alignment ('''align(1)''' ...), anonymous aggregate, '''pragma''', '''static if''' and '''mixin'''.
+
| attrib.d || [http://digitalmars.com/d/2.0/attribute.html Attributes] i.e. storage class ('''const''', '''@safe''' ...), linkage ('''extern(C)''' ...), protection ('''private''' ...), alignment ('''align(1)''' ...), anonymous aggregate, '''pragma''', '''static if''' and '''mixin'''.
 
|-
 
|-
| bit.c || Generate bit-level read/write code. Requires backend support.
+
| bit.d || Generate bit-level read/write code. Requires backend support.
 
|-
 
|-
| builtin.c || Identify and evaluate built-in functions (e.g. '''std.math.sin''')
+
| builtin.d || Identify and evaluate built-in functions (e.g. '''std.math.sin''')
 
|-
 
|-
| cast.c || Implicit cast, implicit conversion, and explicit cast ('''cast(T)'''), combining type in binary expression, integer promotion, and value range propagation.
+
| dcast.d || Implicit cast, implicit conversion, and explicit cast ('''cast(T)'''), combining type in binary expression, integer promotion, and value range propagation.
 
|-
 
|-
| class.c || Class declaration
+
| dclass.d || Class declaration
 
|-
 
|-
| clone.c || Define the implicit '''opEquals''', '''opAssign''', post blit and destructor for struct if needed, and also define the copy constructor for struct.
+
| clone.d || Define the implicit '''opEquals''', '''opAssign''', post blit and destructor for struct if needed, and also define the copy constructor for struct.
 
|-
 
|-
| cond.c || Evaluate compile-time conditionals, i.e. '''debug''', '''version''', and '''static if'''.
+
| cond.d || Evaluate compile-time conditionals, i.e. '''debug''', '''version''', and '''static if'''.
 
|-
 
|-
| constfold.c || Constant folding
+
| constfold.d || Constant folding
 
|-
 
|-
| cppmangle.c || Mangle D types according to [http://mentorembedded.github.io/cxx-abi/abi.html#mangling Intel's Itanium C++ ABI].
+
| cppmangle.d || Mangle D types according to [http://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling Intel's Itanium C++ ABI].
 
|-
 
|-
| declaration.c || Miscellaneous declarations, including '''typedef''', '''alias''', variable declarations including the implicit '''this''' declaration, type tuples, ClassInfo, ModuleInfo and various TypeInfos.
+
| declaration.d || Miscellaneous declarations, including '''typedef''', '''alias''', variable declarations including the implicit '''this''' declaration, type tuples, ClassInfo, ModuleInfo and various TypeInfos.
 
|-
 
|-
| delegatize.c || Convert an expression ''expr'' to a delegate ''{ return expr; }'' (e.g. in '''lazy''' parameter).  
+
| delegatize.d || Convert an expression ''expr'' to a delegate ''{ return expr; }'' (e.g. in '''lazy''' parameter).  
 
|-
 
|-
| doc.c || [http://digitalmars.com/d/ddoc.html Ddoc] documentation generator ([http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=1558 NG:digitalmars.D.announce/1558])
+
| doc.d || [http://digitalmars.com/d/ddoc.html Ddoc] documentation generator ([http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=1558 NG:digitalmars.D.announce/1558])
 
|-
 
|-
| dsymbol.c || D symbols (i.e. variables, functions, modules, ... anything that has a name).
+
| dsymbol.d || D symbols (i.e. variables, functions, modules, ... anything that has a name).
 
|-
 
|-
| dump.c || Defines the ''Expression::dump'' method to print the content of the expression to console. Mainly for debugging.
+
| dump.d || Defines the ''Expression::dump'' method to print the content of the expression to console. Mainly for debugging.
 
|-
 
|-
| e2ir.c || Expression to Intermediate Representation; requires backend support
+
| e2ir.d || Expression to Intermediate Representation; requires backend support
 
|-
 
|-
| eh.c || Generate exception handling tables
+
| eh.d || Generate exception handling tables
 
|-
 
|-
| entity.c || Defines the named entities to support the ''"\&Entity;"'' escape sequence.
+
| entity.d || Defines the named entities to support the ''"\&Entity;"'' escape sequence.
 
|-
 
|-
| enum.c || Enum declaration
+
| denum.d || Enum declaration
 
|-
 
|-
 
| expression.h || Defines the bulk of the classes which represent the AST at the expression level.
 
| expression.h || Defines the bulk of the classes which represent the AST at the expression level.
 
|-
 
|-
| func.c || Function declaration, also includes function/delegate literals, function alias, (static/shared) constructor/destructor/post-blit, '''invariant''', '''unittest''' and [http://digitalmars.com/d/2.0/class.html#allocators allocator/deallocator].
+
| func.d || Function declaration, also includes function/delegate literals, function alias, (static/shared) constructor/destructor/post-blit, '''invariant''', '''unittest''' and [http://digitalmars.com/d/2.0/class.html#allocators allocator/deallocator].
 
|-
 
|-
| glue.c || Generate the object file for function declarations and critical sections; convert between backend types and frontend types
+
| glue.d || Generate the object file for function declarations and critical sections; convert between backend types and frontend types
 
|-
 
|-
| hdrgen.c || Generate headers (*.di files)
+
| hdrgen.d || Generate headers (*.di files)
 
|-
 
|-
| iasm.c || Inline assembler
+
| iasm.d || Inline assembler
 
|-
 
|-
| identifier.c || Identifier (just the name).
+
| identifier.d || Identifier (just the name).
 
|-
 
|-
| idgen.c || Make id.h and id.c for defining built-in Identifier instances. Compile and run this before compiling the rest of the source. ([http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=17157 NG:digitalmars.D/17157])
+
| idgen.d || Make id.h and id.c for defining built-in Identifier instances. Compile and run this before compiling the rest of the source. ([http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=17157 NG:digitalmars.D/17157])
 
|-
 
|-
| impcvngen.c || Make impcnvtab.c for the implicit conversion table. Compile and run this before compiling the rest of the source.  
+
| impcvngen.d || Make impcnvtab.c for the implicit conversion table. Compile and run this before compiling the rest of the source.  
 
|-
 
|-
| imphint.c || Import hint, e.g. prompting to import ''std.stdio'' when using ''writeln''.
+
| imphint.d || Import hint, e.g. prompting to import ''std.stdio'' when using ''writeln''.
 
|-
 
|-
| import.c || Import.
+
| dimport.d || Import.
 
|-
 
|-
| inifile.c || Read .ini file
+
| inifile.d || Read .ini file
 
|-
 
|-
| init.c || [http://digitalmars.com/d/2.0/declaration.html#Initializer Initializers] (e.g. the ''3'' in ''int x = 3'').
+
| init.d || [http://digitalmars.com/d/2.0/declaration.html#Initializer Initializers] (e.g. the ''3'' in ''int x = 3'').
 
|-
 
|-
| inline.c || Compute the cost and perform inlining.
+
| inline.d || Compute the cost and perform inlining.
 
|-
 
|-
| interpret.c || All the code which evaluates CTFE
+
| interpret.d || All the code which evaluates CTFE
 
|-
 
|-
| irstate.c || Intermediate Representation state; requires backend support
+
| irstate.d || Intermediate Representation state; requires backend support
 
|-
 
|-
| json.c || Generate JSON output
+
| json.d || Generate JSON output
 
|-
 
|-
| lexer.c || Lexically analyzes the source (such as separate keywords from identifiers)
+
| lexer.d || Lexically analyzes the source (such as separate keywords from identifiers)
 
|-
 
|-
| libelf.c || ELF object format functions
+
| libelf.d || ELF object format functions
 
|-
 
|-
| libmach.c || Mach-O object format functions
+
| libmach.d || Mach-O object format functions
 
|-
 
|-
| libomf.c || OMF object format functions
+
| libomf.d || OMF object format functions
 
|-
 
|-
| link.c || Call the linker
+
| link.d || Call the linker
 
|-
 
|-
| macro.c || Expand DDoc macros
+
| dmacro.d || Expand DDoc macros
 
|-
 
|-
| mangle.c || Mangle D types and declarations
+
| mangle.d || Mangle D types and declarations
 
|-
 
|-
| mars.c || Analyzes the command line arguments (also display command-line help)
+
| mars.d || Analyzes the command line arguments (also display command-line help)
 
|-
 
|-
| module.c || Read modules.
+
| dmodule.d || ead modules.
 
|-
 
|-
| msc.c || ?
+
| msc.d || ?
 
|-
 
|-
| mtype.c || All D types.
+
| mtype.d || All D types.
 
|-
 
|-
| opover.c || Apply operator overloading
+
| opover.d || Apply operator overloading
 
|-
 
|-
| optimize.c || Optimize the AST
+
| optimize.d || Optimize the AST
 
|-
 
|-
| parse.c || Parse tokens into AST
+
| parse.d || Parse tokens into AST
 
|-
 
|-
| ph.c || Custom allocator to replace malloc/free
+
| ph.d || Custom allocator to replace malloc/free
 
|-
 
|-
| root/aav.c || Associative array
+
| root/aav.d || Associative array
 
|-
 
|-
| root/array.c || Dynamic array
+
| root/array.d || Dynamic array
 
|-
 
|-
| root/async.c || Asynchronous input
+
| root/async.d || Asynchronous input
 
|-
 
|-
| root/dchar.c || Convert UTF-32 character to UTF-8 sequence
+
| root/dchar.d || Convert UTF-32 character to UTF-8 sequence
 
|-
 
|-
| root/gnuc.c || Implements functions missing from GCC, specifically ''stricmp'' and ''memicmp''.
+
| root/gnuc.d || Implements functions missing from GCC, specifically ''stricmp'' and ''memicmp''.
 
|-
 
|-
| root/lstring.c || Length-prefixed UTF-32 string.
+
| root/lstring.d || Length-prefixed UTF-32 string.
 
|-
 
|-
| root/man.c || Start the internet browser.
+
| root/man.d || Start the internet browser.
 
|-
 
|-
| root/port.c || Portable wrapper around compiler/system specific things. The idea is to minimize #ifdef's in the app code.
+
| root/port.d || Portable wrapper around compiler/system specific things. The idea is to minimize #ifdef's in the app code.
 
|-
 
|-
| root/response.c || Read the [http://digitalmars.com/d/2.0/dmd-windows.html#switches response file].
+
| root/response.d || ead the [http://digitalmars.com/d/2.0/dmd-windows.html#switches response file].
 
|-
 
|-
| root/rmem.c || Implementation of the storage allocator uses the standard C allocation package.
+
| root/rmem.d || Implementation of the storage allocator uses the standard C allocation package.
 
|-
 
|-
| root/root.c || Basic functions (deal mostly with strings, files, and bits)
+
| root/root.d || Basic functions (deal mostly with strings, files, and bits)
 
|-
 
|-
| root/speller.c || Spellchecker
+
| root/speller.d || Spellchecker
 
|-
 
|-
| root/stringtable.c || String table
+
| root/stringtable.d || String table
 
|-
 
|-
| s2ir.c || Statement to Intermediate Representation; requires backend support
+
| s2ir.d || Statement to Intermediate Representation; requires backend support
 
|-
 
|-
| scope.c || Scope
+
| dscope.d || Scope
 
|-
 
|-
| statement.c || Handles '''while''', '''do''', '''for''', '''foreach''', '''if''', '''pragma''', '''staticassert''', '''switch''', '''case''', '''default''' , '''break''', '''return''', '''continue''', '''synchronized''', '''try'''/'''catch'''/'''finally''', '''throw''', '''volatile''', '''goto''', and '''label'''
+
| statement.d || Handles '''while''', '''do''', '''for''', '''foreach''', '''if''', '''pragma''', '''staticassert''', '''switch''', '''case''', '''default''' , '''break''', '''return''', '''continue''', '''synchronized''', '''try'''/'''catch'''/'''finally''', '''throw''', '''volatile''', '''goto''', and '''label'''
 
|-
 
|-
| staticassert.c || '''static assert'''.
+
| staticassert.d || '''static assert'''.
 
|-
 
|-
| struct.c || Aggregate ('''struct''' and '''union''') declaration.
+
| dstruct.d || Aggregate ('''struct''' and '''union''') declaration.
 
|-
 
|-
| template.c || Everything related to '''template'''.
+
| dtemplate.d || Everything related to '''template'''.
 
|-
 
|-
 
| tk/ || ?
 
| tk/ || ?
 
|-
 
|-
| tocsym.c || To C symbol
+
| tocsym.d || o C symbol
 
|-
 
|-
| toctype.c || Convert D type to C type for debug symbol
+
| toctype.d || Convert D type to C type for debug symbol
 
|-
 
|-
| tocvdebug.c || [http://stackoverflow.com/questions/1418660/microsofts-codeview-format-specs CodeView4 debug format].
+
| tocvdebug.d || [http://stackoverflow.com/questions/1418660/microsofts-codeview-format-specs CodeView4 debug format].
 
|-
 
|-
| todt.c || ?; requires backend support
+
| todt.d || ?; requires backend support
 
|-
 
|-
| toelfdebug.c || Emit symbolic debug info in Dwarf2 format. Currently empty.
+
| toelfdebug.d || Emit symbolic debug info in Dwarf2 format. Currently empty.
 
|-
 
|-
| toir.c || To Intermediate Representation; requires backend support
+
| toir.d || o Intermediate Representation; requires backend support
 
|-
 
|-
| toobj.c || Generate the object file for Dsymbol and declarations except functions.
+
| toobj.d || Generate the object file for Dsymbol and declarations except functions.
 
|-
 
|-
| traits.c || '''__traits'''.
+
| traits.d || '''__traits'''.
 
|-
 
|-
| typinf.c || Get TypeInfo from a type.
+
| typinf.d || Get TypeInfo from a type.
 
|-
 
|-
| unialpha.c || Check if a character is a Unicode alphabet.
+
| unialpha.d || Check if a character is a Unicode alphabet.
 
|-
 
|-
| unittests.c || Run functions related to unit test.
+
| unittests.d || un functions related to unit test.
 
|-
 
|-
| utf.c || UTF-8.
+
| utf.d || UTF-8.
 
|-
 
|-
| version.c || Handles '''version'''
+
| version.d || Handles '''version'''
 
|}
 
|}
  
'''Back end'''
+
=== Back end ===
 
{| class="wikitable" |
 
{| class="wikitable" |
 
! File || Function
 
! File || Function
Line 189: Line 451:
 
|}
 
|}
  
==A few observations==
+
=== A few observations ===
 
* idgen.c is not part of the compiler source at all.  It is the source to a code generator which creates id.h and id.c, which defines a whole lot of Identifier instances.  (presumably, these are used to represent various 'builtin' symbols that the language defines)
 
* idgen.c is not part of the compiler source at all.  It is the source to a code generator which creates id.h and id.c, which defines a whole lot of Identifier instances.  (presumably, these are used to represent various 'builtin' symbols that the language defines)
 
* impcvngen.c follows the same pattern as idgen.c.  It creates impcnvtab.c, which appears to describe casting rules between primitive types.
 
* impcvngen.c follows the same pattern as idgen.c.  It creates impcnvtab.c, which appears to describe casting rules between primitive types.
Line 196: Line 458:
 
* lots of files with .c suffixes contain C++ code. Very confusing.
 
* lots of files with .c suffixes contain C++ code. Very confusing.
  
==Abbreviations==
+
== Abbreviations ==
 +
 
 +
You may find these abbreviations throughout the DMD source code (in identifiers and comments).
 +
 
 +
==== Front-end ====
 
; STC
 
; STC
 
: STorage Class
 
: STorage Class
Line 204: Line 470:
 
: Intermediate Representation
 
: Intermediate Representation
  
=== Abbreviations (Back end) ===
+
==== Back-end ====
 +
; AE
 +
: Available Expressions
 +
; CP
 +
: Copy Propagation info.
 +
; CSE
 +
: Common Subexpression Elimination
 
; VBE
 
; VBE
 
: Very Busy Expression  (http://web.cs.wpi.edu/~kal/PLT/PLT9.6.html)
 
: Very Busy Expression  (http://web.cs.wpi.edu/~kal/PLT/PLT9.6.html)
; CP
+
 
: Copy Propagation info (?)
+
See also: [[Commonly-Used Acronyms]]
; AE
 
: Arithmetic Expression?
 
  
 
==Class hierarchy==
 
==Class hierarchy==
* RootObject (The root object of all AST classes. Similar to D's Object.)
+
 
** Dsymbol (A "D symbol". Serves as an abstract base for anything which is declared, such as classes/structs and variable declarations. Most (all?) objects which inherit from Dsymbol wind up getting written to the ouput object file.)
+
Have a look at [[Media:dmd_ast.png]].
 +
 
 +
* RootObject
 +
** Dsymbol
 
*** AliasThis
 
*** AliasThis
*** AttribDeclaration (Base class for things like access modifiers, pragma, debug (which is in turn the base class of version).)
+
*** AttribDeclaration
 
**** StorageClassDeclaration
 
**** StorageClassDeclaration
 
***** DeprecatedDeclaration
 
***** DeprecatedDeclaration
Line 228: Line 501:
 
**** CompileDeclaration
 
**** CompileDeclaration
 
**** UserAttributeDeclaration
 
**** UserAttributeDeclaration
*** Declaration (Base class for pretty much all declarations.)
+
*** Declaration
 
**** TupleDeclaration
 
**** TupleDeclaration
**** TypedefDeclaration
 
 
**** AliasDeclaration
 
**** AliasDeclaration
 
**** VarDeclaration
 
**** VarDeclaration
***** ClassInfoDeclaration
 
 
***** TypeInfoDeclaration
 
***** TypeInfoDeclaration
****** TypeInfoStructDeclaration, TypeInfoClassDeclaration, TypeInfoInterfaceDeclaration, TypeInfoTypedefDeclaration, TypeInfoPointerDeclaration, TypeInfoArrayDeclaration, TypeInfoStaticArrayDeclaration, TypeInfoAssociativeArrayDeclaration, TypeInfoEnumDeclaration, TypeInfoFunctionDeclaration, TypeInfoDelegateDeclaration, TypeInfoTupleDeclaration, TypeInfoConstDeclaration, TypeInfoInvariantDeclaration, TypeInfoSharedDeclaration, TypeInfoWildDeclaration, TypeInfoVectorDeclaration
+
****** TypeInfoStructDeclaration
 +
****** TypeInfoClassDeclaration
 +
****** TypeInfoInterfaceDeclaration
 +
****** TypeInfoPointerDeclaration
 +
****** TypeInfoArrayDeclaration
 +
****** TypeInfoStaticArrayDeclaration
 +
****** TypeInfoAssociativeArrayDeclaration
 +
****** TypeInfoEnumDeclaration
 +
****** TypeInfoFunctionDeclaration
 +
****** TypeInfoDelegateDeclaration
 +
****** TypeInfoTupleDeclaration
 +
****** TypeInfoConstDeclaration
 +
****** TypeInfoInvariantDeclaration
 +
****** TypeInfoSharedDeclaration
 +
****** TypeInfoWildDeclaration
 +
****** TypeInfoVectorDeclaration
 
***** ThisDeclaration
 
***** ThisDeclaration
 
**** SymbolDeclaration
 
**** SymbolDeclaration
Line 252: Line 538:
 
***** NewDeclaration
 
***** NewDeclaration
 
***** DeleteDeclaration
 
***** DeleteDeclaration
*** ScopeDsymbol (A symbol which creates a scope for its children.  Base class of with blocks, enum declarations, and templates.)
+
*** ScopeDsymbol
 
**** AggregateDeclaration
 
**** AggregateDeclaration
 
***** StructDeclaration
 
***** StructDeclaration
Line 273: Line 559:
 
*** DebugSymbol
 
*** DebugSymbol
 
*** VersionSymbol
 
*** VersionSymbol
 
+
** Expression
** Expression (Nodes for operations, assignments, and the like derive Expression. All expressions have an interpret method which does CTFE.)
 
 
*** ClassReferenceExp
 
*** ClassReferenceExp
 
*** VoidInitExp
 
*** VoidInitExp
Line 284: Line 569:
 
*** IdentifierExp
 
*** IdentifierExp
 
**** DollarExp
 
**** DollarExp
*** DsymbolExp (An expression that points to a Dsymbol.)
+
*** DsymbolExp
 
*** ThisExp
 
*** ThisExp
 
**** SuperExp
 
**** SuperExp
Line 298: Line 583:
 
*** NewExp
 
*** NewExp
 
*** NewAnonClassExp
 
*** NewAnonClassExp
*** SymbolExp (Points to a Declaration.)
+
*** SymbolExp
**** SymOffExp (Offset from symbol.)
+
**** SymOffExp
**** VarExp (A variable referenced in an expression.)
+
**** VarExp
 
*** OverExp
 
*** OverExp
 
*** FuncExp
 
*** FuncExp
Line 308: Line 593:
 
*** HaltExp
 
*** HaltExp
 
*** IsExp
 
*** IsExp
*** UnaExp (All unary expressions - expressions which wrap one subexpression.)
+
*** UnaExp
 
**** CompileExp
 
**** CompileExp
**** FileExp
 
 
**** AssertExp
 
**** AssertExp
 
**** DotIdExp
 
**** DotIdExp
Line 325: Line 609:
 
**** ComExp
 
**** ComExp
 
**** NotExp
 
**** NotExp
**** BoolExp
 
 
**** DeleteExp
 
**** DeleteExp
 
**** CastExp
 
**** CastExp
Line 333: Line 616:
 
**** ArrayExp
 
**** ArrayExp
 
**** PreExp
 
**** PreExp
*** BinExp (All binary expressions - expressions which have two subexpressions.)
+
*** BinExp
 
**** BinAssignExp
 
**** BinAssignExp
***** AddAssignExp, MinAssignExp, MulAssignExp, DivAssignExp, ModAssignExp, AndAssignExp, OrAssignExp, XorAssignExp, PowAssignExp, ShlAssignExp, ShrAssignExp, UshrAssignExp, CatAssignExp
+
***** AddAssignExp
 +
***** MinAssignExp
 +
***** MulAssignExp
 +
***** DivAssignExp
 +
***** ModAssignExp
 +
***** AndAssignExp
 +
***** OrAssignExp
 +
***** XorAssignExp
 +
***** PowAssignExp
 +
***** ShlAssignExp
 +
***** ShrAssignExp
 +
***** UshrAssignExp
 +
***** CatAssignExp
 
**** DotExp
 
**** DotExp
 
**** CommaExp
 
**** CommaExp
Line 342: Line 637:
 
**** AssignExp
 
**** AssignExp
 
***** ConstructExp
 
***** ConstructExp
**** AddExp, MinExp, CatExp, MulExp, DivExp, ModExp, PowExp, ShlExp, ShrExp, UshrExp, AndExp, OrExp, XorExp, OrOrExp, AndAndExp
+
**** AddExp
 +
**** MinExp
 +
**** CatExp
 +
**** MulExp
 +
**** DivExp
 +
**** ModExp
 +
**** PowExp
 +
**** ShlExp
 +
**** ShrExp
 +
**** UshrExp
 +
**** AndExp
 +
**** OrExp
 +
**** XorExp
 +
**** OrOrExp
 +
**** AndAndExp
 
**** CmpExp
 
**** CmpExp
 
**** InExp
 
**** InExp
Line 355: Line 664:
 
**** FuncInitExp
 
**** FuncInitExp
 
**** PrettyFuncInitExp
 
**** PrettyFuncInitExp
 
 
** Identifier
 
** Identifier
 
 
** Initializer
 
** Initializer
 
*** VoidInitializer
 
*** VoidInitializer
Line 364: Line 671:
 
*** ArrayInitializer
 
*** ArrayInitializer
 
*** ExpInitializer
 
*** ExpInitializer
 
 
** Type
 
** Type
 
*** TypeError
 
*** TypeError
Line 386: Line 692:
 
*** TypeStruct
 
*** TypeStruct
 
*** TypeEnum
 
*** TypeEnum
*** TypeTypedef
 
 
*** TypeClass
 
*** TypeClass
 
*** TypeTuple
 
*** TypeTuple
 
*** TypeNull
 
*** TypeNull
 
 
** Parameter
 
** Parameter
 
+
** Statement
** Statement (Base class for top-level function statements.  Among these is ExpStatement, which is a statement which wraps an expression. For example, a function call or an assignment is an Expression / ExpStatement.)
 
 
*** ErrorStatement
 
*** ErrorStatement
 
*** PeelStatement
 
*** PeelStatement
Line 433: Line 736:
 
*** AsmStatement
 
*** AsmStatement
 
*** ImportStatement
 
*** ImportStatement
 
 
** Catch
 
** Catch
 
** Tuple
 
** Tuple
 
** DsymbolTable
 
** DsymbolTable
 
+
** Condition
* TemplateParameter
+
*** DVCondition
** TemplateTypeParameter
+
**** DebugCondition
*** TemplateThisParameter
+
**** VersionCondition
** TemplateValueParameter
+
*** StaticIfCondition
** TemplateAliasParameter
 
** TemplateTupleParameter
 
 
 
 
* Visitor
 
* Visitor
 
** StoppableVisitor
 
** StoppableVisitor
 
* Condition
 
** DVCondition
 
*** DebugCondition
 
*** VersionCondition
 
** StaticIfCondition
 
 
 
* Lexer
 
* Lexer
 
** Parser
 
** Parser
 
 
* Library
 
* Library
  
==How to make the thing compile==
+
== DMD Hacking Tips & Tricks ==
There are a number of types that are stored in various nodes that are never actually used in the front end.  They are merely stored and passed around as pointers.
 
  
* Symbol - Appears to have something to do with the names used by the linker.  Appears to be used by Dsymbol and its subclasses.
+
=== Use printf-style debugging without too much visual noise ===
* dt_t - "Data to be added to the data segment of the output object file" ''source: todt.c''
 
* elem - A node in the internal representation.
 
  
The code generator is split among the various AST nodes.  Certain methods of almost every AST node are part of the code generator.
+
There are many commented-out '''printf''' statements in the DMD front-end. You can uncomment them
 +
during debugging, but often you may only want to enable them for a specific symbol. One simple
 +
workaround is to enable printing when the name of the symbol matches the symbol you're debugging,
 +
for example:
  
(it's an interesting solution to the problem. It would have never occurred to a Java programmer)
+
<syntaxhighlight lang="d">
 +
void semantic(Scope* sc)
 +
{
 +
    // only do printouts if this is our target symbol
 +
    if (!strcmp(toChars, "test_struct"));
 +
        printf("this=%p, %s '%s', sizeok = %d\n", this, parent.toChars, toChars, sizeok);
 +
}
 +
</syntaxhighlight>
  
Most notably:
+
=== Find which module instantiated a specific template instance ===
* all Statement subclasses must define a toIR method
 
* All Expression subclasses must define a toElem method
 
* Initializers and certain Expression subclasses must define toDt
 
* Declarations must define toObjFile
 
* Dsymbol subclasses must define toSymbol
 
  
===Other things===
+
Templates have a '''instantiatingModule''' field which you can inspect. Here's an example from '''glue.d''':
Floating point libraries seem to be atrociously incompatible between compilers. Replacing strtold with strtod may be necessary, for instance. (this does "break" the compiler, however: it will lose precision on literals of type 'real')
 
-- AndyFriesen
 
  
===Intermediate Representation===
+
<syntaxhighlight lang="d">
 +
/* Skip generating code if this part of a TemplateInstance that is instantiated
 +
* only by non-root modules (i.e. modules not listed on the command line).
 +
*/
 +
TemplateInstance* ti = inTemplateInstance();
 +
if (!global.params.useUnitTests &&
 +
    ti && ti.instantiatingModule && !ti.instantiatingModule.root)
 +
{
 +
    //printf("instantiated by %s  %s\n", ti.instantiatingModule.toChars(), ti.toChars());
 +
    return;
 +
}
 +
</syntaxhighlight>
  
'''From [http://www.digitalmars.com/webnews/newsgroups.php?art_group=D.gnu&article_id=762 NG:D.gnu/762]'''
+
=== View lowerings, emitted templates instances and string/mixins  ===
  
I've been looking at trying to hook the DMD frontend up to LLVM (www.llvm.org), but I've been having some trouble.  The LLVM IR (Intermediate Representation) is very well documented, but I'm having a rough time figuring out how DMD holds its IR. Since at least three people (David, Ben, and Walter) seem to have understand, I thought I'd ask for guidance.
+
The <tt>-vcg-ast</tt> switch will generate a .cg-file next to the compiled code source-code
 +
Which will contain the source representation of the AST just before code-generation
 +
(That means the last stage of compilation)
 +
This helps to debug templates as well spot issues with the inliner
 +
NOTE: for this to work the code has to reach the codegen stage i.e is has to compile without error
  
What's the best way to traverse the DMD IR once I've run the three semantic phases?  As far as I can tell  it's all held in the SymbolTable as a bunch of Symbols. Is there a good way to traverse that and reconstruct it into another IR?
+
<syntaxhighlight lang="d">
 +
void main() {
 +
  foreach (i; 0 .. 10) {
 +
      mixin("auto a = i;");
 +
  }
 +
}
 +
</syntaxhighlight>
  
----
+
<syntaxhighlight lang="bash">
 +
dmd -vcg-ast test.d
 +
</syntaxhighlight>
  
'''From [http://www.digitalmars.com/webnews/newsgroups.php?art_group=D.gnu&article_id=764 NG:D.gnu/764]'''
+
Which will generated a file <tt>test.d.cg</tt>:
  
There isn't a generic visitor interface.  Instead, there are several methods with are responsible for emiting code/data and then calling that method for child objects.  Start by implementing Module::genobjfile and loop over the 'members' array, calling each Dsymbol object's toObjFile method.  From there, you will need to implement these methods:
+
<syntaxhighlight lang="d">
 +
import object;
 +
void main()
 +
{
 +
{
 +
int __key2 = 0;
 +
int __limit3 = 10;
 +
for (; __key2 < __limit3; __key2 += 1)
 +
{
 +
int i = __key2;
 +
int a = i;
 +
}
 +
}
 +
return 0;
 +
}
 +
</syntaxhighlight>
  
Dsymbol (and descendents) ::toObjFile -- Emits code and data for objects that have generally have a symbol name and storage in memory. Containers like ClassDeclaration also have a 'members' array with child Dsymbols.  Most of these are descendents of the Declaration class.
+
=== Determine if a DMD 'Type' is an actual type, expression, or a symbol ===
  
Statement (and descendents) ::toIR -- Emits instructions.  Usually, you just call toObjFile, toIR, toElem, etc. on the statement's fields and string  the results together in the IR.
+
You can use the '''resolve''' virtual function to determine this:
  
Expression (and descendents) ::toElem -- Returns a back end representation of numeric constants, variable references, and operations that expression trees are composed of. This was very simple for GCC because the back end already had the code to convert expression trees to ordered instructions. If LLVM doesn't do this, I think you could generate the instructions here since LLVM has SSA.
+
<syntaxhighlight lang="d">
 +
RootObject* o = ...;
 +
Type* srcType = isType(o);
  
Type (and descendents) ::toCtype -- Returns the back end representation of the type. Note that a lot of classes don't override this -- you just need to do a switch on the 'ty' field in Type::toCtype.
+
if (srcType)
 +
{
 +
    Type* t;
 +
    Expression* e;
 +
    Dsymbol* s;
 +
    srcType.resolve(loc, sc, &e, &t, &s);
  
Dsymbol (and descendents) ::toSymbol -- returns the back end reference to the object. For example, FuncDeclaration::toSymbol could return a llvm::Function. These are already implemented in tocsym.c, but you will probably rewrite them to create LLVM objects.
+
    if (t) { } // it's a type
 +
    else if (e) { }  // it's an expression
 +
    else if (s) { }  // it's a symbol
 +
}
 +
</syntaxhighlight>
  
----
+
You can see examples of this technique being used in the '''traits.d''' file.
  
(Thread:  http://digitalmars.com/d/archives/D/gnu/762.html)
+
=== Get the string representation of a DSymbol ===
  
===The Back End ===
+
A '''DSymbol''' has the two functions '''toChars()''' and '''toPrettyChars()''',
 +
which are useful for debugging. The former prints out the name of the symbol,
 +
while the latter may print out the fully-scoped name of the symbol. For example:
  
DMD's internal representation uses expression trees with 'elem' nodes (defined in el.h). The "Rosetta Stone" for understanding the backend is enum OPER in oper.h. This lists all the types of nodes which can be in an expression tree.
+
<syntaxhighlight lang="d">
 +
StructDeclaration* sd = ...;  // assuming struct named "Bar" inside module named "Foo"
 +
printf("name: %s\n", sd.toChars());  // prints out "Bar"
 +
printf("fully qualified name: %s\n", sd.toPrettyChars());  // prints out "Foo.Bar"
 +
</syntaxhighlight>
  
If you compile dmd with debug on, and compile with:
+
=== Get the string representation of the kind of a DSymbol ===
  
  -O --c
+
All '''DSymbol'''-inherited classes implement the '''kind''' virtual method,
 +
which enable you to use printf-style debugging, e.g.:
  
you'll get reports of the various optimizations done.
+
<syntaxhighlight lang="d">
 +
EnumDeclaration* ed = ...;
 +
DSymbol* s = ed;
 +
printf("%s\n", s.kind());  // prints "enum". See 'EnumDeclaration.kind'.
 +
</syntaxhighlight>
  
Other useful undocumented flags:
+
=== Get the string representation of the dynamic type of an ast-node ===
 +
There is a helper module called '''asttypename''' which allows you to print the dynamic type of an ast-node
 +
This is useful when implementing your own visitors.
  
--b  show block optimisation
+
<syntaxhighlight lang="d">
--f  full output
+
import dmd.asttypename;
--r  show register allocation
+
Expression ex = new AndAndExp(...);
  --x  suppress predefined C++ stuff
+
writeln(ex.astTypeName); // prints "AndAndExp".
--y  show output to Intermediate Language (IL) buffer
+
</syntaxhighlight>
  
Others which are present in the back-end but not exposed as DMD flags are:
+
=== Get the string representation of an operator or token ===
debuge show exception handling info
 
debugs show common subexpression eliminator
 
 
  
The most important entry point from the front-end to the backend is writefunc() in out.c, which optimises a function, and then generates code for it.
+
'''Expression''' objects hold an '''op''' field, which is a '''TOK''' type (a token).
 +
To print out the string representation of the token, index into the static array '''Token.toChars''':
  
* writefunc() sets up the parameters, then calls codgen() to generate the code inside the function.
+
<syntaxhighlight lang="d">
* it generates code for each block. Then puts vars in registers.
+
Expression* e = ...;
* generates function start code, does pinhole optimisation. (cod3.pinholeopt()).
+
printf("Expression op: %s ", Token.toChars(e.op));
* does jump optimisation
+
</syntaxhighlight>
* emit the generated code in codout().
 
* writes switch tables
 
* writes exception tables (nteh_gentables() or except_gentables()
 
 
 
In cgcod.c, blcodgen() generates code for a basic block. Deals with the way the block ends (return, switch,
 
if, etc).
 
  
cod1.gencodelem() does the codegen inside the block. It just calls codelem().
+
=== Print the value of a floating-point literal ===
  
cgcod.codelem() generates code for an elem. This distributes code generation depending on elem type.
+
To print the value of an expression which is a floating-point literal (a value known at compile-time),
 +
use the '''toReal()''' member function:
  
Most x86 integer code generation happens in cod1,cod2, cod3, cod4, and cod5.c
+
<syntaxhighlight lang="d">
Floating-point code generation happens in cg87. Compared to the integer code generation, the x87 code generator is extremely simple. Most importantly, it cannot cope with common subexpressions. This is the primary reason why it is less efficient than compilers from many other vendors.
+
if (exp.op == TOKfloat32 || exp.op == TOKfloat64 || exp.op == TOKfloat80)
 +
    printf("%Lf", exp.toReal());
 +
</syntaxhighlight>
  
==== Optimiser ====
+
=== Check whether an expression is a compile-time known literal ===
The main optimiser is in go.c, optfunc().
 
This calls:
 
* blockopt.c blockopt(iter) -- branch optimisation on basic blocks, iter = 0 or 1.
 
* gother.c constprop() -- constant propagation
 
* gother.c copyprop() -- copy propagation
 
* gother.c rmdeadass() -- remove dead assignments
 
* gother.c verybusyexp() -- very busy expressions
 
* gother.c deadvar() -- eliminate dead variables
 
  
* gloop.c loopopt() -- remove loop invariants and induction vars. Do loop rotation
+
Use the '''isConst()''' method to check if an '''Expression''' is a compile-time known literal.
 +
The name '''isConst()''' is a misnomer, but this name predates D2 and was more relevant to D1.
 +
Please note that '''isConst()''' is also a method of '''Type''', but is unrelated to the
 +
equally named function in the '''Expression''' class.
  
* gdag.c  boolopt() -- optimize booleans.
+
=== Note the difference between mixin declarations and mixin statements ===
* gdag.c builddags() -- common subexpressions
 
  
* el.c el_convert() -- Put float and string literals into the data segment
+
Take this example D code:
* el.c el_combine() -- merges two expressions (uses a comma-expression to join them).
 
* glocal.c localize() -- improve expression locality
 
  
 +
<syntaxhighlight lang="D">
 +
mixin("int x;");
  
* cod3.c pinholeopt() -- Performs peephole optimisation. Doesn't do much, could do a lot more.
+
void main()
 +
{
 +
    mixin("int y;");
 +
}
 +
</syntaxhighlight>
  
==== Code generation ====
+
The first mixin is a '''MixinDeclaration''', while the second is a '''MixinStatement'''.
The code generation for each function is done individually. Each function is placed into its own COMDAT segment in the obj file.
+
These are separate classes in the DMD front-end.
The function is divided into blocks, which are linear sections of code ending with a jump or other control instruction (http://en.wikipedia.org/wiki/Basic_block).
 
  
==== Scheduler (cgsched.c) ====
 
  
Pentium only
+
[[Category:DMD Compiler]]

Latest revision as of 17:54, 7 April 2024

Overview

Major components

All D compilers are divided into two parts: the front-end and the back-end.

The front-end (DMD-FE) implements all things D-specific: lexing and parsing D syntax, instantiating templates, producing error messages, etc. The same front-end code is used by DMD, GDC and LDC.

The back-end is what emits machine code. It contains code generation, optimization, object file writing, etc. The back-end is specific to each D compiler: DMD uses a D-specific Boost-licensed (as of April 2017) back-end, LDC uses LLVM, and GDC uses GCC for their respective back-end processing.

There is also a glue layer, which is the interface between the front-end and back-end. This component is custom for each D compiler.

Compilation cycle

D source code goes through the following stages when compiled:

  • First, the file is loaded into memory as-is, and converted to UTF-8 when necessary.
  • The lexer transforms the file into an array of tokens. There is no structure yet at this point - just a flat list of tokens. (lexer.c)
  • The parser then builds a simple AST out of the token stream. (parser.c)
  • The AST is then semantically processed. This is done in three stages (called semantic, semantic2 and semantic3). This is done in a loop in mars.c. Each pass transforms the AST to be closer to the final representation: types are resolved, templates are instantiated, etc.
1. The "semantic" phase will analyze the full signature of all declarations. For example:
  • members of aggregate type
  • function parameter types and return type
  • variable types
  • evaluation of pragma(msg)
2. The "semantic2" phase will analyze some additional part of the declarations, For example:
  • initializer of variable declarations
  • evaluation of static assert condition
3. The "semantic3" phase will analyze the body of function declarations.
If a function is declared in the module which is not directly compiled (== not listed in the command line), semantic3 pass won't analyze its body.
4. During each phases, some declarations will partially invoke the subsequent phases due to resolve forward reference, For example:
immutable string x = "hello";
static if (x == "hello") { ... }
// The static if condition will invoke semantic2 of the variable 'x'
auto foo() { ... }
typeof(&foo) fp;
// "semantic" phase of the variable 'fp' will run "semantic3" of 'foo'
// to demand the full signature of the function (== infer the return type)
string foo() { ... }
mixin(foo());
// For CTFE, the mixin declaration will invoke the semantic3 of 'foo'
  • Finally, the AST is handed over to the glue layer, which feeds it into the back-end, which in turn produces machine code and object files.

Runtime interoperability

Non-trivial operations (e.g. memory allocation, array operations) are implemented in the D runtime. The compiler integrates with the runtime using a number of so-called hook functions (which by convention have the _d_ name prefix).

A list can be found here: Runtime_Hooks

Details

Note: This section may be considerably outdated. Please bring it up to date where you can.

There are a number of types that are stored in various nodes that are never actually used in the front end. They are merely stored and passed around as pointers.

  • Symbol - Appears to have something to do with the names used by the linker. Appears to be used by Dsymbol and its subclasses.
  • dt_t - "Data to be added to the data segment of the output object file" source: todt.c
  • elem - A node in the internal representation.

The code generator is split among the various AST nodes. Certain methods of almost every AST node are part of the code generator.

(it's an interesting solution to the problem. It would have never occurred to a Java programmer)

Most notably:

  • all Statement subclasses must define a toIR method
  • All Expression subclasses must define a toElem method
  • Initializers and certain Expression subclasses must define toDt
  • Declarations must define toObjFile
  • Dsymbol subclasses must define toSymbol

Other things

Floating point libraries seem to be atrociously incompatible between compilers. Replacing strtold with strtod may be necessary, for instance. (this does "break" the compiler, however: it will lose precision on literals of type 'real') -- AndyFriesen

Intermediate Representation

From NG:D.gnu/762

I've been looking at trying to hook the DMD frontend up to LLVM (www.llvm.org), but I've been having some trouble. The LLVM IR (Intermediate Representation) is very well documented, but I'm having a rough time figuring out how DMD holds its IR. Since at least three people (David, Ben, and Walter) seem to have understand, I thought I'd ask for guidance.

What's the best way to traverse the DMD IR once I've run the three semantic phases? As far as I can tell it's all held in the SymbolTable as a bunch of Symbols. Is there a good way to traverse that and reconstruct it into another IR?


From NG:D.gnu/764

There isn't a generic visitor interface. Instead, there are several methods with are responsible for emiting code/data and then calling that method for child objects. Start by implementing Module::genobjfile and loop over the 'members' array, calling each Dsymbol object's toObjFile method. From there, you will need to implement these methods:

Dsymbol (and descendents) ::toObjFile -- Emits code and data for objects that have generally have a symbol name and storage in memory. Containers like ClassDeclaration also have a 'members' array with child Dsymbols. Most of these are descendents of the Declaration class.

Statement (and descendents) ::toIR -- Emits instructions. Usually, you just call toObjFile, toIR, toElem, etc. on the statement's fields and string the results together in the IR.

Expression (and descendents) ::toElem -- Returns a back end representation of numeric constants, variable references, and operations that expression trees are composed of. This was very simple for GCC because the back end already had the code to convert expression trees to ordered instructions. If LLVM doesn't do this, I think you could generate the instructions here since LLVM has SSA.

Type (and descendents) ::toCtype -- Returns the back end representation of the type. Note that a lot of classes don't override this -- you just need to do a switch on the 'ty' field in Type::toCtype.

Dsymbol (and descendents) ::toSymbol -- returns the back end reference to the object. For example, FuncDeclaration::toSymbol could return a llvm::Function. These are already implemented in tocsym.c, but you will probably rewrite them to create LLVM objects.


(Thread: http://digitalmars.com/d/archives/D/gnu/762.html)

Inliner

DMD's inliner is part of the frontend, existing entirely in the file inline.c.

This inliner is conceptually quite simple: It traverses the AST looking for function calls. Each function found is analysed for cost by adding up the number of expression nodes in its body. Anything non-inlinable counts as "maximum cost". If the total cost is below the maximum, the function call is inlined.

In DMD's AST, certain statements cannot currently be represented as expressions (such as non-unrolled loops and throwing). Because of this, the inliner makes a distinction between two main types of inlining:

  • Converting a function call to an inline expression: This must be used whenever the function's return value is actually used. Ex: "x = foo();" or "2 + foo()".
  • Converting a function call to an inline statement: Used when a function's return value is ignored, or when calling a void function.

Those two scenarios are inlined by mostly separate codepaths. Cost analysis is mostly the same codepath, but "inlinable as a statement" and "inlinable as an expression" are separate decisions (again, due to certain statements not being representable as expressions).

The inliner is divided into four main parts:

  • Main entry point: inlineScan (which utilizes class InlineScanVisitor and function expandInline)
  • Cost analysis (to determine inlinability): canInline and class InlineCostVisitor
  • Inlining a function call as a statement: inlineAsStatement and its embedded class InlineAsStatement
  • Inlining a function call as an expression: doInline and its embedded class InlineStatement
Inliner: Main Entry Point

The whole inliner is driven by the inlineScan function and InlineScanVisitor class, but the bulk of the real work is performed by expandInline (described in this section) and the other three main parts of the inliner (described in the following sections).

The global function inlineScan is the inliner's main entry point. This uses class InlineScanVisitor to traverse the AST looking for function calls to inline, and inlining them as they're found. Whenever InlineScanVisitor finds an inlinable function call (determined by the cost analyzer), it calls expandInline to start the inlining process.

InlineScanVisitor also decides whether to inline as a statement or an expression based on the type of AST node found:

  • ExpStatement: Implies the function either has no return value, or the return value is unused. Therefore, inline as a statement (if permitted by cost analysis).
  • CallExp: Implies the function returns a value which is used. Therefore, inline as an expression (if permitted by cost analysis).

Called by InlineScanVisitor, expandInline drives the actual inlining for both "as statement" and "as expression" cases. It converts the function call scaffolding, parameters and return value (if any) into the appropriate inline statement or expression. To inline the function's body, expandInline hands over to either inlineAsStatement (if inlining the call as a statement) or doInline (if inlining the call as an expression).

Inliner: Cost Analysis

The function canInline, unsurprisingly, determines if a function can be inlined. To decide this, it uses class InlineCostVisitor, which traverses the AST calculating a sum of all costs involved.

InlineCostVisitor is a Visitor class which works just like any other AST visitor class in DMD, or any other usage of the visitor pattern: It contains a visit function for each AST node type supported by the inliner. Each visit function traverses its children nodes (if any) by calling the child node's accept function, passing the visitor class itself as an argument. Then the node's accept automatically calls its corresponding visit function.

Any type of node not supported by InlineCostVisitor automatically calls a default function InlineCostVisitor::visit(Statement *s), which flags the function being analyzed as non-inlinable.

The actual cost variable is slightly complicated since it's really two packed values:

The low 12-bits of cost are the actual accumulated cost. A value of 1 is added for every inlinable expression node in the function's body (ex: "a+b*c" has a cost of 2: One for the multiplication and one for the addition). Anything that can't be inlined, or that the cost analyzer knows nothing about, adds a cost of COST_MAX. If this total cost, in the low 12-bits, is at least COST_MAX (determined by the helper function tooCostly), the function is considered non-inlinable.

The upper bits of cost (bits 13 and up) are separate from the actual cost and keep track of whether the function can be inlined as an expression. Whenever a statement is found which can be inlined only as a statement (and cannot be converted to an expression), this is flagged by adding STATEMENT_COST to cost.

Note: It looks as if at one point in time there had been a limit (or perhaps plans to eventually limit) the number of statements allowed in inlined functions, just as there's currently a limit to the number of expression nodes. But this does not currently appear to be enforced, so STATEMENT_COST is essentially used as a "this can only be inlined as a statement" flag.

Sometimes expressions are evaluated for cost by simply visiting the the expression node, via the node's accept function. Other times, the helper function InlineCostVisitor::expressionInlineCost is used instead. The benefit of expressionInlineCost is it automatically halts analysis of an expression as soon as it reaches COST_MAX.

The canInline function caches its results in two members of FuncDeclaration: In ILS inlineStatusStmt (for inlinability as a statement) and ILS inlineStatusExp (for inlinability as an expression). ILS is an enum defined in declaration.h supporting three states: ILSuninitialized (not yet cached), ILSno (not inlinable) and ILSyes (inlinable).

Inliner: Inlining as a Statement

Any functions DMD is capable of inlining, can be inlined as a statement. As explained above, this is performed whenever a function call ignores the return value, or has no return value. In this case, the function's body is inlined via inlineAsStatement. Internally, inlineAsStatement works using its embedded visitor class InlineAsStatement.

To paraphrase a comment in inline.c, this inlining is done by converting to a statement, copying the trees of the function to be inlined, and renaming the variables. Most of this is fairly straightforward: Much like the cost analyzer's InlineCostVisitor class, the InlineAsStatement class has a visit function for each supported type of statement and expression. Each of these visitors copies the node, makes any adjustments if necessary, and then visits all subnodes by calling their accept functions.

There's also a default catch-all function which asserts, indicating the cost analyzer failed to disallow something which has no corresponding visitor in InlineAsStatement.

Inliner: Inlining as an Expression

Some, but not all, inlinable functions can be inlined as an expression. This must be done whenever a function call uses the return value (Ex: "x = foo();" or "2 + foo()"). In this case, inlining the function's body as an expression works very much like inlining it as a statement (see the section above), but with a separate code path and a few differences:

  • The function body is inlined by doInline instead of inlineAsStatement.
  • There are two overloads of doInline: One to inline expressions as expressions, and one to convert statements to inline expressions.
  • As discussed in the sections above, not all statements can be converted to expressions. Because of this, these statements' corresponding visit functions are omitted from doInline, since the cost analyzer should have already prevented the inliner from attempting to inline any offending functions.
Inliner: How to Add More Support

If a particular statement is unsupported by the inliner (thereby preventing any function using it from being inlined), support can be added like this:

  • Add an overload of InlineCostVisitor::visit for the type of AST node you wish to support. Following the example of the other visit functions:
  • Increase cost however is appropriate.
  • Add STATEMENT_COST to cost if the statement cannot be converted to an expression (ex: ForStatement and ThrowStatement). This allows you to omit a corresponding overload of doInline's InlineStatement::visit.
  • Add COST_MAX to cost for any situations that are not inlinable.
  • Call accept on all subnodes. If the subnode is an expression, it may be better to use expressionInlineCost instead since this will automatically halt analysis as soon as the expression's cost reaches the maximum.
  • In inlineAsStatement, add an overload of InlineAsStatement::visit for the appropriate AST node type. Following the example of the other visit overloads: Copy the node, make any adjustments if necessary, and traverse to all subnodes.
  • If the statement can be converted to an expression (ex: IfStatement), then inside the Statement overload of doInline, add an overload of InlineStatement::visit for the appropriate AST node type. Following the other examples, convert the node to an expression, make any adjustments if necessary, and traverse to all subnodes.

The Back End

DMD's internal representation uses expression trees with 'elem' nodes (defined in el.h). The "Rosetta Stone" for understanding the backend is enum OPER in oper.h. This lists all the types of nodes which can be in an expression tree.

If you compile dmd with debug on, and compile with:

 -O --c

you'll get reports of the various optimizations done.

Other useful undocumented flags:

--b  show block optimisation
--f  full output
--r  show register allocation
--x  suppress predefined C++ stuff
--y  show output to Intermediate Language (IL) buffer

Others which are present in the back-end but not exposed as DMD flags are:

debuge show exception handling info
debugs show common subexpression eliminator

The most important entry point from the front-end to the backend is writefunc() in out.c, which optimises a function, and then generates code for it.

  • writefunc() sets up the parameters, then calls codgen() to generate the code inside the function.
  • it generates code for each block. Then puts vars in registers.
  • generates function start code, does pinhole optimisation. (cod3.pinholeopt()).
  • does jump optimisation
  • emit the generated code in codout().
  • writes switch tables
  • writes exception tables (nteh_gentables() or except_gentables()

In cgcod.c, blcodgen() generates code for a basic block. Deals with the way the block ends (return, switch, if, etc).

cod1.gencodelem() does the codegen inside the block. It just calls codelem().

cgcod.codelem() generates code for an elem. This distributes code generation depending on elem type.

Most x86 integer code generation happens in cod1,cod2, cod3, cod4, and cod5.c Floating-point code generation happens in cg87. Compared to the integer code generation, the x87 code generator is extremely simple. Most importantly, it cannot cope with common subexpressions. This is the primary reason why it is less efficient than compilers from many other vendors.

Optimiser

The main optimiser is in go.c, optfunc(). This calls:

  • blockopt.c blockopt(iter) -- branch optimisation on basic blocks, iter = 0 or 1.
  • gother.c constprop() -- constant propagation
  • gother.c copyprop() -- copy propagation
  • gother.c rmdeadass() -- remove dead assignments
  • gother.c verybusyexp() -- very busy expressions
  • gother.c deadvar() -- eliminate dead variables
  • gloop.c loopopt() -- remove loop invariants and induction vars. Do loop rotation
  • gdag.c boolopt() -- optimize booleans.
  • gdag.c builddags() -- common subexpressions
  • el.c el_convert() -- Put float and string literals into the data segment
  • el.c el_combine() -- merges two expressions (uses a comma-expression to join them).
  • glocal.c localize() -- improve expression locality


  • cod3.c pinholeopt() -- Performs peephole optimisation. Doesn't do much, could do a lot more.
Code generation

The code generation for each function is done individually. Each function is placed into its own COMDAT segment in the obj file. The function is divided into blocks, which are linear sections of code ending with a jump or other control instruction (http://en.wikipedia.org/wiki/Basic_block).

Scheduler (cgsched.c)

Pentium only

Source files

Note: This section may be considerably outdated. If it's wrong, please correct it. If it's not here, please add it.

Front end

See the official docs.

File Function
access.d Access check (private, public, package ...)
aliasthis.d Implements the alias this D symbol.
argtypes.d Convert types for argument passing (e.g. char are passed as ubyte).
arrayop.d Array operations (e.g. a[] = b[] + c[]).
attrib.d Attributes i.e. storage class (const, @safe ...), linkage (extern(C) ...), protection (private ...), alignment (align(1) ...), anonymous aggregate, pragma, static if and mixin.
bit.d Generate bit-level read/write code. Requires backend support.
builtin.d Identify and evaluate built-in functions (e.g. std.math.sin)
dcast.d Implicit cast, implicit conversion, and explicit cast (cast(T)), combining type in binary expression, integer promotion, and value range propagation.
dclass.d Class declaration
clone.d Define the implicit opEquals, opAssign, post blit and destructor for struct if needed, and also define the copy constructor for struct.
cond.d Evaluate compile-time conditionals, i.e. debug, version, and static if.
constfold.d Constant folding
cppmangle.d Mangle D types according to Intel's Itanium C++ ABI.
declaration.d Miscellaneous declarations, including typedef, alias, variable declarations including the implicit this declaration, type tuples, ClassInfo, ModuleInfo and various TypeInfos.
delegatize.d Convert an expression expr to a delegate { return expr; } (e.g. in lazy parameter).
doc.d Ddoc documentation generator (NG:digitalmars.D.announce/1558)
dsymbol.d D symbols (i.e. variables, functions, modules, ... anything that has a name).
dump.d Defines the Expression::dump method to print the content of the expression to console. Mainly for debugging.
e2ir.d Expression to Intermediate Representation; requires backend support
eh.d Generate exception handling tables
entity.d Defines the named entities to support the "\&Entity;" escape sequence.
denum.d Enum declaration
expression.h Defines the bulk of the classes which represent the AST at the expression level.
func.d Function declaration, also includes function/delegate literals, function alias, (static/shared) constructor/destructor/post-blit, invariant, unittest and allocator/deallocator.
glue.d Generate the object file for function declarations and critical sections; convert between backend types and frontend types
hdrgen.d Generate headers (*.di files)
iasm.d Inline assembler
identifier.d Identifier (just the name).
idgen.d Make id.h and id.c for defining built-in Identifier instances. Compile and run this before compiling the rest of the source. (NG:digitalmars.D/17157)
impcvngen.d Make impcnvtab.c for the implicit conversion table. Compile and run this before compiling the rest of the source.
imphint.d Import hint, e.g. prompting to import std.stdio when using writeln.
dimport.d Import.
inifile.d Read .ini file
init.d Initializers (e.g. the 3 in int x = 3).
inline.d Compute the cost and perform inlining.
interpret.d All the code which evaluates CTFE
irstate.d Intermediate Representation state; requires backend support
json.d Generate JSON output
lexer.d Lexically analyzes the source (such as separate keywords from identifiers)
libelf.d ELF object format functions
libmach.d Mach-O object format functions
libomf.d OMF object format functions
link.d Call the linker
dmacro.d Expand DDoc macros
mangle.d Mangle D types and declarations
mars.d Analyzes the command line arguments (also display command-line help)
dmodule.d ead modules.
msc.d ?
mtype.d All D types.
opover.d Apply operator overloading
optimize.d Optimize the AST
parse.d Parse tokens into AST
ph.d Custom allocator to replace malloc/free
root/aav.d Associative array
root/array.d Dynamic array
root/async.d Asynchronous input
root/dchar.d Convert UTF-32 character to UTF-8 sequence
root/gnuc.d Implements functions missing from GCC, specifically stricmp and memicmp.
root/lstring.d Length-prefixed UTF-32 string.
root/man.d Start the internet browser.
root/port.d Portable wrapper around compiler/system specific things. The idea is to minimize #ifdef's in the app code.
root/response.d ead the response file.
root/rmem.d Implementation of the storage allocator uses the standard C allocation package.
root/root.d Basic functions (deal mostly with strings, files, and bits)
root/speller.d Spellchecker
root/stringtable.d String table
s2ir.d Statement to Intermediate Representation; requires backend support
dscope.d Scope
statement.d Handles while, do, for, foreach, if, pragma, staticassert, switch, case, default , break, return, continue, synchronized, try/catch/finally, throw, volatile, goto, and label
staticassert.d static assert.
dstruct.d Aggregate (struct and union) declaration.
dtemplate.d Everything related to template.
tk/ ?
tocsym.d o C symbol
toctype.d Convert D type to C type for debug symbol
tocvdebug.d CodeView4 debug format.
todt.d ?; requires backend support
toelfdebug.d Emit symbolic debug info in Dwarf2 format. Currently empty.
toir.d o Intermediate Representation; requires backend support
toobj.d Generate the object file for Dsymbol and declarations except functions.
traits.d __traits.
typinf.d Get TypeInfo from a type.
unialpha.d Check if a character is a Unicode alphabet.
unittests.d un functions related to unit test.
utf.d UTF-8.
version.d Handles version

Back end

File Function
html.c Extracts D source code from .html files

A few observations

  • idgen.c is not part of the compiler source at all. It is the source to a code generator which creates id.h and id.c, which defines a whole lot of Identifier instances. (presumably, these are used to represent various 'builtin' symbols that the language defines)
  • impcvngen.c follows the same pattern as idgen.c. It creates impcnvtab.c, which appears to describe casting rules between primitive types.
  • Unspurprisingly, the code is highly D-like in methodology. For instance, root.h defines an Object class which serves as a base class for most, if not all of the other classes used. Class instances are always passed by pointer and allocated on the heap.
  • root.h also defines String, Array, and File classes, as opposed to using STL. Curious. (a relic from the days when templates weren't as reliable as they are now?)
  • lots of files with .c suffixes contain C++ code. Very confusing.

Abbreviations

You may find these abbreviations throughout the DMD source code (in identifiers and comments).

Front-end

STC
STorage Class
ILS
InLine State
IR
Intermediate Representation

Back-end

AE
Available Expressions
CP
Copy Propagation info.
CSE
Common Subexpression Elimination
VBE
Very Busy Expression (http://web.cs.wpi.edu/~kal/PLT/PLT9.6.html)

See also: Commonly-Used Acronyms

Class hierarchy

Have a look at Media:dmd_ast.png.

  • RootObject
    • Dsymbol
      • AliasThis
      • AttribDeclaration
        • StorageClassDeclaration
          • DeprecatedDeclaration
        • LinkDeclaration
        • ProtDeclaration
        • AlignDeclaration
        • AnonDeclaration
        • PragmaDeclaration
        • ConditionalDeclaration
          • StaticIfDeclaration
        • CompileDeclaration
        • UserAttributeDeclaration
      • Declaration
        • TupleDeclaration
        • AliasDeclaration
        • VarDeclaration
          • TypeInfoDeclaration
            • TypeInfoStructDeclaration
            • TypeInfoClassDeclaration
            • TypeInfoInterfaceDeclaration
            • TypeInfoPointerDeclaration
            • TypeInfoArrayDeclaration
            • TypeInfoStaticArrayDeclaration
            • TypeInfoAssociativeArrayDeclaration
            • TypeInfoEnumDeclaration
            • TypeInfoFunctionDeclaration
            • TypeInfoDelegateDeclaration
            • TypeInfoTupleDeclaration
            • TypeInfoConstDeclaration
            • TypeInfoInvariantDeclaration
            • TypeInfoSharedDeclaration
            • TypeInfoWildDeclaration
            • TypeInfoVectorDeclaration
          • ThisDeclaration
        • SymbolDeclaration
        • FuncDeclaration
          • FuncAliasDeclaration
          • FuncLiteralDeclaration
          • CtorDeclaration
          • PostBlitDeclaration
          • DtorDeclaration
          • StaticCtorDeclaration
            • SharedStaticCtorDeclaration
          • StaticDtorDeclaration
            • SharedStaticDtorDeclaration
          • InvariantDeclaration
          • UnitTestDeclaration
          • NewDeclaration
          • DeleteDeclaration
      • ScopeDsymbol
        • AggregateDeclaration
          • StructDeclaration
            • UnionDeclaration
          • ClassDeclaration
            • InterfaceDeclaration
        • WithScopeSymbol
        • ArrayScopeSymbol
        • EnumDeclaration
        • Package
          • Module
        • TemplateDeclaration
        • TemplateInstance
          • TemplateMixin
      • OverloadSet
      • EnumMember
      • Import
      • LabelDsymbol
      • StaticAssert
      • DebugSymbol
      • VersionSymbol
    • Expression
      • ClassReferenceExp
      • VoidInitExp
      • ThrownExceptionExp
      • IntegerExp
      • ErrorExp
      • RealExp
      • ComplexExp
      • IdentifierExp
        • DollarExp
      • DsymbolExp
      • ThisExp
        • SuperExp
      • NullExp
      • StringExp
      • TupleExp
      • ArrayLiteralExp
      • AssocArrayLiteralExp
      • StructLiteralExp
      • TypeExp
      • ScopeExp
      • TemplateExp
      • NewExp
      • NewAnonClassExp
      • SymbolExp
        • SymOffExp
        • VarExp
      • OverExp
      • FuncExp
      • DeclarationExp
      • TypeidExp
      • TraitsExp
      • HaltExp
      • IsExp
      • UnaExp
        • CompileExp
        • AssertExp
        • DotIdExp
        • DotTemplateExp
        • DotVarExp
        • DotTemplateInstanceExp
        • DelegateExp
        • DotTypeExp
        • CallExp
        • AddrExp
        • PtrExp
        • NegExp
        • UAddExp
        • ComExp
        • NotExp
        • DeleteExp
        • CastExp
        • VectorExp
        • SliceExp
        • ArrayLengthExp
        • ArrayExp
        • PreExp
      • BinExp
        • BinAssignExp
          • AddAssignExp
          • MinAssignExp
          • MulAssignExp
          • DivAssignExp
          • ModAssignExp
          • AndAssignExp
          • OrAssignExp
          • XorAssignExp
          • PowAssignExp
          • ShlAssignExp
          • ShrAssignExp
          • UshrAssignExp
          • CatAssignExp
        • DotExp
        • CommaExp
        • IndexExp
        • PostExp
        • AssignExp
          • ConstructExp
        • AddExp
        • MinExp
        • CatExp
        • MulExp
        • DivExp
        • ModExp
        • PowExp
        • ShlExp
        • ShrExp
        • UshrExp
        • AndExp
        • OrExp
        • XorExp
        • OrOrExp
        • AndAndExp
        • CmpExp
        • InExp
        • RemoveExp
        • EqualExp
        • IdentityExp
        • CondExp
      • DefaultInitExp
        • FileInitExp
        • LineInitExp
        • ModuleInitExp
        • FuncInitExp
        • PrettyFuncInitExp
    • Identifier
    • Initializer
      • VoidInitializer
      • ErrorInitializer
      • StructInitializer
      • ArrayInitializer
      • ExpInitializer
    • Type
      • TypeError
      • TypeNext
        • TypeArray
          • TypeSArray
          • TypeDArray
          • TypeAArray
        • TypePointer
        • TypeReference
        • TypeFunction
        • TypeDelegate
        • TypeSlice
      • TypeBasic
      • TypeVector
      • TypeQualified
        • TypeIdentifier
        • TypeInstance
        • TypeTypeof
        • TypeReturn
      • TypeStruct
      • TypeEnum
      • TypeClass
      • TypeTuple
      • TypeNull
    • Parameter
    • Statement
      • ErrorStatement
      • PeelStatement
      • ExpStatement
        • DtorExpStatement
      • CompileStatement
      • CompoundStatement
        • CompoundDeclarationStatement
      • UnrolledLoopStatement
      • ScopeStatement
      • WhileStatement
      • DoStatement
      • ForStatement
      • ForeachStatement
      • ForeachRangeStatement
      • IfStatement
      • ConditionalStatement
      • PragmaStatement
      • StaticAssertStatement
      • SwitchStatement
      • CaseStatement
      • CaseRangeStatement
      • DefaultStatement
      • GotoDefaultStatement
      • GotoCaseStatement
      • SwitchErrorStatement
      • ReturnStatement
      • BreakStatement
      • ContinueStatement
      • SynchronizedStatement
      • WithStatement
      • TryCatchStatement
      • TryFinallyStatement
      • OnScopeStatement
      • ThrowStatement
      • DebugStatement
      • GotoStatement
      • LabelStatement
      • AsmStatement
      • ImportStatement
    • Catch
    • Tuple
    • DsymbolTable
    • Condition
      • DVCondition
        • DebugCondition
        • VersionCondition
      • StaticIfCondition
  • Visitor
    • StoppableVisitor
  • Lexer
    • Parser
  • Library

DMD Hacking Tips & Tricks

Use printf-style debugging without too much visual noise

There are many commented-out printf statements in the DMD front-end. You can uncomment them during debugging, but often you may only want to enable them for a specific symbol. One simple workaround is to enable printing when the name of the symbol matches the symbol you're debugging, for example:

void semantic(Scope* sc)
{
    // only do printouts if this is our target symbol
    if (!strcmp(toChars, "test_struct"));
        printf("this=%p, %s '%s', sizeok = %d\n", this, parent.toChars, toChars, sizeok);
}

Find which module instantiated a specific template instance

Templates have a instantiatingModule field which you can inspect. Here's an example from glue.d:

/* Skip generating code if this part of a TemplateInstance that is instantiated
 * only by non-root modules (i.e. modules not listed on the command line).
 */
TemplateInstance* ti = inTemplateInstance();
if (!global.params.useUnitTests &&
    ti && ti.instantiatingModule && !ti.instantiatingModule.root)
{
    //printf("instantiated by %s   %s\n", ti.instantiatingModule.toChars(), ti.toChars());
    return;
}

View lowerings, emitted templates instances and string/mixins

The -vcg-ast switch will generate a .cg-file next to the compiled code source-code Which will contain the source representation of the AST just before code-generation (That means the last stage of compilation) This helps to debug templates as well spot issues with the inliner NOTE: for this to work the code has to reach the codegen stage i.e is has to compile without error

void main() {
   foreach (i; 0 .. 10) {
      mixin("auto a = i;");
   }
}
dmd -vcg-ast test.d

Which will generated a file test.d.cg:

import object;
void main()
{
	{
		int __key2 = 0;
		int __limit3 = 10;
		for (; __key2 < __limit3; __key2 += 1)
		{
			int i = __key2;
			int a = i;
		}
	}
	return 0;
}

Determine if a DMD 'Type' is an actual type, expression, or a symbol

You can use the resolve virtual function to determine this:

RootObject* o = ...;
Type* srcType = isType(o);

if (srcType)
{
    Type* t;
    Expression* e;
    Dsymbol* s;
    srcType.resolve(loc, sc, &e, &t, &s);

    if (t) { }  // it's a type
    else if (e) { }  // it's an expression
    else if (s) { }  // it's a symbol
}

You can see examples of this technique being used in the traits.d file.

Get the string representation of a DSymbol

A DSymbol has the two functions toChars() and toPrettyChars(), which are useful for debugging. The former prints out the name of the symbol, while the latter may print out the fully-scoped name of the symbol. For example:

StructDeclaration* sd = ...;  // assuming struct named "Bar" inside module named "Foo"
printf("name: %s\n", sd.toChars());  // prints out "Bar"
printf("fully qualified name: %s\n", sd.toPrettyChars());  // prints out "Foo.Bar"

Get the string representation of the kind of a DSymbol

All DSymbol-inherited classes implement the kind virtual method, which enable you to use printf-style debugging, e.g.:

EnumDeclaration* ed = ...;
DSymbol* s = ed;
printf("%s\n", s.kind());  // prints "enum". See 'EnumDeclaration.kind'.

Get the string representation of the dynamic type of an ast-node

There is a helper module called asttypename which allows you to print the dynamic type of an ast-node This is useful when implementing your own visitors.

import dmd.asttypename;
Expression ex = new AndAndExp(...);
writeln(ex.astTypeName);  // prints "AndAndExp".

Get the string representation of an operator or token

Expression objects hold an op field, which is a TOK type (a token). To print out the string representation of the token, index into the static array Token.toChars:

Expression* e = ...;
printf("Expression op: %s ", Token.toChars(e.op));

Print the value of a floating-point literal

To print the value of an expression which is a floating-point literal (a value known at compile-time), use the toReal() member function:

if (exp.op == TOKfloat32 || exp.op == TOKfloat64 || exp.op == TOKfloat80)
    printf("%Lf", exp.toReal());

Check whether an expression is a compile-time known literal

Use the isConst() method to check if an Expression is a compile-time known literal. The name isConst() is a misnomer, but this name predates D2 and was more relevant to D1. Please note that isConst() is also a method of Type, but is unrelated to the equally named function in the Expression class.

Note the difference between mixin declarations and mixin statements

Take this example D code:

mixin("int x;");

void main()
{
    mixin("int y;");
}

The first mixin is a MixinDeclaration, while the second is a MixinStatement. These are separate classes in the DMD front-end.