GDC/Hacking

From D Wiki
Revision as of 23:06, 4 February 2014 by Verax (talk | contribs) (update links)
Jump to: navigation, search

The GDC Hackers Guide

This page is meant as a resource for all of us that wants to help Walter develop the D language by developing a modified DMD frontend that can make use of GCC's middle and back ends. In order for us to do this, we must learn how to understand and edit the GDC/GCC sources.

The paperwork is complete and efforts are currently underway to merge GDC into the official GCC codebase, which represents a great step forward for D. However, more than ever, GDC is in need of contributors to keep it up to date with both the D frontend and the trunk development of the GCC backend.

The primary development repository for GDC can be found at https://github.com/D-Programming-GDC/GDC. Development of GDC is generally discussed in the D.gnu newsgroup at http://forum.dlang.org/group/D.gnu, and on the Freenode IRC network in #d.gdc (irc://chat.freenode.net/%23d.gdc).

Why GDC

There are many advantages to adding a D frontend to GCC, and most of them stem from the fact that the GCC codebase has been the focus of extensive development over several decades, and that the GCC middle and back ends are designed such that multiple languages can easily take advantage of them. This means that a D frontend that can make use of the GCC middle and backend code will gain many advantages that would require years of development to match in the DMD codebase.

Firstly, GCC targets many more platforms than DMD. Currently tested platforms include:

  • x86 Linux - Working
  • x86_64 Linux - Working
  • x86 Windows - Mostly Working
  • x86_64 Windows - Mostly Working
  • ARM Linux - Mostly Working
  • OSX - ?

Secondly, GCC has a very well-developed optimization framework that can generally generate more performant code than DMD, particularly when taking advantage of more recent CPU features such as SIMD instructions (both directly and through automatic vectorization).

Thirdly, GDB is primarily developed to debug code generated by GCC, so debugging code generated by GCC will generally result in a better user experience.


GCC Versions

Supported GCC versions: https://bitbucket.org/goshawk/gdc/wiki/Home#!supported-gcc-versions

Binary Releases

GDC binaries are generally not distributed for Linux, as the details of building and installing it are quite distribution-specific. However, due to the difficulty of building GDC on Windows, Daniel Green has been building and posting binaries at Template:Https://bitbucket.org/goshawk/gdc/downloads.

Quicklinks

  • GDC ("gdc development reloaded")
  • GCC (the GNU Compiler Collection)
  • GDB (the GNU Debugger)

Possibly out of date:

GDC call for contributors

GDC is currently developed by a very small group (as the commit history on BitBucket shows). While the addition of a D frontend to GCC represents a great step forward for D, it also represents an informal promise by the D community to keep the D frontend up to date with the latest GCC development.

Thanks to Bitbucket and Mercurial, contributing to GDC is as easy as forking the repository and submitting a pull request (although this workflow will likely change when the merge is officially complete), and filing a bug report is a simple web form.

If you use GDC, we encourage you to try to contribute, whether by submitting pull requests or bug reports. In the past, GDC has nearly died due to poor communication and lack of development. Avoiding those issues is easier than ever before, but GDC will always need a community that's willing to give back.

Installing GDC (in the simplest way possible)

https://bitbucket.org/goshawk/gdc/wiki/Home#!installation


GCC Structure

Here we gather some texts which can help out in order to understand GCC/GDC. GCC is very complex, and unless we acquire good documentation many will surely give up very soon (if anyone knows of some good books, add them too).

I will give a short overview of the structure of GCC (for the newbies). GCC is a compiler for many languages and many targets, so it is divided into pieces.

  • frontend - Turn the source code into an internal representation - GENERIC).
  • middleend - Convert the GENERIC to GIMPLE and perform optimizations.
  • backend - Turn GIMPLE into target-specific ASM instructions.

What we know as "GDC" is only an implementation of the frontend part of GCC. The middleend uses callbacks to interface with the frontend. GDC is located within its own subfolder in the "core" GCC source tree - (srcdir)/gcc/d/. It is within this subfolder that we must perform all changes to the language. GCC has other frontends such c (C), cp (C++), java (Java), objc (<n>Objective-C</n>), Fortran, Ada. One can look at these for advice, but one probably shouldn't... (one exception: the "c++" package is currently also required to build GDC, since the bundled recls library uses it)

Note that GDC is currently not an official language for GCC, but a "third party" addition. As such, it is similar to GPC (GNU Pascal Compiler), see http://www.gnu-pascal.de/

Work is underway to merge GDC into the official GCC codebase, at which point this will no longer be the case.

The frontend contains the lexer and parser - these together turn the source file into GENERIC. The GDC frontend relies heavily on the DMD sources to perform this work, and you will find the entire DMD sources in a subfolder.

Sadly, GCC is in a very poor state as far as code readability is concerned. Complex macros and source code generators litter the middle and backends. The source is well commented, but that really doesn't help... Well, I'll let you find out that by yourselves :)

The documentation (that I have read) is very hard to understand, so if anyone have any good resources, or tips, write them here. Happy hacking!

GDC Structure

DMD Front End

File Function
aav.c Associative array
access.c Access check (private<n>,</n> public<n>,</n> package ...)
aliasthis.c Implements the [DigitalMars:d/2.0/class.html#AliasThis alias this] D symbol.
argtypes.c Convert types for argument passing (e.g. char are passed as ubyte).
array.c Dynamic array
arrayop.c [DigitalMars:d/2.0/arrays.html#array-operations Array operations] (e.g. a[] = b[] + c[]).
async.c Asynchronous input
attrib.c [DigitalMars:d/2.0/attribute.html Attributes] i.e. storage class (const<n>,</n> <n>@</n>safe ...)<n>,</n> linkage (extern(C) ...)<n>,</n> protection (private ...)<n>,</n> alignment (align(1) ...)<n>,</n> anonymous aggregate<n>,</n> pragma<n>,</n> static if and mixin.
builtin.c Identify and evaluate built-in functions (e.g. std.math.sin)
cast.c Implicit cast<n>,</n> implicit conversion<n>,</n> and explicit cast (cast(T))<n>,</n> combining type in binary expression<n>,</n> integer promotion<n>,</n> and value range propagation.
class.c Class declaration
clone.c Define the implicit opEquals<n>,</n> opAssign<n>,</n> post blit and destructor for struct if needed<n>,</n> and also define the copy constructor for struct.
cond.c Evaluate compile-time conditionals<n>,</n> i.e. debug<n>,</n> version<n>,</n> and static if.
constfold.c Constant folding
cppmangle.c Mangle D types according to Intel's Italium C++ ABI.
dchar.c Convert UTF-32 character to UTF-8 sequence
declaration.c Miscellaneous declarations<n>,</n> including typedef<n>,</n> alias<n>,</n> variable declarations including the implicit this declaration<n>, type tuples, ClassInfo, ModuleInfo and various TypeInfos</n>.
delegatize.c Convert an expression expr to a delegate { return expr; } (e.g. in lazy parameter).
doc.c [DigitalMars:d/ddoc.html Ddoc] documentation generator (NG:digitalmars.D.announce/1558)
dsymbol.c <n>D symbols (i.e. variables, functions, modules, ... anything that has a name).</n>
dump.c Defines the <n>Expression::dump</n> method to print the content of the expression to console. Mainly for debugging.
entity.c Defines the named entities to support the "\&Entity;" escape sequence.
enum.c Enum declaration
expression.c Defines the bulk of the classes which represent the AST at the expression level.
func.c Function declaration<n>,</n> also includes function/delegate literals<n>,</n> function alias<n>,</n> (static/shared) constructor/destructor/post-blit<n>,</n> invariant<n>,</n> unittest and [DigitalMars:d/2.0/class.html#allocators allocator/deallocator].
gnuc.c Implements functions missing from GCC<n>,</n> specifically stricmp and memicmp.
hdrgen.c Generate headers (*.di files)
identifier.c Identifier (just the name).
idgen.c Make id.h and id.c for defining built-in Identifier instances. Compile and run this before compiling the rest of the source. (NG:digitalmars.D/17157)
impcvngen.c Make impcnvtab.c for the implicit conversion table. Compile and run this before compiling the rest of the source.
imphint.c Import hint<n>,</n> e.g. prompting to import std.stdio when using writeln.
import.c Import.
init.c [DigitalMars:d/2.0/declaration.html#Initializer Initializers] (e.g. the 3 in int x = 3).
inline.c Compute the cost and perform inlining.
interpret.c All the code which evaluates CTFE
json.c Generate JSON output
lexer.c Lexically analyzes the source (such as separate keywords from identifiers)
lstring.c Length-prefixed UTF-32 string.
macro.c Expand DDoc macros
mangle.c Mangle D types and declarations
mars.c Analyzes the command line arguments (also display command-line help)
module.c Read modules.
mtype.c All D types.
opover.c Apply operator overloading
optimize.c Optimize the AST
parse.c Parse tokens into AST
rmem.c Implementation of the storage allocator uses the standard C allocation package.
root.c Basic functions (deal mostly with strings<n>,</n> files<n>,</n> and bits)
scope.c Scope
speller.c Spellchecker
statement.c Handles while<n>,</n> do<n>,</n> for<n>,</n> foreach<n>,</n> if<n>,</n> pragma<n>,</n> staticassert<n>,</n> switch<n>,</n> case<n>,</n> default <n>,</n> break<n>,</n> return<n>,</n> continue<n>,</n> synchronized<n>,</n> try/catch/finally<n>,</n> throw<n>,</n> volatile<n>,</n> goto<n>,</n> and label
staticassert.c static assert.
stringtable.c String table
struct.c Aggregate (struct and union) declaration.
template.c Everything related to template.
todt.c Generate data structures to initialize static variables added to the object file.
toobj.c Generate the object file for Dsymbol and declarations except functions.
traits.c __traits.
typinf.c Get <n>TypeInfo</n> from a type.
unialpha.c Check if a character is a Unicode alphabet.
unittests.c Run functions related to unit test.
utf.c UTF-8.
version.c Handles version

GDC bindings between DMD and GCC

File Function
asmstmt.cc Builds inline assembler and extended inline assembler statements.
d-apple-gcc.c Deprecated - stub functions for any dependencies that can't be linked in from Apple-GCC objects.
d-asm-i386.h Implements D Inline assembler for x86 and x86_64.
d-bi-attrs.h Supported GCC function and type attributes.
d-builtins2.cc Handles importing of special modules (ie: gcc.builtins<n>,</n> core.vararg) in the runtime library<n>,</n> anything related to builtin intrinsics of GDC.
d-builtins.c Handles GCC backend init routines for building all common and builtin trees of GCC.
d-codegen.c Code generation utilities<n>,</n> emit instructions<n>,</n> static chain/closure creation and passing<n>,</n> expand frontend builtins.
d-convert.cc Convert between basic D types<n>,</n> and conversions to boolean value for conditions.
d-c-stubs.cc Deprecated - stub functions for any dependencies that can't be linked in from GCC objects.
d-decls.cc Based on tocsym.c - builds and returns back end reference to a declaration or object.
d-dmd-gcc.h Contains declarations used by the modified DMD front-end to interact with GCC-specific code.
d-gcc-complex_t.h Same as DMD's complex_t.<n>,</n> but use GCC's REAL_VALUE_TYPE-based real_t instead of long double.
d-gcc-includes.h Headers included from GCC.
d-gcc-real.cc Object-oriented layer for interacting with GCC's REAL_VALUE_TYPE-based real_t.
d-gcc-tree.h Declaration of tree and tree_node for files that cannot include d-gcc-includes.h
d-glue.cc Builds GCC trees for all functions<n>,</n> statements<n>,</n> and expressions. Also convert D types into GCC types.
d-gt.c For linking with the GCC garbage collector
d-incpath.c Adds import paths for frontend to scan.
d-irstate.cc Contains the core functionality of IRState class in d-codegen.cc
d-lang.cc Implementation of GCC back-end callbacks and data structures. Main entry point for the D compiler (cc1d) to compile sources.
d-objfile.cc Setup and emit global variables and functions to send to GCC backend for processing.
d-spec.c The GDC frontend driver for processing command-line options passed to the main application.
dt.cc Implements backend functions called from todt.c in the DMD frontend.
d-todt.c Implements methods removed from todt.c as require special treatment for GDC.
d-tree.def All GDC specific tree codes are defined here.
lang.opt All GDC specific command-line flags are defined here
symbol.cc Implements Symbol class for d-decls.cc.

Intermediate Representation

%% To be written here: briefly describe how GDC builds tree representations of D types, expressions, etc.

Extensions to DMD Frontend

%% To be written here: describe in more detail areas where GDC splits away from DMD frontend.