Timing Code

From D Wiki
Revision as of 09:33, 24 February 2014 by Verax (talk | contribs) (Finished migration)
Jump to: navigation, search

At some time in the development of an application, it becomes necessary to optimize it. And to see if the optimization is working, it is necessary to time it. Using a stopwatch is crude, inaccurate, and is a little silly since computers have accurate clocks. Let's use that built in clock. The first way is to time the execution of the entire program. This is easilly accomplished with a little shell program that reads the current time, executes the program supplied as an argument, reads the time again and subtracts to get the elapsed time:

import std.stdio;
import std.c.stdlib;
import std.date;
import std.process;

int main(char[][] args)
{	d_time starttime;
	int status;

	if (args.length < 2)
	{	writefln("Time execution of a command.\nUse:\n\ttimer command args...");
		std.c.stdlib.exit(1);
	}
	starttime = getUTCtime();
	status = spawnvp(0, args[1], args[1 .. length]);
	starttime = getUTCtime() - starttime;
	if (status == -1)
	{	writefln("'%s' failed to execute", args[1]);
		std.c.stdlib.exit(1);
	}
	writefln("Elapsed time = %d.%03d seconds",
		cast(int) (starttime / TicksPerSecond),
		cast(int) (starttime % TicksPerSecond));
	if (status != 0)
		writefln("--- errorlevel %d",status);
	return status;
}

This, of course, includes everything in the measurement - program load time, module construction times, cleanup time, other tasks that happen to be running at the same moment, etc. To get a usable result try to turn off any other running programs that will be consuming appreciable time, and run the timing program multiple times until it converges on a result. Timing a section of code is as simple as snipping out the three lines of code:

	d_time starttime = getUTCtime();
	{ ... code to be profiled ... }
	d_time elapsedtime = getUTCtime() - starttime;

but that can get tedious. A more advanced method takes advantage of D's auto destruction at the end of a scoped block, otherwise known as RAII (Resource Acquisition Is Initialization):

import std.stdio;
import std.date;

class Timer
{
    static d_time getCount()
    {
	return getUTCtime();
    }

    d_time starttime;

    this() { starttime = getCount(); }
    ~this()
    {
	writefln("elapsed time = %s", getCount() - starttime);
    }
}

void main()
{
    scope t = new Timer();

    for (int i = 0; i < 10000; i++)
    {
    }
}

Here, only one line needs to be inserted into a scope in order to time the code within that scope. When it goes out of scope, the elapsed time is computed within the destructor and the results printed. While this works, after using it a while you'll run smack into another problem. The getUTCtime() returns the time in milliseconds, which is far too coarse for timing smaller snippets of code. Even worse, it calls the operating system to get the time, so the execution of an unknown and perhaps large amount of code gets added to the total.

The solution is the RDTSC (Read Time-Stamp Counter) instruction introduced on Pentium and later CPUs. It returns a cycle count in EDX:EAX, and is perfect for profiling usages. All we need to do is fix getCount with a little inline assembler magic:

import std.stdio;

class Timer
{
    static long getCount()
    {
	asm
	{	naked	;
		rdtsc	;
		ret	;
	}
    }

    long starttime;

    this() { starttime = getCount(); }
    ~this()
    {
	writefln("elapsed time = %s", getCount() - starttime);
    }
}

void main()
{
    scope t = new Timer();

    for (int i = 0; i < 10000; i++)
    {
    }
}

The long types get returned from a function in EDX:EAX, so this works out well. Now we have a very fine grained timer with a very low, and consistent, overhead. More advanced timers along these lines can be found in std.perf.

These timers will all just profile selected sections of code. To profile the operation of a whole program, track how functions call each other, find out where the bottlenecks are, and time the various interactions you'll need a proper profiler. Fortunately, DMD has a profiler at the flip of the -gt command line switch.