Difference between revisions of "Component programming with ranges"
(→Conclusions: improve wording) |
|||
(38 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
− | + | :''By H. S. Teoh, August 2013'' | |
− | + | Component-style programming can be used to implement complex algorithms in a way that's straightforward to write, easy to read, and amenable to code reuse. | |
− | + | ==Sources of Complexity in Code== | |
− | + | One of the more influential courses I took in college was on [http://en.wikipedia.org/wiki/Jackson_Structured_Programming Jackson Structured Programming]. It identified two sources of programming complexity (i.e., where bugs are most likely to occur): structure conflicts and writing loops. | |
− | We shall | + | Structure conflicts are mismatches between the structure of the program and the structure of the data, or mismatches between two or more data structures that must be processed simultaneously. For example, to read a text file formatted into lines, we may write a loop that reads each line in turn; however, the data in the file may be structured as a preamble, a body, and an epilogue. Since this structure is not reflected in the code, it often prompts programmers to introduce flags, state variables, and the like, in order to resolve the conflict between program structure and data structure. However, such ad hoc methods of resolving structure conflicts often increase the complexity of the code and the coupling between various parts of the code, making the code more fragile and prone to bugs. |
+ | |||
+ | Loops are a complex construct in imperative programming. One indication of how complex they can be is the fact that proving that a loop terminates at all is, in the general case, equivalent to solving the halting problem, which is unsolvable. This is especially true when the loop condition is complex. The difficulty is further compounded when nested loops are involved. Unfortunately, the ad hoc methods of resolving structure conflicts often increase the complexity of loop conditions and introduce nested loops, thus making the code doubly prone to bugs. | ||
+ | |||
+ | Furthermore, such code is rarely reusable, because each part of the code is intricately tied to other parts, and it is difficult to extract reusable components out of it. | ||
+ | |||
+ | In this article, we will describe an approach to implementing complex algorithms that minimizes code complexity by using a consistent approach to resolve structure conflicts and simplify loops so that the resulting code is straightforward to write, easy to understand, and made of highly-reusable components. | ||
+ | |||
+ | ==Component Programming== | ||
+ | |||
+ | A key point in Jackson Structured Programming (JSP) is that code structure should be in 1-to-1 correspondence with the structure of the data it is operating on. Bugs and other code problems arise when the structure of the code does not correspond with the structure of the data. | ||
+ | |||
+ | This leads us to component programming, a concept which has been employed in various incarnations. One example is the Unix shell, where input is passed through various programs called ''filters'' that perform various transformations on the input, and each respective output is connected via pipes into the input of the subsequent filter, until the final output is produced and sent to a data sink (output on the terminal, saved into a file, piped into a program that consumes the data without producing further output, etc.). The utility of this concept is proven in the expressiveness of the Unix command-line, in which many tasks can be accomplished simply by passing the input through the right sequence of filters connected by pipes. Functional programming languages also employ this concept in the form of lazily-evaluated sequences and [http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-24.html#%_sec_3.5 streams], where sequential input data is successively transformed in various ways until it attains the final, desired form. | ||
+ | |||
+ | Component programming deals with structure conflicts by recognizing that there are such conflicts in the first place. That is, it acknowledges the existence of non-corresponding structures by encapsulating ''each'' such structure into a separate component, rather than arbitrarily picking one of the structures to model to program after and then use ad hoc methods to resolve the resulting conflicts with the other structures. This separation of non-corresponding structures allows us to match the structure of the code to the structure of the data it is dealing with. Such components thus serve as data sources, the output of which can be passed through other components that successively transform the data until it reaches the desired form, thus resolving the non-correspondences between the original structures. | ||
+ | |||
+ | In the remainder of this article, we will use a case study to demonstrate how this works in practice. | ||
+ | |||
+ | ==Case Study: Formatting a Calendar== | ||
+ | |||
+ | We shall use as example the classic task of laying out a yearly calendar on the console, such that given a particular year, the program will print out a number of lines that displays the 12 months in a nice grid layout, with numbers indicating each day within the month. Something like this: | ||
<pre> | <pre> | ||
Line 41: | Line 61: | ||
</pre> | </pre> | ||
− | While intuitively straightforward, this task has many points of complexity | + | While intuitively straightforward, this task has many points of complexity. |
− | + | Although generating all dates in a year is trivial thanks to D's <code>std.datetime</code> module, the order in which they must be processed is far from obvious. Since we're writing to the console, we're limited to outputting one line at a time; we can't draw one cell of the grid and then go back up a few lines, move a few columns over, and draw the next cell in the grid. We have to somehow print the first lines of all cells in the top row, followed by the second lines, then the third lines, etc., and repeat this process for each row in the grid. Of course, we could create an internal screen buffer that we can write to in arbitrary order, and then output this buffer line-by-line at the end, but this approach is not as elegant because it requires a much bigger memory footprint than is really necessary. | |
− | |||
− | |||
− | |||
− | |||
− | + | In any case, as a result of this mismatch between the structure of the calendar and the structure of the output, the order in which the days in a month must be processed is not the natural, chronological order. We have to assemble the dates for the first weeks in each of the first 3 months, say, if we are putting 3 months per row in the grid, print those out, then assemble the dates for the second weeks in each month, print those out, etc.. Furthermore, within the rows representing each week, some days may be missing, depending on where the boundaries of adjacent months fall; these missing days must be filled out in the following month's first week before the first full week in the month is printed. It is not that simple to figure out where a week starts and ends, and how many rows are needed per month. Then if some months have more full weeks than others, they may occupy less vertical space than other months on the same row in the grid; so we need to insert blank spaces into these shorter months in order for the grid cells to line up vertically in the output. | |
− | + | With this level of complexity, writing our calendar program using the traditional ad hoc way of resolving structure conflicts will certainly result in very complex, hard-to-understand, and bug-prone code. There would not be much hope of getting any reusable pieces out of it. | |
− | == | + | ===Structuring the Program=== |
− | + | Let's consider how we may approach this problem using component programming. The first task is to identify all of the different structures that we will need to deal with: | |
+ | * Generating dates in a year | ||
+ | * Grouping dates by month | ||
+ | * Grouping dates by week | ||
+ | * Formatting the days in a week | ||
+ | * Grouping formatted weeks into months | ||
+ | * Laying out some number of months horizontally in a grid to form rows | ||
+ | * Outputting each line of each grid row | ||
+ | * Outputting all the rows | ||
− | + | ====Ranges==== | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | Notice that the first task, generating dates in a year, basically amounts to a ''data source'': we are producing a sequence of data items (dates in a year), to be consumed by subsequent stages. The concept of a data source is nicely captured by the concept of [http://www.informit.com/articles/printerfriendly.aspx?p=1407357&rll=1 ranges], which are a generalization of C++ iterators. In particular, an ''input range'' is any type that implements three primitives: | |
− | < | + | * <code>empty</code>: tells us whether the data source has any data left to be retrieved. |
− | + | * <code>front</code>: returns the current data item. | |
− | + | * <code>popFront</code>: tells the data source to produce the next data item. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | </ | ||
− | + | This is a useful abstraction, because it allows us to write algorithms that are independent of the concrete type of the data source: ''any'' concrete type that provides the above interface qualifies as an input range and can be used with any algorithm that expects an input range. | |
− | + | For our purposes, it is also useful to define a more specific kind of data source: a ''forward range''. A forward range implements all of the primitives of an input range, plus one more: | |
− | + | * <code>save</code>: returns another instance of the data source representing the current iteration state of the data source. The data items produced by the data source may be traversed twice by iterating over these two instances. | |
− | + | This is a very useful primitive to have, because it essentially allows us to iterate the same range as many times as we want. Each time, we save a copy of the range and iterate over that, which does not change the state of the original range. Some of our calendar algorithms will require this, since certain steps are impossible to implement with only one-pass ranges. | |
− | + | ====Components==== | |
− | |||
− | + | Coming back to our list of tasks, we note that the other tasks on our list essentially amount to writing filters on the data source. For example, grouping dates by month essentially entails writing a filter that takes a range of dates as input, and produces a range of ranges of dates as output, where each subrange in the output corresponds with dates within a single month. | |
− | + | Thus, each of the tasks can be separated out into their own components, which either produces a range of data, or performs some transformation on a given range to produce a new range. This separation of tasks allows us to deal with the structure conflicts in a straightforward, well-encapsulated way. This reduces the complexity of the code required and therefore decreases the likelihood of bugs. It's far easier, for example, to ensure that the code never puts more than 7 days into a week if there is only one place in the code where dates are grouped into weeks. It is also easier to write unittests for checking code correctness when the code is in small, manageable pieces. Writing unittests for a giant outer loop that contains several levels of nested inner loops is tricky business; we'd have no confidence that every possible code path was tested because there are too many of them! | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Once we have all these components, we just need a little bit of glue code to piece them together to do what we want. | Once we have all these components, we just need a little bit of glue code to piece them together to do what we want. | ||
Line 153: | Line 117: | ||
* Returns: A range of dates in a given year. | * Returns: A range of dates in a given year. | ||
*/ | */ | ||
− | auto datesInYear(int year) { | + | auto datesInYear(int year) pure { |
return Date(year, 1, 1) | return Date(year, 1, 1) | ||
− | .recurrence!((a,n) => a[n-1] + | + | .recurrence!((a,n) => a[n-1] + 1.days) |
.until!(a => a.year > year); | .until!(a => a.year > year); | ||
} | } | ||
Line 165: | Line 129: | ||
This is relatively straightforward; it is a typical example of a range implementation. For ease of usage, we have encapsulated it within a function that creates the range and returns it. | This is relatively straightforward; it is a typical example of a range implementation. For ease of usage, we have encapsulated it within a function that creates the range and returns it. | ||
+ | |||
+ | One point of interest in the returned range is that it is a so-called [[Voldemort types|Voldemort type]]: it cannot be named or instantiated outside of the scope of <code>datesInYear</code> because it uses a closure on the argument <code>year</code> which is only available in the scope of this function. Voldemort types are useful for returning things that conform to a particular interface, in this case a forward range, but the exact concrete type of which doesn't need to be known by the user. Making it impossible for user code to depend on the concrete type ensures extensibility: it allows us to change the concrete type returned in the future without breaking any user code. This idiom is frequently used in D code that deals with ranges. | ||
===Grouping Dates by Month=== | ===Grouping Dates by Month=== | ||
Line 180: | Line 146: | ||
if (isInputRange!Range && | if (isInputRange!Range && | ||
is(typeof( | is(typeof( | ||
− | + | attrFun(ElementType!Range.init) == attrFun(ElementType!Range.init) | |
− | |||
)) | )) | ||
) | ) | ||
Line 189: | Line 154: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | + | The <code>if</code> block that comes between the function parameters and its body is a ''signature constraint''. Signature constraints are a D feature that allows a templated function to graciously decline instantiation if its template parameters fail to meet certain requirements. In this case, the signature constraint allows us to specify that <code>chunkBy</code> can only be used with parameters that implement the input range primitives <code>empty</code>, <code>front</code>, and <code>popFront</code>, and that the function <code>attrFun</code> can be applied to elements of the range, and that the result must be comparable with the <code>==</code> operator. This allows us to safely use input range primitives inside the function body, pass range elements to <code>attrFun</code>, and compare the result with <code>==</code>, without fearing that the user type may not support such operations. | |
+ | |||
+ | The intent of <code>chunkBy</code> is to be a function that, given a range <code>r</code> and some function that maps items in the range to some value type that can be compared with the <code>==</code> operator, returns a range of subranges of the original range such that all the items in each subrange is mapped by the function to the same value. This is expressed by the example usage below, which is a unittest in the actual code of the calendar program: | ||
<syntaxhighlight lang=D> | <syntaxhighlight lang=D> | ||
Line 203: | Line 170: | ||
]; | ]; | ||
− | auto byX = chunkBy! | + | auto byX = chunkBy!(a => a[0])(range); |
auto expected1 = [ | auto expected1 = [ | ||
[[1, 1], [1, 1], [1, 2]], | [[1, 1], [1, 1], [1, 2]], | ||
Line 215: | Line 182: | ||
} | } | ||
− | auto byY = chunkBy! | + | auto byY = chunkBy!(a => a[1])(range); |
auto expected2 = [ | auto expected2 = [ | ||
[[1, 1], [1, 1]], | [[1, 1], [1, 1]], | ||
Line 237: | Line 204: | ||
if (isInputRange!Range && | if (isInputRange!Range && | ||
is(typeof( | is(typeof( | ||
− | + | attrFun(ElementType!Range.init) == attrFun(ElementType!Range.init) | |
− | |||
)) | )) | ||
) | ) | ||
Line 307: | Line 273: | ||
if (isDateRange!InputRange) | if (isDateRange!InputRange) | ||
{ | { | ||
− | return chunkBy! | + | return dates.chunkBy!(a => a.month()); |
} | } | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | + | The signature constraint uses the template <code>isDateRange</code> that ensures that <code>InputRange</code> is actually a valid input range, and that its element type is <code>Date</code>, since that's what the code expects. The definition of <code>isDateRange</code> is as follows: | |
+ | |||
+ | <syntaxhighlight lang=D> | ||
+ | /** | ||
+ | * Convenience template for verifying that a given range is an input range of | ||
+ | * Dates. | ||
+ | */ | ||
+ | template isDateRange(R) { | ||
+ | enum isDateRange = isInputRange!R && is(ElementType!R : Date); | ||
+ | } | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | The reason we put this in a separate template is because we will need it again later on to verify that a particular template argument is, in fact, what we expect it to be. | ||
+ | |||
+ | The unittest accompanying our new <code>byMonth</code> function shows how it can be put together with our first component, <code>datesInYear()</code>, to produce a range of dates in each month of a year: | ||
<syntaxhighlight lang=D> | <syntaxhighlight lang=D> | ||
Line 338: | Line 318: | ||
* same week. Note that weeks begin on Sunday and end on Saturday. | * same week. Note that weeks begin on Sunday and end on Saturday. | ||
*/ | */ | ||
− | auto byWeek(InputRange)(InputRange dates) | + | auto byWeek(InputRange)(InputRange dates) pure nothrow |
if (isDateRange!InputRange) | if (isDateRange!InputRange) | ||
{ | { | ||
Line 361: | Line 341: | ||
This time, our code is simpler because weeks always end on a Saturday, which allows us to use the <code>until()</code> algorithm included in D's Phobos standard library. | This time, our code is simpler because weeks always end on a Saturday, which allows us to use the <code>until()</code> algorithm included in D's Phobos standard library. | ||
− | This simplicity belies something very powerful that we have just achieved, though. | + | This simplicity belies something very powerful that we have just achieved, though. Notice that nowhere in the definition of <code>byWeek()</code> is there any reference to dates in a year, or dates in a month. It can be applied to both! If applied to the output of <code>datesInYear()</code>, for example, it would give us the dates of the year grouped by weeks in the year. If applied to one of the subranges returned by <code>byMonth()</code>, it would give us the dates of that month grouped by weeks, which is exactly what we need later on when we want to format the days in a month. So already, we can see that <code>byWeek()</code> is a reusable component that can be reused outside of the confines of our task at hand. |
− | + | Furthermore, if a month starts in the middle of the week, <code>byWeek()</code> will return less than a full week in its first subrange, because it breaks the range up by Sundays, not necessarily by every 7 items in the range. The same thing goes for ranges that end in the middle of the week. So we have, indirectly, solved one of the key problems in calendar layout: the handling of partial weeks in a month. Yet, this solution does not involve tricky if-statements or other conditionals at all, just a straightforward iteration over a range of dates! We are now starting to see the power of component-style programming. | |
===Formatting Days in a Week=== | ===Formatting Days in a Week=== | ||
Line 369: | Line 349: | ||
Our next step is to take a range of dates in (possibly partial) weeks, and format them into string snippets suitable for assembling into our output lines later. In order to maximize the ways we can reuse this functionality, we return a range of string snippets that other code can then process in whatever way needed. | Our next step is to take a range of dates in (possibly partial) weeks, and format them into string snippets suitable for assembling into our output lines later. In order to maximize the ways we can reuse this functionality, we return a range of string snippets that other code can then process in whatever way needed. | ||
− | For the purposes of | + | For the purposes of this case study, we will forego runtime configurability in the output of our calendar program, and simply fix some constant parameters by which we will format the output: |
<syntaxhighlight lang=D> | <syntaxhighlight lang=D> | ||
Line 379: | Line 359: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | These can be recast as runtime parameters, should we ever wish to increase the configurability of our | + | These can be recast as runtime parameters, should we ever wish to increase the configurability of our calendar program. |
To help with code readability, we also write a little helper function that returns a string of <code>n</code> spaces, which we will use to add padding where it's needed: | To help with code readability, we also write a little helper function that returns a string of <code>n</code> spaces, which we will use to add padding where it's needed: | ||
Line 387: | Line 367: | ||
* Returns: A string containing exactly n spaces. | * Returns: A string containing exactly n spaces. | ||
*/ | */ | ||
− | string spaces(size_t n) { | + | string spaces(size_t n) pure nothrow { |
− | return | + | return std.array.replicate(" ", n); |
} | } | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 395: | Line 375: | ||
<syntaxhighlight lang=D> | <syntaxhighlight lang=D> | ||
− | auto formatWeek(Range)(Range weeks) | + | auto formatWeek(Range)(Range weeks) pure nothrow |
if (isInputRange!Range && isInputRange!(ElementType!Range) && | if (isInputRange!Range && isInputRange!(ElementType!Range) && | ||
is(ElementType!(ElementType!Range) == Date)) | is(ElementType!(ElementType!Range) == Date)) | ||
Line 459: | Line 439: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | As | + | As can be seen, we can now format single months in a nice layout, simply by chaining together existing components. We are not planning to support the display of only a single month in the final calendar program, but as this unittest shows, it would be trivial to implement. |
Another important thing to note about this unittest is that <code>formatWeek()</code> returns a ''range'' of strings to be output; it is not at all tied down to printing the entire month in one shot. We will make use of this fact when we do our final calendar layout, by consuming the first lines of each month in a row of months first, then the second lines of each month in the row, etc., and thus achieve a line-by-line output to the terminal that doesn't require a grid buffer. | Another important thing to note about this unittest is that <code>formatWeek()</code> returns a ''range'' of strings to be output; it is not at all tied down to printing the entire month in one shot. We will make use of this fact when we do our final calendar layout, by consuming the first lines of each month in a row of months first, then the second lines of each month in the row, etc., and thus achieve a line-by-line output to the terminal that doesn't require a grid buffer. | ||
Line 473: | Line 453: | ||
* Formats the name of a month centered on ColsPerWeek. | * Formats the name of a month centered on ColsPerWeek. | ||
*/ | */ | ||
− | string monthTitle(Month month) { | + | string monthTitle(Month month) pure nothrow { |
static immutable string[] monthNames = [ | static immutable string[] monthNames = [ | ||
"January", "February", "March", "April", "May", "June", | "January", "February", "March", "April", "May", "June", | ||
Line 487: | Line 467: | ||
auto after = ColsPerWeek - name.length - before; | auto after = ColsPerWeek - name.length - before; | ||
− | return | + | return spaces(before) ~ name ~ spaces(after); |
} | } | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 502: | Line 482: | ||
auto formatMonth(Range)(Range monthDays) | auto formatMonth(Range)(Range monthDays) | ||
if (isInputRange!Range && is(ElementType!Range == Date)) | if (isInputRange!Range && is(ElementType!Range == Date)) | ||
− | { | + | in { |
assert(!monthDays.empty); | assert(!monthDays.empty); | ||
assert(monthDays.front.day == 1); | assert(monthDays.front.day == 1); | ||
− | + | } body { | |
return chain( | return chain( | ||
[ monthTitle(monthDays.front.month) ], | [ monthTitle(monthDays.front.month) ], | ||
Line 525: | Line 505: | ||
* A range of ranges of formatted lines for each month. | * A range of ranges of formatted lines for each month. | ||
*/ | */ | ||
− | auto formatMonths(Range)(Range months) | + | auto formatMonths(Range)(Range months) pure nothrow |
if (isInputRange!Range && is(ElementType!(ElementType!Range) == Date)) | if (isInputRange!Range && is(ElementType!(ElementType!Range) == Date)) | ||
{ | { | ||
− | return months.map! | + | return months.map!formatMonth; |
} | } | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 538: | Line 518: | ||
Now we come to the most crucial piece of our calendar program: the code that formats a row of months in the grid of the final calendar output. Here is where we will take the ranges of formatted lines for each month, and paste them together horizontally to form the output lines to the terminal. Will this require some tricky loops, or clever if-statements, to achieve? | Now we come to the most crucial piece of our calendar program: the code that formats a row of months in the grid of the final calendar output. Here is where we will take the ranges of formatted lines for each month, and paste them together horizontally to form the output lines to the terminal. Will this require some tricky loops, or clever if-statements, to achieve? | ||
− | Actually, with the components we have built so far, this part of the code is almost disappointingly straightforward. Again, we note that the task of pasting together rectangular blocks of string snippets is not a calendar-specific problem; it is a general | + | Actually, with the components we have built so far, this part of the code is almost disappointingly straightforward. Again, we note that the task of pasting together rectangular blocks of string snippets is not a calendar-specific problem; it is a general algorithm that can be applied to any block of any string snippets. Hence, we cast our next component in generic terms: |
<syntaxhighlight lang=D> | <syntaxhighlight lang=D> | ||
Line 550: | Line 530: | ||
* Parameters: | * Parameters: | ||
* ror = A range of of ranges of fixed-width strings. | * ror = A range of of ranges of fixed-width strings. | ||
− | * sepWidth = Number of spaces | + | * sepWidth = Number of spaces to insert between each month. |
* Returns: | * Returns: | ||
* A range of ranges of formatted lines for each month. | * A range of ranges of formatted lines for each month. | ||
Line 584: | Line 564: | ||
// Map each subrange to its front element, or empty fillers if | // Map each subrange to its front element, or empty fillers if | ||
// it's already empty. | // it's already empty. | ||
− | .map! | + | .map!(a => a[0].empty ? spaces(a[1]) : a[0].front) |
// Join them together to form a line | // Join them together to form a line | ||
Line 663: | Line 643: | ||
// Format each row | // Format each row | ||
− | .map! | + | .map!(r => |
// By formatting each month | // By formatting each month | ||
r.formatMonths() | r.formatMonths() | ||
Line 701: | Line 681: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | And that's it. The output of the program is exactly the output displayed | + | And that's it. The output of the program is exactly the output displayed near the beginning of this article. |
==Conclusions== | ==Conclusions== | ||
Line 716: | Line 696: | ||
==Acknowledgements== | ==Acknowledgements== | ||
− | Many thanks to | + | Many thanks to D forum members who gave helpful feedback and encouraged me to turn my original forum posting into this article, especially to Timon Gehr and bearophile for suggesting improvements to the calendar code, and Andrei Alexandrescu for suggesting to provide a better context for the article in relation to related areas of research. |
Thanks also go to my former professor Gunnar Gotshalks who first introduced me to Jackson Structured Programming, which gave me a deep insight into how mismatches between program structure and data structure (or between two or more data structures) is the source of much of the complexity in code. | Thanks also go to my former professor Gunnar Gotshalks who first introduced me to Jackson Structured Programming, which gave me a deep insight into how mismatches between program structure and data structure (or between two or more data structures) is the source of much of the complexity in code. | ||
+ | |||
+ | ==Related readings== | ||
+ | |||
+ | * [http://mitpress.mit.edu/sicp/ Structure and Interpretation of Computer Programs], especially the section on [http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-24.html#%_sec_3.5 streams]. | ||
+ | * Wikipedia: | ||
+ | :* [http://en.wikipedia.org/wiki/Pipeline_(software) Pipes and filters design pattern] | ||
+ | :* [http://en.wikipedia.org/wiki/Component-based_software_engineering Component-based software engineering] | ||
+ | * Andrei Alexandrescu's article ''[http://www.informit.com/articles/printerfriendly.aspx?p=1407357&rll=1 On Iteration]'' where the concept of ranges was first developed. | ||
+ | * Walter Bright's article on [http://www.drdobbs.com/architecture-and-design/component-programming-in-d/240008321 component programming in D]. | ||
==Appendix== | ==Appendix== | ||
Line 724: | Line 713: | ||
The full source code for the calendar program developed in this article is [https://github.com/quickfur/dcal/blob/master/dcal.d available on github]. | The full source code for the calendar program developed in this article is [https://github.com/quickfur/dcal/blob/master/dcal.d available on github]. | ||
− | The exact version of the code used in this article is [https://github.com/quickfur/dcal/blob/ | + | The exact version of the code used in this article is [https://github.com/quickfur/dcal/blob/83764e6de9e6d25c0a136cbd91ff8b1a8fb1d5ec/dcal.d here]. |
+ | |||
+ | [[Category:Tutorials]] |
Latest revision as of 20:08, 19 May 2015
- By H. S. Teoh, August 2013
Component-style programming can be used to implement complex algorithms in a way that's straightforward to write, easy to read, and amenable to code reuse.
Contents
Sources of Complexity in Code
One of the more influential courses I took in college was on Jackson Structured Programming. It identified two sources of programming complexity (i.e., where bugs are most likely to occur): structure conflicts and writing loops.
Structure conflicts are mismatches between the structure of the program and the structure of the data, or mismatches between two or more data structures that must be processed simultaneously. For example, to read a text file formatted into lines, we may write a loop that reads each line in turn; however, the data in the file may be structured as a preamble, a body, and an epilogue. Since this structure is not reflected in the code, it often prompts programmers to introduce flags, state variables, and the like, in order to resolve the conflict between program structure and data structure. However, such ad hoc methods of resolving structure conflicts often increase the complexity of the code and the coupling between various parts of the code, making the code more fragile and prone to bugs.
Loops are a complex construct in imperative programming. One indication of how complex they can be is the fact that proving that a loop terminates at all is, in the general case, equivalent to solving the halting problem, which is unsolvable. This is especially true when the loop condition is complex. The difficulty is further compounded when nested loops are involved. Unfortunately, the ad hoc methods of resolving structure conflicts often increase the complexity of loop conditions and introduce nested loops, thus making the code doubly prone to bugs.
Furthermore, such code is rarely reusable, because each part of the code is intricately tied to other parts, and it is difficult to extract reusable components out of it.
In this article, we will describe an approach to implementing complex algorithms that minimizes code complexity by using a consistent approach to resolve structure conflicts and simplify loops so that the resulting code is straightforward to write, easy to understand, and made of highly-reusable components.
Component Programming
A key point in Jackson Structured Programming (JSP) is that code structure should be in 1-to-1 correspondence with the structure of the data it is operating on. Bugs and other code problems arise when the structure of the code does not correspond with the structure of the data.
This leads us to component programming, a concept which has been employed in various incarnations. One example is the Unix shell, where input is passed through various programs called filters that perform various transformations on the input, and each respective output is connected via pipes into the input of the subsequent filter, until the final output is produced and sent to a data sink (output on the terminal, saved into a file, piped into a program that consumes the data without producing further output, etc.). The utility of this concept is proven in the expressiveness of the Unix command-line, in which many tasks can be accomplished simply by passing the input through the right sequence of filters connected by pipes. Functional programming languages also employ this concept in the form of lazily-evaluated sequences and streams, where sequential input data is successively transformed in various ways until it attains the final, desired form.
Component programming deals with structure conflicts by recognizing that there are such conflicts in the first place. That is, it acknowledges the existence of non-corresponding structures by encapsulating each such structure into a separate component, rather than arbitrarily picking one of the structures to model to program after and then use ad hoc methods to resolve the resulting conflicts with the other structures. This separation of non-corresponding structures allows us to match the structure of the code to the structure of the data it is dealing with. Such components thus serve as data sources, the output of which can be passed through other components that successively transform the data until it reaches the desired form, thus resolving the non-correspondences between the original structures.
In the remainder of this article, we will use a case study to demonstrate how this works in practice.
Case Study: Formatting a Calendar
We shall use as example the classic task of laying out a yearly calendar on the console, such that given a particular year, the program will print out a number of lines that displays the 12 months in a nice grid layout, with numbers indicating each day within the month. Something like this:
January February March 1 2 3 4 5 1 2 1 2 6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9 13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16 20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23 27 28 29 30 31 24 25 26 27 28 24 25 26 27 28 29 30 31 April May June 1 2 3 4 5 6 1 2 3 4 1 7 8 9 10 11 12 13 5 6 7 8 9 10 11 2 3 4 5 6 7 8 14 15 16 17 18 19 20 12 13 14 15 16 17 18 9 10 11 12 13 14 15 21 22 23 24 25 26 27 19 20 21 22 23 24 25 16 17 18 19 20 21 22 28 29 30 26 27 28 29 30 31 23 24 25 26 27 28 29 30 July August September 1 2 3 4 5 6 1 2 3 1 2 3 4 5 6 7 7 8 9 10 11 12 13 4 5 6 7 8 9 10 8 9 10 11 12 13 14 14 15 16 17 18 19 20 11 12 13 14 15 16 17 15 16 17 18 19 20 21 21 22 23 24 25 26 27 18 19 20 21 22 23 24 22 23 24 25 26 27 28 28 29 30 31 25 26 27 28 29 30 31 29 30 October November December 1 2 3 4 5 1 2 1 2 3 4 5 6 7 6 7 8 9 10 11 12 3 4 5 6 7 8 9 8 9 10 11 12 13 14 13 14 15 16 17 18 19 10 11 12 13 14 15 16 15 16 17 18 19 20 21 20 21 22 23 24 25 26 17 18 19 20 21 22 23 22 23 24 25 26 27 28 27 28 29 30 31 24 25 26 27 28 29 30 29 30 31
While intuitively straightforward, this task has many points of complexity.
Although generating all dates in a year is trivial thanks to D's std.datetime
module, the order in which they must be processed is far from obvious. Since we're writing to the console, we're limited to outputting one line at a time; we can't draw one cell of the grid and then go back up a few lines, move a few columns over, and draw the next cell in the grid. We have to somehow print the first lines of all cells in the top row, followed by the second lines, then the third lines, etc., and repeat this process for each row in the grid. Of course, we could create an internal screen buffer that we can write to in arbitrary order, and then output this buffer line-by-line at the end, but this approach is not as elegant because it requires a much bigger memory footprint than is really necessary.
In any case, as a result of this mismatch between the structure of the calendar and the structure of the output, the order in which the days in a month must be processed is not the natural, chronological order. We have to assemble the dates for the first weeks in each of the first 3 months, say, if we are putting 3 months per row in the grid, print those out, then assemble the dates for the second weeks in each month, print those out, etc.. Furthermore, within the rows representing each week, some days may be missing, depending on where the boundaries of adjacent months fall; these missing days must be filled out in the following month's first week before the first full week in the month is printed. It is not that simple to figure out where a week starts and ends, and how many rows are needed per month. Then if some months have more full weeks than others, they may occupy less vertical space than other months on the same row in the grid; so we need to insert blank spaces into these shorter months in order for the grid cells to line up vertically in the output.
With this level of complexity, writing our calendar program using the traditional ad hoc way of resolving structure conflicts will certainly result in very complex, hard-to-understand, and bug-prone code. There would not be much hope of getting any reusable pieces out of it.
Structuring the Program
Let's consider how we may approach this problem using component programming. The first task is to identify all of the different structures that we will need to deal with:
- Generating dates in a year
- Grouping dates by month
- Grouping dates by week
- Formatting the days in a week
- Grouping formatted weeks into months
- Laying out some number of months horizontally in a grid to form rows
- Outputting each line of each grid row
- Outputting all the rows
Ranges
Notice that the first task, generating dates in a year, basically amounts to a data source: we are producing a sequence of data items (dates in a year), to be consumed by subsequent stages. The concept of a data source is nicely captured by the concept of ranges, which are a generalization of C++ iterators. In particular, an input range is any type that implements three primitives:
empty
: tells us whether the data source has any data left to be retrieved.front
: returns the current data item.popFront
: tells the data source to produce the next data item.
This is a useful abstraction, because it allows us to write algorithms that are independent of the concrete type of the data source: any concrete type that provides the above interface qualifies as an input range and can be used with any algorithm that expects an input range.
For our purposes, it is also useful to define a more specific kind of data source: a forward range. A forward range implements all of the primitives of an input range, plus one more:
save
: returns another instance of the data source representing the current iteration state of the data source. The data items produced by the data source may be traversed twice by iterating over these two instances.
This is a very useful primitive to have, because it essentially allows us to iterate the same range as many times as we want. Each time, we save a copy of the range and iterate over that, which does not change the state of the original range. Some of our calendar algorithms will require this, since certain steps are impossible to implement with only one-pass ranges.
Components
Coming back to our list of tasks, we note that the other tasks on our list essentially amount to writing filters on the data source. For example, grouping dates by month essentially entails writing a filter that takes a range of dates as input, and produces a range of ranges of dates as output, where each subrange in the output corresponds with dates within a single month.
Thus, each of the tasks can be separated out into their own components, which either produces a range of data, or performs some transformation on a given range to produce a new range. This separation of tasks allows us to deal with the structure conflicts in a straightforward, well-encapsulated way. This reduces the complexity of the code required and therefore decreases the likelihood of bugs. It's far easier, for example, to ensure that the code never puts more than 7 days into a week if there is only one place in the code where dates are grouped into weeks. It is also easier to write unittests for checking code correctness when the code is in small, manageable pieces. Writing unittests for a giant outer loop that contains several levels of nested inner loops is tricky business; we'd have no confidence that every possible code path was tested because there are too many of them!
Once we have all these components, we just need a little bit of glue code to piece them together to do what we want.
Let's walk through these components one by one, and show how we can write our calendar program in nice, reusable pieces.
Generating Dates in a Year
Our first task is to generate all the dates in a year. Thanks to D's std.datetime
module, this is rather easy: create a Date
object, then repeatedly add durations of 1 day to it. Varying month lengths, leap years, etc., are all taken care of for us. For our purposes, though, we can't just do this in a loop, because it has to interface with the other components, which do not have a matching structure to a loop over dates. So we capture this task of generating dates by encapsulating it within an input range.
We could implement this range manually, but in the spirit of code reuse, we make use of D's Phobos standard library that provides convenient primitives for constructing ranges. Here is our implementation:
/**
* Returns: A range of dates in a given year.
*/
auto datesInYear(int year) pure {
return Date(year, 1, 1)
.recurrence!((a,n) => a[n-1] + 1.days)
.until!(a => a.year > year);
}
The recurrence()
primitive lets us specify a range that's programmatically generated from an initial value. In this case, we start with the first day of the year, January 1st, represented as Date(year, 1, 1), then specify the relation that generates the next date in the sequence. This relation is specified as a lambda that takes a state vector a
, representing the sequence generated so far, and an index n
, and returns the date one day after the previous date, a[n-1]
. (Note that in spite of the array indexing notation, recurrence
is smart enough to only store as many previous dates as is necessary to generate the next one; in this case, only one previous date is stored. So there is no undue memory consumption caused by storing all dates in the year.)
The until()
primitive lets us limit the range of generated dates to only dates within the specified year.
This is relatively straightforward; it is a typical example of a range implementation. For ease of usage, we have encapsulated it within a function that creates the range and returns it.
One point of interest in the returned range is that it is a so-called Voldemort type: it cannot be named or instantiated outside of the scope of datesInYear
because it uses a closure on the argument year
which is only available in the scope of this function. Voldemort types are useful for returning things that conform to a particular interface, in this case a forward range, but the exact concrete type of which doesn't need to be known by the user. Making it impossible for user code to depend on the concrete type ensures extensibility: it allows us to change the concrete type returned in the future without breaking any user code. This idiom is frequently used in D code that deals with ranges.
Grouping Dates by Month
Our next task on the list is to group dates by month. For maximum utility, we'd like to take a range of Date
s, and break it up into subranges where all the Date
s in a given subrange belong to the same month.
chunkBy
We could simply proceed and write this code directly; but in the spirit of code reuse, we note that this particular task is not really specific to dates. At its core, what we're doing is that given a range of items with properties x, y, z, we want to break the range up into subranges where all items in each subrange share the same value for a particular chosen property, say x. Or, to phrase it in terms of the Dates that we're dealing with, given a range of Date
s, each of which consists of year, month, and day, we'd like to be able to select a particular property, such as month, and break the range up into subranges by that property (i.e., each subrange contains the dates for a single month). Since we don't know if, in the future, we might want to group Date
s by year instead, the selection of which property to use should be parametrized.
But there is no reason why we should limit ourselves to using only properties of the item as the grouping criterion; that would not cover the case of, say, grouping Dates by week, since the Date
object doesn't have a week field we may group by. So, boiled down to the essentials, what we're doing is to map each item to some value, be it selecting the month from a Date, or computing a function on a number, etc.. This then leads to the generic definition:
auto chunkBy(alias attrFun, Range)(Range r)
if (isInputRange!Range &&
is(typeof(
attrFun(ElementType!Range.init) == attrFun(ElementType!Range.init)
))
)
{
...
}
The if
block that comes between the function parameters and its body is a signature constraint. Signature constraints are a D feature that allows a templated function to graciously decline instantiation if its template parameters fail to meet certain requirements. In this case, the signature constraint allows us to specify that chunkBy
can only be used with parameters that implement the input range primitives empty
, front
, and popFront
, and that the function attrFun
can be applied to elements of the range, and that the result must be comparable with the ==
operator. This allows us to safely use input range primitives inside the function body, pass range elements to attrFun
, and compare the result with ==
, without fearing that the user type may not support such operations.
The intent of chunkBy
is to be a function that, given a range r
and some function that maps items in the range to some value type that can be compared with the ==
operator, returns a range of subranges of the original range such that all the items in each subrange is mapped by the function to the same value. This is expressed by the example usage below, which is a unittest in the actual code of the calendar program:
unittest {
auto range = [
[1, 1],
[1, 1],
[1, 2],
[2, 2],
[2, 3],
[2, 3],
[3, 3]
];
auto byX = chunkBy!(a => a[0])(range);
auto expected1 = [
[[1, 1], [1, 1], [1, 2]],
[[2, 2], [2, 3], [2, 3]],
[[3, 3]]
];
foreach (e; byX) {
assert(!expected1.empty);
assert(e.equal(expected1.front));
expected1.popFront();
}
auto byY = chunkBy!(a => a[1])(range);
auto expected2 = [
[[1, 1], [1, 1]],
[[1, 2], [2, 2]],
[[2, 3], [2, 3], [3, 3]]
];
foreach (e; byY) {
assert(!expected2.empty);
assert(e.equal(expected2.front));
expected2.popFront();
}
}
This unittest exemplifies the point that in component-style programming, we work with self-contained components with well-defined, straightforward interfaces, which makes them straightforward to implement, easy to test, and amenable to reuse.
Now we present the full definition of chunkBy()
:
auto chunkBy(alias attrFun, Range)(Range r)
if (isInputRange!Range &&
is(typeof(
attrFun(ElementType!Range.init) == attrFun(ElementType!Range.init)
))
)
{
alias attr = unaryFun!attrFun;
alias AttrType = typeof(attr(r.front));
static struct Chunk {
private Range r;
private AttrType curAttr;
@property bool empty() {
return r.empty || !(curAttr == attr(r.front));
}
@property ElementType!Range front() { return r.front; }
void popFront() {
assert(!r.empty);
r.popFront();
}
}
static struct ChunkBy {
private Range r;
private AttrType lastAttr;
this(Range _r) {
r = _r;
if (!empty)
lastAttr = attr(r.front);
}
@property bool empty() { return r.empty; }
@property auto front() {
assert(!r.empty);
return Chunk(r, lastAttr);
}
void popFront() {
assert(!r.empty);
while (!r.empty && attr(r.front) == lastAttr) {
r.popFront();
}
if (!r.empty)
lastAttr = attr(r.front);
}
static if (isForwardRange!Range) {
@property ChunkBy save() {
ChunkBy copy;
copy.r = r.save;
copy.lastAttr = lastAttr;
return copy;
}
}
}
return ChunkBy(r);
}
While this code is more complex than before, it is still relatively straightforward: the outer range iterates over the first elements of each subrange, and its .front
method returns said subrange as a separate iterable object. Each subrange iterates over the source range until the criterion that defines that subrange is no longer satisfied, at which point the iteration ends. There are only a bare minimum of state variables (keeping track of the value of the function defining the current subrange), and no intricate interdependencies that are difficult to understand.
byMonth
Armed with chunkBy
, grouping a range of Date
s by month is now trivial to implement:
/**
* Chunks a given input range of dates by month.
* Returns: A range of ranges, each subrange of which contains dates for the
* same month.
*/
auto byMonth(InputRange)(InputRange dates)
if (isDateRange!InputRange)
{
return dates.chunkBy!(a => a.month());
}
The signature constraint uses the template isDateRange
that ensures that InputRange
is actually a valid input range, and that its element type is Date
, since that's what the code expects. The definition of isDateRange
is as follows:
/**
* Convenience template for verifying that a given range is an input range of
* Dates.
*/
template isDateRange(R) {
enum isDateRange = isInputRange!R && is(ElementType!R : Date);
}
The reason we put this in a separate template is because we will need it again later on to verify that a particular template argument is, in fact, what we expect it to be.
The unittest accompanying our new byMonth
function shows how it can be put together with our first component, datesInYear()
, to produce a range of dates in each month of a year:
unittest {
auto months = datesInYear(2013).byMonth();
int month = 1;
do {
assert(!months.empty);
assert(months.front.front == Date(2013, month, 1));
months.popFront();
} while (++month <= 12);
assert(months.empty);
}
We didn't verify that the dates within each month as yet -- we will get to that eventually -- but at this point it suffices to show how, by composing two components, we're now able to produce ranges of dates for each month of a year. By segmenting the range of dates in a year rather than generating the dates of each month separately, we guarantee that the output of the calendar program will be consistent: no date will be omitted or repeated due to a bug in the code, and we will not inadvertently print more dates than are in the year.
Grouping Dates by Week
Another piece of the puzzle of calendar layout is to be able to group a range of dates by week. This is also a relatively simple task, since for the purposes of our calendar we can regard each week as beginning on a Sunday and ending on a Saturday. Again, we write a function that takes a range of dates, and breaks it up into subranges by week:
/**
* Chunks a given input range of dates by week.
* Returns: A range of ranges, each subrange of which contains dates for the
* same week. Note that weeks begin on Sunday and end on Saturday.
*/
auto byWeek(InputRange)(InputRange dates) pure nothrow
if (isDateRange!InputRange)
{
static struct ByWeek {
InputRange r;
@property bool empty() { return r.empty; }
@property auto front() {
return until!((Date a) => a.dayOfWeek == DayOfWeek.sat)
(r, OpenRight.no);
}
void popFront() {
assert(!r.empty());
r.popFront();
while (!r.empty && r.front.dayOfWeek != DayOfWeek.sun)
r.popFront();
}
}
return ByWeek(dates);
}
This time, our code is simpler because weeks always end on a Saturday, which allows us to use the until()
algorithm included in D's Phobos standard library.
This simplicity belies something very powerful that we have just achieved, though. Notice that nowhere in the definition of byWeek()
is there any reference to dates in a year, or dates in a month. It can be applied to both! If applied to the output of datesInYear()
, for example, it would give us the dates of the year grouped by weeks in the year. If applied to one of the subranges returned by byMonth()
, it would give us the dates of that month grouped by weeks, which is exactly what we need later on when we want to format the days in a month. So already, we can see that byWeek()
is a reusable component that can be reused outside of the confines of our task at hand.
Furthermore, if a month starts in the middle of the week, byWeek()
will return less than a full week in its first subrange, because it breaks the range up by Sundays, not necessarily by every 7 items in the range. The same thing goes for ranges that end in the middle of the week. So we have, indirectly, solved one of the key problems in calendar layout: the handling of partial weeks in a month. Yet, this solution does not involve tricky if-statements or other conditionals at all, just a straightforward iteration over a range of dates! We are now starting to see the power of component-style programming.
Formatting Days in a Week
Our next step is to take a range of dates in (possibly partial) weeks, and format them into string snippets suitable for assembling into our output lines later. In order to maximize the ways we can reuse this functionality, we return a range of string snippets that other code can then process in whatever way needed.
For the purposes of this case study, we will forego runtime configurability in the output of our calendar program, and simply fix some constant parameters by which we will format the output:
/// The number of columns per day in the formatted output.
enum ColsPerDay = 3;
/// The number of columns per week in the formatted output.
enum ColsPerWeek = 7 * ColsPerDay;
These can be recast as runtime parameters, should we ever wish to increase the configurability of our calendar program.
To help with code readability, we also write a little helper function that returns a string of n
spaces, which we will use to add padding where it's needed:
/**
* Returns: A string containing exactly n spaces.
*/
string spaces(size_t n) pure nothrow {
return std.array.replicate(" ", n);
}
The actual code for formatting each week is relatively straightforward:
auto formatWeek(Range)(Range weeks) pure nothrow
if (isInputRange!Range && isInputRange!(ElementType!Range) &&
is(ElementType!(ElementType!Range) == Date))
{
struct WeekStrings {
Range r;
@property bool empty() { return r.empty; }
string front()
out(s) { assert(s.length == ColsPerWeek); }
body
{
auto buf = appender!string();
// Insert enough filler to align the first day with its respective
// day-of-week.
assert(!r.front.empty);
auto startDay = r.front.front.dayOfWeek;
buf.put(spaces(ColsPerDay * startDay));
// Format each day into its own cell and append to target string.
string[] days = map!((Date d) => " %2d".format(d.day))(r.front)
.array;
assert(days.length <= 7 - startDay);
days.copy(buf);
// Insert more filler at the end to fill up the remainder of the
// week, if it's a short week (e.g. at the end of the month).
if (days.length < 7 - startDay)
buf.put(spaces(ColsPerDay * (7 - startDay - days.length)));
return buf.data;
}
void popFront() {
r.popFront();
}
}
return WeekStrings(weeks);
}
Most of the work is done in the .front
function, which handles partial weeks by computing how many missing days there are on each side, and inserting filler spaces in their place. The formatting of the days themselves is handled by using std.algorithm.map
. The buffering of the output is handled by std.array.appender
, a Phobos-provided component that gives an output range interface to a string buffer. All in all, this is pretty straightforward, and easy to test.
In the accompanying unittest, we couple formatWeeks
with the other components we have built so far, to show off what we can achieve now:
unittest {
auto jan2013 = datesInYear(2013)
.byMonth().front // pick January 2013 for testing purposes
.byWeek()
.formatWeek()
.join("\n");
assert(jan2013 ==
" 1 2 3 4 5\n"~
" 6 7 8 9 10 11 12\n"~
" 13 14 15 16 17 18 19\n"~
" 20 21 22 23 24 25 26\n"~
" 27 28 29 30 31 "
);
}
As can be seen, we can now format single months in a nice layout, simply by chaining together existing components. We are not planning to support the display of only a single month in the final calendar program, but as this unittest shows, it would be trivial to implement.
Another important thing to note about this unittest is that formatWeek()
returns a range of strings to be output; it is not at all tied down to printing the entire month in one shot. We will make use of this fact when we do our final calendar layout, by consuming the first lines of each month in a row of months first, then the second lines of each month in the row, etc., and thus achieve a line-by-line output to the terminal that doesn't require a grid buffer.
But let's not get ahead of ourselves. First, let's officially put the functionality already demonstrated by the above unittest into a form that can be reused.
Formatting Months
In addition to printing out the days in the month, we want to print the name of the month at the top, so that we know which month the block of formatted days belongs to. For cosmetic purposes, we center the name of the month horizontally over its days:
/**
* Formats the name of a month centered on ColsPerWeek.
*/
string monthTitle(Month month) pure nothrow {
static immutable string[] monthNames = [
"January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"
];
static assert(monthNames.length == 12);
// Determine how many spaces before and after the month name we need to
// center it over the formatted weeks in the month
auto name = monthNames[month - 1];
assert(name.length < ColsPerWeek);
auto before = (ColsPerWeek - name.length) / 2;
auto after = ColsPerWeek - name.length - before;
return spaces(before) ~ name ~ spaces(after);
}
Next, we write the function that formats a month by returning a range containing the month name followed by the formatted weeks in the month:
/**
* Formats a month.
* Parameters:
* monthDays = A range of Dates representing consecutive days in a month.
* Returns: A range of strings representing each line of the formatted month.
*/
auto formatMonth(Range)(Range monthDays)
if (isInputRange!Range && is(ElementType!Range == Date))
in {
assert(!monthDays.empty);
assert(monthDays.front.day == 1);
} body {
return chain(
[ monthTitle(monthDays.front.month) ],
monthDays.byWeek().formatWeek());
}
Thanks to the reusable components we have built up to this point, this is quite simple.
To make formatMonth
easier to use in the final code, we wrap it in a function that applies it to each month in a range of months:
/**
* Formats a range of months.
* Parameters:
* months = A range of ranges, each inner range is a range of Dates in a
* month.
* Returns:
* A range of ranges of formatted lines for each month.
*/
auto formatMonths(Range)(Range months) pure nothrow
if (isInputRange!Range && is(ElementType!(ElementType!Range) == Date))
{
return months.map!formatMonth;
}
Again, we note the genericity of this code: formatMonths
does not assume anything about the number of months it is given to format. We may pass in all 12 months in the year, or any subset thereof. In particular, in the next step, we will be grouping months into grid rows, and passing each row to formatMonths
to obtain the range of formatted lines of the months in the row. The way formatMonths
is designed makes it reusable in a wide variety of situations.
Formatting a Row of Months
Now we come to the most crucial piece of our calendar program: the code that formats a row of months in the grid of the final calendar output. Here is where we will take the ranges of formatted lines for each month, and paste them together horizontally to form the output lines to the terminal. Will this require some tricky loops, or clever if-statements, to achieve?
Actually, with the components we have built so far, this part of the code is almost disappointingly straightforward. Again, we note that the task of pasting together rectangular blocks of string snippets is not a calendar-specific problem; it is a general algorithm that can be applied to any block of any string snippets. Hence, we cast our next component in generic terms:
/**
* Horizontally pastes a forward range of rectangular blocks of characters.
*
* Each rectangular block is represented by a range of fixed-width strings. If
* some blocks are longer than others, the shorter blocks are padded with
* spaces at the bottom.
*
* Parameters:
* ror = A range of of ranges of fixed-width strings.
* sepWidth = Number of spaces to insert between each month.
* Returns:
* A range of ranges of formatted lines for each month.
*/
auto pasteBlocks(Range)(Range ror, int sepWidth)
if (isForwardRange!Range && is(ElementType!(ElementType!Range) : string))
{
struct Lines {
Range ror;
string sep;
size_t[] colWidths;
bool _empty;
this(Range _ror, string _sep) {
ror = _ror;
sep = _sep;
_empty = ror.empty;
// Store the widths of each column so that we can insert fillers if
// one of the subranges run out of data prematurely.
foreach (r; ror.save) {
colWidths ~= r.empty ? 0 : r.front.length;
}
}
@property bool empty() { return _empty; }
@property auto front() {
return
// Iterate over ror and colWidths simultaneously
zip(ror.save, colWidths)
// Map each subrange to its front element, or empty fillers if
// it's already empty.
.map!(a => a[0].empty ? spaces(a[1]) : a[0].front)
// Join them together to form a line
.join(sep);
}
/// Pops an element off each subrange.
void popFront() {
assert(!empty);
_empty = true; // assume no more data after popping (we're lazy)
foreach (ref r; ror) {
if (!r.empty) {
r.popFront();
if (!r.empty)
_empty = false; // well, there's still data after all
}
}
}
}
static assert(isInputRange!Lines);
string separator = spaces(sepWidth);
return Lines(ror, separator);
}
This is again an input range, with rather straightforward implementation. The .front
method is where the pasting of each corresponding line of the input months into a single line is implemented.
There's really only one minor complication: how to deal with months that occupy a different amount of vertical space due to the dates falling in such a way that they cover a smaller number of (possibly partial) weeks. To handle this, we simply insert a row of blank spaces in place of a week, when a subrange runs out before other subranges. For maximum reusability, we make no assumptions about the width of the blocks; so we use an array to keep track of the widths of each column and use those widths for inserting the blank spaces where necessary.
Actually, the astute reader will have noticed that pasteBlocks
doesn't even use any of our previously built components. It's a completely generic component that can be used in a wide variety of other programs! Only its accompanying unittest actually puts it together with our other calendar-specific components in a demonstration of our achievement so far:
unittest {
// Make a beautiful, beautiful row of months. How's that for a unittest? :)
auto row = datesInYear(2013).byMonth().take(3)
.formatMonths()
.array()
.pasteBlocks(1)
.join("\n");
assert(row ==
" January February March \n"~
" 1 2 3 4 5 1 2 1 2\n"~
" 6 7 8 9 10 11 12 3 4 5 6 7 8 9 3 4 5 6 7 8 9\n"~
" 13 14 15 16 17 18 19 10 11 12 13 14 15 16 10 11 12 13 14 15 16\n"~
" 20 21 22 23 24 25 26 17 18 19 20 21 22 23 17 18 19 20 21 22 23\n"~
" 27 28 29 30 31 24 25 26 27 28 24 25 26 27 28 29 30\n"~
" 31 "
);
}
Formatting a Year
At this point, we're pretty much done. All that's left is to put all the pieces together to format all the rows in the grid of months for a year:
/**
* Formats a year.
* Parameters:
* year = Year to display calendar for.
* monthsPerRow = How many months to fit into a row in the output.
* Returns: A range of strings representing the formatted year.
*/
auto formatYear(int year, int monthsPerRow)
{
enum colSpacing = 1;
return
// Start by generating all dates for the given year
datesInYear(year)
// Group them by month
.byMonth()
// Group the months into horizontal rows
.chunks(monthsPerRow)
// Format each row
.map!(r =>
// By formatting each month
r.formatMonths()
// Storing each month's formatting in a row buffer
.array()
// Horizontally pasting each respective month's lines together
.pasteBlocks(colSpacing)
.join("\n"))
// Insert a blank line between each row
.join("\n\n");
}
It's worth pointing out that even though we store each month's formatting in an array, this is an array of ranges of each month's formatting. Each line of the formatting is lazily generated when .front
is invoked; there is no memory overhead incurred by storing all formatted lines of the months in a row before we start producing output lines. If we were to trace through the execution of the program, we will find that the formatted lines of each month are generated on-the-fly as each output line is being assembled. And indeed, formatMonth()
returns an input range, not a forward range or higher, and pasteBlocks
does not cache any of the generated string snippets. This lazy evaluation is very similar to the way functional languages work, and is an example of D's support for functional-style code.
For maximum flexibility, we have written formatYear
in such a way that it returns an input range of strings representing the lines of the final, formatted calendar. Thus, one could reuse it ways outside the scope of the present example; for example, using pasteBlocks
to paste multiple calendars together horizontally.
To hook this up to the outside world, we write a simple main()
function that lets the user specify which year the calendar should be formatted for:
int main(string[] args) {
// This is as simple as it gets: parse the year from the command-line:
if (args.length < 2) {
stderr.writeln("Please specify year");
return 1;
}
int year = to!int(args[1]);
// Print the calender
enum MonthsPerRow = 3;
writeln(formatYear(year, MonthsPerRow));
return 0;
}
And that's it. The output of the program is exactly the output displayed near the beginning of this article.
Conclusions
The calendar program that we wrote above proves that component-style programming using the range concept is a very powerful approach to programming:
- It allows us to untangle very complex code into straightforward, manageable pieces.
- It produces components that are highly-reusable outside of their original purpose: in fact,
chunkBy
is a candidate for inclusion in Phobos. - It is easy to write, easy to read, and easy to test, and hence much less prone to bugs. The
formatYear
function is a prime example of how the code is easy to read: it starts with the dates in a year and successively transforms it until it arrives at the final output. There are no convoluted loops, hard-to-understand state variables and complicated if-statements, yet it is able to express the level of complexity required to format a yearly calendar. - New functionality can be easily added by putting the components together in a different way. For example, we can extend
main()
to allow printing just a single month instead of the whole year.
It is also the first calendar formatting program, to my knowledge, that doesn't look frighteningly similar to an entry to the IOCCC. :-)
Acknowledgements
Many thanks to D forum members who gave helpful feedback and encouraged me to turn my original forum posting into this article, especially to Timon Gehr and bearophile for suggesting improvements to the calendar code, and Andrei Alexandrescu for suggesting to provide a better context for the article in relation to related areas of research.
Thanks also go to my former professor Gunnar Gotshalks who first introduced me to Jackson Structured Programming, which gave me a deep insight into how mismatches between program structure and data structure (or between two or more data structures) is the source of much of the complexity in code.
Related readings
- Structure and Interpretation of Computer Programs, especially the section on streams.
- Wikipedia:
- Andrei Alexandrescu's article On Iteration where the concept of ranges was first developed.
- Walter Bright's article on component programming in D.
Appendix
The full source code for the calendar program developed in this article is available on github.
The exact version of the code used in this article is here.