Difference between revisions of "Component programming with ranges"

From D Wiki
Jump to: navigation, search
(expand)
(expand)
Line 49: Line 49:
 
* If some months have more full weeks than others, they may occupy less vertical space than other months on the same row in the grid; so we need to insert blank spaces into these shorter months in order for the grid cells to line up vertically in the output.
 
* If some months have more full weeks than others, they may occupy less vertical space than other months on the same row in the grid; so we need to insert blank spaces into these shorter months in order for the grid cells to line up vertically in the output.
  
Considering all of the above points, it would appear at first glance that we are doomed to writing algorithms specific only to this task, because each piece of the puzzle depends on the others in complex ways. It would appear hopeless to actually get any reusable components out of our calendar program!
+
Considering all of the above points, it would appear at first glance that we are doomed to writing algorithms specific only to this task, because each piece of the puzzle depends on the others in complex ways. It would appear hopeless to actually get any reusable components out of our calendar program! And indeed, this would be the case if we approached the problem from the traditional imperative approach.
 +
 
 +
Furthermore, such a complex algorithm would be difficult to write, and would be more prone to bugs because of its complexity.
 +
 
 +
==Sources of Complexity==
 +
 
 +
One of the more influential courses I took in college was on [http://en.wikipedia.org/wiki/Jackson_Structured_Programming Jackson Structured Programming]. It identified two sources of programming complexity (i.e., where bugs are most likely to occur):
 +
 
 +
# Mismatches between the structure of the program and the structure of the data (e.g., you're reading an input file that has a preamble, body, and epilogue, but your code has a single loop over lines in the file), or between two or more data structures that you are processing at the same time (e.g., laying out a yearly calendar where the boundaries of weeks don't correspond with the boundaries of the months, and the grid structure doesn't correspond with the line-by-line sequence of the final output);
 +
# Writing loop invariants (or equivalently, loop conditions).
 +
 
 +
Most non-trivial loops in imperative code have both, which makes them doubly prone to bugs. In the example of reading a file with three sections, the mismatch between the code structure (a single loop) and the file structure (three sequential sections) often prompts people to add boolean flags, state variables, and the like, in order to resolve the conflict between the two structures. Such ad hoc structure resolutions are a breeding ground for bugs, and often lead to complicated loop conditions, which invite even more bugs.
 +
 
 +
==Untangling Complexity==
 +
 
 +
In contrast, if you structure your code according to the structure of the input (i.e., one loop for processing the preamble, one loop for processing the body, one loop for processing the epilogue), it becomes considerably less complex, easier to read (and write!), and far less bug
 +
prone. Your loop conditions become simpler, and thus easier to reason about and leave less room for bugs to hide.
 +
 
 +
But to be able to process the input in this way requires that we encapsulate our input so that it can be processed by 3 different loops. Once we go down that road, we start to arrive at the concept of input ranges... then we abstract away the three loops into three components, and thus we arrive at Walter's component-style programming, where each component has a well-defined input and output, and the transformation process from input to output has a straightforward correspondence.

Revision as of 17:36, 2 August 2013

Preface

This article was inspired by Walter's article on component programming in D and based on a related discussion thread in the D forum. In short, Walter's article addressed the question of why, despite years of carefully programming in paradigms that purportedly makes your code more reusable, so very little code from the past is actually reused. Component-style programming, in which code is assembled from reusable components may be data sources, data sinks, or algorithms (that transforms data sources and puts them into data sinks), is proposed as a possible solution to this problem.

In this article, we will consider how component-style programming can greatly untangle a complicated algorithm into manageable pieces that are straightforward to write, easy to debug, and reusable.

The Task

We shall consider the classic task of laying out a yearly calendar on the console, such that given a particular year, the program will print out a number of lines that displays the 12 months in a nice grid layout, with numbers indicating each day within the month. Something like this:

       January              February                March        
        1  2  3  4  5                  1  2                  1  2
  6  7  8  9 10 11 12   3  4  5  6  7  8  9   3  4  5  6  7  8  9
 13 14 15 16 17 18 19  10 11 12 13 14 15 16  10 11 12 13 14 15 16
 20 21 22 23 24 25 26  17 18 19 20 21 22 23  17 18 19 20 21 22 23
 27 28 29 30 31        24 25 26 27 28        24 25 26 27 28 29 30
                                             31                  

        April                  May                  June         
     1  2  3  4  5  6            1  2  3  4                     1
  7  8  9 10 11 12 13   5  6  7  8  9 10 11   2  3  4  5  6  7  8
 14 15 16 17 18 19 20  12 13 14 15 16 17 18   9 10 11 12 13 14 15
 21 22 23 24 25 26 27  19 20 21 22 23 24 25  16 17 18 19 20 21 22
 28 29 30              26 27 28 29 30 31     23 24 25 26 27 28 29
                                             30                  

        July                 August               September      
     1  2  3  4  5  6               1  2  3   1  2  3  4  5  6  7
  7  8  9 10 11 12 13   4  5  6  7  8  9 10   8  9 10 11 12 13 14
 14 15 16 17 18 19 20  11 12 13 14 15 16 17  15 16 17 18 19 20 21
 21 22 23 24 25 26 27  18 19 20 21 22 23 24  22 23 24 25 26 27 28
 28 29 30 31           25 26 27 28 29 30 31  29 30               

       October              November              December       
        1  2  3  4  5                  1  2   1  2  3  4  5  6  7
  6  7  8  9 10 11 12   3  4  5  6  7  8  9   8  9 10 11 12 13 14
 13 14 15 16 17 18 19  10 11 12 13 14 15 16  15 16 17 18 19 20 21
 20 21 22 23 24 25 26  17 18 19 20 21 22 23  22 23 24 25 26 27 28
 27 28 29 30 31        24 25 26 27 28 29 30  29 30 31            

While intuitively straightforward, this task has many points of complexity:

  • While generating all dates in a year is trivial, thanks to D's std.datetime module, the order in which they must be processed is far from obvious, due to the following complicating factors:
  • Since we're writing to the console, we're limited to outputting one line at a time; we can't draw one cell of the grid and then go back up a few lines, move a few columns over, and draw the next cell in the grid. We have to somehow print the first lines of all cells in the top row, followed by the second lines, then the third lines, etc., and repeat this process for each row in the grid.
  • As a result, the order in which the days in a month are processed is not the natural, chronological order. We have to assemble the dates for the first weeks in each of the first 3 months, say, (if we are putting 3 months per row in the grid), print those out, then assemble the dates for the second weeks in each month, print those out, etc..
  • Furthermore, within the rows representing each week, some days may be missing, depending on where the boundaries of adjacent months fall; these missing days must then be filled out in the following month's first week before the first full week in the month is printed. It is not that simple to figure out where a week starts and ends, and how many rows are needed per month.
  • If some months have more full weeks than others, they may occupy less vertical space than other months on the same row in the grid; so we need to insert blank spaces into these shorter months in order for the grid cells to line up vertically in the output.

Considering all of the above points, it would appear at first glance that we are doomed to writing algorithms specific only to this task, because each piece of the puzzle depends on the others in complex ways. It would appear hopeless to actually get any reusable components out of our calendar program! And indeed, this would be the case if we approached the problem from the traditional imperative approach.

Furthermore, such a complex algorithm would be difficult to write, and would be more prone to bugs because of its complexity.

Sources of Complexity

One of the more influential courses I took in college was on Jackson Structured Programming. It identified two sources of programming complexity (i.e., where bugs are most likely to occur):

  1. Mismatches between the structure of the program and the structure of the data (e.g., you're reading an input file that has a preamble, body, and epilogue, but your code has a single loop over lines in the file), or between two or more data structures that you are processing at the same time (e.g., laying out a yearly calendar where the boundaries of weeks don't correspond with the boundaries of the months, and the grid structure doesn't correspond with the line-by-line sequence of the final output);
  2. Writing loop invariants (or equivalently, loop conditions).

Most non-trivial loops in imperative code have both, which makes them doubly prone to bugs. In the example of reading a file with three sections, the mismatch between the code structure (a single loop) and the file structure (three sequential sections) often prompts people to add boolean flags, state variables, and the like, in order to resolve the conflict between the two structures. Such ad hoc structure resolutions are a breeding ground for bugs, and often lead to complicated loop conditions, which invite even more bugs.

Untangling Complexity

In contrast, if you structure your code according to the structure of the input (i.e., one loop for processing the preamble, one loop for processing the body, one loop for processing the epilogue), it becomes considerably less complex, easier to read (and write!), and far less bug prone. Your loop conditions become simpler, and thus easier to reason about and leave less room for bugs to hide.

But to be able to process the input in this way requires that we encapsulate our input so that it can be processed by 3 different loops. Once we go down that road, we start to arrive at the concept of input ranges... then we abstract away the three loops into three components, and thus we arrive at Walter's component-style programming, where each component has a well-defined input and output, and the transformation process from input to output has a straightforward correspondence.