Difference between revisions of "Read table data from file"

From D Wiki
Jump to: navigation, search
(Created page with "== Reading tabular data from file == To read in a text file with records in rows, where fields are separated by a separator (e.g. tab, whitespace), this code might help: <sy...")
 
m (Reading tabular data from file: Fixed typos and extended source code example)
 
(2 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
To read in a text file with records in rows, where fields are separated by a separator (e.g. tab, whitespace), this code might help:
 
To read in a text file with records in rows, where fields are separated by a separator (e.g. tab, whitespace), this code might help:
  
<syntaxhighlight lang="D" line highlight="9">
+
<syntaxhighlight lang="D" line highlight="11">
 +
module test.read;
 +
 
 
import std.stdio;
 
import std.stdio;
 
import std.array;
 
import std.array;
Line 11: Line 13:
 
     string[][] buffer;
 
     string[][] buffer;
  
     foreach (line; inputFile.byLines) {
+
     foreach (line; inputFile.byLine)
         buffer ~= split(line.dup, fieldSeparator);
+
         buffer ~= split(line.idup, fieldSeparator);
  
 
     return buffer;
 
     return buffer;
 +
}
 +
 +
void main(string[] args)
 +
{
 +
    if (args.length > 1)
 +
        writeln(readInData(
 +
            File(args[1]),
 +
            args.length > 2 ? args[2] : " "
 +
        ));
 
}
 
}
 
</syntaxhighlight>
 
</syntaxhighlight>
  
  
The not so obvious usage of '''.dup''' property is necessary here in order to avoid memory corruption of the output multidimensional string array.
+
The not so obvious usage of '''.idup''' property is necessary here in order to avoid memory rewrite of the output multidimensional string array.
 
There are couple of reasons for this:
 
There are couple of reasons for this:
 
*the LineReader defined with <code>File.byLine()</code> reuses its buffer for efficiency and
 
*the LineReader defined with <code>File.byLine()</code> reuses its buffer for efficiency and
Line 28: Line 39:
  
 
The information provided here was first discussed on [http://forum.dlang.org/post/blciltrhymxcwpdcfrbp@forum.dlang.org this thread].
 
The information provided here was first discussed on [http://forum.dlang.org/post/blciltrhymxcwpdcfrbp@forum.dlang.org this thread].
 +
 +
[[Category:HowTo]]

Latest revision as of 17:45, 22 September 2017

Reading tabular data from file

To read in a text file with records in rows, where fields are separated by a separator (e.g. tab, whitespace), this code might help:

 1 module test.read;
 2 
 3 import std.stdio;
 4 import std.array;
 5 
 6 auto readInData(File inputFile, string fieldSeparator)
 7 {
 8     string[][] buffer;
 9 
10     foreach (line; inputFile.byLine)
11         buffer ~= split(line.idup, fieldSeparator);
12 
13     return buffer;
14 }
15 
16 void main(string[] args)
17 {
18     if (args.length > 1)
19         writeln(readInData(
20             File(args[1]),
21             args.length > 2 ? args[2] : " "
22         ));
23 }


The not so obvious usage of .idup property is necessary here in order to avoid memory rewrite of the output multidimensional string array. There are couple of reasons for this:

  • the LineReader defined with File.byLine() reuses its buffer for efficiency and
  • split() function is optimized to return slices into its input buffer (line in this case) instead of copying each substring to output.

Without the line being duplicated the output buffer gets overwritten in every iteration.

Credits

The information provided here was first discussed on this thread.