Difference between revisions of "Efficientcodesnippets draft in progress: useful snippets for the D newcomer on writing efficient code"

From D Wiki
Jump to: navigation, search
(deleted things - see page discussion.)
Line 3: Line 3:
 
Foreach vs for
 
Foreach vs for
 
==============
 
==============
 
foreach() is good for linked list because it uses a delegate so
 
it avoids to cross the items from 0 to i at each iteration.
 
 
for() is good for arrays because of the pointer arithmetic.
 
 
But the D way for foreach'es is to use ranges: popFront etc,
 
  
 
foreach is just syntax sugar over a for loop. If there's any  
 
foreach is just syntax sugar over a for loop. If there's any  
Line 55: Line 48:
  
 
opApply vs Range
 
opApply vs Range
 +
 
=================
 
=================
 
http://forum.dlang.org/post/mailman.942.1292183237.21107.digitalmars-d-learn@puremagic.com
 
http://forum.dlang.org/post/mailman.942.1292183237.21107.digitalmars-d-learn@puremagic.com

Revision as of 20:21, 1 February 2015

Thanks to Adam D Ruppe, Bbaz, Ola Fosheim Grostad

Foreach vs for

==

foreach is just syntax sugar over a for loop. If there's any allocations, it is because your code had some, it isn't inherit to the loop. The doc definition even lists the translation of foreach to for in the case of ranges explicitly:

http://dlang.org/statement.html#ForeachStatement


The most likely allocation would be to a user-defined opApply delegate, and you can prevent that by making it opApply(scope your_delegate) - the scope word prevents any closure allocation.

There is always significant optimization effects in long running loops: - SIMD - cache locality / prefetching

For the former (SIMD) you need to make sure that good code is generated either by hand, by using vectorized libraries or by auto vectorization.

For the latter (cache) you need to make sure that the prefetcher is able to predict or is being told to prefetch explicitly and also that the working set is small enough to stay at the faster cache levels.

If you want good performance you cannot ignore any of these, and you have to design the data structures and algorithms for it. Prefetching has to happen maybe 100 instructions before the actual load from memory and AVX requires byte alignment and a layout that fits the algorithm. On next gen Xeon Skylake I think the alignment might go up to 64 byte and you have 512 bits wide registers (so you can do 8 64 bit floating point operations in parallel per core). The difference between issuing 1-4 ops and issuing 8-16 per time unit is noticable...

An of course, the closer your code is to theoretical throughput in the CPU, the more critical it becomes to not wait for memory loads.

This is also a moving target...


opApply vs Range

=====

http://forum.dlang.org/post/mailman.942.1292183237.21107.digitalmars-d-learn@puremagic.com