Difference between revisions of "Efficientcodesnippets draft in progress: useful snippets for the D newcomer on writing efficient code"

From D Wiki
Jump to: navigation, search
(deleted things - see page discussion.)
(Replaced content with "Removed at the request of Bbaz. I shall let someone else take this forward")
 
Line 1: Line 1:
Thanks to Adam D Ruppe, Bbaz, Ola Fosheim Grostad
+
Removed at the request of Bbaz. I shall let someone else take this forward
 
 
Foreach vs for
 
==============
 
 
 
foreach is just syntax sugar over a for loop. If there's any
 
allocations, it is because your code had some, it isn't inherit
 
to the loop. The doc definition even lists the translation of
 
foreach to for in the case of ranges explicitly:
 
 
 
http://dlang.org/statement.html#ForeachStatement
 
 
 
 
 
The most likely allocation would be to a user-defined opApply
 
delegate, and you can prevent that by making it opApply(scope
 
your_delegate) - the scope word prevents any closure allocation.
 
 
 
There is always significant optimization effects in long running
 
loops:
 
- SIMD
 
- cache locality / prefetching
 
 
 
For the former (SIMD) you need to make sure that good code is
 
generated either by hand, by using vectorized libraries or by
 
auto vectorization.
 
 
 
For the latter (cache) you need to make sure that the prefetcher
 
is able to predict or is being told to prefetch explicitly and
 
also that the working set is small enough to stay at the faster
 
cache levels.
 
 
 
If you want good performance you cannot ignore any of these, and
 
you have to design the data structures and algorithms for it.  
 
Prefetching has to happen maybe 100 instructions before the
 
actual load from memory and AVX requires byte alignment and a
 
layout that fits the algorithm. On next gen Xeon Skylake I think
 
the alignment might go up to 64 byte and you have 512 bits wide
 
registers (so you can do 8 64 bit floating point operations in
 
parallel per core). The difference between issuing 1-4 ops and
 
issuing 8-16 per time unit is noticable...
 
 
 
An of course, the closer your code is to theoretical throughput
 
in the CPU, the more critical it becomes to not wait for memory
 
loads.
 
 
 
This is also a moving target...
 
 
 
 
 
opApply vs Range
 
 
 
=================
 
http://forum.dlang.org/post/mailman.942.1292183237.21107.digitalmars-d-learn@puremagic.com
 

Latest revision as of 23:54, 1 February 2015

Removed at the request of Bbaz. I shall let someone else take this forward