// Find the logarithm of every number from 1 to // 10_000_000 in parallel. auto logs = new double[10_000_000]; // Parallel foreach works with or without an index // variable. It can iterate by ref if range.front // returns by ref. // Iterate over logs using work units of size 100. foreach (i, ref elem; taskPool.parallel(logs, 100)) { elem = log(i + 1.0); } // Same thing, but use the default work unit size. // // Timings on an Athlon 64 X2 dual core machine: // // Parallel foreach: 388 milliseconds // Regular foreach: 619 milliseconds foreach (i, ref elem; taskPool.parallel(logs)) { elem = log(i + 1.0); }
Notes:
The memory usage of this implementation is guaranteed to be constant in range.length.
Breaking from a parallel foreach loop via a break, labeled break, labeled continue, return or goto statement throws a ParallelForeachError.
In the case of non-random access ranges, parallel foreach buffers lazily to an array of size workUnitSize before executing the parallel portion of the loop. The exception is that, if a parallel foreach is executed over a range returned by asyncBuf or map, the copying is elided and the buffers are simply swapped. In this case workUnitSize is ignored and the work unit size is set to the buffer size of range.
A memory barrier is guaranteed to be executed on exit from the loop, so that results produced by all threads are visible in the calling thread.
Exception Handling:
When at least one exception is thrown from inside a parallel foreach loop, the submission of additional Task objects is terminated as soon as possible, in a non-deterministic manner. All executing or enqueued work units are allowed to complete. Then, all exceptions that were thrown by any work unit are chained using Throwable.next and rethrown. The order of the exception chaining is non-deterministic.
Implements a parallel foreach loop over a range. This works by implicitly creating and submitting one Task to the TaskPool for each worker thread. A work unit is a set of consecutive elements of range to be processed by a worker thread between communication with any other thread. The number of elements processed per work unit is controlled by the workUnitSize parameter. Smaller work units provide better load balancing, but larger work units avoid the overhead of communicating with other threads frequently to fetch the next work unit. Large work units also avoid false sharing in cases where the range is being modified. The less time a single iteration of the loop takes, the larger workUnitSize should be. For very expensive loop bodies, workUnitSize should be 1. An overload that chooses a default work unit size is also available.