TaskPool.WorkerLocalStorage

Struct for creating worker-local storage. Worker-local storage is thread-local storage that exists only for worker threads in a given TaskPool plus a single thread outside the pool. It is allocated on the garbage collected heap in a way that avoids _false sharing, and doesn't necessarily have global scope within any thread. It can be accessed from any worker thread in the TaskPool that created it, and one thread outside this TaskPool. All threads outside the pool that created a given instance of worker-local storage share a single slot.

Since the underlying data for this struct is heap-allocated, this struct has reference semantics when passed between functions.

The main uses cases for WorkerLocalStorage are:

1. Performing parallel reductions with an imperative, as opposed to functional, programming style. In this case, it's useful to treat WorkerLocalStorage as local to each thread for only the parallel portion of an algorithm.

2. Recycling temporary buffers across iterations of a parallel foreach loop.

class TaskPool
static
struct WorkerLocalStorage (
T
) {}

Members

Functions

get
ref get()

Get the current thread's instance. Returns by ref. Note that calling get from any thread outside the TaskPool that created this instance will return the same reference, so an instance of worker-local storage should only be accessed from one thread outside the pool that created it. If this rule is violated, undefined behavior will result.

get
void get(T val)

Assign a value to the current thread's instance. This function has the same caveats as its overload.

toRange
WorkerLocalStorageRange!T toRange()

Returns a range view of the values for all threads, which can be used to further process the results of each thread after running the parallel part of your algorithm. Do not use this method in the parallel portion of your algorithm.

Examples

// Calculate pi as in our synopsis example, but
// use an imperative instead of a functional style.
immutable n = 1_000_000_000;
immutable delta = 1.0L / n;

auto sums = taskPool.workerLocalStorage(0.0L);
foreach (i; parallel(iota(n)))
{
    immutable x = ( i - 0.5L ) * delta;
    immutable toAdd = delta / ( 1.0 + x * x );
    sums.get += toAdd;
}

// Add up the results from each worker thread.
real pi = 0;
foreach (threadResult; sums.toRange)
{
    pi += 4.0L * threadResult;
}

Meta