largestPartialIntersection

Given a range of sorted forward ranges ror, copies to tgt the elements that are common to most ranges, along with their number of occurrences. All ranges in ror are assumed to be sorted by less. Only the most frequent tgt.length elements are returned.

void
largestPartialIntersection
(
alias less = "a < b"
RangeOfRanges
Range
)
(
RangeOfRanges ror
,
Range tgt
,
SortOutput sorted = No.sortOutput
)

Parameters

less

The predicate the ranges are sorted by.

ror RangeOfRanges

A range of forward ranges sorted by less.

tgt Range

The target range to copy common elements to.

sorted SortOutput

Whether the elements copied should be in sorted order.

The function largestPartialIntersection is useful for e.g. searching an inverted index for the documents most likely to contain some terms of interest. The complexity of the search is O(n * log(tgt.length)), where n is the sum of lengths of all input ranges. This approach is faster than keeping an associative array of the occurrences and then selecting its top items, and also requires less memory (largestPartialIntersection builds its result directly in tgt and requires no extra memory).

If at least one of the ranges is a multiset, then all occurences of a duplicate element are taken into account. The result is equivalent to merging all ranges and picking the most frequent tgt.length elements.

Warning: Because largestPartialIntersection does not allocate extra memory, it will leave ror modified. Namely, largestPartialIntersection assumes ownership of ror and discretionarily swaps and advances elements of it. If you want ror to preserve its contents after the call, you may want to pass a duplicate to largestPartialIntersection (and perhaps cache the duplicate in between calls).

Examples

import std.typecons : tuple, Tuple;

// Figure which number can be found in most arrays of the set of
// arrays below.
double[][] a =
[
    [ 1, 4, 7, 8 ],
    [ 1, 7 ],
    [ 1, 7, 8],
    [ 4 ],
    [ 7 ],
];
auto b = new Tuple!(double, uint)[1];
// it will modify the input range, hence we need to create a duplicate
largestPartialIntersection(a.dup, b);
// First member is the item, second is the occurrence count
assert(b[0] == tuple(7.0, 4u));
// 7.0 occurs in 4 out of 5 inputs, more than any other number

// If more of the top-frequent numbers are needed, just create a larger
// tgt range
auto c = new Tuple!(double, uint)[2];
largestPartialIntersection(a, c);
assert(c[0] == tuple(1.0, 3u));
// 1.0 occurs in 3 inputs

// multiset
double[][] x =
[
    [1, 1, 1, 1, 4, 7, 8],
    [1, 7],
    [1, 7, 8],
    [4, 7],
    [7]
];
auto y = new Tuple!(double, uint)[2];
largestPartialIntersection(x.dup, y);
// 7.0 occurs 5 times
assert(y[0] == tuple(7.0, 5u));
// 1.0 occurs 6 times
assert(y[1] == tuple(1.0, 6u));

Meta