gapWeightedSimilarityIncremental

Similar to gapWeightedSimilarity, just works in an incremental manner by first revealing the matches of length 1, then gapped matches of length 2, and so on. The memory requirement is O(s.length * t.length). The time complexity is O(s.length * t.length) time for computing each step. Continuing on the previous example:

The implementation is based on the pseudocode in Fig. 4 of the paper "Efﬁcient Computation of Gapped Substring Kernels on Large Alphabets" by Rousu et al., with additional algorithmic and systems-level optimizations.

GapWeightedSimilarityIncremental!(R, F)

(

)

(

R r1

R r2

F penalty

)

Examples

string[] s = ["Hello", "brave", "new", "world"];
string[] t = ["Hello", "new", "world"];
auto simIter = gapWeightedSimilarityIncremental(s, t, 1.0);
assert(simIter.front == 3); // three 1-length matches
simIter.popFront();
assert(simIter.front == 3); // three 2-length matches
simIter.popFront();
assert(simIter.front == 1); // one 3-length match
simIter.popFront();
assert(simIter.empty);     // no more match

gapWeightedSimilarityIncremental

Examples

Meta

Source