InversionList

InversionList is a set of $(CODEPOINTS) represented as an array of open-right [a, b) intervals (see CodepointInterval above). The name comes from the way the representation reads left to right. For instance a set of all values [10, 50), [80, 90), plus a singular value 60 looks like this:

10, 50, 60, 61, 80, 90

The way to read this is: start with negative meaning that all numbers smaller then the next one are not present in this set (and positive - the contrary). Then switch positive/negative after each number passed from left to right.

This way negative spans until 10, then positive until 50, then negative until 60, then positive until 61, and so on. As seen this provides a space-efficient storage of highly redundant data that comes in long runs. A description which Unicode $(CHARACTER) properties fit nicely. The technique itself could be seen as a variation on RLE encoding.

Sets are value types (just like int is) thus they are never aliased.

@safe

struct InversionList (

SP = GcPolicy

) {

auto assumeSorted(R r) via import std.range : assumeSorted;;

this(Set set);

this(Range intervals);

this(uint[] intervals);

auto byInterval [@property getter];

bool opIndex(uint val);

size_t length [@property getter];

This opBinary(U rhs);

This opOpAssign(U rhs);

bool opBinaryRight(U ch);

auto opUnary();

auto byCodepoint [@property getter];

void toString(Writer sink, FormatSpec!char fmt);

ref add(uint a, uint b);

auto inverted [@property getter];

string toSourceCode(string funcName);

bool empty [@property getter];

}

Constructors

this this(Set set): Construct from another code point set of any type.
this this(Range intervals): Construct a set from a forward range of code point intervals.
this this(uint[] intervals): Construct a set from plain values of code point intervals.

Members

Functions

add ref add(uint a, uint b)

Add an interval [a, b) to this set.

opBinary This opBinary(U rhs)

Sets support natural syntax for set algebra, namely:


Operator	Math notation	Description
&	a ∩ b	intersection
\|	a ∪ b	union
-	a ∖ b	subtraction
~	a ~ b	symmetric set difference i.e. (a ∪ b) \ (a ∩ b)

opBinaryRight bool opBinaryRight(U ch)

Tests the presence of codepoint ch in this set, the same as opIndex.

opIndex bool opIndex(uint val)

Tests the presence of code point val in this set.

opOpAssign This opOpAssign(U rhs)

The 'op=' versions of the above overloaded operators.

opUnary auto opUnary()

Obtains a set that is the inversion of this set.

toSourceCode string toSourceCode(string funcName)

Generates string with D source code of unary function with name of funcName taking a single dchar argument. If funcName is empty the code is adjusted to be a lambda function.

toString void toString(Writer sink, FormatSpec!char fmt)

Obtain a textual representation of this InversionList in form of open-right intervals.

Properties

byCodepoint auto byCodepoint [@property getter]: A range that spans each $(CODEPOINT) in this set.
byInterval auto byInterval [@property getter]: Get range that spans all of the $(CODEPOINT) intervals in this InversionList.
empty bool empty [@property getter]: True if this set doesn't contain any $(CODEPOINTS).
inverted auto inverted [@property getter]: Obtains a set that is the inversion of this set.
length size_t length [@property getter]: Number of $(CODEPOINTS) in this set

Examples

auto a = CodepointSet('a', 'z'+1);
auto b = CodepointSet('A', 'Z'+1);
auto c = a;
a = a | b;
assert(a == CodepointSet('A', 'Z'+1, 'a', 'z'+1));
assert(a != c);

See also unicode for simpler construction of sets from predefined ones.

Memory usage is 8 bytes per each contiguous interval in a set. The value semantics are achieved by using the COW technique and thus it's not safe to cast this type to $(D_KEYWORD shared).

Note:

It's not recommended to rely on the template parameters or the exact type of a current $(CODEPOINT) set in std.uni. The type and parameters may change when the standard allocators design is finalized. Use isCodepointSet with templates or just stick with the default alias CodepointSet throughout the whole code base.