byUTF

Iterate an input range of characters by char type C by encoding the elements of the range.

UTF sequences that cannot be converted to the specified encoding are either replaced by U+FFFD per "5.22 Best Practice for U+FFFD Substitution" of the Unicode Standard 6.2 or result in a thrown UTFException. Hence byUTF is not symmetric. This algorithm is lazy, and does not allocate memory. @nogc, pure-ity, nothrow, and @safe-ty are inferred from the r parameter.

template byUTF(C, UseReplacementDchar useReplacementDchar = Yes.useReplacementDchar)
ref
byUTF
(
R
)
(
R r
)
if (
isInputRange!R
&&
isSomeChar!(ElementEncodingType!R)
)

Parameters

C

char, wchar, or dchar

useReplacementDchar

UseReplacementDchar.yes means replace invalid UTF with replacementDchar, UseReplacementDchar.no means throw UTFException for invalid UTF

Return Value

A bidirectional range if R is a bidirectional range and not auto-decodable, as defined by std.traits.isAutodecodableString.

A forward range if R is a forward range and not auto-decodable.

Or, if R is a range and it is auto-decodable and is(ElementEncodingType!typeof(r) == C), then the range is passed to byCodeUnit.

Otherwise, an input range of characters.

Throws

UTFException if invalid UTF sequence and useReplacementDchar is set to UseReplacementDchar.no

GC: Does not use GC if useReplacementDchar is set to UseReplacementDchar.yes

Examples

import std.algorithm.comparison : equal;

// hellö as a range of `char`s, which are UTF-8
assert("hell\u00F6".byUTF!char().equal(['h', 'e', 'l', 'l', 0xC3, 0xB6]));

// `wchar`s are able to hold the ö in a single element (UTF-16 code unit)
assert("hell\u00F6".byUTF!wchar().equal(['h', 'e', 'l', 'l', 'ö']));

// 𐐷 is four code units in UTF-8, two in UTF-16, and one in UTF-32
assert("𐐷".byUTF!char().equal([0xF0, 0x90, 0x90, 0xB7]));
assert("𐐷".byUTF!wchar().equal([0xD801, 0xDC37]));
assert("𐐷".byUTF!dchar().equal([0x00010437]));
import std.algorithm.comparison : equal;
import std.exception : assertThrown;

assert("hello\xF0betty".byChar.byUTF!(dchar, UseReplacementDchar.yes).equal("hello\uFFFDetty"));
assertThrown!UTFException("hello\xF0betty".byChar.byUTF!(dchar, UseReplacementDchar.no).equal("hello betty"));
import std.range.primitives;
wchar[] s = ['ă', 'î'];

auto rc = s.byUTF!char;
static assert(isBidirectionalRange!(typeof(rc)));
assert(rc.back == 0xae);
rc.popBack;
assert(rc.back == 0xc3);
rc.popBack;
assert(rc.back == 0x83);
rc.popBack;
assert(rc.back == 0xc4);

auto rw = s.byUTF!wchar;
static assert(isBidirectionalRange!(typeof(rw)));
assert(rw.back == 'î');
rw.popBack;
assert(rw.back == 'ă');

auto rd = s.byUTF!dchar;
static assert(isBidirectionalRange!(typeof(rd)));
assert(rd.back == 'î');
rd.popBack;
assert(rd.back == 'ă');

Meta