unicodeJS

Show:

Namespace for all UnicodeJS classes, static methods and static properties.

Source:

Classes

TextString

Namespaces

characterclass
graphemebreak
wordbreak

Methods

(static) charRangeArrayRegexp(ranges) → {string}

...

Make a regexp string for an array of Unicode character ranges.

If either character in a range is above 0xFFFF, then the range will be encoded as multiple surrogate pair ranges. It is an error for a range to overlap with the surrogate range 0xD800-0xDFFF (as this would only match ill-formed strings).

Parameters:
Name Type Description
ranges Array

Array of ranges, each of which is a character or an interval

Source:
Returns:

Regexp string for the disjunction of the ranges.

Type
string

(private, static) codeUnitRange(min, max, bracketopt) → {string}

...

Return a regexp string for the code unit range min-max

Parameters:
Name Type Attributes Description
min number

the minimum code unit in the range.

max number

the maximum code unit in the range.

bracket boolean <optional>

If true, then wrap range in [ ... ]

Source:
Returns:

Regexp string which matches the range

Type
string

(private, static) getCodeUnitBoxes(ch1, ch2) → {Array.<Object>}

...

Get a list of boxes in hi-lo surrogate space, corresponding to the given character range

A box {hi: [x, y], lo: [z, w]} represents a regex [x-y][z-w] to match a surrogate pair

Suppose ch1 and ch2 have surrogate pairs (hi1, lo1) and (hi2, lo2). Then the range of chars from ch1 to ch2 can be represented as the disjunction of three code unit ranges:

[hi1 - hi1][lo1 - 0xDFFF]
 |
[hi1+1 - hi2-1][0xDC00 - 0xDFFF]
 |
[hi2 - hi2][0xD800 - lo2]

Often the notation can be optimised (e.g. when hi1 == hi2).

Parameters:
Name Type Description
ch1 number

The min character of the range; must be over 0xFFFF

ch2 number

The max character of the range; must be at least ch1

Source:
Returns:

A list of boxes where each box is an object with two properties: 'hi' and 'lo'. 'hi' is an array of two numbers representing the range of the high surrogate. 'lo' is an array of two numbers representing the range of the low surrogate.

Type
Array.<Object>

(static) isLeadingSurrogate(unit) → {boolean}

...

Check if a code unit is a the leading half of a surrogate pair

Parameters:
Name Type Description
unit string

Code unit

Source:
Returns:
Type
boolean

(static) isTrailingSurrogate(unit) → {boolean}

...

Check if a code unit is a the trailing half of a surrogate pair

Parameters:
Name Type Description
unit string

Code unit

Source:
Returns:
Type
boolean

(private, static) uEsc(codeUnit) → {string}

...

Write a UTF-16 code unit as a javascript string literal.

Parameters:
Name Type Description
codeUnit number

integer between 0x0000 and 0xFFFF

Source:
Returns:

String literal ('\u' followed by 4 hex digits)

Type
string