Expand all

unicodeJS

Namespace for all UnicodeJS classes, static methods and static properties.

Namespaces

characterclass
graphemebreak
wordbreak

Classes

TextString

Methods

charRangeArrayRegexp(ranges) → {string}static #

Make a regexp string for an array of Unicode character ranges.

If either character in a range is above 0xFFFF, then the range will be encoded as multiple surrogate pair ranges. It is an error for a range to overlap with the surrogate range 0xD800-0xDFFF (as this would only match ill-formed strings).

Parameters:

Name Type Description
ranges Array

Array of ranges, each of which is a character or an interval

Source:

Returns:

Regexp string for the disjunction of the ranges.

Type
string

codeUnitRange(min, max, [bracket]) → {string}privatestatic #

Return a regexp string for the code unit range min-max

Parameters:

Name Type Attributes Description
min number

the minimum code unit in the range.

max number

the maximum code unit in the range.

bracket boolean optional

If true, then wrap range in [ ... ]

Source:

Returns:

Regexp string which matches the range

Type
string

getCodeUnitBoxes(ch1, ch2) → {Array.<Object>}privatestatic #

Get a list of boxes in hi-lo surrogate space, corresponding to the given character range

A box {hi: [x, y], lo: [z, w]} represents a regex [x-y][z-w] to match a surrogate pair

Suppose ch1 and ch2 have surrogate pairs (hi1, lo1) and (hi2, lo2). Then the range of chars from ch1 to ch2 can be represented as the disjunction of three code unit ranges:

[hi1 - hi1][lo1 - 0xDFFF]
 |
[hi1+1 - hi2-1][0xDC00 - 0xDFFF]
 |
[hi2 - hi2][0xD800 - lo2]

Often the notation can be optimised (e.g. when hi1 == hi2).

Parameters:

Name Type Description
ch1 number

The min character of the range; must be over 0xFFFF

ch2 number

The max character of the range; must be at least ch1

Source:

Returns:

A list of boxes where each box is an object with two properties: 'hi' and 'lo'. 'hi' is an array of two numbers representing the range of the high surrogate. 'lo' is an array of two numbers representing the range of the low surrogate.

Type
Array.<Object>

isLeadingSurrogate(unit) → {boolean}static #

Check if a code unit is a the leading half of a surrogate pair

Parameters:

Name Type Description
unit string

Code unit

Source:

Returns:

Type
boolean

isTrailingSurrogate(unit) → {boolean}static #

Check if a code unit is a the trailing half of a surrogate pair

Parameters:

Name Type Description
unit string

Code unit

Source:

Returns:

Type
boolean

uEsc(codeUnit) → {string}privatestatic #

Write a UTF-16 code unit as a javascript string literal.

Parameters:

Name Type Description
codeUnit number

integer between 0x0000 and 0xFFFF

Source:

Returns:

String literal ('\u' followed by 4 hex digits)

Type
string