tools.chars — Character Based Helper Functions#

Character based helper functions (not wiki-dependent).

tools.chars.contains_invisible(text)[source]#

Return True if the text contain any of the invisible characters.

tools.chars.replace_invisible(text)[source]#

Replace invisible characters by ‘<codepoint>’.

tools.chars.string2html(string, encoding)[source]#

Convert unicode string to requested HTML encoding.

Attempt to encode the string into the desired format; if that work return it unchanged. Otherwise encode the non-ASCII characters into HTML &#; entities.

Example:

>>> string2html('Referências', 'utf-8')
'Referências'
>>> string2html('Referências', 'ascii')
'Refer&#234;ncias'
>>> string2html('脚注', 'euc_jp')
'脚注'
>>> string2html('脚注', 'iso-8859-1')
'&#33050;&#27880;'
Parameters:
  • string (str) – String to update

  • encoding (str) – Encoding to use

Return type:

str

tools.chars.string_to_ascii_html(string)[source]#

Convert unicode chars of str to HTML entities if chars are not ASCII.

Example:

>>> string_to_ascii_html('Python')
'Python'
>>> string_to_ascii_html("Pywikibot's API")
"Pywikibot's API"
>>> string_to_ascii_html('Eetße Joohunndot füür Kreůßtůß')
'Eet&#223;e Joohunndot f&#252;&#252;r Kre&#367;&#223;t&#367;&#223;'
Parameters:

string (str) – String to update

Return type:

str

tools.chars.url2string(title, encodings='utf-8')[source]#

Convert URL-encoded text to unicode using several encoding.

Uses the first encoding that doesn’t cause an error. Raises the first exception if all encodings fails.

For a single encodings string this function is equvalent to urllib.parse.unquote(title, encodings, errors='strict')

Changed in version 8.4: Ignore LookupError and try other encodings.

Example:

>>> url2string('abc%20def')
'abc def'
>>> url2string('/El%20Ni%C3%B1o/')
'/El Niño/'
>>> url2string('/El%20Ni%C3%B1o/', 'ascii')
Traceback (most recent call last):
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6:...
>>> url2string('/El%20Ni%C3%B1o/', ['ascii', 'utf-8'])
'/El Niño/'
Parameters:
  • title (str) – URL-encoded character data to convert

  • encodings (str | Iterable[str]) – Encodings to attempt to use during conversion.

Raises:
  • UnicodeError – Could not convert using any encoding.

  • LookupError – unknown encoding

Return type:

str