tools.chars
— Character Based Helper Functions#
Character based helper functions (not wiki-dependent).
- tools.chars.contains_invisible(text)[source]#
Return True if the text contain any of the invisible characters.
- tools.chars.string2html(string, encoding)[source]#
Convert unicode string to requested HTML encoding.
Attempt to encode the string into the desired format; if that work return it unchanged. Otherwise encode the non-ASCII characters into HTML &#; entities.
Example:
>>> string2html('Referências', 'utf-8') 'Referências' >>> string2html('Referências', 'ascii') 'Referências' >>> string2html('脚注', 'euc_jp') '脚注' >>> string2html('脚注', 'iso-8859-1') '脚注'
- Parameters:
string (str) – String to update
encoding (str) – Encoding to use
- Return type:
str
- tools.chars.string_to_ascii_html(string)[source]#
Convert unicode chars of str to HTML entities if chars are not ASCII.
Example:
>>> string_to_ascii_html('Python') 'Python' >>> string_to_ascii_html("Pywikibot's API") "Pywikibot's API" >>> string_to_ascii_html('Eetße Joohunndot füür Kreůßtůß') 'Eetße Joohunndot füür Kreůßtůß'
- Parameters:
string (str) – String to update
- Return type:
str
- tools.chars.url2string(title, encodings='utf-8')[source]#
Convert URL-encoded text to unicode using several encoding.
Uses the first encoding that doesn’t cause an error. Raises the first exception if all encodings fails.
For a single encodings string this function is equvalent to
urllib.parse.unquote(title, encodings, errors='strict')
Changed in version 8.4: Ignore LookupError and try other encodings.
See also
Example:
>>> url2string('abc%20def') 'abc def' >>> url2string('/El%20Ni%C3%B1o/') '/El Niño/' >>> url2string('/El%20Ni%C3%B1o/', 'ascii') Traceback (most recent call last): ... UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6:... >>> url2string('/El%20Ni%C3%B1o/', ['ascii', 'utf-8']) '/El Niño/'
- Parameters:
title (str) – URL-encoded character data to convert
encodings (str | Iterable[str]) – Encodings to attempt to use during conversion.
- Raises:
UnicodeError – Could not convert using any encoding.
LookupError – unknown encoding
- Return type:
str