Scripts package#

Scripts folder contains predefined scripts easy to use.

Scripts are only available with Pywikibot if installed in directory mode and not as site package. They can be run in command line using the pwb wrapper script:

python pwb.py <global options> <name_of_script> <options>

Every script provides a -help option which shows all available options, their explanation and usage examples. Global options will be shown by -help:global or using:

python pwb.py -help

The advantages of pwb.py wrapper script are:

  • check for framework and script depedencies and show a warning if a package is missing or outdated or if the Python release does not fit

  • check whether user config file (user-config.py) is available and ask to create it by starting the generate_user_files.py script

  • enable global options even if a script does not support them

  • start private scripts located in userscripts sub-folder

  • find a script even if given script name does not match a filename e.g. due to spelling mistake

scripts.base_dir = PosixPath('/src/scripts')#

defines the entry point for pywikibot-scripts package

add_text script#

Append text to the top or bottom of a page

By default this adds the text to the bottom above the categories and interwiki.

Use the following command line parameters to specify what to add:

-text

(str) Text to append. “n” are interpreted as newlines.

-textfile

(str) Path to a file with text to append

-summary

(str) Change summary to use

-up

Append text to the top of the page rather than the bottom

-create

Create the page if necessary. Note that talk pages are created already without of this option.

-createonly

Only create the page but do not edit existing ones

-always

If used, the bot won’t ask if it should add the specified text

-major

If used, the edit will be saved without the “minor edit” flag

-talk, -talkpage

Put the text onto the talk page instead

-excepturl

(str) Skip pages with a url that matches this regular expression

-noreorder

Place the text beneath the categories and interwiki

Furthermore, the following can be used to specify which pages to process…

This script supports use of pagegenerators arguments.

Examples

Append ‘hello world’ to the bottom of the sandbox:

python pwb.py add_text -page:Wikipedia:Sandbox
-summary:"Bot: pywikibot practice" -text:"hello world"

Add a template to the top of the pages with ‘category:catname’:

python pwb.py add_text -cat:catname -summary:"Bot: Adding a template"
-text:"{{Something}}" -except:"\{\{([Tt]emplate:|)[Ss]omething" -up

Command used on it.wikipedia to put the template in the page without any category:

python pwb.py add_text -except:"\{\{([Tt]emplate:|)[Cc]ategorizzare"
-text:"{{Categorizzare}}" -excepturl:"class='catlinks'>" -uncat
-summary:"Bot: Aggiungo template Categorizzare"
class scripts.add_text.AddTextBot(**kwargs)[source]#

Bases: AutomaticTWSummaryBot, ExistingPageBot

A bot which adds a text to a page.

Parameters:

kwargs (Any) – bot options

Keyword Arguments:

generator – a generator processed by run() method

setup()[source]#

Read text to be added from file.

Return type:

None

skip_page(page)[source]#

Skip if -exceptUrl matches or page does not exists.

summary_key: str | None = 'add_text-adding'#

Must be defined in subclasses.

property summary_parameters#

Return a dictionary of all parameters for i18n.

Line breaks are replaced by dash.

treat_page()[source]#

Add text to the page.

Return type:

None

update_options: dict[str, Any] = {'always': False, 'create': False, 'createonly': False, 'minor': True, 'regex_skip_url': '', 'reorder': True, 'summary': '', 'talk_page': False, 'text': '', 'textfile': '', 'up': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.add_text.main(*argv)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

argv (str) – Command line arguments

Return type:

None

scripts.add_text.parse(argv, generator_factory)[source]#

Parses our arguments and provide a dictionary with their values.

Parameters:
  • argv (Sequence[str]) – input arguments to be parsed

  • generator_factory (GeneratorFactory) – factory that will determine what pages to process

Returns:

dictionary with our parsed arguments

Raises:

ValueError – if we receive invalid arguments

Return type:

dict[str, bool | str]

archivebot script#

archivebot.py - discussion page archiving bot

usage:

python pwb.py archivebot [OPTIONS] [TEMPLATE_PAGE]

Several TEMPLATE_PAGE templates can be given at once. Default is User:MiszaBot/config. Bot examines backlinks (Special:WhatLinksHere) to all TEMPLATE_PAGE templates. Then goes through all pages (unless a specific page specified using options) and archives old discussions. This is done by breaking a page into threads, then scanning each thread for timestamps. Threads older than a specified threshold are then moved to another page (the archive), which can be named either basing on the thread’s name or then name can contain a counter which will be incremented when the archive reaches a certain size.

Transcluded template may contain the following parameters:

{{TEMPLATE_PAGE
|archive =
|algo =
|counter =
|maxarchivesize =
|minthreadsleft =
|minthreadstoarchive =
|archiveheader =
|key =
}}

Meanings of parameters are:

archive

Name of the page to which archived threads will be put. Must be a subpage of the current page. Variables are supported.

algo

Specifies the maximum age of a thread. Must be in the form old(<delay>) where <delay> specifies the age in seconds (s), hours (h), days (d), weeks (w), or years (y) like 24h or 5d. Default is old(24h).

counter

The current value of a counter which could be assigned as variable. Will be updated by bot. Initial value is 1.

maxarchivesize

The maximum archive size before incrementing the counter. Value can be given with appending letter like K or M which indicates KByte or MByte. Default value is 200K.

minthreadsleft

Minimum number of threads that should be left on a page. Default value is 5.

minthreadstoarchive

The minimum number of threads to archive at once. Default value is 2.

archiveheader

Content that will be put on new archive pages as the header. This parameter supports the use of variables. Default value is {{talkarchive}}.

key

A secret key that (if valid) allows archives not to be subpages of the page being archived.

Variables below can be used in the value for “archive” in the template above; numbers are latin digits. Alternatively you may use localized digits. This is only available for a few site languages. Refer NON_LATIN_DIGITS whether there is a localized one.

latin

localized

Description

%(counter)d

%(localcounter)s

the current value of the counter

%(year)d

%(localyear)s

year of the thread being archived

%(isoyear)d

%(localisoyear)s

ISO year of the thread being archived

%(isoweek)d

%(localisoweek)s

ISO week number of the thread being archived

%(semester)d

%(localsemester)s

semester term of the year of the thread being archived

%(quarter)d

%(localquarter)s

quarter of the year of the thread being archived

%(month)d

%(localmonth)s

month (as a number 1-12) of the thread being archived

%(monthname)s

localized name of the month above

%(monthnameshort)s

first three letters of the name above

%(week)d

%(localweek)s

week number of the thread being archived

The ISO calendar starts with the Monday of the week which has at least four days in the new Gregorian calendar. If January 1st is between Monday and Thursday (including), the first week of that year started the Monday of that week, which is in the year before if January 1st is not a Monday. If it’s between Friday or Sunday (including) the following week is then the first week of the year. So up to three days are still counted as the year before.

Options (may be omitted):

-help

show this help message and exit

-calc:PAGE

calculate key for PAGE and exit

-file:FILE

load list of pages from FILE

-force

override security options

-locale:LOCALE

switch to locale LOCALE

-namespace:NS

only archive pages from a given namespace

-page:PAGE

archive a single PAGE, default ns is a user talk page

-salt:SALT

specify salt

-keep

Preserve thread order in archive even if threads are archived later

-sort

Sort archive by timestamp; should not be used with keep

-async

Run the bot in parallel tasks.

Changed in version 7.6: Localized variables for “archive” template parameter are supported. User:MiszaBot/config is the default template. -keep option was added.

Changed in version 7.7: -sort and -async options were added.

Changed in version 8.2: KeyboardInterrupt was enabled with -async option.

exception scripts.archivebot.ArchiveBotSiteConfigError(arg)[source]#

Bases: Error

There is an error originated by archivebot’s on-site configuration.

Parameters:

arg (Exception | str)

Return type:

None

exception scripts.archivebot.ArchiveSecurityError(arg)[source]#

Bases: ArchiveBotSiteConfigError

Page title is not a valid archive of page being archived.

The page title is neither a subpage of the page being archived, nor does it match the key specified in the archive configuration template.

Parameters:

arg (Exception | str)

Return type:

None

class scripts.archivebot.DiscussionPage(source, archiver, params=None, keep=False)[source]#

Bases: Page

A class that represents a single page of discussion threads.

Feed threads to it and run an update() afterwards.

feed_thread(thread, max_archive_size)[source]#

Append a new thread to the archive.

Parameters:
Return type:

bool

is_full(max_archive_size)[source]#

Check whether archive size exceeded.

Parameters:

max_archive_size (tuple[int, str])

Return type:

bool

load_page()[source]#

Load the page to be archived and break it up into threads.

Changed in version 7.6: If -keep option is given run through all threads and set the current timestamp to the previous if the current is lower.

Changed in version 7.7: Load unsigned threads using timestamp of the next thread.

Return type:

None

static max(ts1, ts2)[source]#

Calculate the maximum of two timestamps but allow None as value.

Added in version 7.6.

Parameters:
Return type:

Timestamp | None

size()[source]#

Return size of talk page threads.

Note that this method counts bytes, rather than codepoints (characters). This corresponds to MediaWiki’s definition of page size.

Changed in version 7.6: return 0 if archive page neither exists nor has threads (T313886).

Return type:

int

update(summary, sort_threads=False)[source]#

Recombine threads and save page.

Parameters:

sort_threads (bool)

Return type:

None

class scripts.archivebot.DiscussionThread(title, timestripper)[source]#

Bases: object

An object representing a discussion thread on a page.

It represents something that is of the form:

== Title of thread ==

Thread content here. ~~~~
:Reply, etc. ~~~~
Parameters:
feed_line(line)[source]#

Add a line to the content and find the newest timestamp.

Parameters:

line (str)

Return type:

None

size()[source]#

Return size of discussion thread.

Note that the result is NOT equal to that of len(self.to_text()). This method counts bytes, rather than codepoints (characters). This corresponds to MediaWiki’s definition of page size.

Return type:

int

to_text()[source]#

Return wikitext discussion thread.

Return type:

str

exception scripts.archivebot.MalformedConfigError(arg)[source]#

Bases: ArchiveBotSiteConfigError

There is an error in the configuration template.

Parameters:

arg (Exception | str)

Return type:

None

exception scripts.archivebot.MissingConfigError(arg)[source]#

Bases: ArchiveBotSiteConfigError

The config is missing in the header.

It’s in one of the threads or transcluded from another page.

Parameters:

arg (Exception | str)

Return type:

None

class scripts.archivebot.PageArchiver(page, template, salt, force=False, keep=False, sort=False)[source]#

Bases: object

A class that encapsulates all archiving methods.

Parameters:
  • page (pywikibot.Page) – a page object to be archived

  • template (pywikibot.Page) – a template with configuration settings

  • salt (str) – salt value

  • force (bool) – override security value

  • keep (bool)

  • sort (bool)

algo = 'none'#
analyze_page()[source]#

Analyze DiscussionPage.

Return type:

set[tuple[str, str]]

attr2text()[source]#

Return a template with archiver saveable attributes.

Return type:

str

get_archive_page(title, params=None)[source]#

Return the page for archiving.

If it doesn’t exist yet, create and cache it. Also check for security violations.

Parameters:

title (str)

Return type:

DiscussionPage

get_attr(attr, default='')[source]#

Get an archiver attribute.

Return type:

Any

get_params(timestamp, counter)[source]#

Make params for archiving template.

Parameters:

counter (int)

Return type:

dict

key_ok()[source]#

Return whether key is valid.

Return type:

bool

load_config()[source]#

Load and validate archiver template.

Return type:

None

preload_pages(counter, thread, pattern)[source]#

Preload pages if counter matters.

Parameters:

counter (int)

Return type:

None

run()[source]#

Process a single DiscussionPage object.

Return type:

None

saveables()[source]#

Return a list of saveable attributes.

Return type:

list[str]

set_attr(attr, value, out=True)[source]#

Set an archiver attribute.

Parameters:

out (bool)

Return type:

None

should_archive_thread(thread)[source]#

Check whether a thread has to be archived.

Returns:

the archivation reason as a tuple of localization args

Parameters:

thread (DiscussionThread)

Return type:

tuple[str, str] | None

scripts.archivebot.calc_md5_hexdigest(txt, salt)[source]#

Return md5 hexdigest computed from text and salt.

Return type:

str

scripts.archivebot.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

scripts.archivebot.process_page(page, *args)[source]#

Call PageArchiver for a single page.

Returns:

Return True to continue with the next page, False to break the loop.

Parameters:

args (Any)

Return type:

bool

Added in version 7.6.

Changed in version 7.7: pass an unspecified number of arguments to the bot using *args

scripts.archivebot.show_md5_key(calc, salt, site)[source]#

Show calculated MD5 hexdigest.

Return type:

bool

scripts.archivebot.str2localized_duration(site, string)[source]#

Localise a shorthand duration.

Translates a duration written in the shorthand notation (ex. “24h”, “7d”) into an expression in the local wiki language (“24 hours”, “7 days”).

Parameters:

string (str)

Return type:

str

scripts.archivebot.str2size(string)[source]#

Return a size for a shorthand size.

Accepts a string defining a size:

1337 - 1337 bytes
150K - 150 kilobytes
2M - 2 megabytes
Returns:

a tuple (size, unit), where size is an integer and unit is 'B' (bytes) or 'T' (threads).

Parameters:

string (str)

Return type:

tuple[int, str]

scripts.archivebot.template_title_regex(tpl_page)[source]#

Return a regex that matches to variations of the template title.

It supports the transcluding variant as well as localized namespaces and case-insensitivity depending on the namespace.

Parameters:

tpl_page (pywikibot.page.Page) – The template page

Return type:

Pattern

basic script#

An incomplete sample script

This is not a complete bot; rather, it is a template from which simple bots can be made. You can rename it to mybot.py, then edit it in whatever way you want.

Use global -simulate option for test purposes. No changes to live wiki will be done.

The following parameters are supported:

-always

The bot won’t ask for confirmation when putting a page

-text:

Use this text to be added; otherwise ‘Test’ is used

-replace:

Don’t add text but replace it

-top

Place additional text on top of the page

-summary:

Set the action summary message for the edit.

This sample script is a ConfigParserBot. All settings can be made either by giving option with the command line or with a settings file which is scripts.ini by default. If you don’t want the default values you can add any option you want to change to that settings file below the [basic] section like:

[basic] ; inline comments starts with colon
# This is a commend line. Assignments may be done with '=' or ':'
text: A text with line break and
    continuing on next line to be put
replace: yes ; yes/no, on/off, true/false and 1/0 is also valid
summary = Bot: My first test edit with pywikibot

Every script has its own section with the script name as header.

In addition the following generators and filters are supported but cannot be set by settings file:

This script supports use of pagegenerators arguments.

class scripts.basic.BasicBot(site=True, **kwargs)[source]#

Bases: SingleSiteBot, ConfigParserBot, ExistingPageBot, AutomaticTWSummaryBot

An incomplete sample bot.

Variables:

summary_key – Edit summary message key. The message that should be used is placed on /i18n subdirectory. The file containing these messages should have the same name as the caller script (i.e. basic.py in this case). Use summary_key to set a default edit summary message.

Parameters:
  • site (BaseSite | bool | None)

  • kwargs (Any)

Create a SingleSiteBot instance.

Parameters:
  • site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.

  • kwargs (Any)

summary_key: str | None = 'basic-changing'#

Must be defined in subclasses.

treat_page()[source]#

Load the given page, do some changes, and save it.

Return type:

None

update_options: dict[str, Any] = {'replace': False, 'summary': None, 'text': 'Test', 'top': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.basic.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

blockpageschecker script#

A bot to remove stale protection templates from unprotected pages

Very often sysops block the pages for a set time but then they forget to remove the warning! This script is useful if you want to remove those useless warning left in these pages.

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-protectedpages

Check all the blocked pages; useful when you have not categories or when you have problems with them. (add the namespace after “:” where you want to check - default checks all protected pages.)

-moveprotected

Same as -protectedpages, for moveprotected pages

This script is a ConfigParserBot. The following options can be set within a settings file which is scripts.ini by default:

-always

Doesn’t ask every time whether the bot should make the change. Do it always.

-show

When the bot can’t delete the template from the page (wrong regex or something like that) it will ask you if it should show the page on your browser.

Attention

Pages included may give false positives!

-move

The bot will check if the page is blocked also for the move option, not only for edit

Examples:

python pwb.py blockpageschecker -always

python pwb.py blockpageschecker -cat:Geography -always

python pwb.py blockpageschecker -show -protectedpages:4
class scripts.blockpageschecker.CheckerBot(site=True, **kwargs)[source]#

Bases: ConfigParserBot, ExistingPageBot, SingleSiteBot

Bot to remove stale protection templates from unprotected pages.

Changed in version 7.0: CheckerBot is a ConfigParserBot

Create a SingleSiteBot instance.

Parameters:
  • site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.

  • kwargs (Any)

static invoke_editor(page)[source]#

Ask for an editor and invoke it.

Return type:

None

remove_templates()[source]#

Understand if the page is blocked has the right template.

setup()[source]#

Initialize the coroutine for parsing templates.

Return type:

None

skip_page(page)[source]#

Skip if the user has not permission to edit.

teardown()[source]#

Close the coroutine.

Return type:

None

treat_page()[source]#

Load the given page, do some changes, and save it.

Return type:

None

update_options: dict[str, Any] = {'move': False, 'show': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.blockpageschecker.main(*args)[source]#

Process command line arguments and perform task.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

category script#

Script to manage categories

Syntax:

python pwb.py category action [-option]

where action can be one of these

add

mass-add a category to a list of pages.

remove

remove category tag from all pages in a category. If a pagegenerators option is given, the intersection with category pages is processed.

move

move all pages in a category to another category. If a pagegenerators option is given, the intersection with category pages is processed.

tidy

tidy up a category by moving its pages into subcategories.

tree

show a tree of subcategories of a given category.

listify

make a list of all of the articles that are in a category.

clean

Removes redundant grandchildren from specified category by removing direct link to grandparent. In another words a grandchildren should not be also a children.

and option can be one of these

Options for add action:

-person

Sort persons by their last name.

-create

If a page doesn’t exist, do not skip it, create it instead.

-redirect

Follow redirects.

Options for listify action:

-append

This appends the list to the current page that is already existing (appending to the bottom by default).

-overwrite

This overwrites the current page with the list even if something is already there.

-showimages

This displays images rather than linking them in the list.

-talkpages

This outputs the links to talk pages of the pages to be listified in addition to the pages themselves.

-prefix:#

You may specify a list prefix like “#” for a numbered list or any other prefix. Default is a bullet list with prefix “*”.

Options for remove action:

-nodelsum

This specifies not to use the custom edit summary as the deletion reason. Instead, it uses the default deletion reason for the language, which is “Category was disbanded” in English.

Options for move action:

-hist

Creates a nice wikitable on the talk page of target category that contains detailed page history of the source category.

-nodelete

Don’t delete the old category after move.

-nowb

Don’t update the Wikibase repository.

-allowsplit

If that option is not set, it only moves the talk and main page together.

-mvtogether

Only move the pages/subcategories of a category, if the target page (and talk page, if -allowsplit is not set) doesn’t exist.

-keepsortkey

Use sortKey of the old category also for the new category. If not specified, sortKey is removed. An alternative method to keep sortKey is to use -inplace option.

Options for listify and tidy actions:

-namespaces, -namespace, -ns

Filter the arcitles in the specified namespaces. Separate multiple namespace numbers or names with commas. Examples: -ns:0,2,4, -ns:Help,MediaWiki

Options for clean action:

-always

The bot won’t ask for confirmation when putting a page.

Options for several actions:

-rebuild

Reset the database.

-from:

The category to move from (for the move option). Also, the category to remove from in the remove option. Also, the category to make a list of in the listify option.

-to:

The category to move to (for the move option). Also, the name of the list to make in the listify option.

-batch

Don’t prompt to delete emptied categories (do it automatically).

-summary:

Pick a custom edit summary for the bot.

-inplace

Use this flag to change categories in place rather than rearranging them.

-recurse[:<depth>]

Recurse through subcategories of the category to optional depth.

-pagesonly

While removing pages from a category, keep the subpage links and do not remove them.

-match

Only work on pages whose titles match the given regex (for move and remove actions).

-depth:

The max depth limit beyond which no subcategories will be isted.

Note

If the category names have spaces in them you may need to use a special syntax in your shell so that the names aren’t treated as separate parameters. For instance, in BASH, use single quotes, e.g. -from:'Polar bears'.

If action is “add”, “move” or “remove, the following additional options are supported:

This script supports use of pagegenerators arguments.

For the actions tidy and tree, the bot will store the category structure locally in category.dump. This saves time and server load, but if it uses these data later, they may be outdated; use the -rebuild parameter in this case.

For example, to create a new category from a list of persons, type:

python pwb.py category add -person

and follow the on-screen instructions.

Or to do it all from the command-line, use the following syntax:

python pwb.py category move -from:US -to:”United States”

This will move all pages in the category US to the category United States.

A pagegenerators option can be given with move and remove action:

pwb category -site:wikipedia:en remove -from:Hydraulics -cat:Pneumatics

The sample above would remove ‘Hydraulics’ category from all pages which are also in ‘Pneumatics’ category.

Changed in version 8.0: pagegenerators are supported with “move” and “remove” action.

class scripts.category.CategoryAddBot(generator, newcat=None, sort_by_last_name=False, create=False, comment='', follow_redirects=False)[source]#

Bases: CategoryPreprocess

A robot to mass-add a category to a list of pages.

Parameters:
  • sort_by_last_name (bool)

  • create (bool)

  • comment (str)

  • follow_redirects (bool)

static sorted_by_last_name(catlink, pagelink)[source]#

Return a Category with key that sorts persons by their last name.

Parameters: catlink - The Category to be linked.

pagelink - the Page to be placed in the category.

Trailing words in brackets will be removed. Example: If category_name is ‘Author’ and pl is a Page to [[Alexandre Dumas (senior)]], this function will return this Category: [[Category:Author|Dumas, Alexandre]].

Return type:

Page

treat(page)[source]#

Process one page.

Return type:

None

class scripts.category.CategoryDatabase(rebuild=False, filename='category.dump.bz2')[source]#

Bases: object

Temporary database saving pages and subcategories for each category.

This prevents loading the category pages over and over again.

Parameters:
  • rebuild (bool)

  • filename (str)

dump(filename=None)[source]#

Save the dictionaries to disk if not empty.

Pickle the contents of the dictionaries superclass_db and cat_content_db if at least one is not empty. If both are empty, removes the file from the disk.

If the filename is None, it’ll use the filename determined in __init__.

Return type:

None

get_articles(cat)[source]#

Return the list of pages for a given category.

Saves this list in a temporary database so that it won’t be loaded from the server next time it’s required.

Return type:

set[Page]

get_subcats(supercat)[source]#

Return the list of subcategories for a given supercategory.

Saves this list in a temporary database so that it won’t be loaded from the server next time it’s required.

Return type:

set[Category]

get_supercats(subcat)[source]#

Return the supercategory (or a set of) for a given subcategory.

Return type:

set[Category]

property is_loaded: bool#

Return whether the contents have been loaded.

rebuild()[source]#

Rebuild the dabatase.

Return type:

None

class scripts.category.CategoryListifyRobot(cat_title, list_title, edit_summary, append=False, overwrite=False, show_images=False, *, talk_pages=False, recurse=False, namespaces=None, **kwargs)[source]#

Bases: object

Create a list containing all of the members in a category.

Parameters:
  • cat_title (str | None)

  • list_title (str | None)

  • edit_summary (str)

  • append (bool)

  • overwrite (bool)

  • show_images (bool)

  • talk_pages (bool)

  • recurse (int | bool)

run()[source]#

Start bot.

Return type:

None

class scripts.category.CategoryMoveRobot(oldcat, newcat=None, batch=False, comment='', inplace=False, move_oldcat=True, delete_oldcat=True, title_regex=None, history=False, pagesonly=False, deletion_comment=0, move_comment=None, wikibase=True, allow_split=False, move_together=False, keep_sortkey=None, generator=None)[source]#

Bases: CategoryPreprocess

Change or remove the category from the pages.

If the new category is given changes the category from the old to the new one. Otherwise remove the category from the page and the category if it’s empty.

Per default the operation applies to pages and subcategories.

Added in version 8.0: The generator parameter.

Store all given parameters in the objects attributes.

Parameters:
  • oldcat – The move source.

  • newcat – The move target.

  • batch (bool) – If True the user has not to confirm the deletion.

  • comment (str) – The edit summary for all pages where the category is changed, and also for moves and deletions if not overridden.

  • inplace (bool) – If True the categories are not reordered.

  • move_oldcat (bool) – If True the category page (and talkpage) is copied to the new category.

  • delete_oldcat (bool) – If True the oldcat page and talkpage are deleted (or nominated for deletion) if it is empty.

  • title_regex – Only pages (and subcats) with a title that matches the regex are moved.

  • history (bool) – If True the history of the oldcat is posted on the talkpage of newcat.

  • pagesonly (bool) – If True only move pages, not subcategories.

  • deletion_comment (int | str) – Either string or special value: DELETION_COMMENT_AUTOMATIC: use a generated message, DELETION_COMMENT_SAME_AS_EDIT_COMMENT: use the same message for delete that is used for the edit summary of the pages whose category was changed (see the comment param above). If the value is not recognized, it’s interpreted as DELETION_COMMENT_AUTOMATIC.

  • move_comment – If set, uses this as the edit summary on the actual move of the category page. Otherwise, defaults to the value of the comment parameter.

  • wikibase (bool) – If True, update the Wikibase item of the old category.

  • allow_split (bool) – If False only moves page and talk page together.

  • move_together (bool) – If True moves the pages/subcategories only if page and talk page could be moved or both source page and target page don’t exist.

  • generator – a generator from pagegenerators.GeneratorFactory. If given an intersection to the oldcat category members is used.

DELETION_COMMENT_AUTOMATIC = 0#
DELETION_COMMENT_SAME_AS_EDIT_COMMENT = 1#
static check_move(name, old_page, new_page)[source]#

Return if the old page can be safely moved to the new page.

Parameters:
  • name (str) – Title of the new page

  • old_page (pywikibot.page.BasePage) – Page to be moved

  • new_page (pywikibot.page.BasePage) – Page to be moved to

Returns:

True if possible to move page, False if not page move not possible

Return type:

bool

run()[source]#

The main bot function that does all the work.

For readability it is split into several helper functions: - _movecat() - _movetalk() - _hist() - _change() - _delete()

Changed in version 8.0: if a page generator is given to the bot, the intersection with pagegenerators.CategorizedPageGenerator() or pagegenerators.SubCategoriesPageGenerator() is used.

Return type:

None

class scripts.category.CategoryPreprocess(follow_redirects=False, edit_redirects=False, create=False, **kwargs)[source]#

Bases: BaseBot

A class to prepare a list of pages for robots.

Parameters:
  • follow_redirects (bool)

  • edit_redirects (bool)

  • create (bool)

determine_template_target(page)[source]#

Return template page to be categorized.

Categories for templates can be included in <includeonly> section of template doc page.

Also the doc page can be changed by doc template parameter.

TODO: decide if/how to enable/disable this feature.

Parameters:

page (Page) – Page to be processed.

Returns:

Page to be categorized.

Return type:

Page

determine_type_target(page)[source]#

Return page to be categorized by type.

Parameters:

page (Page) – Existing, missing or redirect page to be processed.

Returns:

Page to be categorized.

Return type:

Page | None

class scripts.category.CategoryTidyRobot(cat_title, cat_db, namespaces=None, comment=None)[source]#

Bases: Bot, CategoryPreprocess

Robot to move members of a category into sub- or super-categories.

Specify the category title on the command line. The robot will pick up the page, look for all sub- and super-categories, and show them listed as possibilities to move page into with an assigned number. It will ask you to type number of the appropriate replacement, and performs the change robotically. It will then automatically loop over all pages in the category.

If you don’t want to move the member to a sub- or super-category, but to another category, you can use the ‘j’ (jump) command.

By typing ‘s’ you can leave the complete page unchanged.

By typing ‘m’ you can show more content of the current page, helping you to find out what the page is about and in which other categories it currently is.

Parameters:
  • cat_title (str | None) – a title of the category to process.

  • cat_db (CategoryDatabase object) – a CategoryDatabase object.

  • namespaces (iterable of pywikibot.Namespace) – namespaces to focus on.

  • comment (str | None) – a custom summary for edits.

move_to_category(member, original_cat, current_cat)[source]#

Ask whether to move it to one of the sub- or super-categories.

Given a page in the original_cat category, ask the user whether to move it to one of original_cat’s sub- or super-categories. Recursively run through subcategories’ subcategories.

Note

current_cat is only used for internal recursion. You should always use current_cat = original_cat.

Parameters:
  • member (Page) – a page to process.

  • original_cat (Category) – original category to replace.

  • current_cat (Category) – a category which is questioned.

Return type:

None

teardown()[source]#

Cleanups after run operation.

Return type:

None

treat(page)[source]#

Process page.

Return type:

None

class scripts.category.CategoryTreeRobot(cat_title, cat_db, filename=None, max_depth=10)[source]#

Bases: object

Robot to create tree overviews of the category structure.

Parameters:
  • root. (* cat_title - The category which will be the tree's)

  • object. (* cat_db - A CategoryDatabase)

  • listed. (* max_depth - The limit beyond which no subcategories will be) – This also guarantees that loops in the category structure won’t be a problem.

  • print (* filename - The textfile where the tree should be saved; None to) – the tree to stdout.

  • max_depth (int)

run()[source]#

Handle the multi-line string generated by treeview.

After string was generated by treeview it is either printed to the console or saved it to a file.

Return type:

None

treeview(cat, current_depth=0, parent=None)[source]#

Return a tree view of all subcategories of cat.

The multi-line string contains a tree view of all subcategories of cat, up to level max_depth. Recursively calls itself.

Parameters:
  • opening. (* cat - the Category of the node we're currently)

  • tree (* current_depth - the current level in the)

  • from. (* parent - the Category of the category we're coming)

  • current_depth (int)

Return type:

str

class scripts.category.CleanBot(**kwargs)[source]#

Bases: Bot

Automatically cleans up specified category.

Removes redundant grandchildren from specified category by removing direct link to grandparent.

In another words a grandchildren should not be also a children.

Stubs categories are exception.

Note

For details please read:

Added in version 7.0.

skip_page(cat)[source]#

Check whether the category should be processed.

Return type:

bool

treat(child)[source]#

Process the category.

Return type:

None

update_options: dict[str, Any] = {'recurse': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.category.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments.

Return type:

None

category_graph script#

Visualizes category hierarchy

Generates graphical representation in formats dot, svg and html5 of category hierarchy.

Usage:

pwb.py category_graph [-style STYLE] [-depth DEPTH] [-from FROM] [-to TO]

actions:

-from [FROM]

Category name to scan, default is main category, “?” to ask.

optional arguments:

-to TO

base file name to save, “?” to ask

-style STYLE

graphviz style definitions in dot format (see below)

-depth DEPTH

maximal hierarchy depth. 2 by default

-downsize K

font size divider for subcategories. 4 by default Use 1 for the same font size

See also

https://graphviz.org/doc/info/attrs.html for graphviz style definitions.

Example

Visualizes main category:

pwb.py -v category_graph -from

Extended example with style settings:

pwb.py category_graph -from Life -downsize 1.5 -style ‘graph[rankdir=BT ranksep=0.5] node[shape=circle style=filled fillcolor=green] edge[style=dashed penwidth=3]’

Added in version 8.0.

class scripts.category_graph.CategoryGraphBot(args)[source]#

Bases: SingleSiteBot

Bot to create graph of the category structure.

Parameters:

args (argparse.Namespace)

run()[source]#

Main function of CategoryGraphBot.

Return type:

None

scan_level(cat, level, hue=None)[source]#

Recursive function to fill dot graph.

Parameters:
  • cat – the Category of the node we’re currently opening.

  • level – the current decreasing from depth to zero level in the tree (for recursion), opposite of depth.

Return type:

str

static setup_args(ap)[source]#

Declares arguments.

scripts.category_graph.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

category_redirect script#

This bot will move pages out of redirected categories

The bot will look for categories that are marked with a category redirect template, take the first parameter of the template as the target of the redirect, and move all pages and subcategories of the category there. It also changes hard redirects into soft redirects, and fixes double redirects. A log is written under <userpage>/category_redirect_log. A log is written under <userpage>/category_edit_requests if a page cannot be moved to be done manually. Only category pages that haven’t been edited for a certain cooldown period (default 7 days) are taken into account.

The following parameters are supported:

-always

If used, the bot won’t ask if it should add the specified text

-delay:#

Set an amount of days. If the category is edited more recently than given days, ignore it. Default is 7.

-tiny

Only loops over Category:Non-empty_category_redirects and moves all images, pages and categories in redirect categories to the target category.

-category:<cat>

Category to be used with this script. If not given either wikibase entries Q4616723 or Q8099903 are used.

Usage:

python pwb.py category_redirect [options]

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

class scripts.category_redirect.CategoryRedirectBot(**kwargs)[source]#

Bases: ConfigParserBot, SingleSiteBot, AutomaticTWSummaryBot

Page category update bot.

Changed in version 7.0: CategoryRedirectBot is a ConfigParserBot

Changed in version 9.0: A logentry is writen to <userpage>/category_edit_requests if a page cannot be moved

check_hard_redirect()[source]#

Check for hard-redirected categories.

Check categories that are not already marked with an appropriate softredirect template and replace the content with a redirect template.

Return type:

None

check_soft_redirect()[source]#

Check for soft-redirected categories.

Return type:

None

get_cat()[source]#

Specify the category page.

get_log_text()[source]#

Rotate log text and return the most recent text.

load_record()[source]#

Load record from data file and create a backup file.

Return type:

None

move_contents(old_cat_title, new_cat_title, edit_summary)[source]#

The worker function that moves pages out of oldCat into newCat.

Parameters:
  • old_cat_title (str)

  • new_cat_title (str)

  • edit_summary (str)

Return type:

tuple[int, int]

ready_to_edit(cat)[source]#

Return True if cat not edited during cooldown period, else False.

run()[source]#

Run the bot.

Return type:

None

setup_hard_redirect()[source]#

Setup hard redirect task.

setup_soft_redirect()[source]#

Setup soft redirect task.

teardown()[source]#

Write self.record to file and save logs.

Return type:

None

touch(page)[source]#

Touch the given page.

Return type:

None

update_options: dict[str, Any] = {'category': '', 'delay': 7, 'tiny': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.category_redirect.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

change_pagelang script#

This script changes the content language of pages

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-setlang

What language the pages should be set to

-always

If a language is already set for a page, always change it to the one set in -setlang.

-never

If a language is already set for a page, never change it to the one set in -setlang (keep the current language).

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

Added in version 5.1.

class scripts.change_pagelang.ChangeLangBot(**kwargs)[source]#

Bases: ConfigParserBot, SingleSiteBot

Change page language bot.

Changed in version 7.0: ChangeLangBot is a ConfigParserBot

changelang(page)[source]#

Set page language.

Parameters:

page (pywikibot.page.BasePage) – The page to update and save

Return type:

None

treat(page)[source]#

Treat a page.

Parameters:

page (pywikibot.page.BasePage) – The page to treat

Return type:

None

update_options: dict[str, Any] = {'never': False, 'setlang': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.change_pagelang.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

checkimages script#

Script to check recently uploaded files

This script checks if a file description is present and if there are other problems in the image’s description.

This script will have to be configured for each site. Please submit localisations as addition to the Pywikibot framework.

Everything that needs customisation is indicated by comments.

This script understands the following command-line arguments:

-limit

(int) The number of images to check (default: 80)

-commons

The bot will check if an image on Commons has the same name and if true it reports the image.

-duplicates[:#]

Checking if the image has duplicates (if arg, set how many rollback wait before reporting the image in the report instead of tag the image) default: 1 rollback.

-duplicatesreport

Report the duplicates in a log AND put the template in the images.

-maxusernotify

Maximum notifications added to a user talk page in a single check, to avoid email spamming.

-sendemail

Send an email after tagging.

-break

To break the bot after the first check (default: recursive)

-sleep[:#]

Time in seconds between repeat runs (default: 30)

-wait[:#]

Wait x second before check the images (default: 0)

-skip[:#]

The bot skip the first [:#] images (default: 0)

-start[:#]

Use allimages() as generator (it starts already from File:[:#])

-cat[:#]

Use a category as generator

-regex[:#]

Use regex, must be used with -url or -page

-page[:#]

Define the name of the wikipage where are the images

-url[:#]

Define the url where are the images

-nologerror

If given, this option will disable the error that is risen when the log is full.

Instructions for the real-time settings

For every new block you have to add:

<------- ------->

In this way the bot can understand where the block starts in order to take the right parameter:

Name=     Set the name of the block
Find=     search this text in the image's description
Findonly= search for exactly this text in the image's description
Summary=  That's the summary that the bot will use when it will
          notify the problem.
Head=     That's the incipit that the bot will use for the message.
Text=     This is the template that the bot will use when it will
          report the image's problem.

Changed in version 8.4: Welcome messages are imported from scripts.welcome script.

scripts.checkimages.CATEGORIES_WITH_LICENSES = ('Q4481876', 'Q7451504')#

Category items with the licenses; subcategories may contain other licenses.

Changed in version 7.2: uses wikibase items instead of category titles.

class scripts.checkimages.CheckImagesBot(site, log_full_number=25000, sendemail_active=False, duplicates_report=False, log_full_error=True, max_user_notify=None)[source]#

Bases: object

A robot to check recently uploaded files.

Initializer, define some instance variables.

Parameters:
  • log_full_number (int)

  • sendemail_active (bool)

  • duplicates_report (bool)

  • log_full_error (bool)

check_image_duplicated(duplicates_rollback)[source]#

Function to check the duplicated files.

Return type:

bool

check_image_on_commons()[source]#

Checking if the file is on commons.

Return type:

bool

check_step()[source]#

Check a single file page.

Return type:

None

find_additional_problems()[source]#

Extract additional settings from configuration page.

Return type:

None

ignore_server_errors = False#
static important_image(list_given)[source]#

Get tuples of image and time, return the most used or oldest image.

Changed in version 7.2: itertools.zip_longest is used to stop using_pages as soon as possible.

Parameters:

list_given (list[tuple[float, FilePage]]) – a list of tuples which hold seconds and FilePage

Returns:

the most used or oldest image

Return type:

FilePage

is_tagged()[source]#

Understand if a file is already tagged or not.

Return type:

bool

static load(raw)[source]#

Load a list of objects from a string using regex.

Return type:

list[str]

load_hidden_templates()[source]#

Function to load the white templates.

Return type:

None

load_licenses()[source]#

Load the list of the licenses.

Changed in version 7.2: return a set instead of a list for quicker lookup.

Return type:

set[Page]

mini_template_check(template)[source]#

Check if template is in allowed licenses or in licenses to skip.

Return type:

bool

put_mex_in_talk()[source]#

Function to put the warning in talk page of the uploader.

When the bot find that the usertalk is empty it adds the welcome message first. The messages are imported from welcome.py script.

Return type:

None

regex_generator(regexp, textrun)[source]#

Find page to yield using regex to parse text.

Return type:

Generator[FilePage]

report(newtext, image_to_report, notification=None, head=None, notification2=None, unver=True, comm_talk=None, comm_image=None)[source]#

Function to make the reports easier.

Parameters:

unver (bool)

Return type:

None

report_image(image_to_report, rep_page=None, com=None, rep_text=None, addings=True)[source]#

Report the files to the report page when needed.

Parameters:

addings (bool)

Return type:

bool

set_parameters(image)[source]#

Set parameters.

Return type:

None

skip_images(skip_number, limit)[source]#

Given a number of files, skip the first -number- files.

Return type:

bool

smart_detection()[source]#

Detect templates.

The bot instead of checking if there’s a simple template in the image’s description, checks also if that template is a license or something else. In this sense this type of check is smart.

Return type:

tuple[str, bool]

tag_image(put=True)[source]#

Add template to the Image page and find out the uploader.

Parameters:

put (bool)

Return type:

bool

takesettings()[source]#

Function to take the settings from the wiki.

Return type:

None

template_in_list()[source]#

Check if template is in list.

The problem is the calls to the MediaWiki system because they can be pretty slow. While searching in a list of objects is really fast, so first of all let’s see if we can find something in the info that we already have, then make a deeper check.

Return type:

None

static upload_bot_change_function(report_page_text, upload_bot_array)[source]#

Detect the user that has uploaded the file through upload bot.

Return type:

str

static wait(generator, wait_time)[source]#

Skip the images uploaded before x seconds.

Let the users to fix the image’s problem alone in the first x seconds.

Return type:

Generator[FilePage]

exception scripts.checkimages.LogIsFull(arg)[source]#

Bases: Error

Log is full and the bot cannot add other data to prevent Errors.

Parameters:

arg (Exception | str)

Return type:

None

scripts.checkimages.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

bool

scripts.checkimages.print_with_time_zone(message)[source]#

Print the messages followed by the TimeZone encoded correctly.

Return type:

None

claimit script#

A script that adds claims to Wikidata items based on a list of pages

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Usage:

python pwb.py claimit [pagegenerators] P1 Q2 P123 Q456

You can use any typical pagegenerator (like categories) to provide with a list of pages. Then list the property–>target pairs to add.

For geographic coordinates:

python pwb.py claimit [pagegenerators] P625 [lat-dec],[long-dec],[prec]

[lat-dec] and [long-dec] represent the latitude and longitude respectively, and [prec] represents the precision. All values are in decimal degrees, not DMS. If [prec] is omitted, the default precision is 0.0001 degrees.

Example

python pwb.py claimit [pagegenerators] P625 -23.3991,-52.0910,0.0001

By default, claimit.py does not add a claim if one with the same property already exists on the page. To override this behavior, use the ‘exists’ option:

python pwb.py claimit [pagegenerators] P246 “string example” -exists:p

Suppose the claim you want to add has the same property as an existing claim and the “-exists:p” argument is used. Now, claimit.py will not add the claim if it has the same target, source, and/or the existing claim has qualifiers. To override this behavior, add ‘t’ (target), ‘s’ (sources), or ‘q’ (qualifiers) to the ‘exists’ argument.

For instance, to add the claim to each page even if one with the same property and target and some qualifiers already exists:

python pwb.py claimit [pagegenerators] P246 “string example” -exists:ptq

Note that the ordering of the letters in the ‘exists’ argument does not matter, but ‘p’ must be included.

class scripts.claimit.ClaimRobot(claims, exists_arg='', **kwargs)[source]#

Bases: WikidataBot

A bot to add Wikidata claims.

Parameters:
  • claims (list) – A list of wikidata claims

  • exists_arg (str) – String specifying how to handle duplicate claims

treat_page_and_item(page, item)[source]#

Treat each page.

Parameters:
  • page (pywikibot.page.BasePage) – The page to update and change

  • item (pywikibot.page.ItemPage) – The item to treat

Return type:

None

use_from_page = None#
scripts.claimit.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

clean_sandbox script#

This bot resets a (user) sandbox with predefined text

This script understands the following command-line arguments:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-hours

(float) Use this parameter if to make the script repeat itself after the given hours. Hours can be defined as a decimal. 0.01 hours are 36 seconds; 0.1 are 6 minutes.

-delay

(int) Use this parameter for a wait time after the last edit was made. If no parameter is given it takes it from hours and limits it between 5 and 15 minutes. The minimum delay time is 5 minutes.

-text

(str) The text that substitutes in the sandbox, you can use this when you haven’t configured clean_sandbox for your wiki.

-textfile

(str) As an alternative to -text, you can use this to provide a file containing the text to be used.

-summary

(str) Summary of the edit made by the bot. Overrides the default from i18n.

This script is a ConfigParserBot. All local parameters can be given inside a scripts.ini file. Options passed to the script are priorized over options read from ini file.

For example:

[clean_sandbox]
# the parameter section for clean_sandbox script
summary = Bot: Cleaning sandbox
text = {{subst:Clean Sandbox}}
hours: 0.5
delay: 7
class scripts.clean_sandbox.SandboxBot(**kwargs)[source]#

Bases: Bot, ConfigParserBot

Sandbox reset bot.

available_options: dict[str, Any] = {'delay': -1, 'hours': -1.0, 'summary': '', 'text': ''}#

Handler configuration attribute. Only the keys of the dict can be passed as __init__ options. The values are the default values. Overwrite this in subclasses!

run()[source]#

Run bot.

Return type:

None

treat(page)[source]#

Treat a single page.

scripts.clean_sandbox.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

commons_information script#

This bot adds a language template to the file’s description field

The Information template is commonly used to provide formatting to the basic information for files (description, source, author, etc.). The description field should provide brief but complete information about the image. The description format should use Language templates like {{En}} or {{De}} to specify the language of the description. This script adds these langage templates if missing. For example the description of

{{Information
 | Description = A simplified icon for [[Pywikibot]]
 | Date = 2003-06-14
 | Other fields =
}}

will be analyzed as en language by ~100 % accurancy and the bot replaces its content by

{{Information
 | Description = {{en|A simplified icon for [[Pywikibot]]}}
 | Date = 2003-06-14
 | Other fields =
}}

Note

langdetect package is needed for fully support of language detection. Install it with:

pip install langdetect

This script understands the following command-line arguments:

This script supports use of pagegenerators arguments.

Usage:

python pwb.py commons_information [pagegenerators]

You can use any typical pagegenerator (like categories) to provide with a list of pages. If no pagegenerator is given, transcluded pages from Information template are used.

Hint

This script uses commons site as default. For other sites use the global -site option.

Example for going through all files:

python pwb.py commons_information -start:File:!

Added in version 6.0.

Changed in version 9.2: accelerate script with preloading pages; use commons as default site; use transcluded pages of Information template.

class scripts.commons_information.InformationBot(**kwargs)[source]#

Bases: SingleSiteBot, ExistingPageBot

Bot for the Information template.

Initialzer.

comment = {'en': 'Bot: wrap the description parameter of Information in the appropriate language template'}#
desc_params = ('Description', 'description')#
static detect_langs(text)[source]#

Detect language from given text.

Parameters:

text (str)

get_description(template)[source]#

Get description parameter.

lang_tmp_cat = 'Language templates'#
process_desc_other(wikicode, nodes)[source]#

Process other description text.

The description text may consist of different Node types except of Template which is handled by process_desc_template(). Combine all nodes and replace the last with new created Template while removing the remaining from wikicode.

Added in version 9.2.

Parameters:
  • wikicode (Wikicode) – The Wikicode of the parsed page text.

  • nodes (list[Node]) – wikitext nodes to be processed

Returns:

whether the description nodes were changed

Return type:

bool

process_desc_template(template)[source]#

Process description template.

Parameters:

template (Template) – a mwparserfromhell Template found in the description parameter of Information template.

Returns:

whether the template node was changed.

Return type:

bool

static replace_value(param, value)[source]#

Replace param node with given value.

Parameters:
  • param (Node)

  • value (Template)

Return type:

None

treat_page()[source]#

Treat current page.

Return type:

None

scripts.commons_information.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

commonscat script#

With this tool you can add the template {{commonscat}} to categories

The tool works by following the interwiki links. If the template is present on another language page, the bot will use it.

Warning

You could probably use it at articles as well, but this isn’t tested.

The following parameters are supported:

-checkcurrent

Work on all category pages that use the primary commonscat template.

This script is a ConfigParserBot. The following options can be set within a settings file which is scripts.ini by default:

-always

Don’t prompt you for each replacement. Warning message has not to be confirmed.

Attention

Use this with care!

-summary:XYZ

Set the action summary message for the edit to XYZ, otherwise it uses messages from add_text.py as default.

This bot uses pagegenerators to get a list of pages. The following options are supported:

This script supports use of pagegenerators arguments.

For example to go through all categories:

python pwb.py commonscat -start:Category:!

class scripts.commonscat.CommonscatBot(**kwargs)[source]#

Bases: ConfigParserBot, ExistingPageBot

Commons categorisation bot.

Changed in version 7.0: CommonscatBot is a ConfigParserBot

Parameters:

kwargs (Any) – bot options

Keyword Arguments:

generator – a generator processed by run() method

changeCommonscat(page=None, oldtemplate='', oldcat='', newtemplate='', newcat='', linktitle='')[source]#

Change the current commonscat template and target.

Parameters:
  • oldtemplate (str)

  • oldcat (str)

  • newtemplate (str)

  • newcat (str)

  • linktitle (str)

Return type:

None

Return the name of a valid commons category.

If the page is a redirect this function tries to follow it. If the page doesn’t exists the function will return an empty string

Parameters:

name (str)

Find CommonsCat template on interwiki pages.

Returns:

name of a valid commons category

Return type:

str

find_commons_category(page)[source]#

Find CommonsCat template on Wikibase repository.

Use Wikibase property to get the category if possible. Otherwise check all langlinks to find it.

Returns:

name of a valid commons category

Return type:

str

Find CommonsCat template on page.

Return type:

tuple of (<templatename>, <target>, <linktext>, <note>)

static skipPage(page)[source]#

Determine if the page should be skipped.

Return type:

bool

skip_page(page)[source]#

Skip category redirects.

treat_page()[source]#

Add CommonsCat template to page.

Take a page. Go to all the interwiki page looking for a commonscat template. When all the interwiki’s links are checked and a proper category is found add it to the page.

Return type:

None

update_options: dict[str, Any] = {'summary': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_disambigs: bool | None = False#

Attribute to determine whether to use disambiguation pages. Set it to True to use disambigs only, set it to False to skip disambigs. If None both are processed.

Added in version 7.2.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.commonscat.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

coordinate_import script#

Coordinate importing script

Usage:

python pwb.py coordinate_import -site:wikipedia:en -cat:Category:Coordinates_not_on_Wikidata

This will work on all pages in the category “coordinates not on Wikidata” and will import the coordinates on these pages to Wikidata.

The data from the “GeoData” extension (https://www.mediawiki.org/wiki/Extension:GeoData) is used so that extension has to be setup properly. You can look at the [[Special:Nearby]] page on your local Wiki to see if it’s populated.

You can use any typical pagegenerator to provide with a list of pages:

python pwb.py coordinate_import -lang:it -family:wikipedia -namespace:0 -transcludes:Infobox_stazione_ferroviaria

You can also run over a set of items on the repo without coordinates and try to import them from any connected page. To do this, you have to explicitly provide the repo as the site using -site argument.

Example

python pwb.py coordinate_import -site:wikidata:wikidata -namespace:0 -querypage:Deadendpages

The following command line parameters are supported:

-always

If used, the bot won’t ask if it should add the specified text.

-create

Create items for pages without one.

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

This script supports use of pagegenerators arguments.

class scripts.coordinate_import.CoordImportRobot(**kwargs)[source]#

Bases: ConfigParserBot, WikidataBot

A bot to import coordinates to Wikidata.

Changed in version 7.0: CoordImportRobot is a ConfigParserBot

has_coord_qualifier(claims)[source]#

Check if self.prop is used as property for a qualifier.

Parameters:

claims (dict) – the Wikibase claims to check in

Returns:

the first property for which self.prop is used as qualifier, or None if any

Return type:

str | None

item_has_coordinates(item)[source]#

Check if the item has coordinates.

Returns:

whether the item has coordinates

Return type:

bool

treat_page_and_item(page, item)[source]#

Treat page/item.

Return type:

None

try_import_coordinates_from_page(page, item)[source]#

Try import coordinate from the given page to the given item.

Returns:

whether any coordinates were found and the import was successful

Return type:

bool

use_from_page = None#
scripts.coordinate_import.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line argument

Return type:

None

cosmetic_changes script#

This module can do slight modifications to tidy a wiki page’s source code

The changes are not supposed to change the look of the rendered wiki page.

The following parameters are supported:

-always

Don’t prompt you for each replacement. Warning (see below) has not to be confirmed. ATTENTION: Use this with care!

-async

Put page on queue to be saved to wiki asynchronously.

-summary:XYZ

Set the summary message text for the edit to XYZ, bypassing the predefined message texts with original and replacements inserted.

-ignore:

Ignores if an error occurred and either skips the page or only that method. It can be set to: all - does not ignore errors match - ignores ISBN related errors (default) method - ignores fixing method errors page - ignores page related errors

The following generators and filters are supported:

This script supports use of pagegenerators arguments.

ATTENTION: You can run this script as a stand-alone for testing purposes. However, the changes that are made are only minor, and other users might get angry if you fill the version histories and watchlists with such irrelevant changes. Some wikis prohibit stand-alone running.

For further information see pywikibot/cosmetic_changes.py

class scripts.cosmetic_changes.CosmeticChangesBot(**kwargs)[source]#

Bases: AutomaticTWSummaryBot, ExistingPageBot

Cosmetic changes bot.

Parameters:

kwargs (Any) – bot options

Keyword Arguments:

generator – a generator processed by run() method

summary_key: str | None = 'cosmetic_changes-standalone'#

Must be defined in subclasses.

treat_page()[source]#

Treat page with the cosmetic toolkit.

Changed in version 7.0: skip if InvalidPageError is raised

Return type:

None

update_options: dict[str, Any] = {'async': False, 'ignore': CANCEL.MATCH, 'summary': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.cosmetic_changes.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

create_isbn_edition script#

Pywikibot client to load ISBN linked data into Wikidata

Pywikibot script to get ISBN data from a digital library, and create or amend the related Wikidata item for edition (with the P212, ISBN number as unique external ID).

Use digital libraries to get ISBN data in JSON format, and integrate the results into Wikidata.

Note

ISBN data should only be used for editions, and not for written works.

Then the resulting item number can be used e.g. to generate Wikipedia references using template Cite_Q.

Parameters:

All parameters are optional:

*P1:*        digital library (default wiki "-")

    bnf      Catalogue General (France)
    bol      Bol.com
    dnb      Deutsche National Library
    goob     Google Books
    kb       National Library of the Netherlands
    loc      Library of Congress US
    mcues    Ministerio de Cultura (Spain)
    openl    OpenLibrary.org
    porbase  urn.porbase.org Portugal
    sbn      Servizio Bibliotecario Nazionale (Italy)
    wiki     wikipedia.org
    worldcat WorldCat (wc)

*P2:*        ISO 639-1 language code. Default LANG; e.g. en, nl,
             fr, de, es, it, etc.

*P3 P4...:*  P/Q pairs to add additional claims (repeated) e.g.
             P921 Q107643461 (main subject: database management
             linked to P2163, Fast ID 888037)

*stdin:*     List of ISBN numbers (International standard book
             number, version 10 or 13). Free text (e.g.
             Wikipedia references list, or publication list) is
             accepted. Identification is done via an ISBN regex
             expression.
Functionality:
  • Both ISBN-10 and ISBN-13 numbers are accepted as input.

  • Only ISBN-13 numbers are stored. ISBN-10 numbers are only used for identification purposes; they are not stored.

  • The ISBN number is used as a primary key; no two items can have the same P212 ISBN number. The item update is not performed when there is no unique match. Only editions are updated or created.

  • Individual statements are added or merged incrementally; existing data is not overwritten.

  • Authors and publishers are searched to get their item number; unknown of ambiguous items are skipped.

  • Book title and subtitle are separated with either ‘.’, ‘:’, or ‘-’ in that order.

  • Detect author, illustrator, writer preface, afterwork instances.

  • Add profession “author” to individual authors.

  • This script can be run incrementally.

Examples:

Default library (Google Books), language (LANG), no additional statements:

pwb create_isbn_edition.py 9789042925564

Wikimedia, language English, main subject: database management:

pwb create_isbn_edition.py wiki en P921 Q107643461 978-0-596-10089-6

Data quality:
  • ISBN numbers (P212) are only assigned to editions.

  • A written work should not have an ISBN number (P212).

  • For targets of P629 (edition of) amend “is an Q47461344 (written work) instance” and “inverse P747 (work has edition)” statements

  • Use https://query.wikidata.org/querybuilder/ to identify P212 duplicates. Merge duplicate items before running the script again.

  • The following properties should only be used for written works, not for editions:

    • P5331: OCLC work ID (editions should only have P243)

    • P8383: Goodreads-identificatiecode for work (editions should only have P2969)

Return status:

The following status codes are returned to the shell:

3   Invalid or missing parameter
4   Library not installed
12  Item does not exist
20  Network error
Standard ISBN properties for editions:
P31:Q3331189:  instance of edition (mandatory statement)
P50:           author
P123:          publisher
P212:          canonical ISBN number (with dashes; searchable
               via Wikidata Query)
P407:          language of work (Qnumber linked to ISO 639-1
               language code)
P577:          date of publication (year)
P1476:         book title
P1680:         subtitle
Other ISBN properties:
P921:   main subject (inverse lookup from external Fast ID P2163)
P629:   work for edition
P747:   edition of work
Qualifiers:
P248:   Source
P813:   Retrieval date
P1545:  (author) sequence number
External identifiers:
P243:   OCLC ID
P1036:  Dewey Decimal Classification
P2163:  Fast ID (inverse lookup via Wikidata Query)
        -> P921: main subject

(not implemented)
P2969:  Goodreads-identificatiecode

(only for written works)
P5331:  OCLC work ID (editions should only have P243)

(not implemented)
P8383:  Goodreads-identificatiecode for work
        (editions should only have P2969)
P213:   ISNI ID
P496:   ORCID ID
P675:   Google Books-identificatiecode
Unavailable properties from digital library:
(not implemented by isbnlib)
P98:    Editor
P110:   Illustrator/photographer
P291:   place of publication
P1104:  number of pages
?:      edition format (hardcover, paperback)
Author:

Geert Van Pamel (User:Geertivp), MIT License, 2022-08-04,

Prerequisites:

In addition to Pywikibot the following ISBN lib package is mandatory; install it with:

pip install isbnlib

The following ISBN lib package are optional; install them with:

pip install isbnlib-bnf
pip install isbnlib-bol
pip install isbnlib-dnb
pip install isbnlib-kb
pip install isbnlib-loc
pip install isbnlib-worldcat2
Restrictions:
  • Better use the ISO 639-1 language code parameter as a default. The language code is not always available from the digital library; therefore we need a default.

  • Publisher unknown: * Missing P31:Q2085381 statement, missing subclass in script * Missing alias * Create publisher

  • Unknown author: create author as a person

Known Problems:
  • Unknown ISBN, e.g. 9789400012820

  • If there is no ISBN data available for an edition either returns no output (goob = Google Books), or an error message (wiki, openl). The script is taking care of both. Try another library instance.

  • Only 6 specific ISBN attributes are listed by the webservice(s), missing are e.g.: place of publication, number of pages

  • Some digital libraries have more registrations than others.

  • Some digital libraries have data quality problems.

  • Not all ISBN atttributes have data values (authors, publisher, date of publication), language can be missing at the digital library.

  • How to add still more digital libraries?

    • This would require an additional isbnlib module

    • Does the KBR has a public ISBN service (Koninklijke Bibliotheek van België)?

  • The script uses multiple webservice calls; script might take time, but it is automated.

  • Need to manually amend ISBN items that have no author, publisher, or other required data * You could use another digital library * Which other services to use?

  • BibTex service is currently unavailable

  • Filter for work properties: https://www.wikidata.org/wiki/Q63413107

    ['9781282557246', '9786612557248', '9781847196057', '9781847196040']
    P5331: OCLC identification code for work 793965595; should only
           have P243)
    P8383: Goodreads identification code for work 13957943; should
           only have P2969)
    
  • ERROR: an HTTP error has ocurred e.g. (503) Service Unavailable

  • error: externally-managed-environment

    isbnlib-kb cannot be installed via pip install command. It raises error: externally-managed-environment because this environment is externally managed.

    To install Python packages system-wide, try apt install python3-xyz, where xyz is the package you are trying to install.

    If you wish to install a non-Debian-packaged Python package, create a virtual environment using python3 -m venv path/to/venv. Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make sure you have python3-full installed.

    If you wish to install a non-Debian packaged Python application, it may be easiest to use pipx install xyz, which will manage a virtual environment for you. Make sure you have pipx installed.

    See also

    See Python Library venv for more information about virtual environments.

    Note

    If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages to pip.

    Hint

    See PEP 668 for the detailed specification.

    You need to install a local python environment:

    sudo -s
    apt install python3-full
    python3 -m venv /opt/python
    /opt/python/bin/pip install pywikibot
    /opt/python/bin/pip install isbnlib-kb
    /opt/python/bin/python ../userscripts/create_isbn_edition.py kb
    
Environment:

The python script can run on the following platforms:

  • Linux client

  • Google Chromebook (Linux container)

  • Toolforge Portal

  • PAWS

LANG: default ISO 639-1 language code

Applications:

Generate a book reference. Example for wp.en only:

{{Cite Q|Q63413107}}

Use the Visual editor reference with Qnumber.

Wikidata Query:
Related projects:
Other systems:
Documentation:

Added in version 7.7.

Changed in version 9.6: several implementation improvements

scripts.create_isbn_edition.add_claims(isbn_data)[source]#

Inspect isbn_data and add claims if possible.

Parameters:

isbn_data (dict[str, Any])

Return type:

int

scripts.create_isbn_edition.amend_isbn_edition(isbn_number)[source]#

Amend ISBN registration in Wikidata.

It is registering the ISBN-13 data via P212, depending on the data obtained from the digital library.

Parameters:

isbn_number (str) – ISBN number (10 or 13 digits with optional hyphens)

Returns:

Return status which is:

  • 0: Amended (found or created)

  • 1: Not found

  • 2: Ambiguous

  • 3: Other error

Return type:

int

scripts.create_isbn_edition.fatal_error(errcode, errtext)[source]#

A fatal error has occurred.

Print the error message, and exit with an error code.

scripts.create_isbn_edition.get_canon_name(baselabel)[source]#

Get standardised name.

Parameters:

baselabel (str) – input label

Return type:

str

scripts.create_isbn_edition.get_item_header(header)[source]#

Get the item header (label, description, alias in user language).

Parameters:

header (str | list[str]) – item label, description, or alias language list

Returns:

label, description, or alias in the first available language

Return type:

str

scripts.create_isbn_edition.get_item_header_lang(header, lang)[source]#

Get the item header (label, description, alias in user language).

Parameters:
  • header (str | list[str]) – item label, description, or alias language list

  • lang (str) – language code

Returns:

label, description, or alias in the first available language

Return type:

str

scripts.create_isbn_edition.get_item_list(item_name, instance_id)[source]#

Get list of items by name, belonging to an instance (list).

Normally there should have one single best match. The caller should take care of homonyms.

Parameters:
  • item_name (str) – Item name (case sensitive)

  • instance_id (str | set[str] | list[str]) – Instance ID

Returns:

Set of items

Return type:

set[str]

scripts.create_isbn_edition.get_item_page(qnumber)[source]#

Get the item; handle redirects.

Return type:

ItemPage

scripts.create_isbn_edition.get_item_with_prop_value(prop, propval)[source]#

Get list of items that have a property/value statement.

See also

Site.search()

Parameters:
  • prop (str) – Property ID

  • propval (str) – Property value

Returns:

List of items (Q-numbers)

Return type:

set[str]

scripts.create_isbn_edition.get_language_preferences()[source]#

Get the list of preferred languages.

Uses environment variables LANG, LC_ALL, and LANGUAGE, ‘en’ is always appended.

See also

  • :wiki:`List_of_ISO_639-1_codes

Return:

List of ISO 639-1 language codes with strings delimited by ‘:’.

Return type:

list[str]

scripts.create_isbn_edition.is_in_value_list(statement_list, valuelist)[source]#

Verify if statement list contains at least one value from the valuelist.

Parameters:
  • statement_list (list) – Statement list of values

  • valuelist (list[str]) – List of values

Returns:

True when match, False otherwise

Return type:

bool

scripts.create_isbn_edition.item_has_label(item, label)[source]#

Verify if the item has a label.

Parameters:
  • item – Item

  • label (str) – Item label

Returns:

Whether the item has a label

Return type:

bool

scripts.create_isbn_edition.item_is_in_list(statement_list, itemlist)[source]#

Verify if statement list contains at least one item from the itemlist.

param statement_list: Statement list param itemlist: List of values (string) return: Whether the item matches

Parameters:
  • statement_list (list)

  • itemlist (list[str])

Return type:

bool

scripts.create_isbn_edition.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Algorithm:

Get parameters from shell
Validate parameters
Get ISBN data
Convert ISBN data:
    Reverse names when Lastname, Firstname
Get additional data
Register ISBN data into Wikidata:
    Add source reference when creating the item:
        (digital library instance, retrieval date)
    Create or amend items or claims:
        Number the authors in order of appearence
        Check data consistency
        Correct data quality problems:
            OCLC Work ID for Written work
            Written work instance statement
            Inverse relationship written work -> edition
            Move/register OCLC work ID to/with written work
Manually corrections:
    Create missing (referenced) items
        (authors, publishers, written works, main subject/FAST ID)
    Resolve ambiguous values
Parameters:

args (str) – command line arguments

Return type:

None

scripts.create_isbn_edition.show_final_information(isbn_number)[source]#

Print additional information.

Get optional information.Could generate too many transactions errors; so the process might stop at the first error.

Parameters:

isbn_number (str)

Return type:

None

dataextend script#

Script to add properties, identifiers and sources to WikiBase items

Usage:

dataextend <item> [<property>[+*]] [args]

In the basic usage, where no property is specified, item is the Q-number of the item to work on.from html import unescape

If a property (P-number, or the special value ‘Wiki’ or ‘Data’) is specified, only the data from that identifier are added. With a ‘+’ after it, work starts on that identifier, then goes on to identifiers after that (including new identifiers added while working on those identifiers). With a ‘*’ after it, the identifier itself is skipped, but those coming after it (not those coming before it) are included.

The following parameters are supported:

-always

If this is supplied, the bot will not ask for permission after each external link has been handled.

-showonly

Only show claims for a given ItemPage. Don’t try to add any properties

The bot will load the corresponding pages for these identifiers, and try to the meaning of that string for the specified type of thing (for example ‘city’ or ‘gender’). If you want to use it, but not save it (which can happen if the string specifies a certain value now, but might show another value elsewhere, or if it is so specific that you’re pretty sure it won’t occur a second time), you can provide the Q-number with X rather than Q. If you do not want to use the string, you can just hit enter, or give the special value ‘XXX’ which means that it will be skipped in each subsequent run as well.

After an identifier has been worked on, there might be a list of names that has been found, in lc:name format, where lc is a language code. You can accept all suggested names (answer Y), none (answer N) or ask to get asked for each name separately (answer S), the latter being the default if you do not fill in anything.

After all identifiers have been worked on, possible descriptions in various languages are presented, and you get to choose one. The default is here 0, which always is the current description for that language. Finally, for a number of identifiers text is shown that usually gives parts of the description that are hard to parse automatically, so you can see if there any additional pieces of data that can be added.

It is advisable to (re)load the item page that the bot has been working on in the browser afterward, to correct any mistakes it has made, or cases where a more precise and less precise value have both been included.

Added in version 7.2.

Deprecated since version 9.6: will be removed with Pywikibot 10.

class scripts.dataextend.AKLAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AbartAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagedescriptions(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AcademiaeGroninganaeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

getentry(naam, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AcademicTreeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findadvisors(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

finddocstudents(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findwebsite(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AcademieFrancaiseAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findawards(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AcademieRouenAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AccademiaCruscaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AdultFilmAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findethnicities(html)[source]#
Parameters:

html (str)

findeyecolor(html)[source]#
Parameters:

html (str)

findfloruitstart(html)[source]#
Parameters:

html (str)

findhaircolor(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AgorhaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AinmAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

getvalue(field, html, category=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AlkindiAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstnames(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

instanceof(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AlvinAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AmericanArtAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AmericanBiographyAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.Analyzer(ident, data=None, item=None, bot=None)[source]#

Bases: object

SCRIPTRE = re.compile('(?s)<script.*?</script>', re.DOTALL)#
TAGRE = re.compile('<[^<>]*>')#
property alturl#
static commastrip(term)[source]#
property extraurls: list[str]#
findallbyre(regex, html, dtype=None, skips=None, alt=None)[source]#
Return type:

list[str]

findbyre(regex, html, dtype=None, skips=None, alt=None)[source]#
Return type:

str

findclaims()[source]#
Return type:

list[tuple[str, str, Analyzer | None]]

finddefaultmixedrefs(html, includesocial=True)[source]#
finddescriptions(html)[source]#
Parameters:

html (str)

findwikipedianames(html)[source]#
Parameters:

html (str)

getdata(dtype, text, ask=True)[source]#
getdescriptions()[source]#
getlanguage(code)[source]#
getnames()[source]#
longtext()[source]#
prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

static singlespace(text)[source]#
property url#
class scripts.dataextend.AngelicumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

instanceof(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AnimeConsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finalscript(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArchivesDuSpectacleAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArmbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findassociations(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtHistoriansAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtUkAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtcyclopediaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmovements(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArticArtistAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtistsCanadaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtnetAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AthenaeumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AustrianBiographicalAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AuteursLuxembourgAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AutoresArAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BabelioAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BacklinkAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findadvisors(html)[source]#
Parameters:

html (str)

findawards(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddocstudents(html)[source]#
Parameters:

html (str)

findkins(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findparticipantins(html)[source]#
Parameters:

html (str)

findpartners(html)[source]#
Parameters:

html (str)

findpartofs(html)[source]#
Parameters:

html (str)

findparts(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findsiblings(html)[source]#
Parameters:

html (str)

findsources(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

findstudents(html)[source]#
Parameters:

html (str)

findteachers(html)[source]#
Parameters:

html (str)

getrelations(relation, html)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BandcampAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BdelAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findparties(html)[source]#
Parameters:

html (str)

findranks(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BdfaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findheight(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findsportteams(html)[source]#
Parameters:

html (str)

findteampositions(html)[source]#
Parameters:

html (str)

findweight(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BedethequeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findpseudonyms(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BelgianPhotographerAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None, alt=None)[source]#
getvalues(field, html, dtype=None, alt=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BenezitAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findisntanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BenezitUrlAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findisntanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

indinstanceof(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BewebAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findviaf(html)[source]#
Parameters:

html (str)

findwebpages(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BibliotecaNacionalAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BibsysAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findviaf(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BiografischPortaalAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findsources(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BiuSanteAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BnaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

instanceof(html)[source]#
Parameters:

html (str)

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BnbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findviaf(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BneAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findviaf(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BnfAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findcountry(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BookTradeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddeathdate(html)[source]#
Parameters:

html (str)

findfloruit(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BritishExecutionsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findcausedeath(html)[source]#
Parameters:

html (str)

findcrimes(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmannerdeath(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BritishMuseumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddetails(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BrooklynMuseumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CageMatchAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findheight(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

findweights(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CanadianBiographyAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findspouses(html)[source]#
Parameters:

html (str)

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CanticAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: MarcAnalyzer

finddescriptions(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CbdbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnationalities(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CcedAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findreligion(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CerlAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None, link=False)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CesarAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.Chess365Analyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findchesstitle(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findparticipations(html)[source]#
Parameters:

html (str)

findsportcountries(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CinemagiaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CiniiAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ClaraAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CommonwealthGamesAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findparticipations(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ConorAlAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: ConorAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ConorAnalyzer(ident, data=None