Scripts package#

Scripts folder contains predefined scripts easy to use.

Scripts are only available with Pywikibot if installed in directory mode and not as site package. They can be run in command line using the pwb wrapper script:

python pwb.py <global options> <name_of_script> <options>

Every script provides a -help option which shows all available options, their explanation and usage examples. Global options will be shown by -help:global or using:

python pwb.py -help

The advantages of pwb.py wrapper script are:

  • check for framework and script depedencies and show a warning if a package is missing or outdated or if the Python release does not fit

  • check whether user config file (user-config.py) is available and ask to create it by starting the generate_user_files.py script

  • enable global options even if a script does not support them

  • start private scripts located in userscripts sub-folder

  • find a script even if given script name does not match a filename e.g. due to spelling mistake

scripts.base_dir = PosixPath('/src/scripts')#

defines the entry point for pywikibot-scripts package

add_text script#

Append text to the top or bottom of a page

By default this adds the text to the bottom above the categories and interwiki.

Use the following command line parameters to specify what to add:

-text             Text to append. "\n" are interpreted as newlines.

-textfile         Path to a file with text to append

-summary          Change summary to use

-up               Append text to the top of the page rather than the bottom

-create           Create the page if necessary. Note that talk pages are
                  created already without of this option.

-createonly       Only create the page but do not edit existing ones

-always           If used, the bot won't ask if it should add the specified
                  text

-major            If used, the edit will be saved without the "minor edit" flag

-talkpage         Put the text onto the talk page instead
-talk

-excepturl        Skip pages with a url that matches this regular expression

-noreorder        Place the text beneath the categories and interwiki

Furthermore, the following can be used to specify which pages to process…

This script supports use of pagegenerators arguments.

Examples

Append ‘hello world’ to the bottom of the sandbox:

python pwb.py add_text -page:Wikipedia:Sandbox \
-summary:"Bot: pywikibot practice" -text:"hello world"

Add a template to the top of the pages with ‘category:catname’:

python pwb.py add_text -cat:catname -summary:"Bot: Adding a template" \
-text:"{{Something}}" -except:"\{\{([Tt]emplate:|)[Ss]omething" -up

Command used on it.wikipedia to put the template in the page without any category:

python pwb.py add_text -except:"\{\{([Tt]emplate:|)[Cc]ategorizzare" \
-text:"{{Categorizzare}}" -excepturl:"class='catlinks'>" -uncat \
-summary:"Bot: Aggiungo template Categorizzare"
class scripts.add_text.AddTextBot(**kwargs)[source]#

Bases: AutomaticTWSummaryBot, ExistingPageBot

A bot which adds a text to a page.

Parameters:

kwargs (Any) – bot options

Keyword Arguments:

generator – a generator processed by run() method

setup()[source]#

Read text to be added from file.

Return type:

None

skip_page(page)[source]#

Skip if -exceptUrl matches or page does not exists.

summary_key: str | None = 'add_text-adding'#

Must be defined in subclasses.

property summary_parameters#

Return a dictionary of all parameters for i18n.

Line breaks are replaced by dash.

treat_page()[source]#

Add text to the page.

Return type:

None

update_options: dict[str, Any] = {'always': False, 'create': False, 'createonly': False, 'minor': True, 'regex_skip_url': '', 'reorder': True, 'summary': '', 'talk_page': False, 'text': '', 'textfile': '', 'up': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.add_text.main(*argv)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

argv (str) – Command line arguments

Return type:

None

scripts.add_text.parse(argv, generator_factory)[source]#

Parses our arguments and provide a dictionary with their values.

Parameters:
  • argv (Sequence[str]) – input arguments to be parsed

  • generator_factory (GeneratorFactory) – factory that will determine what pages to process

Returns:

dictionary with our parsed arguments

Raises:

ValueError – if we receive invalid arguments

Return type:

dict[str, bool | str]

archivebot script#

archivebot.py - discussion page archiving bot

usage:

python pwb.py archivebot [OPTIONS] [TEMPLATE_PAGE]

Several TEMPLATE_PAGE templates can be given at once. Default is User:MiszaBot/config. Bot examines backlinks (Special:WhatLinksHere) to all TEMPLATE_PAGE templates. Then goes through all pages (unless a specific page specified using options) and archives old discussions. This is done by breaking a page into threads, then scanning each thread for timestamps. Threads older than a specified threshold are then moved to another page (the archive), which can be named either basing on the thread’s name or then name can contain a counter which will be incremented when the archive reaches a certain size.

Transcluded template may contain the following parameters:

{{TEMPLATE_PAGE
|archive =
|algo =
|counter =
|maxarchivesize =
|minthreadsleft =
|minthreadstoarchive =
|archiveheader =
|key =
}}

Meanings of parameters are:

archive              Name of the page to which archived threads will be put.
                     Must be a subpage of the current page. Variables are
                     supported.
algo                 Specifies the maximum age of a thread. Must be
                     in the form old(<delay>) where <delay> specifies
                     the age in seconds (s), hours (h), days (d),
                     weeks (w), or years (y) like 24h or 5d. Default is
                     old(24h).
counter              The current value of a counter which could be assigned as
                     variable. Will be updated by bot. Initial value is 1.
maxarchivesize       The maximum archive size before incrementing the counter.
                     Value can be given with appending letter like K or M
                     which indicates KByte or MByte. Default value is 200K.
minthreadsleft       Minimum number of threads that should be left on a page.
                     Default value is 5.
minthreadstoarchive  The minimum number of threads to archive at once. Default
                     value is 2.
archiveheader        Content that will be put on new archive pages as the
                     header. This parameter supports the use of variables.
                     Default value is {{talkarchive}}
key                  A secret key that (if valid) allows archives not to be
                     subpages of the page being archived.

Variables below can be used in the value for “archive” in the template above; numbers are latin digits:

%(counter)d          the current value of the counter
%(year)d             year of the thread being archived
%(isoyear)d          ISO year of the thread being archived
%(isoweek)d          ISO week number of the thread being archived
%(semester)d         semester term of the year of the thread being archived
%(quarter)d          quarter of the year of the thread being archived
%(month)d            month (as a number 1-12) of the thread being archived
%(monthname)s        localized name of the month above
%(monthnameshort)s   first three letters of the name above
%(week)d             week number of the thread being archived

Alternatively you may use localized digits. This is only available for a few site languages. Refer NON_LATIN_DIGITS whether there is a localized one:

%(localcounter)s     the current value of the counter
%(localyear)s        year of the thread being archived
%(localisoyear)s     ISO year of the thread being archived
%(localisoweek)s     ISO week number of the thread being archived
%(localsemester)s    semester term of the year of the thread being archived
%(localquarter)s     quarter of the year of the thread being archived
%(localmonth)s       month (as a number 1-12) of the thread being archived
%(localweek)s        week number of the thread being archived

The ISO calendar starts with the Monday of the week which has at least four days in the new Gregorian calendar. If January 1st is between Monday and Thursday (including), the first week of that year started the Monday of that week, which is in the year before if January 1st is not a Monday. If it’s between Friday or Sunday (including) the following week is then the first week of the year. So up to three days are still counted as the year before.

Options (may be omitted):

-help           show this help message and exit
-calc:PAGE      calculate key for PAGE and exit
-file:FILE      load list of pages from FILE
-force          override security options
-locale:LOCALE  switch to locale LOCALE
-namespace:NS   only archive pages from a given namespace
-page:PAGE      archive a single PAGE, default ns is a user talk page
-salt:SALT      specify salt
-keep           Preserve thread order in archive even if threads are
                archived later
-sort           Sort archive by timestamp; should not be used with -keep
-async          Run the bot in parallel tasks.

Changed in version 7.6: Localized variables for “archive” template parameter are supported. User:MiszaBot/config is the default template. -keep option was added.

Changed in version 7.7: -sort and -async options were added.

Changed in version 8.2: KeyboardInterrupt was enabled with -async option.

exception scripts.archivebot.ArchiveBotSiteConfigError(arg)[source]#

Bases: Error

There is an error originated by archivebot’s on-site configuration.

Parameters:

arg (Exception | str)

Return type:

None

exception scripts.archivebot.ArchiveSecurityError(arg)[source]#

Bases: ArchiveBotSiteConfigError

Page title is not a valid archive of page being archived.

The page title is neither a subpage of the page being archived, nor does it match the key specified in the archive configuration template.

Parameters:

arg (Exception | str)

Return type:

None

class scripts.archivebot.DiscussionPage(source, archiver, params=None, keep=False)[source]#

Bases: Page

A class that represents a single page of discussion threads.

Feed threads to it and run an update() afterwards.

feed_thread(thread, max_archive_size)[source]#

Append a new thread to the archive.

Parameters:
Return type:

bool

is_full(max_archive_size)[source]#

Check whether archive size exceeded.

Parameters:

max_archive_size (tuple[int, str])

Return type:

bool

load_page()[source]#

Load the page to be archived and break it up into threads.

Changed in version 7.6: If -keep option is given run through all threads and set the current timestamp to the previous if the current is lower.

Changed in version 7.7: Load unsigned threads using timestamp of the next thread.

Return type:

None

static max(ts1, ts2)[source]#

Calculate the maximum of two timestamps but allow None as value.

Added in version 7.6.

Parameters:
Return type:

Timestamp | None

size()[source]#

Return size of talk page threads.

Note that this method counts bytes, rather than codepoints (characters). This corresponds to MediaWiki’s definition of page size.

Changed in version 7.6: return 0 if archive page neither exists nor has threads (T313886).

Return type:

int

update(summary, sort_threads=False)[source]#

Recombine threads and save page.

Parameters:

sort_threads (bool)

Return type:

None

class scripts.archivebot.DiscussionThread(title, timestripper)[source]#

Bases: object

An object representing a discussion thread on a page.

It represents something that is of the form:

== Title of thread ==

Thread content here. ~~~~
:Reply, etc. ~~~~
Parameters:
feed_line(line)[source]#

Add a line to the content and find the newest timestamp.

Parameters:

line (str)

Return type:

None

size()[source]#

Return size of discussion thread.

Note that the result is NOT equal to that of len(self.to_text()). This method counts bytes, rather than codepoints (characters). This corresponds to MediaWiki’s definition of page size.

Return type:

int

to_text()[source]#

Return wikitext discussion thread.

Return type:

str

exception scripts.archivebot.MalformedConfigError(arg)[source]#

Bases: ArchiveBotSiteConfigError

There is an error in the configuration template.

Parameters:

arg (Exception | str)

Return type:

None

exception scripts.archivebot.MissingConfigError(arg)[source]#

Bases: ArchiveBotSiteConfigError

The config is missing in the header.

It’s in one of the threads or transcluded from another page.

Parameters:

arg (Exception | str)

Return type:

None

class scripts.archivebot.PageArchiver(page, template, salt, force=False, keep=False, sort=False)[source]#

Bases: object

A class that encapsulates all archiving methods.

Parameters:
  • page (pywikibot.Page) – a page object to be archived

  • template (pywikibot.Page) – a template with configuration settings

  • salt (str) – salt value

  • force (bool) – override security value

  • keep (bool)

  • sort (bool)

algo = 'none'#
analyze_page()[source]#

Analyze DiscussionPage.

Return type:

set[tuple[str, str]]

attr2text()[source]#

Return a template with archiver saveable attributes.

Return type:

str

get_archive_page(title, params=None)[source]#

Return the page for archiving.

If it doesn’t exist yet, create and cache it. Also check for security violations.

Parameters:

title (str)

Return type:

DiscussionPage

get_attr(attr, default='')[source]#

Get an archiver attribute.

Return type:

Any

get_params(timestamp, counter)[source]#

Make params for archiving template.

Parameters:

counter (int)

Return type:

dict

key_ok()[source]#

Return whether key is valid.

Return type:

bool

load_config()[source]#

Load and validate archiver template.

Return type:

None

preload_pages(counter, thread, pattern)[source]#

Preload pages if counter matters.

Parameters:

counter (int)

Return type:

None

run()[source]#

Process a single DiscussionPage object.

Return type:

None

saveables()[source]#

Return a list of saveable attributes.

Return type:

list[str]

set_attr(attr, value, out=True)[source]#

Set an archiver attribute.

Parameters:

out (bool)

Return type:

None

should_archive_thread(thread)[source]#

Check whether a thread has to be archived.

Returns:

the archivation reason as a tuple of localization args

Parameters:

thread (DiscussionThread)

Return type:

tuple[str, str] | None

scripts.archivebot.calc_md5_hexdigest(txt, salt)[source]#

Return md5 hexdigest computed from text and salt.

Return type:

str

scripts.archivebot.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

scripts.archivebot.process_page(page, *args)[source]#

Call PageArchiver for a single page.

Returns:

Return True to continue with the next page, False to break the loop.

Parameters:

args (Any)

Return type:

bool

Added in version 7.6.

Changed in version 7.7: pass an unspecified number of arguments to the bot using *args

scripts.archivebot.show_md5_key(calc, salt, site)[source]#

Show calculated MD5 hexdigest.

Return type:

bool

scripts.archivebot.str2localized_duration(site, string)[source]#

Localise a shorthand duration.

Translates a duration written in the shorthand notation (ex. “24h”, “7d”) into an expression in the local wiki language (“24 hours”, “7 days”).

Parameters:

string (str)

Return type:

str

scripts.archivebot.str2size(string)[source]#

Return a size for a shorthand size.

Accepts a string defining a size:

1337 - 1337 bytes
150K - 150 kilobytes
2M - 2 megabytes
Returns:

a tuple (size, unit), where size is an integer and unit is 'B' (bytes) or 'T' (threads).

Parameters:

string (str)

Return type:

tuple[int, str]

scripts.archivebot.template_title_regex(tpl_page)[source]#

Return a regex that matches to variations of the template title.

It supports the transcluding variant as well as localized namespaces and case-insensitivity depending on the namespace.

Parameters:

tpl_page (pywikibot.page.Page) – The template page

Return type:

Pattern

basic script#

An incomplete sample script

This is not a complete bot; rather, it is a template from which simple bots can be made. You can rename it to mybot.py, then edit it in whatever way you want.

Use global -simulate option for test purposes. No changes to live wiki will be done.

The following parameters are supported:

-always           The bot won't ask for confirmation when putting a page

-text:            Use this text to be added; otherwise 'Test' is used

-replace:         Don't add text but replace it

-top              Place additional text on top of the page

-summary:         Set the action summary message for the edit.

This sample script is a ConfigParserBot. All settings can be made either by giving option with the command line or with a settings file which is scripts.ini by default. If you don’t want the default values you can add any option you want to change to that settings file below the [basic] section like:

[basic] ; inline comments starts with colon
# This is a commend line. Assignments may be done with '=' or ':'
text: A text with line break and
    continuing on next line to be put
replace: yes ; yes/no, on/off, true/false and 1/0 is also valid
summary = Bot: My first test edit with pywikibot

Every script has its own section with the script name as header.

In addition the following generators and filters are supported but cannot be set by settings file:

This script supports use of pagegenerators arguments.

class scripts.basic.BasicBot(site=True, **kwargs)[source]#

Bases: SingleSiteBot, ConfigParserBot, ExistingPageBot, AutomaticTWSummaryBot

An incomplete sample bot.

Variables:

summary_key – Edit summary message key. The message that should be used is placed on /i18n subdirectory. The file containing these messages should have the same name as the caller script (i.e. basic.py in this case). Use summary_key to set a default edit summary message.

Parameters:
  • site (BaseSite | bool | None)

  • kwargs (Any)

Create a SingleSiteBot instance.

Parameters:
  • site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.

  • kwargs (Any)

summary_key: str | None = 'basic-changing'#

Must be defined in subclasses.

treat_page()[source]#

Load the given page, do some changes, and save it.

Return type:

None

update_options: dict[str, Any] = {'replace': False, 'summary': None, 'text': 'Test', 'top': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.basic.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

blockpageschecker script#

A bot to remove stale protection templates from pages that are not protected

Very often sysops block the pages for a set time but then they forget to remove the warning! This script is useful if you want to remove those useless warning left in these pages.

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-protectedpages  Check all the blocked pages; useful when you have not
                 categories or when you have problems with them. (add the
                 namespace after ":" where you want to check - default checks
                 all protected pages.)

-moveprotected   Same as -protectedpages, for moveprotected pages

This script is a ConfigParserBot. The following options can be set within a settings file which is scripts.ini by default::

-always          Doesn't ask every time whether the bot should make the change.
                 Do it always.

-show            When the bot can't delete the template from the page (wrong
                 regex or something like that) it will ask you if it should
                 show the page on your browser.
                 (attention: pages included may give false positives!)

-move            The bot will check if the page is blocked also for the move
                 option, not only for edit

Examples::

python pwb.py blockpageschecker -always

python pwb.py blockpageschecker -cat:Geography -always

python pwb.py blockpageschecker -show -protectedpages:4
class scripts.blockpageschecker.CheckerBot(site=True, **kwargs)[source]#

Bases: ConfigParserBot, ExistingPageBot, SingleSiteBot

Bot to remove stale protection templates from unprotected pages.

Changed in version 7.0: CheckerBot is a ConfigParserBot

Create a SingleSiteBot instance.

Parameters:
  • site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.

  • kwargs (Any)

static invoke_editor(page)[source]#

Ask for an editor and invoke it.

Return type:

None

remove_templates()[source]#

Understand if the page is blocked has the right template.

setup()[source]#

Initialize the coroutine for parsing templates.

Return type:

None

skip_page(page)[source]#

Skip if the user has not permission to edit.

teardown()[source]#

Close the coroutine.

Return type:

None

treat_page()[source]#

Load the given page, do some changes, and save it.

Return type:

None

update_options: dict[str, Any] = {'move': False, 'show': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.blockpageschecker.main(*args)[source]#

Process command line arguments and perform task.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

category script#

Script to manage categories

Syntax:

python pwb.py category action [-option]

where action can be one of these

add

mass-add a category to a list of pages.

remove

remove category tag from all pages in a category. If a pagegenerators option is given, the intersection with category pages is processed.

move

move all pages in a category to another category. If a pagegenerators option is given, the intersection with category pages is processed.

tidy

tidy up a category by moving its pages into subcategories.

tree

show a tree of subcategories of a given category.

listify

make a list of all of the articles that are in a category.

clean

Removes redundant grandchildren from specified category by removing direct link to grandparent. In another words a grandchildren should not be also a children.

and option can be one of these

Options for “add” action:

-person      - Sort persons by their last name.
-create      - If a page doesn't exist, do not skip it, create it instead.
-redirect    - Follow redirects.

Options for “listify” action:

-append      - This appends the list to the current page that is already
               existing (appending to the bottom by default).
-overwrite   - This overwrites the current page with the list even if
               something is already there.
-showimages  - This displays images rather than linking them in the list.
-talkpages   - This outputs the links to talk pages of the pages to be
               listified in addition to the pages themselves.
-prefix:#    - You may specify a list prefix like "#" for a numbered list or
               any other prefix. Default is a bullet list with prefix "*".

Options for “remove” action:

-nodelsum    - This specifies not to use the custom edit summary as the
               deletion reason. Instead, it uses the default deletion reason
               for the language, which is "Category was disbanded" in
               English.

Options for “move” action:

-hist        - Creates a nice wikitable on the talk page of target category
               that contains detailed page history of the source category.
-nodelete    - Don't delete the old category after move.
-nowb        - Don't update the Wikibase repository.
-allowsplit  - If that option is not set, it only moves the talk and main
               page together.
-mvtogether  - Only move the pages/subcategories of a category, if the
               target page (and talk page, if -allowsplit is not set)
               doesn't exist.
-keepsortkey - Use sortKey of the old category also for the new category.
               If not specified, sortKey is removed.
               An alternative method to keep sortKey is to use -inplace
               option.

Options for “listify” and “tidy” actions:

-namespaces    Filter the arcitles in the specified namespaces. Separate
-namespace     multiple namespace numbers or names with commas. Examples::
-ns            -ns:0,2,4
               -ns:Help,MediaWiki

Options for “clean” action:

-always

Options for several actions:

-rebuild     - Reset the database.
-from:       - The category to move from (for the move option)
               Also, the category to remove from in the remove option
               Also, the category to make a list of in the listify option.
-to:         - The category to move to (for the move option).
             - Also, the name of the list to make in the listify option.

-batch       - Don't prompt to delete emptied categories (do it
               automatically).
-summary:    - Pick a custom edit summary for the bot.
-inplace     - Use this flag to change categories in place rather than
               rearranging them.
-recurse[:<depth>]
             - Recurse through subcategories of the category to
               optional depth.
-pagesonly   - While removing pages from a category, keep the subpage links
               and do not remove them.
-match       - Only work on pages whose titles match the given regex (for
               move and remove actions).
-depth:      - The max depth limit beyond which no subcategories will be
               listed.

Note

If the category names have spaces in them you may need to use a special syntax in your shell so that the names aren’t treated as separate parameters. For instance, in BASH, use single quotes, e.g. -from:'Polar bears'.

If action is “add”, “move” or “remove, the following additional options are supported:

This script supports use of pagegenerators arguments.

For the actions tidy and tree, the bot will store the category structure locally in category.dump. This saves time and server load, but if it uses these data later, they may be outdated; use the -rebuild parameter in this case.

For example, to create a new category from a list of persons, type:

python pwb.py category add -person

and follow the on-screen instructions.

Or to do it all from the command-line, use the following syntax:

python pwb.py category move -from:US -to:"United States"

This will move all pages in the category US to the category United States.

A pagegenerators option can be given with move and remove action:

pwb category -site:wikipedia:en remove -from:Hydraulics -cat:Pneumatics

The sample above would remove ‘Hydraulics’ category from all pages which are also in ‘Pneumatics’ category.

Changed in version 8.0: pagegenerators are supported with “move” and “remove” action.

class scripts.category.CategoryAddBot(generator, newcat=None, sort_by_last_name=False, create=False, comment='', follow_redirects=False)[source]#

Bases: CategoryPreprocess

A robot to mass-add a category to a list of pages.

Parameters:
  • sort_by_last_name (bool)

  • create (bool)

  • comment (str)

  • follow_redirects (bool)

static sorted_by_last_name(catlink, pagelink)[source]#

Return a Category with key that sorts persons by their last name.

Parameters: catlink - The Category to be linked.

pagelink - the Page to be placed in the category.

Trailing words in brackets will be removed. Example: If category_name is ‘Author’ and pl is a Page to [[Alexandre Dumas (senior)]], this function will return this Category: [[Category:Author|Dumas, Alexandre]].

Return type:

Page

treat(page)[source]#

Process one page.

Return type:

None

class scripts.category.CategoryDatabase(rebuild=False, filename='category.dump.bz2')[source]#

Bases: object

Temporary database saving pages and subcategories for each category.

This prevents loading the category pages over and over again.

Parameters:
  • rebuild (bool)

  • filename (str)

dump(filename=None)[source]#

Save the dictionaries to disk if not empty.

Pickle the contents of the dictionaries superclass_db and cat_content_db if at least one is not empty. If both are empty, removes the file from the disk.

If the filename is None, it’ll use the filename determined in __init__.

Return type:

None

get_articles(cat)[source]#

Return the list of pages for a given category.

Saves this list in a temporary database so that it won’t be loaded from the server next time it’s required.

Return type:

set[Page]

get_subcats(supercat)[source]#

Return the list of subcategories for a given supercategory.

Saves this list in a temporary database so that it won’t be loaded from the server next time it’s required.

Return type:

set[Category]

get_supercats(subcat)[source]#

Return the supercategory (or a set of) for a given subcategory.

Return type:

set[Category]

property is_loaded: bool#

Return whether the contents have been loaded.

rebuild()[source]#

Rebuild the dabatase.

Return type:

None

class scripts.category.CategoryListifyRobot(cat_title, list_title, edit_summary, append=False, overwrite=False, show_images=False, *, talk_pages=False, recurse=False, namespaces=None, **kwargs)[source]#

Bases: object

Create a list containing all of the members in a category.

Parameters:
  • cat_title (str | None)

  • list_title (str | None)

  • edit_summary (str)

  • append (bool)

  • overwrite (bool)

  • show_images (bool)

  • talk_pages (bool)

  • recurse (int | bool)

run()[source]#

Start bot.

Return type:

None

class scripts.category.CategoryMoveRobot(oldcat, newcat=None, batch=False, comment='', inplace=False, move_oldcat=True, delete_oldcat=True, title_regex=None, history=False, pagesonly=False, deletion_comment=0, move_comment=None, wikibase=True, allow_split=False, move_together=False, keep_sortkey=None, generator=None)[source]#

Bases: CategoryPreprocess

Change or remove the category from the pages.

If the new category is given changes the category from the old to the new one. Otherwise remove the category from the page and the category if it’s empty.

Per default the operation applies to pages and subcategories.

Added in version 8.0: The generator parameter.

Store all given parameters in the objects attributes.

Parameters:
  • oldcat – The move source.

  • newcat – The move target.

  • batch (bool) – If True the user has not to confirm the deletion.

  • comment (str) – The edit summary for all pages where the category is changed, and also for moves and deletions if not overridden.

  • inplace (bool) – If True the categories are not reordered.

  • move_oldcat (bool) – If True the category page (and talkpage) is copied to the new category.

  • delete_oldcat (bool) – If True the oldcat page and talkpage are deleted (or nominated for deletion) if it is empty.

  • title_regex – Only pages (and subcats) with a title that matches the regex are moved.

  • history (bool) – If True the history of the oldcat is posted on the talkpage of newcat.

  • pagesonly (bool) – If True only move pages, not subcategories.

  • deletion_comment (int | str) – Either string or special value: DELETION_COMMENT_AUTOMATIC: use a generated message, DELETION_COMMENT_SAME_AS_EDIT_COMMENT: use the same message for delete that is used for the edit summary of the pages whose category was changed (see the comment param above). If the value is not recognized, it’s interpreted as DELETION_COMMENT_AUTOMATIC.

  • move_comment – If set, uses this as the edit summary on the actual move of the category page. Otherwise, defaults to the value of the comment parameter.

  • wikibase (bool) – If True, update the Wikibase item of the old category.

  • allow_split (bool) – If False only moves page and talk page together.

  • move_together (bool) – If True moves the pages/subcategories only if page and talk page could be moved or both source page and target page don’t exist.

  • generator – a generator from pagegenerators.GeneratorFactory. If given an intersection to the oldcat category members is used.

DELETION_COMMENT_AUTOMATIC = 0#
DELETION_COMMENT_SAME_AS_EDIT_COMMENT = 1#
static check_move(name, old_page, new_page)[source]#

Return if the old page can be safely moved to the new page.

Parameters:
  • name (str) – Title of the new page

  • old_page (pywikibot.page.BasePage) – Page to be moved

  • new_page (pywikibot.page.BasePage) – Page to be moved to

Returns:

True if possible to move page, False if not page move not possible

Return type:

bool

run()[source]#

The main bot function that does all the work.

For readability it is split into several helper functions: - _movecat() - _movetalk() - _hist() - _change() - _delete()

Changed in version 8.0: if a page generator is given to the bot, the intersection with pagegenerators.CategorizedPageGenerator() or pagegenerators.SubCategoriesPageGenerator() is used.

Return type:

None

class scripts.category.CategoryPreprocess(follow_redirects=False, edit_redirects=False, create=False, **kwargs)[source]#

Bases: BaseBot

A class to prepare a list of pages for robots.

Parameters:
  • follow_redirects (bool)

  • edit_redirects (bool)

  • create (bool)

determine_template_target(page)[source]#

Return template page to be categorized.

Categories for templates can be included in <includeonly> section of template doc page.

Also the doc page can be changed by doc template parameter.

TODO: decide if/how to enable/disable this feature.

Parameters:

page (Page) – Page to be processed.

Returns:

Page to be categorized.

Return type:

Page

determine_type_target(page)[source]#

Return page to be categorized by type.

Parameters:

page (Page) – Existing, missing or redirect page to be processed.

Returns:

Page to be categorized.

Return type:

Page | None

class scripts.category.CategoryTidyRobot(cat_title, cat_db, namespaces=None, comment=None)[source]#

Bases: Bot, CategoryPreprocess

Robot to move members of a category into sub- or super-categories.

Specify the category title on the command line. The robot will pick up the page, look for all sub- and super-categories, and show them listed as possibilities to move page into with an assigned number. It will ask you to type number of the appropriate replacement, and performs the change robotically. It will then automatically loop over all pages in the category.

If you don’t want to move the member to a sub- or super-category, but to another category, you can use the ‘j’ (jump) command.

By typing ‘s’ you can leave the complete page unchanged.

By typing ‘m’ you can show more content of the current page, helping you to find out what the page is about and in which other categories it currently is.

Parameters:
  • cat_title (str | None) – a title of the category to process.

  • cat_db (CategoryDatabase object) – a CategoryDatabase object.

  • namespaces (iterable of pywikibot.Namespace) – namespaces to focus on.

  • comment (str | None) – a custom summary for edits.

move_to_category(member, original_cat, current_cat)[source]#

Ask whether to move it to one of the sub- or super-categories.

Given a page in the original_cat category, ask the user whether to move it to one of original_cat’s sub- or super-categories. Recursively run through subcategories’ subcategories.

Note

current_cat is only used for internal recursion. You should always use current_cat = original_cat.

Parameters:
  • member (Page) – a page to process.

  • original_cat (Category) – original category to replace.

  • current_cat (Category) – a category which is questioned.

Return type:

None

teardown()[source]#

Cleanups after run operation.

Return type:

None

treat(page)[source]#

Process page.

Return type:

None

class scripts.category.CategoryTreeRobot(cat_title, cat_db, filename=None, max_depth=10)[source]#

Bases: object

Robot to create tree overviews of the category structure.

Parameters:
  • root. (* cat_title - The category which will be the tree's)

  • object. (* cat_db - A CategoryDatabase)

  • listed. (* max_depth - The limit beyond which no subcategories will be) – This also guarantees that loops in the category structure won’t be a problem.

  • print (* filename - The textfile where the tree should be saved; None to) – the tree to stdout.

  • max_depth (int)

run()[source]#

Handle the multi-line string generated by treeview.

After string was generated by treeview it is either printed to the console or saved it to a file.

Return type:

None

treeview(cat, current_depth=0, parent=None)[source]#

Return a tree view of all subcategories of cat.

The multi-line string contains a tree view of all subcategories of cat, up to level max_depth. Recursively calls itself.

Parameters:
  • opening. (* cat - the Category of the node we're currently)

  • tree (* current_depth - the current level in the)

  • from. (* parent - the Category of the category we're coming)

  • current_depth (int)

Return type:

str

class scripts.category.CleanBot(**kwargs)[source]#

Bases: Bot

Automatically cleans up specified category.

Removes redundant grandchildren from specified category by removing direct link to grandparent.

In another words a grandchildren should not be also a children.

Stubs categories are exception.

Note

For details please read:

Added in version 7.0.

skip_page(cat)[source]#

Check whether the category should be processed.

Return type:

bool

treat(child)[source]#

Process the category.

Return type:

None

update_options: dict[str, Any] = {'recurse': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.category.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments.

Return type:

None

category_graph script#

Visualizes category hierarchy

Generates graphical representation in formats dot, svg and html5 of category hierarchy.

Usage:

pwb.py category_graph [-style STYLE] [-depth DEPTH] [-from FROM] [-to TO]

actions:

-from [FROM]   Category name to scan, default is main category, "?" to ask.

optional arguments:

-to TO         base file name to save, "?" to ask
-style STYLE   graphviz style definitions in dot format (see below)
-depth DEPTH   maximal hierarchy depth. 2 by default
-downsize K    font size divider for subcategories. 4 by default
               Use 1 for the same font size

See also

https://graphviz.org/doc/info/attrs.html for graphviz style definitions.

Example

Visualizes main category:

pwb.py -v category_graph -from

Extended example with style settings:

pwb.py category_graph -from Life -downsize 1.5 \
-style 'graph[rankdir=BT ranksep=0.5] node[shape=circle style=filled \
fillcolor=green] edge[style=dashed penwidth=3]'

Added in version 8.0.

class scripts.category_graph.CategoryGraphBot(args)[source]#

Bases: SingleSiteBot

Bot to create graph of the category structure.

Parameters:

args (argparse.Namespace)

run()[source]#

Main function of CategoryGraphBot.

Return type:

None

scan_level(cat, level, hue=None)[source]#

Recursive function to fill dot graph.

Parameters:
  • cat – the Category of the node we’re currently opening.

  • level – the current decreasing from depth to zero level in the tree (for recursion), opposite of depth.

Return type:

str

static setup_args(ap)[source]#

Declares arguments.

scripts.category_graph.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

category_redirect script#

This bot will move pages out of redirected categories

The bot will look for categories that are marked with a category redirect template, take the first parameter of the template as the target of the redirect, and move all pages and subcategories of the category there. It also changes hard redirects into soft redirects, and fixes double redirects. A log is written under <userpage>/category_redirect_log. A log is written under <userpage>/category_edit_requests if a page cannot be moved to be done manually. Only category pages that haven’t been edited for a certain cooldown period (default 7 days) are taken into account.

The following parameters are supported:

-always           If used, the bot won't ask if it should add the specified
                  text

-delay:#          Set an amount of days. If the category is edited more
                  recently than given days, ignore it. Default is 7.

-tiny             Only loops over Category:Non-empty_category_redirects and
                  moves all images, pages and categories in redirect categories
                  to the target category.

-category:<cat>   Category to be used with this script. If not given
                  either wikibase entries Q4616723 or Q8099903 are used.

Usage:

python pwb.py category_redirect [options]

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

class scripts.category_redirect.CategoryRedirectBot(**kwargs)[source]#

Bases: ConfigParserBot, SingleSiteBot, AutomaticTWSummaryBot

Page category update bot.

Changed in version 7.0: CategoryRedirectBot is a ConfigParserBot

Changed in version 9.0: A logentry is writen to <userpage>/category_edit_requests if a page cannot be moved

check_hard_redirect()[source]#

Check for hard-redirected categories.

Check categories that are not already marked with an appropriate softredirect template and replace the content with a redirect template.

Return type:

None

check_soft_redirect()[source]#

Check for soft-redirected categories.

Return type:

None

get_cat()[source]#

Specify the category page.

get_log_text()[source]#

Rotate log text and return the most recent text.

load_record()[source]#

Load record from data file and create a backup file.

Return type:

None

move_contents(old_cat_title, new_cat_title, edit_summary)[source]#

The worker function that moves pages out of oldCat into newCat.

Parameters:
  • old_cat_title (str)

  • new_cat_title (str)

  • edit_summary (str)

Return type:

tuple[int, int]

ready_to_edit(cat)[source]#

Return True if cat not edited during cooldown period, else False.

run()[source]#

Run the bot.

Return type:

None

setup_hard_redirect()[source]#

Setup hard redirect task.

setup_soft_redirect()[source]#

Setup soft redirect task.

teardown()[source]#

Write self.record to file and save logs.

Return type:

None

touch(page)[source]#

Touch the given page.

Return type:

None

update_options: dict[str, Any] = {'category': '', 'delay': 7, 'tiny': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.category_redirect.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

change_pagelang script#

This script changes the content language of pages

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-setlang          What language the pages should be set to

-always           If a language is already set for a page, always change
                  it to the one set in -setlang.

-never            If a language is already set for a page, never change
                  it to the one set in -setlang (keep the current
                  language).

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

Added in version 5.1.

class scripts.change_pagelang.ChangeLangBot(**kwargs)[source]#

Bases: ConfigParserBot, SingleSiteBot

Change page language bot.

Changed in version 7.0: ChangeLangBot is a ConfigParserBot

changelang(page)[source]#

Set page language.

Parameters:

page (pywikibot.page.BasePage) – The page to update and save

Return type:

None

treat(page)[source]#

Treat a page.

Parameters:

page (pywikibot.page.BasePage) – The page to treat

Return type:

None

update_options: dict[str, Any] = {'never': False, 'setlang': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.change_pagelang.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

checkimages script#

Script to check recently uploaded files

This script checks if a file description is present and if there are other problems in the image’s description.

This script will have to be configured for each language. Please submit translations as addition to the Pywikibot framework.

Everything that needs customisation is indicated by comments.

This script understands the following command-line arguments:

-limit              The number of images to check (default: 80)

-commons            The bot will check if an image on Commons has the same name
                    and if true it reports the image.

-duplicates[:#]     Checking if the image has duplicates (if arg, set how many
                    rollback wait before reporting the image in the report
                    instead of tag the image) default: 1 rollback.

-duplicatesreport   Report the duplicates in a log *AND* put the template in
                    the images.

-maxusernotify      Maximum notifications added to a user talk page in a single
                    check, to avoid email spamming.

-sendemail          Send an email after tagging.

-break              To break the bot after the first check (default: recursive)

-sleep[:#]          Time in seconds between repeat runs (default: 30)

-wait[:#]           Wait x second before check the images (default: 0)

-skip[:#]           The bot skip the first [:#] images (default: 0)

-start[:#]          Use allimages() as generator
                    (it starts already from File:[:#])

-cat[:#]            Use a category as generator

-regex[:#]          Use regex, must be used with -url or -page

-page[:#]           Define the name of the wikipage where are the images

-url[:#]            Define the url where are the images

-nologerror         If given, this option will disable the error that is risen
                    when the log is full.

Instructions for the real-time settings. For every new block you have to add:

<------- ------->

In this way the bot can understand where the block starts in order to take the right parameter.

  • Name= Set the name of the block

  • Find= search this text in the image’s description

  • Findonly= search for exactly this text in the image’s description

  • Summary= That’s the summary that the bot will use when it will notify the

    problem.

  • Head= That’s the incipit that the bot will use for the message.

  • Text= This is the template that the bot will use when it will report the

    image’s problem.

Changed in version 8.4: Welcome messages are imported from scripts.welcome script.

scripts.checkimages.CATEGORIES_WITH_LICENSES = ('Q4481876', 'Q7451504')#

Category items with the licenses; subcategories may contain other licenses.

Changed in version 7.2: uses wikibase items instead of category titles.

class scripts.checkimages.CheckImagesBot(site, log_full_number=25000, sendemail_active=False, duplicates_report=False, log_full_error=True, max_user_notify=None)[source]#

Bases: object

A robot to check recently uploaded files.

Initializer, define some instance variables.

Parameters:
  • log_full_number (int)

  • sendemail_active (bool)

  • duplicates_report (bool)

  • log_full_error (bool)

check_image_duplicated(duplicates_rollback)[source]#

Function to check the duplicated files.

Return type:

bool

check_image_on_commons()[source]#

Checking if the file is on commons.

Return type:

bool

check_step()[source]#

Check a single file page.

Return type:

None

find_additional_problems()[source]#

Extract additional settings from configuration page.

Return type:

None

ignore_server_errors = False#
static important_image(list_given)[source]#

Get tuples of image and time, return the most used or oldest image.

Changed in version 7.2: itertools.zip_longest is used to stop using_pages as soon as possible.

Parameters:

list_given (list[tuple[float, FilePage]]) – a list of tuples which hold seconds and FilePage

Returns:

the most used or oldest image

Return type:

FilePage

is_tagged()[source]#

Understand if a file is already tagged or not.

Return type:

bool

static load(raw)[source]#

Load a list of objects from a string using regex.

Return type:

list[str]

load_hidden_templates()[source]#

Function to load the white templates.

Return type:

None

load_licenses()[source]#

Load the list of the licenses.

Changed in version 7.2: return a set instead of a list for quicker lookup.

Return type:

set[Page]

mini_template_check(template)[source]#

Check if template is in allowed licenses or in licenses to skip.

Return type:

bool

put_mex_in_talk()[source]#

Function to put the warning in talk page of the uploader.

When the bot find that the usertalk is empty it adds the welcome message first. The messages are imported from welcome.py script.

Return type:

None

regex_generator(regexp, textrun)[source]#

Find page to yield using regex to parse text.

Return type:

Generator[FilePage]

report(newtext, image_to_report, notification=None, head=None, notification2=None, unver=True, comm_talk=None, comm_image=None)[source]#

Function to make the reports easier.

Parameters:

unver (bool)

Return type:

None

report_image(image_to_report, rep_page=None, com=None, rep_text=None, addings=True)[source]#

Report the files to the report page when needed.

Parameters:

addings (bool)

Return type:

bool

set_parameters(image)[source]#

Set parameters.

Return type:

None

skip_images(skip_number, limit)[source]#

Given a number of files, skip the first -number- files.

Return type:

bool

smart_detection()[source]#

Detect templates.

The bot instead of checking if there’s a simple template in the image’s description, checks also if that template is a license or something else. In this sense this type of check is smart.

Return type:

tuple[str, bool]

tag_image(put=True)[source]#

Add template to the Image page and find out the uploader.

Parameters:

put (bool)

Return type:

bool

takesettings()[source]#

Function to take the settings from the wiki.

Return type:

None

template_in_list()[source]#

Check if template is in list.

The problem is the calls to the MediaWiki system because they can be pretty slow. While searching in a list of objects is really fast, so first of all let’s see if we can find something in the info that we already have, then make a deeper check.

Return type:

None

static upload_bot_change_function(report_page_text, upload_bot_array)[source]#

Detect the user that has uploaded the file through upload bot.

Return type:

str

static wait(generator, wait_time)[source]#

Skip the images uploaded before x seconds.

Let the users to fix the image’s problem alone in the first x seconds.

Return type:

Generator[FilePage]

exception scripts.checkimages.LogIsFull(arg)[source]#

Bases: Error

Log is full and the bot cannot add other data to prevent Errors.

Parameters:

arg (Exception | str)

Return type:

None

scripts.checkimages.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

bool

scripts.checkimages.print_with_time_zone(message)[source]#

Print the messages followed by the TimeZone encoded correctly.

Return type:

None

claimit script#

A script that adds claims to Wikidata items based on a list of pages

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Usage:

python pwb.py claimit [pagegenerators] P1 Q2 P123 Q456

You can use any typical pagegenerator (like categories) to provide with a list of pages. Then list the property–>target pairs to add.

For geographic coordinates:

python pwb.py claimit [pagegenerators] P625 [lat-dec],[long-dec],[prec]

[lat-dec] and [long-dec] represent the latitude and longitude respectively, and [prec] represents the precision. All values are in decimal degrees, not DMS. If [prec] is omitted, the default precision is 0.0001 degrees.

Example

python pwb.py claimit [pagegenerators] P625 -23.3991,-52.0910,0.0001

By default, claimit.py does not add a claim if one with the same property already exists on the page. To override this behavior, use the ‘exists’ option:

python pwb.py claimit [pagegenerators] P246 "string example" -exists:p

Suppose the claim you want to add has the same property as an existing claim and the “-exists:p” argument is used. Now, claimit.py will not add the claim if it has the same target, source, and/or the existing claim has qualifiers. To override this behavior, add ‘t’ (target), ‘s’ (sources), or ‘q’ (qualifiers) to the ‘exists’ argument.

For instance, to add the claim to each page even if one with the same property and target and some qualifiers already exists:

python pwb.py claimit [pagegenerators] P246 "string example" -exists:ptq

Note that the ordering of the letters in the ‘exists’ argument does not matter, but ‘p’ must be included.

class scripts.claimit.ClaimRobot(claims, exists_arg='', **kwargs)[source]#

Bases: WikidataBot

A bot to add Wikidata claims.

Parameters:
  • claims (list) – A list of wikidata claims

  • exists_arg (str) – String specifying how to handle duplicate claims

treat_page_and_item(page, item)[source]#

Treat each page.

Parameters:
  • page (pywikibot.page.BasePage) – The page to update and change

  • item (pywikibot.page.ItemPage) – The item to treat

Return type:

None

use_from_page = None#
scripts.claimit.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

clean_sandbox script#

This bot resets a (user) sandbox with predefined text

This script understands the following command-line arguments:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-hours:#       Use this parameter if to make the script repeat itself
               after # hours. Hours can be defined as a decimal. 0.01
               hours are 36 seconds; 0.1 are 6 minutes.

-delay:#       Use this parameter for a wait time after the last edit
               was made. If no parameter is given it takes it from
               hours and limits it between 5 and 15 minutes.
               The minimum delay time is 5 minutes.

-text          The text that substitutes in the sandbox, you can use this
               when you haven't configured clean_sandbox for your wiki.

-textfile      As an alternative to -text, you can use this to provide
               a file containing the text to be used.

-summary       Summary of the edit made by the bot. Overrides the default
               from i18n.

This script is a ConfigParserBot. All local parameters can be given inside a scripts.ini file. Options passed to the script are priorized over options read from ini file.

For example:

[clean_sandbox]
# the parameter section for clean_sandbox script
summary = Bot: Cleaning sandbox
text = {{subst:Clean Sandbox}}
hours: 0.5
delay: 7
class scripts.clean_sandbox.SandboxBot(**kwargs)[source]#

Bases: Bot, ConfigParserBot

Sandbox reset bot.

available_options: dict[str, Any] = {'delay': -1, 'hours': -1.0, 'summary': '', 'text': ''}#

Handler configuration attribute. Only the keys of the dict can be passed as __init__ options. The values are the default values. Overwrite this in subclasses!

run()[source]#

Run bot.

Return type:

None

scripts.clean_sandbox.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

commons_information script#

This bot adds a language template to the file’s description field

The Information template is commonly used to provide formatting to the basic information for files (description, source, author, etc.). The description field should provide brief but complete information about the image. The description format should use Language templates like {{En}} or {{De}} to specify the language of the description. This script adds these langage templates if missing. For example the description of

{{Information
 | Description = A simplified icon for [[Pywikibot]]
 | Date = 2003-06-14
 | Other fields =
}}

will be analyzed as en language by ~100 % accurancy and the bot replaces its content by

{{Information
 | Description = {{en|A simplified icon for [[Pywikibot]]}}
 | Date = 2003-06-14
 | Other fields =
}}

Note

langdetect package is needed for fully support of language detection. Install it with::

pip install langdetect

This script understands the following command-line arguments:

This script supports use of pagegenerators arguments.

Usage:

python pwb.py commons_information [pagegenerators]

You can use any typical pagegenerator (like categories) to provide with a list of pages. If no pagegenerator is given, transcluded pages from Information template are used.

Hint

This script uses commons site as default. For other sites use the global -site option.

Example for going through all files:

python pwb.py commons_information -start:File:!

Added in version 6.0.

Changed in version 9.2: accelerate script with preloading pages; use commons as default site; use transcluded pages of Information template.

class scripts.commons_information.InformationBot(**kwargs)[source]#

Bases: SingleSiteBot, ExistingPageBot

Bot for the Information template.

Initialzer.

comment = {'en': 'Bot: wrap the description parameter of Information in the appropriate language template'}#
desc_params = ('Description', 'description')#
static detect_langs(text)[source]#

Detect language from given text.

Parameters:

text (str)

get_description(template)[source]#

Get description parameter.

lang_tmp_cat = 'Language templates'#
process_desc_other(wikicode, nodes)[source]#

Process other description text.

The description text may consist of different Node types except of Template which is handled by process_desc_template(). Combine all nodes and replace the last with new created Template while removing the remaining from wikicode.

Added in version 9.2.

Parameters:
  • wikicode (Wikicode) – The Wikicode of the parsed page text.

  • nodes (list[Node]) – wikitext nodes to be processed

Returns:

whether the description nodes were changed

Return type:

bool

process_desc_template(template)[source]#

Process description template.

Parameters:

template (Template) – a mwparserfromhell Template found in the description parameter of Information template.

Returns:

whether the template node was changed.

Return type:

bool

static replace_value(param, value)[source]#

Replace param node with given value.

Parameters:
  • param (Node)

  • value (Template)

Return type:

None

treat_page()[source]#

Treat current page.

Return type:

None

scripts.commons_information.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

commonscat script#

With this tool you can add the template {{commonscat}} to categories

The tool works by following the interwiki links. If the template is present on another language page, the bot will use it.

You could probably use it at articles as well, but this isn’t tested.

The following parameters are supported:

-checkcurrent     Work on all category pages that use the primary commonscat
                  template.

This script is a ConfigParserBot. The following options can be set within a settings file which is scripts.ini by default::

-always           Don't prompt you for each replacement. Warning message
                  has not to be confirmed. ATTENTION: Use this with care!

-summary:XYZ      Set the action summary message for the edit to XYZ,
                  otherwise it uses messages from add_text.py as default.

This bot uses pagegenerators to get a list of pages. The following options are supported:

This script supports use of pagegenerators arguments.

For example to go through all categories:

python pwb.py commonscat -start:Category:!
class scripts.commonscat.CommonscatBot(**kwargs)[source]#

Bases: ConfigParserBot, ExistingPageBot

Commons categorisation bot.

Changed in version 7.0: CommonscatBot is a ConfigParserBot

Parameters:

kwargs (Any) – bot options

Keyword Arguments:

generator – a generator processed by run() method

changeCommonscat(page=None, oldtemplate='', oldcat='', newtemplate='', newcat='', linktitle='')[source]#

Change the current commonscat template and target.

Parameters:
  • oldtemplate (str)

  • oldcat (str)

  • newtemplate (str)

  • newcat (str)

  • linktitle (str)

Return type:

None

Return the name of a valid commons category.

If the page is a redirect this function tries to follow it. If the page doesn’t exists the function will return an empty string

Parameters:

name (str)

Find CommonsCat template on interwiki pages.

Returns:

name of a valid commons category

Return type:

str

find_commons_category(page)[source]#

Find CommonsCat template on Wikibase repository.

Use Wikibase property to get the category if possible. Otherwise check all langlinks to find it.

Returns:

name of a valid commons category

Return type:

str

Find CommonsCat template on page.

Return type:

tuple of (<templatename>, <target>, <linktext>, <note>)

static skipPage(page)[source]#

Determine if the page should be skipped.

Return type:

bool

skip_page(page)[source]#

Skip category redirects.

treat_page()[source]#

Add CommonsCat template to page.

Take a page. Go to all the interwiki page looking for a commonscat template. When all the interwiki’s links are checked and a proper category is found add it to the page.

Return type:

None

update_options: dict[str, Any] = {'summary': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_disambigs: bool | None = False#

Attribute to determine whether to use disambiguation pages. Set it to True to use disambigs only, set it to False to skip disambigs. If None both are processed.

Added in version 7.2.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.commonscat.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

coordinate_import script#

Coordinate importing script

Usage:

   python pwb.py coordinate_import -site:wikipedia:en \
-cat:Category:Coordinates_not_on_Wikidata

This will work on all pages in the category “coordinates not on Wikidata” and will import the coordinates on these pages to Wikidata.

The data from the “GeoData” extension (https://www.mediawiki.org/wiki/Extension:GeoData) is used so that extension has to be setup properly. You can look at the [[Special:Nearby]] page on your local Wiki to see if it’s populated.

You can use any typical pagegenerator to provide with a list of pages:

   python pwb.py coordinate_import -lang:it -family:wikipedia -namespace:0 \
-transcludes:Infobox_stazione_ferroviaria

You can also run over a set of items on the repo without coordinates and try to import them from any connected page. To do this, you have to explicitly provide the repo as the site using -site argument.

Example

python pwb.py coordinate_import -site:wikidata:wikidata -namespace:0

-querypage:Deadendpages

The following command line parameters are supported:

-always           If used, the bot won't ask if it should add the specified
                  text

-create           Create items for pages without one.

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

This script supports use of pagegenerators arguments.

class scripts.coordinate_import.CoordImportRobot(**kwargs)[source]#

Bases: ConfigParserBot, WikidataBot

A bot to import coordinates to Wikidata.

Changed in version 7.0: CoordImportRobot is a ConfigParserBot

has_coord_qualifier(claims)[source]#

Check if self.prop is used as property for a qualifier.

Parameters:

claims (dict) – the Wikibase claims to check in

Returns:

the first property for which self.prop is used as qualifier, or None if any

Return type:

str | None

item_has_coordinates(item)[source]#

Check if the item has coordinates.

Returns:

whether the item has coordinates

Return type:

bool

treat_page_and_item(page, item)[source]#

Treat page/item.

Return type:

None

try_import_coordinates_from_page(page, item)[source]#

Try import coordinate from the given page to the given item.

Returns:

whether any coordinates were found and the import was successful

Return type:

bool

use_from_page = None#
scripts.coordinate_import.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line argument

Return type:

None

cosmetic_changes script#

This module can do slight modifications to tidy a wiki page’s source code

The changes are not supposed to change the look of the rendered wiki page.

The following parameters are supported:

-always           Don't prompt you for each replacement. Warning (see below)
                  has not to be confirmed. ATTENTION: Use this with care!

-async            Put page on queue to be saved to wiki asynchronously.

-summary:XYZ      Set the summary message text for the edit to XYZ, bypassing
                  the predefined message texts with original and replacements
                  inserted.

-ignore:          Ignores if an error occurred and either skips the page or
                  only that method. It can be set to:
                  all - dos not ignore errors
                  match - ignores ISBN related errors (default)
                  method - ignores fixing method errors
                  page - ignores page related errors

The following generators and filters are supported:

This script supports use of pagegenerators arguments.

ATTENTION: You can run this script as a stand-alone for testing purposes. However, the changes that are made are only minor, and other users might get angry if you fill the version histories and watchlists with such irrelevant changes. Some wikis prohibit stand-alone running.

For further information see pywikibot/cosmetic_changes.py

class scripts.cosmetic_changes.CosmeticChangesBot(**kwargs)[source]#

Bases: AutomaticTWSummaryBot, ExistingPageBot

Cosmetic changes bot.

Parameters:

kwargs (Any) – bot options

Keyword Arguments:

generator – a generator processed by run() method

summary_key: str | None = 'cosmetic_changes-standalone'#

Must be defined in subclasses.

treat_page()[source]#

Treat page with the cosmetic toolkit.

Changed in version 7.0: skip if InvalidPageError is raised

Return type:

None

update_options: dict[str, Any] = {'async': False, 'ignore': CANCEL.MATCH, 'summary': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.cosmetic_changes.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

create_isbn_edition script#

Pywikibot script to load ISBN related data into Wikidata

Pywikibot script to get ISBN data from a digital library, and create or amend the related Wikidata item for edition (with the P212=ISBN number as unique external ID).

Use digital libraries to get ISBN data in JSON format, and integrate the results into Wikidata.

Then the resulting item number can be used e.g. to generate Wikipedia references using template Cite_Q.

param All parameters are optional:

P1: digital library (default goob “-“)

bnf Catalogue General (France) bol Bol.com dnb Deutsche National Library goob Google Books kb National Library of the Netherlands loc Library of Congress US mcues Ministerio de Cultura (Spain) openl OpenLibrary.org porbase urn.porbase.org Portugal sbn Servizio Bibliotecario Nazionale wiki wikipedia.org worldcat WorldCat

P2: ISO 639-1 language code

Default LANG; e.g. en, nl, fr, de, es, it, etc.

P3 P4…: P/Q pairs to add additional claims (repeated)

e.g. P921 Q107643461 (main subject: database management linked to P2163 Fast ID)

param stdin:

ISBN numbers (International standard book number)

Free text (e.g. Wikipedia references list, or publication list) is accepted. Identification is done via an ISBN regex expression.

Functionality:
  • The ISBN number is used as a primary key (P212 where no duplicates are allowed. The item update is not performed when there is no unique match

  • Statements are added or merged incrementally; existing data is not overwritten.

  • Authors and publishers are searched to get their item number (ambiguous items are skipped)

  • Book title and subtitle are separated with ‘.’, ‘:’, or ‘-’

  • This script can be run incrementally with the same parameters Caveat: Take into account the Wikidata Query database replication delay. Wait for minimum 5 minutes to avoid creating duplicate objects.

Data quality:
  • Use https://query.wikidata.org/querybuilder/ to identify P212 duplicates. Merge duplicate items before running the script again.

  • The following properties should only be used for written works P5331: OCLC work ID (editions should only have P243) P8383: Goodreads-identificatiecode for work (editions should only have P2969)

Examples

Default library (Google Books), language (LANG), no additional statements:

pwb create_isbn_edition.py 9789042925564

Wikimedia, language Dutch, main subject: database management:

pwb create_isbn_edition.py wiki en P921 Q107643461 978-0-596-10089-6

Standard ISBN properties:

P31:Q3331189:   instance of edition
P50:    author
P123:   publisher
P212:   canonical ISBN number (lookup via Wikidata Query)
P407:   language of work (Qnumber linked to ISO 639-1 language code)
P577:   date of publication (year)
P1476:  book title
P1680:  subtitle

Other ISBN properties:

P291:   place of publication
P921:   main subject (inverse lookup from external Fast ID P2163)
P629:   work for edition
P747:   edition of work
P1104:  number of pages

Qualifiers:

P1545:  (author) sequence number

External identifiers:

P213:   ISNI ID
P243:   OCLC ID
P496:   ORCID iD
P675:   Google Books-identificatiecode
P1036:  Dewey Decimal Classification
P2163:  Fast ID (inverse lookup via Wikidata Query) -> P921: main subject
P2969:  Goodreads-identificatiecode

(only for written works)
P5331:  OCLC work ID (editions should only have P243)
P8383:  Goodreads-identificatiecode for work (editions should only
        have P2969)
Author:

Geert Van Pamel, 2022-08-04, GNU General Public License v3.0, User:Geertivp

Documentation:
Prerequisites:

pywikibot

Install the following ISBN lib packages:: https://pypi.org/search/?q=isbnlib_

pip install isbnlib (mandatory)

(optional) pip install isbnlib-bol pip install isbnlib-bnf pip install isbnlib-dnb pip install isbnlib-kb pip install isbnlib-loc pip install isbnlib-worldcat2 etc.

Restrictions:
  • Better use the ISO 639-1 language code parameter as a default

    The language code is not always available from the digital library.

  • SPARQL queries run on a replicated database

    Possible important replication delay; wait 5 minutes before retry – otherwise risk for creating duplicates.

Algorithm:

# Get parameters # Validate parameters # Get ISBN data # Convert ISBN data # Get additional data # Register ISBN data into Wikidata (create or amend items or claims)

Environment:

The python script can run on the following platforms::

    Linux client
    Google Chromebook (Linux container)
    Toolforge Portal
    PAWS

LANG: ISO 639-1 language code

Applications:

Generate a book reference
    Example: {{Cite Q|Q63413107}} (wp.en)
    See also::
        https://meta.wikimedia.org/wiki/WikiCite
        https://www.wikidata.org/wiki/Q21831105 (WikiCite)
        https://www.wikidata.org/wiki/Q22321052 (Cite_Q)
        https://www.mediawiki.org/wiki/Global_templates
        https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData
        https://phabricator.wikimedia.org/tag/wikicite/
        https://meta.wikimedia.org/wiki/WikiCite/Shared_Citations
Wikidata Query:
Related projects:
Other systems:

Added in version 7.7.

scripts.create_isbn_edition.add_claims(isbn_data)[source]#

Inspect isbn_data and add claims if possible.

Parameters:

isbn_data (dict[str, Any])

Return type:

None

scripts.create_isbn_edition.amend_isbn_edition(isbn_number)[source]#

Amend ISBN registration.

Amend Wikidata, by registering the ISBN-13 data via P212, depending on the data obtained from the digital library.

Parameters:

isbn_number (str) – ISBN number (10 or 13 digits with optional hyphens)

Return type:

None

scripts.create_isbn_edition.get_item_list(item_name, instance_id)[source]#

Get list of items by name, belonging to an instance (list).

Parameters:
  • item_name (str) – Item name (case sensitive)

  • instance_id – Instance ID (string, set, or list)

Returns:

Set of items (Q-numbers)

scripts.create_isbn_edition.is_in_list(statement_list, checklist)[source]#

Verify if statement list contains at least one item from the checklist.

Parameters:
  • statement_list – Statement list

  • checklist (list[str]) – List of values

Returns:

True when match

Return type:

bool

scripts.create_isbn_edition.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

scripts.create_isbn_edition.show_final_information(number, doi)[source]#

Print additional information.

dataextend script#

Script to add properties, identifiers and sources to WikiBase items

Usage:

dataextend <item> [<property>[+*]] [args]

In the basic usage, where no property is specified, item is the Q-number of the item to work on.from html import unescape

If a property (P-number, or the special value ‘Wiki’ or ‘Data’) is specified, only the data from that identifier are added. With a ‘+’ after it, work starts on that identifier, then goes on to identifiers after that (including new identifiers added while working on those identifiers). With a ‘*’ after it, the identifier itself is skipped, but those coming after it (not those coming before it) are included.

The following parameters are supported:

-always    If this is supplied, the bot will not ask for permission
           after each external link has been handled.

-showonly  Only show claims for a given ItemPage. Don't try to add any
           properties

The bot will load the corresponding pages for these identifiers, and try to the meaning of that string for the specified type of thing (for example ‘city’ or ‘gender’). If you want to use it, but not save it (which can happen if the string specifies a certain value now, but might show another value elsewhere, or if it is so specific that you’re pretty sure it won’t occur a second time), you can provide the Q-number with X rather than Q. If you do not want to use the string, you can just hit enter, or give the special value ‘XXX’ which means that it will be skipped in each subsequent run as well.

After an identifier has been worked on, there might be a list of names that has been found, in lc:name format, where lc is a language code. You can accept all suggested names (answer Y), none (answer N) or ask to get asked for each name separately (answer S), the latter being the default if you do not fill in anything.

After all identifiers have been worked on, possible descriptions in various languages are presented, and you get to choose one. The default is here 0, which always is the current description for that language. Finally, for a number of identifiers text is shown that usually gives parts of the description that are hard to parse automatically, so you can see if there any additional pieces of data that can be added.

It is advisable to (re)load the item page that the bot has been working on in the browser afterward, to correct any mistakes it has made, or cases where a more precise and less precise value have both been included.

Added in version 7.2.

class scripts.dataextend.AKLAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AbartAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagedescriptions(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AcademiaeGroninganaeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

getentry(naam, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AcademicTreeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findadvisors(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

finddocstudents(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findwebsite(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AcademieFrancaiseAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findawards(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AcademieRouenAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AccademiaCruscaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AdultFilmAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findethnicities(html)[source]#
Parameters:

html (str)

findeyecolor(html)[source]#
Parameters:

html (str)

findfloruitstart(html)[source]#
Parameters:

html (str)

findhaircolor(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AgorhaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AinmAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

getvalue(field, html, category=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AlkindiAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstnames(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

instanceof(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AlvinAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AmericanArtAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AmericanBiographyAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.Analyzer(ident, data=None, item=None, bot=None)[source]#

Bases: object

SCRIPTRE = re.compile('(?s)<script.*?</script>', re.DOTALL)#
TAGRE = re.compile('<[^<>]*>')#
property alturl#
static commastrip(term)[source]#
property extraurls: list[str]#
findallbyre(regex, html, dtype=None, skips=None, alt=None)[source]#
Return type:

list[str]

findbyre(regex, html, dtype=None, skips=None, alt=None)[source]#
Return type:

str

findclaims()[source]#
Return type:

list[tuple[str, str, Analyzer | None]]

finddefaultmixedrefs(html, includesocial=True)[source]#
finddescriptions(html)[source]#
Parameters:

html (str)

findwikipedianames(html)[source]#
Parameters:

html (str)

getdata(dtype, text, ask=True)[source]#
getdescriptions()[source]#
getlanguage(code)[source]#
getnames()[source]#
longtext()[source]#
prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

static singlespace(text)[source]#
property url#
class scripts.dataextend.AngelicumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

instanceof(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AnimeConsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finalscript(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArchivesDuSpectacleAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArmbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findassociations(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtHistoriansAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtUkAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtcyclopediaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmovements(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArticArtistAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtistsCanadaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtnetAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AthenaeumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AustrianBiographicalAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AuteursLuxembourgAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AutoresArAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BabelioAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BacklinkAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findadvisors(html)[source]#
Parameters:

html (str)

findawards(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddocstudents(html)[source]#
Parameters:

html (str)

findkins(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findparticipantins(html)[source]#
Parameters:

html (str)

findpartners(html)[source]#
Parameters:

html (str)

findpartofs(html)[source]#
Parameters:

html (str)

findparts(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findsiblings(html)[source]#
Parameters:

html (str)

findsources(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

findstudents(html)[source]#
Parameters:

html (str)

findteachers(html)[source]#
Parameters:

html (str)

getrelations(relation, html)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BandcampAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BdelAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findparties(html)[source]#
Parameters:

html (str)

findranks(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BdfaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findheight(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findsportteams(html)[source]#
Parameters:

html (str)

findteampositions(html)[source]#
Parameters:

html (str)

findweight(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BedethequeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findpseudonyms(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BelgianPhotographerAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None, alt=None)[source]#
getvalues(field, html, dtype=None, alt=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BenezitAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findisntanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BenezitUrlAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findisntanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

indinstanceof(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BewebAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findviaf(html)[source]#
Parameters:

html (str)

findwebpages(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BibliotecaNacionalAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BibsysAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findviaf(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BiografischPortaalAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findsources(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BiuSanteAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BnaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

instanceof(html)[source]#
Parameters:

html (str)

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BnbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findviaf(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BneAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findviaf(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BnfAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findcountry(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BookTradeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddeathdate(html)[source]#
Parameters:

html (str)

findfloruit(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BritishExecutionsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findcausedeath(html)[source]#
Parameters:

html (str)

findcrimes(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmannerdeath(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BritishMuseumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddetails(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BrooklynMuseumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CageMatchAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findheight(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

findweights(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CanadianBiographyAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findspouses(html)[source]#
Parameters:

html (str)

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CanticAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: MarcAnalyzer

finddescriptions(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CbdbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnationalities(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CcedAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findreligion(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CerlAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None, link=False)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CesarAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.Chess365Analyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findchesstitle(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findparticipations(html)[source]#
Parameters:

html (str)

findsportcountries(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CinemagiaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CiniiAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ClaraAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)