Scripts package#

Scripts folder contains predefined scripts easy to use.

Scripts are only available with Pywikibot if installed in directory mode and not as site package. They can be run in command line using the pwb wrapper script:

python pwb.py <global options> <name_of_script> <options>

Every script provides a -help option which shows all available options, their explanation and usage examples. Global options will be shown by -help:global or using:

python pwb.py -help

The advantages of pwb.py wrapper script are:

  • check for framework and script depedencies and show a warning if a package is missing or outdated or if the Python release does not fit

  • check whether user config file (user-config.py) is available and ask to create it by starting the generate_user_files.py script

  • enable global options even if a script does not support them

  • start private scripts located in userscripts sub-folder

  • find a script even if given script name does not match a filename e.g. due to spelling mistake

scripts.base_dir = PosixPath('/src/scripts')#

defines the entry point for pywikibot-scripts package

add_text script#

Append text to the top or bottom of a page

By default this adds the text to the bottom above the categories and interwiki.

Use the following command line parameters to specify what to add:

-text

(str) Text to append. “n” are interpreted as newlines.

-textfile

(str) Path to a file with text to append

-summary

(str) Change summary to use

-up

Append text to the top of the page rather than the bottom

-create

Create the page if necessary. Note that talk pages are created already without of this option.

-createonly

Only create the page but do not edit existing ones

-always

If used, the bot won’t ask if it should add the specified text

-major

If used, the edit will be saved without the “minor edit” flag

-talk, -talkpage

Put the text onto the talk page instead

-excepturl

(str) Skip pages with a url that matches this regular expression

-noreorder

Place the text beneath the categories and interwiki

Furthermore, the following can be used to specify which pages to process…

This script supports use of pagegenerators arguments.

Examples

Append ‘hello world’ to the bottom of the sandbox:

python pwb.py add_text -page:Wikipedia:Sandbox
-summary:"Bot: pywikibot practice" -text:"hello world"

Add a template to the top of the pages with ‘category:catname’:

python pwb.py add_text -cat:catname -summary:"Bot: Adding a template"
-text:"{{Something}}" -except:"\{\{([Tt]emplate:|)[Ss]omething" -up

Command used on it.wikipedia to put the template in the page without any category:

python pwb.py add_text -except:"\{\{([Tt]emplate:|)[Cc]ategorizzare"
-text:"{{Categorizzare}}" -excepturl:"class='catlinks'>" -uncat
-summary:"Bot: Aggiungo template Categorizzare"
class scripts.add_text.AddTextBot(**kwargs)[source]#

Bases: AutomaticTWSummaryBot, ExistingPageBot

A bot which adds a text to a page.

Parameters:

kwargs (Any) – bot options

Keyword Arguments:

generator – a generator processed by run() method

setup()[source]#

Read text to be added from file.

Return type:

None

skip_page(page)[source]#

Skip if -exceptUrl matches or page does not exists.

summary_key: str | None = 'add_text-adding'#

Must be defined in subclasses.

property summary_parameters#

Return a dictionary of all parameters for i18n.

Line breaks are replaced by dash.

treat_page()[source]#

Add text to the page.

Return type:

None

update_options: dict[str, Any] = {'always': False, 'create': False, 'createonly': False, 'minor': True, 'regex_skip_url': '', 'reorder': True, 'summary': '', 'talk_page': False, 'text': '', 'textfile': '', 'up': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.add_text.main(*argv)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

argv (str) – Command line arguments

Return type:

None

scripts.add_text.parse(argv, generator_factory)[source]#

Parses our arguments and provide a dictionary with their values.

Parameters:
  • argv (Sequence[str]) – input arguments to be parsed

  • generator_factory (GeneratorFactory) – factory that will determine what pages to process

Returns:

dictionary with our parsed arguments

Raises:

ValueError – if we receive invalid arguments

Return type:

dict[str, bool | str]

archivebot script#

archivebot.py - discussion page archiving bot

usage:

python pwb.py archivebot [OPTIONS] [TEMPLATE_PAGE]

Several TEMPLATE_PAGE templates can be given at once. Default is User:MiszaBot/config. Bot examines backlinks (Special:WhatLinksHere) to all TEMPLATE_PAGE templates. Then goes through all pages (unless a specific page specified using options) and archives old discussions. This is done by breaking a page into threads, then scanning each thread for timestamps. Threads older than a specified threshold are then moved to another page (the archive), which can be named either basing on the thread’s name or then name can contain a counter which will be incremented when the archive reaches a certain size.

Transcluded template may contain the following parameters:

{{TEMPLATE_PAGE
|archive =
|algo =
|counter =
|maxarchivesize =
|minthreadsleft =
|minthreadstoarchive =
|archiveheader =
|key =
}}

Meanings of parameters are:

archive

Name of the page to which archived threads will be put. Must be a subpage of the current page. Variables are supported.

algo

Specifies the maximum age of a thread. Must be in the form old(<delay>) where <delay> specifies the age in seconds (s), hours (h), days (d), weeks (w), or years (y) like 24h or 5d. Default is old(24h).

counter

The current value of a counter which could be assigned as variable. Will be updated by bot. Initial value is 1.

maxarchivesize

The maximum archive size before incrementing the counter. Value can be given with appending letter like K or M which indicates KByte or MByte. Default value is 200K.

minthreadsleft

Minimum number of threads that should be left on a page. Default value is 5.

minthreadstoarchive

The minimum number of threads to archive at once. Default value is 2.

archiveheader

Content that will be put on new archive pages as the header. This parameter supports the use of variables. Default value is {{talkarchive}}.

key

A secret key that (if valid) allows archives not to be subpages of the page being archived.

Variables below can be used in the value for “archive” in the template above; numbers are latin digits. Alternatively you may use localized digits. This is only available for a few site languages. Refer NON_LATIN_DIGITS whether there is a localized one.

latin

localized

Description

%(counter)d

%(localcounter)s

the current value of the counter

%(year)d

%(localyear)s

year of the thread being archived

%(isoyear)d

%(localisoyear)s

ISO year of the thread being archived

%(isoweek)d

%(localisoweek)s

ISO week number of the thread being archived

%(semester)d

%(localsemester)s

semester term of the year of the thread being archived

%(quarter)d

%(localquarter)s

quarter of the year of the thread being archived

%(month)d

%(localmonth)s

month (as a number 1-12) of the thread being archived

%(monthname)s

localized name of the month above

%(monthnameshort)s

first three letters of the name above

%(week)d

%(localweek)s

week number of the thread being archived

The ISO calendar starts with the Monday of the week which has at least four days in the new Gregorian calendar. If January 1st is between Monday and Thursday (including), the first week of that year started the Monday of that week, which is in the year before if January 1st is not a Monday. If it’s between Friday or Sunday (including) the following week is then the first week of the year. So up to three days are still counted as the year before.

Options (may be omitted):

-help

show this help message and exit

-calc:PAGE

calculate key for PAGE and exit

-file:FILE

load list of pages from FILE

-force

override security options

-locale:LOCALE

switch to locale LOCALE

-namespace:NS

only archive pages from a given namespace

-page:PAGE

archive a single PAGE, default ns is a user talk page

-salt:SALT

specify salt

-keep

Preserve thread order in archive even if threads are archived later

-sort

Sort archive by timestamp; should not be used with keep

-async

Run the bot in parallel tasks.

Changed in version 7.6: Localized variables for “archive” template parameter are supported. User:MiszaBot/config is the default template. -keep option was added.

Changed in version 7.7: -sort and -async options were added.

Changed in version 8.2: KeyboardInterrupt was enabled with -async option.

exception scripts.archivebot.ArchiveBotSiteConfigError(arg)[source]#

Bases: Error

There is an error originated by archivebot’s on-site configuration.

Parameters:

arg (Exception | str)

Return type:

None

exception scripts.archivebot.ArchiveSecurityError(arg)[source]#

Bases: ArchiveBotSiteConfigError

Page title is not a valid archive of page being archived.

The page title is neither a subpage of the page being archived, nor does it match the key specified in the archive configuration template.

Parameters:

arg (Exception | str)

Return type:

None

class scripts.archivebot.DiscussionPage(source, archiver, params=None, keep=False)[source]#

Bases: Page

A class that represents a single page of discussion threads.

Feed threads to it and run an update() afterwards.

feed_thread(thread, max_archive_size)[source]#

Append a new thread to the archive.

Parameters:
Return type:

bool

is_full(max_archive_size)[source]#

Check whether archive size exceeded.

Parameters:

max_archive_size (tuple[int, str])

Return type:

bool

load_page()[source]#

Load the page to be archived and break it up into threads.

Changed in version 7.6: If -keep option is given run through all threads and set the current timestamp to the previous if the current is lower.

Changed in version 7.7: Load unsigned threads using timestamp of the next thread.

Return type:

None

static max(ts1, ts2)[source]#

Calculate the maximum of two timestamps but allow None as value.

Added in version 7.6.

Parameters:
Return type:

Timestamp | None

size()[source]#

Return size of talk page threads.

Note that this method counts bytes, rather than codepoints (characters). This corresponds to MediaWiki’s definition of page size.

Changed in version 7.6: return 0 if archive page neither exists nor has threads (T313886).

Return type:

int

update(summary, sort_threads=False)[source]#

Recombine threads and save page.

Parameters:

sort_threads (bool)

Return type:

None

class scripts.archivebot.DiscussionThread(title, timestripper)[source]#

Bases: object

An object representing a discussion thread on a page.

It represents something that is of the form:

== Title of thread ==

Thread content here. ~~~~
:Reply, etc. ~~~~
Parameters:
feed_line(line)[source]#

Add a line to the content and find the newest timestamp.

Parameters:

line (str)

Return type:

None

size()[source]#

Return size of discussion thread.

Note that the result is NOT equal to that of len(self.to_text()). This method counts bytes, rather than codepoints (characters). This corresponds to MediaWiki’s definition of page size.

Return type:

int

to_text()[source]#

Return wikitext discussion thread.

Return type:

str

exception scripts.archivebot.MalformedConfigError(arg)[source]#

Bases: ArchiveBotSiteConfigError

There is an error in the configuration template.

Parameters:

arg (Exception | str)

Return type:

None

exception scripts.archivebot.MissingConfigError(arg)[source]#

Bases: ArchiveBotSiteConfigError

The config is missing in the header.

It’s in one of the threads or transcluded from another page.

Parameters:

arg (Exception | str)

Return type:

None

class scripts.archivebot.PageArchiver(page, template, salt, force=False, keep=False, sort=False)[source]#

Bases: object

A class that encapsulates all archiving methods.

Parameters:
  • page (pywikibot.Page) – a page object to be archived

  • template (pywikibot.Page) – a template with configuration settings

  • salt (str) – salt value

  • force (bool) – override security value

  • keep (bool)

  • sort (bool)

algo = 'none'#
analyze_page()[source]#

Analyze DiscussionPage.

Return type:

set[tuple[str, str]]

attr2text()[source]#

Return a template with archiver saveable attributes.

Return type:

str

get_archive_page(title, params=None)[source]#

Return the page for archiving.

If it doesn’t exist yet, create and cache it. Also check for security violations.

Parameters:

title (str)

Return type:

DiscussionPage

get_attr(attr, default='')[source]#

Get an archiver attribute.

Return type:

Any

get_params(timestamp, counter)[source]#

Make params for archiving template.

Parameters:

counter (int)

Return type:

dict

key_ok()[source]#

Return whether key is valid.

Return type:

bool

load_config()[source]#

Load and validate archiver template.

Return type:

None

preload_pages(counter, thread, pattern)[source]#

Preload pages if counter matters.

Parameters:

counter (int)

Return type:

None

run()[source]#

Process a single DiscussionPage object.

Return type:

None

saveables()[source]#

Return a list of saveable attributes.

Return type:

list[str]

set_attr(attr, value, out=True)[source]#

Set an archiver attribute.

Parameters:

out (bool)

Return type:

None

should_archive_thread(thread)[source]#

Check whether a thread has to be archived.

Returns:

the archivation reason as a tuple of localization args

Parameters:

thread (DiscussionThread)

Return type:

tuple[str, str] | None

scripts.archivebot.calc_md5_hexdigest(txt, salt)[source]#

Return md5 hexdigest computed from text and salt.

Return type:

str

scripts.archivebot.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

scripts.archivebot.process_page(page, *args)[source]#

Call PageArchiver for a single page.

Returns:

Return True to continue with the next page, False to break the loop.

Parameters:

args (Any)

Return type:

bool

Added in version 7.6.

Changed in version 7.7: pass an unspecified number of arguments to the bot using *args

scripts.archivebot.show_md5_key(calc, salt, site)[source]#

Show calculated MD5 hexdigest.

Return type:

bool

scripts.archivebot.str2localized_duration(site, string)[source]#

Localise a shorthand duration.

Translates a duration written in the shorthand notation (ex. “24h”, “7d”) into an expression in the local wiki language (“24 hours”, “7 days”).

Parameters:

string (str)

Return type:

str

scripts.archivebot.str2size(string)[source]#

Return a size for a shorthand size.

Accepts a string defining a size:

1337 - 1337 bytes
150K - 150 kilobytes
2M - 2 megabytes
Returns:

a tuple (size, unit), where size is an integer and unit is 'B' (bytes) or 'T' (threads).

Parameters:

string (str)

Return type:

tuple[int, str]

scripts.archivebot.template_title_regex(tpl_page)[source]#

Return a regex that matches to variations of the template title.

It supports the transcluding variant as well as localized namespaces and case-insensitivity depending on the namespace.

Parameters:

tpl_page (pywikibot.page.Page) – The template page

Return type:

Pattern

basic script#

An incomplete sample script

This is not a complete bot; rather, it is a template from which simple bots can be made. You can rename it to mybot.py, then edit it in whatever way you want.

Use global -simulate option for test purposes. No changes to live wiki will be done.

The following parameters are supported:

-always

The bot won’t ask for confirmation when putting a page

-text:

Use this text to be added; otherwise ‘Test’ is used

-replace:

Don’t add text but replace it

-top

Place additional text on top of the page

-summary:

Set the action summary message for the edit.

This sample script is a ConfigParserBot. All settings can be made either by giving option with the command line or with a settings file which is scripts.ini by default. If you don’t want the default values you can add any option you want to change to that settings file below the [basic] section like:

[basic] ; inline comments starts with colon
# This is a commend line. Assignments may be done with '=' or ':'
text: A text with line break and
    continuing on next line to be put
replace: yes ; yes/no, on/off, true/false and 1/0 is also valid
summary = Bot: My first test edit with pywikibot

Every script has its own section with the script name as header.

In addition the following generators and filters are supported but cannot be set by settings file:

This script supports use of pagegenerators arguments.

class scripts.basic.BasicBot(site=True, **kwargs)[source]#

Bases: SingleSiteBot, ConfigParserBot, ExistingPageBot, AutomaticTWSummaryBot

An incomplete sample bot.

Variables:

summary_key – Edit summary message key. The message that should be used is placed on /i18n subdirectory. The file containing these messages should have the same name as the caller script (i.e. basic.py in this case). Use summary_key to set a default edit summary message.

Parameters:
  • site (BaseSite | bool | None)

  • kwargs (Any)

Create a SingleSiteBot instance.

Parameters:
  • site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.

  • kwargs (Any)

summary_key: str | None = 'basic-changing'#

Must be defined in subclasses.

treat_page()[source]#

Load the given page, do some changes, and save it.

Return type:

None

update_options: dict[str, Any] = {'replace': False, 'summary': None, 'text': 'Test', 'top': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.basic.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

blockpageschecker script#

A bot to remove stale protection templates from unprotected pages

Very often sysops block the pages for a set time but then they forget to remove the warning! This script is useful if you want to remove those useless warning left in these pages.

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-protectedpages

Check all the blocked pages; useful when you have not categories or when you have problems with them. (add the namespace after “:” where you want to check - default checks all protected pages.)

-moveprotected

Same as -protectedpages, for moveprotected pages

This script is a ConfigParserBot. The following options can be set within a settings file which is scripts.ini by default:

-always

Doesn’t ask every time whether the bot should make the change. Do it always.

-show

When the bot can’t delete the template from the page (wrong regex or something like that) it will ask you if it should show the page on your browser.

Attention

Pages included may give false positives!

-move

The bot will check if the page is blocked also for the move option, not only for edit

Examples:

python pwb.py blockpageschecker -always

python pwb.py blockpageschecker -cat:Geography -always

python pwb.py blockpageschecker -show -protectedpages:4
class scripts.blockpageschecker.CheckerBot(site=True, **kwargs)[source]#

Bases: ConfigParserBot, ExistingPageBot, SingleSiteBot

Bot to remove stale protection templates from unprotected pages.

Changed in version 7.0: CheckerBot is a ConfigParserBot

Create a SingleSiteBot instance.

Parameters:
  • site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.

  • kwargs (Any)

static invoke_editor(page)[source]#

Ask for an editor and invoke it.

Return type:

None

remove_templates()[source]#

Understand if the page is blocked has the right template.

setup()[source]#

Initialize the coroutine for parsing templates.

Return type:

None

skip_page(page)[source]#

Skip if the user has not permission to edit.

teardown()[source]#

Close the coroutine.

Return type:

None

treat_page()[source]#

Load the given page, do some changes, and save it.

Return type:

None

update_options: dict[str, Any] = {'move': False, 'show': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.blockpageschecker.main(*args)[source]#

Process command line arguments and perform task.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

category script#

Script to manage categories

Syntax:

python pwb.py category action [-option]

where action can be one of these

add

mass-add a category to a list of pages.

remove

remove category tag from all pages in a category. If a pagegenerators option is given, the intersection with category pages is processed.

move

move all pages in a category to another category. If a pagegenerators option is given, the intersection with category pages is processed.

tidy

tidy up a category by moving its pages into subcategories.

tree

show a tree of subcategories of a given category.

listify

make a list of all of the articles that are in a category.

clean

Removes redundant grandchildren from specified category by removing direct link to grandparent. In another words a grandchildren should not be also a children.

and option can be one of these

Options for add action:

-person

Sort persons by their last name.

-create

If a page doesn’t exist, do not skip it, create it instead.

-redirect

Follow redirects.

Options for listify action:

-append

This appends the list to the current page that is already existing (appending to the bottom by default).

-overwrite

This overwrites the current page with the list even if something is already there.

-showimages

This displays images rather than linking them in the list.

-talkpages

This outputs the links to talk pages of the pages to be listified in addition to the pages themselves.

-prefix:#

You may specify a list prefix like “#” for a numbered list or any other prefix. Default is a bullet list with prefix “*”.

Options for remove action:

-nodelsum

This specifies not to use the custom edit summary as the deletion reason. Instead, it uses the default deletion reason for the language, which is “Category was disbanded” in English.

Options for move action:

-hist

Creates a nice wikitable on the talk page of target category that contains detailed page history of the source category.

-nodelete

Don’t delete the old category after move.

-nowb

Don’t update the Wikibase repository.

-allowsplit

If that option is not set, it only moves the talk and main page together.

-mvtogether

Only move the pages/subcategories of a category, if the target page (and talk page, if -allowsplit is not set) doesn’t exist.

-keepsortkey

Use sortKey of the old category also for the new category. If not specified, sortKey is removed. An alternative method to keep sortKey is to use -inplace option.

Options for listify and tidy actions:

-namespaces, -namespace, -ns

Filter the arcitles in the specified namespaces. Separate multiple namespace numbers or names with commas. Examples: -ns:0,2,4, -ns:Help,MediaWiki

Options for clean action:

-always

The bot won’t ask for confirmation when putting a page.

Options for several actions:

-rebuild

Reset the database.

-from:

The category to move from (for the move option). Also, the category to remove from in the remove option. Also, the category to make a list of in the listify option.

-to:

The category to move to (for the move option). Also, the name of the list to make in the listify option.

-batch

Don’t prompt to delete emptied categories (do it automatically).

-summary:

Pick a custom edit summary for the bot.

-inplace

Use this flag to change categories in place rather than rearranging them.

-recurse[:<depth>]

Recurse through subcategories of the category to optional depth.

-pagesonly

While removing pages from a category, keep the subpage links and do not remove them.

-match

Only work on pages whose titles match the given regex (for move and remove actions).

-depth:

The max depth limit beyond which no subcategories will be isted.

Note

If the category names have spaces in them you may need to use a special syntax in your shell so that the names aren’t treated as separate parameters. For instance, in BASH, use single quotes, e.g. -from:'Polar bears'.

If action is “add”, “move” or “remove, the following additional options are supported:

This script supports use of pagegenerators arguments.

For the actions tidy and tree, the bot will store the category structure locally in category.dump. This saves time and server load, but if it uses these data later, they may be outdated; use the -rebuild parameter in this case.

For example, to create a new category from a list of persons, type:

python pwb.py category add -person

and follow the on-screen instructions.

Or to do it all from the command-line, use the following syntax:

python pwb.py category move -from:US -to:”United States”

This will move all pages in the category US to the category United States.

A pagegenerators option can be given with move and remove action:

pwb category -site:wikipedia:en remove -from:Hydraulics -cat:Pneumatics

The sample above would remove ‘Hydraulics’ category from all pages which are also in ‘Pneumatics’ category.

Changed in version 8.0: pagegenerators are supported with “move” and “remove” action.

class scripts.category.CategoryAddBot(generator, newcat=None, sort_by_last_name=False, create=False, comment='', follow_redirects=False)[source]#

Bases: CategoryPreprocess

A robot to mass-add a category to a list of pages.

Parameters:
  • sort_by_last_name (bool)

  • create (bool)

  • comment (str)

  • follow_redirects (bool)

static sorted_by_last_name(catlink, pagelink)[source]#

Return a Category with key that sorts persons by their last name.

Parameters: catlink - The Category to be linked.

pagelink - the Page to be placed in the category.

Trailing words in brackets will be removed. Example: If category_name is ‘Author’ and pl is a Page to [[Alexandre Dumas (senior)]], this function will return this Category: [[Category:Author|Dumas, Alexandre]].

Return type:

Page

treat(page)[source]#

Process one page.

Return type:

None

class scripts.category.CategoryDatabase(rebuild=False, filename='category.dump.bz2')[source]#

Bases: object

Temporary database saving pages and subcategories for each category.

This prevents loading the category pages over and over again.

Parameters:
  • rebuild (bool)

  • filename (str)

dump(filename=None)[source]#

Save the dictionaries to disk if not empty.

Pickle the contents of the dictionaries superclass_db and cat_content_db if at least one is not empty. If both are empty, removes the file from the disk.

If the filename is None, it’ll use the filename determined in __init__.

Return type:

None

get_articles(cat)[source]#

Return the list of pages for a given category.

Saves this list in a temporary database so that it won’t be loaded from the server next time it’s required.

Return type:

set[Page]

get_subcats(supercat)[source]#

Return the list of subcategories for a given supercategory.

Saves this list in a temporary database so that it won’t be loaded from the server next time it’s required.

Return type:

set[Category]

get_supercats(subcat)[source]#

Return the supercategory (or a set of) for a given subcategory.

Return type:

set[Category]

property is_loaded: bool#

Return whether the contents have been loaded.

rebuild()[source]#

Rebuild the dabatase.

Return type:

None

class scripts.category.CategoryListifyRobot(cat_title, list_title, edit_summary, append=False, overwrite=False, show_images=False, *, talk_pages=False, recurse=False, namespaces=None, **kwargs)[source]#

Bases: object

Create a list containing all of the members in a category.

Parameters:
  • cat_title (str | None)

  • list_title (str | None)

  • edit_summary (str)

  • append (bool)

  • overwrite (bool)

  • show_images (bool)

  • talk_pages (bool)

  • recurse (int | bool)

run()[source]#

Start bot.

Return type:

None

class scripts.category.CategoryMoveRobot(oldcat, newcat=None, batch=False, comment='', inplace=False, move_oldcat=True, delete_oldcat=True, title_regex=None, history=False, pagesonly=False, deletion_comment=0, move_comment=None, wikibase=True, allow_split=False, move_together=False, keep_sortkey=None, generator=None)[source]#

Bases: CategoryPreprocess

Change or remove the category from the pages.

If the new category is given changes the category from the old to the new one. Otherwise remove the category from the page and the category if it’s empty.

Per default the operation applies to pages and subcategories.

Added in version 8.0: The generator parameter.

Store all given parameters in the objects attributes.

Parameters:
  • oldcat – The move source.

  • newcat – The move target.

  • batch (bool) – If True the user has not to confirm the deletion.

  • comment (str) – The edit summary for all pages where the category is changed, and also for moves and deletions if not overridden.

  • inplace (bool) – If True the categories are not reordered.

  • move_oldcat (bool) – If True the category page (and talkpage) is copied to the new category.

  • delete_oldcat (bool) – If True the oldcat page and talkpage are deleted (or nominated for deletion) if it is empty.

  • title_regex – Only pages (and subcats) with a title that matches the regex are moved.

  • history (bool) – If True the history of the oldcat is posted on the talkpage of newcat.

  • pagesonly (bool) – If True only move pages, not subcategories.

  • deletion_comment (int | str) – Either string or special value: DELETION_COMMENT_AUTOMATIC: use a generated message, DELETION_COMMENT_SAME_AS_EDIT_COMMENT: use the same message for delete that is used for the edit summary of the pages whose category was changed (see the comment param above). If the value is not recognized, it’s interpreted as DELETION_COMMENT_AUTOMATIC.

  • move_comment – If set, uses this as the edit summary on the actual move of the category page. Otherwise, defaults to the value of the comment parameter.

  • wikibase (bool) – If True, update the Wikibase item of the old category.

  • allow_split (bool) – If False only moves page and talk page together.

  • move_together (bool) – If True moves the pages/subcategories only if page and talk page could be moved or both source page and target page don’t exist.

  • generator – a generator from pagegenerators.GeneratorFactory. If given an intersection to the oldcat category members is used.

DELETION_COMMENT_AUTOMATIC = 0#
DELETION_COMMENT_SAME_AS_EDIT_COMMENT = 1#
static check_move(name, old_page, new_page)[source]#

Return if the old page can be safely moved to the new page.

Parameters:
  • name (str) – Title of the new page

  • old_page (pywikibot.page.BasePage) – Page to be moved

  • new_page (pywikibot.page.BasePage) – Page to be moved to

Returns:

True if possible to move page, False if not page move not possible

Return type:

bool

run()[source]#

The main bot function that does all the work.

For readability it is split into several helper functions: - _movecat() - _movetalk() - _hist() - _change() - _delete()

Changed in version 8.0: if a page generator is given to the bot, the intersection with pagegenerators.CategorizedPageGenerator() or pagegenerators.SubCategoriesPageGenerator() is used.

Return type:

None

class scripts.category.CategoryPreprocess(follow_redirects=False, edit_redirects=False, create=False, **kwargs)[source]#

Bases: BaseBot

A class to prepare a list of pages for robots.

Parameters:
  • follow_redirects (bool)

  • edit_redirects (bool)

  • create (bool)

determine_template_target(page)[source]#

Return template page to be categorized.

Categories for templates can be included in <includeonly> section of template doc page.

Also the doc page can be changed by doc template parameter.

TODO: decide if/how to enable/disable this feature.

Parameters:

page (Page) – Page to be processed.

Returns:

Page to be categorized.

Return type:

Page

determine_type_target(page)[source]#

Return page to be categorized by type.

Parameters:

page (Page) – Existing, missing or redirect page to be processed.

Returns:

Page to be categorized.

Return type:

Page | None

class scripts.category.CategoryTidyRobot(cat_title, cat_db, namespaces=None, comment=None)[source]#

Bases: Bot, CategoryPreprocess

Robot to move members of a category into sub- or super-categories.

Specify the category title on the command line. The robot will pick up the page, look for all sub- and super-categories, and show them listed as possibilities to move page into with an assigned number. It will ask you to type number of the appropriate replacement, and performs the change robotically. It will then automatically loop over all pages in the category.

If you don’t want to move the member to a sub- or super-category, but to another category, you can use the ‘j’ (jump) command.

By typing ‘s’ you can leave the complete page unchanged.

By typing ‘m’ you can show more content of the current page, helping you to find out what the page is about and in which other categories it currently is.

Parameters:
  • cat_title (str | None) – a title of the category to process.

  • cat_db (CategoryDatabase object) – a CategoryDatabase object.

  • namespaces (iterable of pywikibot.Namespace) – namespaces to focus on.

  • comment (str | None) – a custom summary for edits.

move_to_category(member, original_cat, current_cat)[source]#

Ask whether to move it to one of the sub- or super-categories.

Given a page in the original_cat category, ask the user whether to move it to one of original_cat’s sub- or super-categories. Recursively run through subcategories’ subcategories.

Note

current_cat is only used for internal recursion. You should always use current_cat = original_cat.

Parameters:
  • member (Page) – a page to process.

  • original_cat (Category) – original category to replace.

  • current_cat (Category) – a category which is questioned.

Return type:

None

teardown()[source]#

Cleanups after run operation.

Return type:

None

treat(page)[source]#

Process page.

Return type:

None

class scripts.category.CategoryTreeRobot(cat_title, cat_db, filename=None, max_depth=10)[source]#

Bases: object

Robot to create tree overviews of the category structure.

Parameters:
  • root. (* cat_title - The category which will be the tree's)

  • object. (* cat_db - A CategoryDatabase)

  • listed. (* max_depth - The limit beyond which no subcategories will be) – This also guarantees that loops in the category structure won’t be a problem.

  • print (* filename - The textfile where the tree should be saved; None to) – the tree to stdout.

  • max_depth (int)

run()[source]#

Handle the multi-line string generated by treeview.

After string was generated by treeview it is either printed to the console or saved it to a file.

Return type:

None

treeview(cat, current_depth=0, parent=None)[source]#

Return a tree view of all subcategories of cat.

The multi-line string contains a tree view of all subcategories of cat, up to level max_depth. Recursively calls itself.

Parameters:
  • opening. (* cat - the Category of the node we're currently)

  • tree (* current_depth - the current level in the)

  • from. (* parent - the Category of the category we're coming)

  • current_depth (int)

Return type:

str

class scripts.category.CleanBot(**kwargs)[source]#

Bases: Bot

Automatically cleans up specified category.

Removes redundant grandchildren from specified category by removing direct link to grandparent.

In another words a grandchildren should not be also a children.

Stubs categories are exception.

Note

For details please read:

Added in version 7.0.

skip_page(cat)[source]#

Check whether the category should be processed.

Return type:

bool

treat(child)[source]#

Process the category.

Return type:

None

update_options: dict[str, Any] = {'recurse': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.category.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments.

Return type:

None

category_graph script#

Visualizes category hierarchy

Generates graphical representation in formats dot, svg and html5 of category hierarchy.

Usage:

pwb.py category_graph [-style STYLE] [-depth DEPTH] [-from FROM] [-to TO]

actions:

-from [FROM]

Category name to scan, default is main category, “?” to ask.

optional arguments:

-to TO

base file name to save, “?” to ask

-style STYLE

graphviz style definitions in dot format (see below)

-depth DEPTH

maximal hierarchy depth. 2 by default

-downsize K

font size divider for subcategories. 4 by default Use 1 for the same font size

See also

https://graphviz.org/doc/info/attrs.html for graphviz style definitions.

Example

Visualizes main category:

pwb.py -v category_graph -from

Extended example with style settings:

pwb.py category_graph -from Life -downsize 1.5 -style ‘graph[rankdir=BT ranksep=0.5] node[shape=circle style=filled fillcolor=green] edge[style=dashed penwidth=3]’

Added in version 8.0.

class scripts.category_graph.CategoryGraphBot(args)[source]#

Bases: SingleSiteBot

Bot to create graph of the category structure.

Parameters:

args (argparse.Namespace)

run()[source]#

Main function of CategoryGraphBot.

Return type:

None

scan_level(cat, level, hue=None)[source]#

Recursive function to fill dot graph.

Parameters:
  • cat – the Category of the node we’re currently opening.

  • level – the current decreasing from depth to zero level in the tree (for recursion), opposite of depth.

Return type:

str

static setup_args(ap)[source]#

Declares arguments.

scripts.category_graph.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

category_redirect script#

This bot will move pages out of redirected categories

The bot will look for categories that are marked with a category redirect template, take the first parameter of the template as the target of the redirect, and move all pages and subcategories of the category there. It also changes hard redirects into soft redirects, and fixes double redirects. A log is written under <userpage>/category_redirect_log. A log is written under <userpage>/category_edit_requests if a page cannot be moved to be done manually. Only category pages that haven’t been edited for a certain cooldown period (default 7 days) are taken into account.

The following parameters are supported:

-always

If used, the bot won’t ask if it should add the specified text

-delay:#

Set an amount of days. If the category is edited more recently than given days, ignore it. Default is 7.

-tiny

Only loops over Category:Non-empty_category_redirects and moves all images, pages and categories in redirect categories to the target category.

-category:<cat>

Category to be used with this script. If not given either wikibase entries Q4616723 or Q8099903 are used.

Usage:

python pwb.py category_redirect [options]

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

class scripts.category_redirect.CategoryRedirectBot(**kwargs)[source]#

Bases: ConfigParserBot, SingleSiteBot, AutomaticTWSummaryBot

Page category update bot.

Changed in version 7.0: CategoryRedirectBot is a ConfigParserBot

Changed in version 9.0: A logentry is writen to <userpage>/category_edit_requests if a page cannot be moved

check_hard_redirect()[source]#

Check for hard-redirected categories.

Check categories that are not already marked with an appropriate softredirect template and replace the content with a redirect template.

Return type:

None

check_soft_redirect()[source]#

Check for soft-redirected categories.

Return type:

None

get_cat()[source]#

Specify the category page.

get_log_text()[source]#

Rotate log text and return the most recent text.

load_record()[source]#

Load record from data file and create a backup file.

Return type:

None

move_contents(old_cat_title, new_cat_title, edit_summary)[source]#

The worker function that moves pages out of oldCat into newCat.

Parameters:
  • old_cat_title (str)

  • new_cat_title (str)

  • edit_summary (str)

Return type:

tuple[int, int]

ready_to_edit(cat)[source]#

Return True if cat not edited during cooldown period, else False.

run()[source]#

Run the bot.

Return type:

None

setup_hard_redirect()[source]#

Setup hard redirect task.

setup_soft_redirect()[source]#

Setup soft redirect task.

teardown()[source]#

Write self.record to file and save logs.

Return type:

None

touch(page)[source]#

Touch the given page.

Return type:

None

update_options: dict[str, Any] = {'category': '', 'delay': 7, 'tiny': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.category_redirect.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

change_pagelang script#

This script changes the content language of pages

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-setlang

What language the pages should be set to

-always

If a language is already set for a page, always change it to the one set in -setlang.

-never

If a language is already set for a page, never change it to the one set in -setlang (keep the current language).

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

Added in version 5.1.

class scripts.change_pagelang.ChangeLangBot(**kwargs)[source]#

Bases: ConfigParserBot, SingleSiteBot

Change page language bot.

Changed in version 7.0: ChangeLangBot is a ConfigParserBot

changelang(page)[source]#

Set page language.

Parameters:

page (pywikibot.page.BasePage) – The page to update and save

Return type:

None

treat(page)[source]#

Treat a page.

Parameters:

page (pywikibot.page.BasePage) – The page to treat

Return type:

None

update_options: dict[str, Any] = {'never': False, 'setlang': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.change_pagelang.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

checkimages script#

Script to check recently uploaded files

This script checks if a file description is present and if there are other problems in the image’s description.

This script will have to be configured for each site. Please submit localisations as addition to the Pywikibot framework.

Everything that needs customisation is indicated by comments.

This script understands the following command-line arguments:

-limit

(int) The number of images to check (default: 80)

-commons

The bot will check if an image on Commons has the same name and if true it reports the image.

-duplicates[:#]

Checking if the image has duplicates (if arg, set how many rollback wait before reporting the image in the report instead of tag the image) default: 1 rollback.

-duplicatesreport

Report the duplicates in a log AND put the template in the images.

-maxusernotify

Maximum notifications added to a user talk page in a single check, to avoid email spamming.

-sendemail

Send an email after tagging.

-break

To break the bot after the first check (default: recursive)

-sleep[:#]

Time in seconds between repeat runs (default: 30)

-wait[:#]

Wait x second before check the images (default: 0)

-skip[:#]

The bot skip the first [:#] images (default: 0)

-start[:#]

Use allimages() as generator (it starts already from File:[:#])

-cat[:#]

Use a category as generator

-regex[:#]

Use regex, must be used with -url or -page

-page[:#]

Define the name of the wikipage where are the images

-url[:#]

Define the url where are the images

-nologerror

If given, this option will disable the error that is risen when the log is full.

Instructions for the real-time settings

For every new block you have to add:

<------- ------->

In this way the bot can understand where the block starts in order to take the right parameter:

Name=     Set the name of the block
Find=     search this text in the image's description
Findonly= search for exactly this text in the image's description
Summary=  That's the summary that the bot will use when it will
          notify the problem.
Head=     That's the incipit that the bot will use for the message.
Text=     This is the template that the bot will use when it will
          report the image's problem.

Changed in version 8.4: Welcome messages are imported from scripts.welcome script.

scripts.checkimages.CATEGORIES_WITH_LICENSES = ('Q4481876', 'Q7451504')#

Category items with the licenses; subcategories may contain other licenses.

Changed in version 7.2: uses wikibase items instead of category titles.

class scripts.checkimages.CheckImagesBot(site, log_full_number=25000, sendemail_active=False, duplicates_report=False, log_full_error=True, max_user_notify=None)[source]#

Bases: object

A robot to check recently uploaded files.

Initializer, define some instance variables.

Parameters:
  • log_full_number (int)

  • sendemail_active (bool)

  • duplicates_report (bool)

  • log_full_error (bool)

check_image_duplicated(duplicates_rollback)[source]#

Function to check the duplicated files.

Return type:

bool

check_image_on_commons()[source]#

Checking if the file is on commons.

Return type:

bool

check_step()[source]#

Check a single file page.

Return type:

None

find_additional_problems()[source]#

Extract additional settings from configuration page.

Return type:

None

ignore_server_errors = False#
static important_image(list_given)[source]#

Get tuples of image and time, return the most used or oldest image.

Changed in version 7.2: itertools.zip_longest is used to stop using_pages as soon as possible.

Parameters:

list_given (list[tuple[float, FilePage]]) – a list of tuples which hold seconds and FilePage

Returns:

the most used or oldest image

Return type:

FilePage

is_tagged()[source]#

Understand if a file is already tagged or not.

Return type:

bool

static load(raw)[source]#

Load a list of objects from a string using regex.

Return type:

list[str]

load_hidden_templates()[source]#

Function to load the white templates.

Return type:

None

load_licenses()[source]#

Load the list of the licenses.

Changed in version 7.2: return a set instead of a list for quicker lookup.

Return type:

set[Page]

mini_template_check(template)[source]#

Check if template is in allowed licenses or in licenses to skip.

Return type:

bool

put_mex_in_talk()[source]#

Function to put the warning in talk page of the uploader.

When the bot find that the usertalk is empty it adds the welcome message first. The messages are imported from welcome.py script.

Return type:

None

regex_generator(regexp, textrun)[source]#

Find page to yield using regex to parse text.

Return type:

Generator[FilePage]

report(newtext, image_to_report, notification=None, head=None, notification2=None, unver=True, comm_talk=None, comm_image=None)[source]#

Function to make the reports easier.

Parameters:

unver (bool)

Return type:

None

report_image(image_to_report, rep_page=None, com=None, rep_text=None, addings=True)[source]#

Report the files to the report page when needed.

Parameters:

addings (bool)

Return type:

bool

set_parameters(image)[source]#

Set parameters.

Return type:

None

skip_images(skip_number, limit)[source]#

Given a number of files, skip the first -number- files.

Return type:

bool

smart_detection()[source]#

Detect templates.

The bot instead of checking if there’s a simple template in the image’s description, checks also if that template is a license or something else. In this sense this type of check is smart.

Return type:

tuple[str, bool]

tag_image(put=True)[source]#

Add template to the Image page and find out the uploader.

Parameters:

put (bool)

Return type:

bool

takesettings()[source]#

Function to take the settings from the wiki.

Return type:

None

template_in_list()[source]#

Check if template is in list.

The problem is the calls to the MediaWiki system because they can be pretty slow. While searching in a list of objects is really fast, so first of all let’s see if we can find something in the info that we already have, then make a deeper check.

Return type:

None

static upload_bot_change_function(report_page_text, upload_bot_array)[source]#

Detect the user that has uploaded the file through upload bot.

Return type:

str

static wait(generator, wait_time)[source]#

Skip the images uploaded before x seconds.

Let the users to fix the image’s problem alone in the first x seconds.

Return type:

Generator[FilePage]

exception scripts.checkimages.LogIsFull(arg)[source]#

Bases: Error

Log is full and the bot cannot add other data to prevent Errors.

Parameters:

arg (Exception | str)

Return type:

None

scripts.checkimages.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

bool

scripts.checkimages.print_with_time_zone(message)[source]#

Print the messages followed by the TimeZone encoded correctly.

Return type:

None

claimit script#

A script that adds claims to Wikidata items based on a list of pages

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Usage:

python pwb.py claimit [pagegenerators] P1 Q2 P123 Q456

You can use any typical pagegenerator (like categories) to provide with a list of pages. Then list the property–>target pairs to add.

For geographic coordinates:

python pwb.py claimit [pagegenerators] P625 [lat-dec],[long-dec],[prec]

[lat-dec] and [long-dec] represent the latitude and longitude respectively, and [prec] represents the precision. All values are in decimal degrees, not DMS. If [prec] is omitted, the default precision is 0.0001 degrees.

Example

python pwb.py claimit [pagegenerators] P625 -23.3991,-52.0910,0.0001

By default, claimit.py does not add a claim if one with the same property already exists on the page. To override this behavior, use the ‘exists’ option:

python pwb.py claimit [pagegenerators] P246 “string example” -exists:p

Suppose the claim you want to add has the same property as an existing claim and the “-exists:p” argument is used. Now, claimit.py will not add the claim if it has the same target, source, and/or the existing claim has qualifiers. To override this behavior, add ‘t’ (target), ‘s’ (sources), or ‘q’ (qualifiers) to the ‘exists’ argument.

For instance, to add the claim to each page even if one with the same property and target and some qualifiers already exists:

python pwb.py claimit [pagegenerators] P246 “string example” -exists:ptq

Note that the ordering of the letters in the ‘exists’ argument does not matter, but ‘p’ must be included.

class scripts.claimit.ClaimRobot(claims, exists_arg='', **kwargs)[source]#

Bases: WikidataBot

A bot to add Wikidata claims.

Parameters:
  • claims (list) – A list of wikidata claims

  • exists_arg (str) – String specifying how to handle duplicate claims

treat_page_and_item(page, item)[source]#

Treat each page.

Parameters:
  • page (pywikibot.page.BasePage) – The page to update and change

  • item (pywikibot.page.ItemPage) – The item to treat

Return type:

None

use_from_page = None#
scripts.claimit.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

clean_sandbox script#

This bot resets a (user) sandbox with predefined text

This script understands the following command-line arguments:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-hours

(float) Use this parameter if to make the script repeat itself after the given hours. Hours can be defined as a decimal. 0.01 hours are 36 seconds; 0.1 are 6 minutes.

-delay

(int) Use this parameter for a wait time after the last edit was made. If no parameter is given it takes it from hours and limits it between 5 and 15 minutes. The minimum delay time is 5 minutes.

-text

(str) The text that substitutes in the sandbox, you can use this when you haven’t configured clean_sandbox for your wiki.

-textfile

(str) As an alternative to -text, you can use this to provide a file containing the text to be used.

-summary

(str) Summary of the edit made by the bot. Overrides the default from i18n.

This script is a ConfigParserBot. All local parameters can be given inside a scripts.ini file. Options passed to the script are priorized over options read from ini file.

For example:

[clean_sandbox]
# the parameter section for clean_sandbox script
summary = Bot: Cleaning sandbox
text = {{subst:Clean Sandbox}}
hours: 0.5
delay: 7
class scripts.clean_sandbox.SandboxBot(**kwargs)[source]#

Bases: Bot, ConfigParserBot

Sandbox reset bot.

available_options: dict[str, Any] = {'delay': -1, 'hours': -1.0, 'summary': '', 'text': ''}#

Handler configuration attribute. Only the keys of the dict can be passed as __init__ options. The values are the default values. Overwrite this in subclasses!

run()[source]#

Run bot.

Return type:

None

treat(page)[source]#

Treat a single page.

scripts.clean_sandbox.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

commons_information script#

This bot adds a language template to the file’s description field

The Information template is commonly used to provide formatting to the basic information for files (description, source, author, etc.). The description field should provide brief but complete information about the image. The description format should use Language templates like {{En}} or {{De}} to specify the language of the description. This script adds these langage templates if missing. For example the description of

{{Information
 | Description = A simplified icon for [[Pywikibot]]
 | Date = 2003-06-14
 | Other fields =
}}

will be analyzed as en language by ~100 % accurancy and the bot replaces its content by

{{Information
 | Description = {{en|A simplified icon for [[Pywikibot]]}}
 | Date = 2003-06-14
 | Other fields =
}}

Note

langdetect package is needed for fully support of language detection. Install it with:

pip install langdetect

This script understands the following command-line arguments:

This script supports use of pagegenerators arguments.

Usage:

python pwb.py commons_information [pagegenerators]

You can use any typical pagegenerator (like categories) to provide with a list of pages. If no pagegenerator is given, transcluded pages from Information template are used.

Hint

This script uses commons site as default. For other sites use the global -site option.

Example for going through all files:

python pwb.py commons_information -start:File:!

Added in version 6.0.

Changed in version 9.2: accelerate script with preloading pages; use commons as default site; use transcluded pages of Information template.

class scripts.commons_information.InformationBot(**kwargs)[source]#

Bases: SingleSiteBot, ExistingPageBot

Bot for the Information template.

Initialzer.

comment = {'en': 'Bot: wrap the description parameter of Information in the appropriate language template'}#
desc_params = ('Description', 'description')#
static detect_langs(text)[source]#

Detect language from given text.

Parameters:

text (str)

get_description(template)[source]#

Get description parameter.

lang_tmp_cat = 'Language templates'#
process_desc_other(wikicode, nodes)[source]#

Process other description text.

The description text may consist of different Node types except of Template which is handled by process_desc_template(). Combine all nodes and replace the last with new created Template while removing the remaining from wikicode.

Added in version 9.2.

Parameters:
  • wikicode (Wikicode) – The Wikicode of the parsed page text.

  • nodes (list[Node]) – wikitext nodes to be processed

Returns:

whether the description nodes were changed

Return type:

bool

process_desc_template(template)[source]#

Process description template.

Parameters:

template (Template) – a mwparserfromhell Template found in the description parameter of Information template.

Returns:

whether the template node was changed.

Return type:

bool

static replace_value(param, value)[source]#

Replace param node with given value.

Parameters:
  • param (Node)

  • value (Template)

Return type:

None

treat_page()[source]#

Treat current page.

Return type:

None

scripts.commons_information.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

commonscat script#

With this tool you can add the template {{commonscat}} to categories

The tool works by following the interwiki links. If the template is present on another language page, the bot will use it.

Warning

You could probably use it at articles as well, but this isn’t tested.

The following parameters are supported:

-checkcurrent

Work on all category pages that use the primary commonscat template.

This script is a ConfigParserBot. The following options can be set within a settings file which is scripts.ini by default:

-always

Don’t prompt you for each replacement. Warning message has not to be confirmed.

Attention

Use this with care!

-summary:XYZ

Set the action summary message for the edit to XYZ, otherwise it uses messages from add_text.py as default.

This bot uses pagegenerators to get a list of pages. The following options are supported:

This script supports use of pagegenerators arguments.

For example to go through all categories:

python pwb.py commonscat -start:Category:!

class scripts.commonscat.CommonscatBot(**kwargs)[source]#

Bases: ConfigParserBot, ExistingPageBot

Commons categorisation bot.

Changed in version 7.0: CommonscatBot is a ConfigParserBot

Parameters:

kwargs (Any) – bot options

Keyword Arguments:

generator – a generator processed by run() method

changeCommonscat(page=None, oldtemplate='', oldcat='', newtemplate='', newcat='', linktitle='')[source]#

Change the current commonscat template and target.

Parameters:
  • oldtemplate (str)

  • oldcat (str)

  • newtemplate (str)

  • newcat (str)

  • linktitle (str)

Return type:

None

Return the name of a valid commons category.

If the page is a redirect this function tries to follow it. If the page doesn’t exists the function will return an empty string

Parameters:

name (str)

Find CommonsCat template on interwiki pages.

Returns:

name of a valid commons category

Return type:

str

find_commons_category(page)[source]#

Find CommonsCat template on Wikibase repository.

Use Wikibase property to get the category if possible. Otherwise check all langlinks to find it.

Returns:

name of a valid commons category

Return type:

str

Find CommonsCat template on page.

Return type:

tuple of (<templatename>, <target>, <linktext>, <note>)

static skipPage(page)[source]#

Determine if the page should be skipped.

Return type:

bool

skip_page(page)[source]#

Skip category redirects.

treat_page()[source]#

Add CommonsCat template to page.

Take a page. Go to all the interwiki page looking for a commonscat template. When all the interwiki’s links are checked and a proper category is found add it to the page.

Return type:

None

update_options: dict[str, Any] = {'summary': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_disambigs: bool | None = False#

Attribute to determine whether to use disambiguation pages. Set it to True to use disambigs only, set it to False to skip disambigs. If None both are processed.

Added in version 7.2.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.commonscat.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

coordinate_import script#

Coordinate importing script

Usage:

python pwb.py coordinate_import -site:wikipedia:en -cat:Category:Coordinates_not_on_Wikidata

This will work on all pages in the category “coordinates not on Wikidata” and will import the coordinates on these pages to Wikidata.

The data from the “GeoData” extension (https://www.mediawiki.org/wiki/Extension:GeoData) is used so that extension has to be setup properly. You can look at the [[Special:Nearby]] page on your local Wiki to see if it’s populated.

You can use any typical pagegenerator to provide with a list of pages:

python pwb.py coordinate_import -lang:it -family:wikipedia -namespace:0 -transcludes:Infobox_stazione_ferroviaria

You can also run over a set of items on the repo without coordinates and try to import them from any connected page. To do this, you have to explicitly provide the repo as the site using -site argument.

Example

python pwb.py coordinate_import -site:wikidata:wikidata -namespace:0 -querypage:Deadendpages

The following command line parameters are supported:

-always

If used, the bot won’t ask if it should add the specified text.

-create

Create items for pages without one.

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

This script supports use of pagegenerators arguments.

class scripts.coordinate_import.CoordImportRobot(**kwargs)[source]#

Bases: ConfigParserBot, WikidataBot

A bot to import coordinates to Wikidata.

Changed in version 7.0: CoordImportRobot is a ConfigParserBot

has_coord_qualifier(claims)[source]#

Check if self.prop is used as property for a qualifier.

Parameters:

claims (dict) – the Wikibase claims to check in

Returns:

the first property for which self.prop is used as qualifier, or None if any

Return type:

str | None

item_has_coordinates(item)[source]#

Check if the item has coordinates.

Returns:

whether the item has coordinates

Return type:

bool

treat_page_and_item(page, item)[source]#

Treat page/item.

Return type:

None

try_import_coordinates_from_page(page, item)[source]#

Try import coordinate from the given page to the given item.

Returns:

whether any coordinates were found and the import was successful

Return type:

bool

use_from_page = None#
scripts.coordinate_import.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line argument

Return type:

None

cosmetic_changes script#

This module can do slight modifications to tidy a wiki page’s source code

The changes are not supposed to change the look of the rendered wiki page.

The following parameters are supported:

-always

Don’t prompt you for each replacement. Warning (see below) has not to be confirmed. ATTENTION: Use this with care!

-async

Put page on queue to be saved to wiki asynchronously.

-summary:XYZ

Set the summary message text for the edit to XYZ, bypassing the predefined message texts with original and replacements inserted.

-ignore:

Ignores if an error occurred and either skips the page or only that method. It can be set to: all - does not ignore errors match - ignores ISBN related errors (default) method - ignores fixing method errors page - ignores page related errors

The following generators and filters are supported:

This script supports use of pagegenerators arguments.

ATTENTION: You can run this script as a stand-alone for testing purposes. However, the changes that are made are only minor, and other users might get angry if you fill the version histories and watchlists with such irrelevant changes. Some wikis prohibit stand-alone running.

For further information see pywikibot/cosmetic_changes.py

class scripts.cosmetic_changes.CosmeticChangesBot(**kwargs)[source]#

Bases: AutomaticTWSummaryBot, ExistingPageBot

Cosmetic changes bot.

Parameters:

kwargs (Any) – bot options

Keyword Arguments:

generator – a generator processed by run() method

summary_key: str | None = 'cosmetic_changes-standalone'#

Must be defined in subclasses.

treat_page()[source]#

Treat page with the cosmetic toolkit.

Changed in version 7.0: skip if InvalidPageError is raised

Return type:

None

update_options: dict[str, Any] = {'async': False, 'ignore': CANCEL.MATCH, 'summary': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.cosmetic_changes.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

create_isbn_edition script#

Pywikibot client to load ISBN linked data into Wikidata

Pywikibot script to get ISBN data from a digital library, and create or amend the related Wikidata item for edition (with the P212, ISBN number as unique external ID).

Use digital libraries to get ISBN data in JSON format, and integrate the results into Wikidata.

Note

ISBN data should only be used for editions, and not for written works.

Then the resulting item number can be used e.g. to generate Wikipedia references using template Cite_Q.

Parameters:

All parameters are optional:

*P1:*        digital library (default wiki "-")

    bnf      Catalogue General (France)
    bol      Bol.com
    dnb      Deutsche National Library
    goob     Google Books
    kb       National Library of the Netherlands
    loc      Library of Congress US
    mcues    Ministerio de Cultura (Spain)
    openl    OpenLibrary.org
    porbase  urn.porbase.org Portugal
    sbn      Servizio Bibliotecario Nazionale (Italy)
    wiki     wikipedia.org
    worldcat WorldCat (wc)

*P2:*        ISO 639-1 language code. Default LANG; e.g. en, nl,
             fr, de, es, it, etc.

*P3 P4...:*  P/Q pairs to add additional claims (repeated) e.g.
             P921 Q107643461 (main subject: database management
             linked to P2163, Fast ID 888037)

*stdin:*     List of ISBN numbers (International standard book
             number, version 10 or 13). Free text (e.g.
             Wikipedia references list, or publication list) is
             accepted. Identification is done via an ISBN regex
             expression.
Functionality:
  • Both ISBN-10 and ISBN-13 numbers are accepted as input.

  • Only ISBN-13 numbers are stored. ISBN-10 numbers are only used for identification purposes; they are not stored.

  • The ISBN number is used as a primary key; no two items can have the same P212 ISBN number. The item update is not performed when there is no unique match. Only editions are updated or created.

  • Individual statements are added or merged incrementally; existing data is not overwritten.

  • Authors and publishers are searched to get their item number; unknown of ambiguous items are skipped.

  • Book title and subtitle are separated with either ‘.’, ‘:’, or ‘-’ in that order.

  • Detect author, illustrator, writer preface, afterwork instances.

  • Add profession “author” to individual authors.

  • This script can be run incrementally.

Examples:

Default library (Google Books), language (LANG), no additional statements:

pwb create_isbn_edition.py 9789042925564

Wikimedia, language English, main subject: database management:

pwb create_isbn_edition.py wiki en P921 Q107643461 978-0-596-10089-6

Data quality:
  • ISBN numbers (P212) are only assigned to editions.

  • A written work should not have an ISBN number (P212).

  • For targets of P629 (edition of) amend “is an Q47461344 (written work) instance” and “inverse P747 (work has edition)” statements

  • Use https://query.wikidata.org/querybuilder/ to identify P212 duplicates. Merge duplicate items before running the script again.

  • The following properties should only be used for written works, not for editions:

    • P5331: OCLC work ID (editions should only have P243)

    • P8383: Goodreads-identificatiecode for work (editions should only have P2969)

Return status:

The following status codes are returned to the shell:

3   Invalid or missing parameter
4   Library not installed
12  Item does not exist
20  Network error
Standard ISBN properties for editions:
P31:Q3331189:  instance of edition (mandatory statement)
P50:           author
P123:          publisher
P212:          canonical ISBN number (with dashes; searchable
               via Wikidata Query)
P407:          language of work (Qnumber linked to ISO 639-1
               language code)
P577:          date of publication (year)
P1476:         book title
P1680:         subtitle
Other ISBN properties:
P921:   main subject (inverse lookup from external Fast ID P2163)
P629:   work for edition
P747:   edition of work
Qualifiers:
P248:   Source
P813:   Retrieval date
P1545:  (author) sequence number
External identifiers:
P243:   OCLC ID
P1036:  Dewey Decimal Classification
P2163:  Fast ID (inverse lookup via Wikidata Query)
        -> P921: main subject

(not implemented)
P2969:  Goodreads-identificatiecode

(only for written works)
P5331:  OCLC work ID (editions should only have P243)

(not implemented)
P8383:  Goodreads-identificatiecode for work
        (editions should only have P2969)
P213:   ISNI ID
P496:   ORCID ID
P675:   Google Books-identificatiecode
Unavailable properties from digital library:
(not implemented by isbnlib)
P98:    Editor
P110:   Illustrator/photographer
P291:   place of publication
P1104:  number of pages
?:      edition format (hardcover, paperback)
Author:

Geert Van Pamel (User:Geertivp), MIT License, 2022-08-04,

Prerequisites:

In addition to Pywikibot the following ISBN lib package is mandatory; install it with:

pip install isbnlib

The following ISBN lib package are optional; install them with:

pip install isbnlib-bnf
pip install isbnlib-bol
pip install isbnlib-dnb
pip install isbnlib-kb
pip install isbnlib-loc
pip install isbnlib-worldcat2
Restrictions:
  • Better use the ISO 639-1 language code parameter as a default. The language code is not always available from the digital library; therefore we need a default.

  • Publisher unknown: * Missing P31:Q2085381 statement, missing subclass in script * Missing alias * Create publisher

  • Unknown author: create author as a person

Known Problems:
  • Unknown ISBN, e.g. 9789400012820

  • If there is no ISBN data available for an edition either returns no output (goob = Google Books), or an error message (wiki, openl). The script is taking care of both. Try another library instance.

  • Only 6 specific ISBN attributes are listed by the webservice(s), missing are e.g.: place of publication, number of pages

  • Some digital libraries have more registrations than others.

  • Some digital libraries have data quality problems.

  • Not all ISBN atttributes have data values (authors, publisher, date of publication), language can be missing at the digital library.

  • How to add still more digital libraries?

    • This would require an additional isbnlib module

    • Does the KBR has a public ISBN service (Koninklijke Bibliotheek van België)?

  • The script uses multiple webservice calls; script might take time, but it is automated.

  • Need to manually amend ISBN items that have no author, publisher, or other required data * You could use another digital library * Which other services to use?

  • BibTex service is currently unavailable

  • Filter for work properties: https://www.wikidata.org/wiki/Q63413107

    ['9781282557246', '9786612557248', '9781847196057', '9781847196040']
    P5331: OCLC identification code for work 793965595; should only
           have P243)
    P8383: Goodreads identification code for work 13957943; should
           only have P2969)
    
  • ERROR: an HTTP error has ocurred e.g. (503) Service Unavailable

  • error: externally-managed-environment

    isbnlib-kb cannot be installed via pip install command. It raises error: externally-managed-environment because this environment is externally managed.

    To install Python packages system-wide, try apt install python3-xyz, where xyz is the package you are trying to install.

    If you wish to install a non-Debian-packaged Python package, create a virtual environment using python3 -m venv path/to/venv. Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make sure you have python3-full installed.

    If you wish to install a non-Debian packaged Python application, it may be easiest to use pipx install xyz, which will manage a virtual environment for you. Make sure you have pipx installed.

    See also

    See Python Library venv for more information about virtual environments.

    Note

    If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages to pip.

    Hint

    See PEP 668 for the detailed specification.

    You need to install a local python environment:

    sudo -s
    apt install python3-full
    python3 -m venv /opt/python
    /opt/python/bin/pip install pywikibot
    /opt/python/bin/pip install isbnlib-kb
    /opt/python/bin/python ../userscripts/create_isbn_edition.py kb
    
Environment:

The python script can run on the following platforms:

  • Linux client

  • Google Chromebook (Linux container)

  • Toolforge Portal

  • PAWS

LANG: default ISO 639-1 language code

Applications:

Generate a book reference. Example for wp.en only:

{{Cite Q|Q63413107}}

Use the Visual editor reference with Qnumber.

Wikidata Query:
Related projects:
Other systems:
Documentation:

Added in version 7.7.

Changed in version 9.6: several implementation improvements

scripts.create_isbn_edition.add_claims(isbn_data)[source]#

Inspect isbn_data and add claims if possible.

Parameters:

isbn_data (dict[str, Any])

Return type:

int

scripts.create_isbn_edition.amend_isbn_edition(isbn_number)[source]#

Amend ISBN registration in Wikidata.

It is registering the ISBN-13 data via P212, depending on the data obtained from the digital library.

Parameters:

isbn_number (str) – ISBN number (10 or 13 digits with optional hyphens)

Returns:

Return status which is:

  • 0: Amended (found or created)

  • 1: Not found

  • 2: Ambiguous

  • 3: Other error

Return type:

int

scripts.create_isbn_edition.fatal_error(errcode, errtext)[source]#

A fatal error has occurred.

Print the error message, and exit with an error code.

scripts.create_isbn_edition.get_canon_name(baselabel)[source]#

Get standardised name.

Parameters:

baselabel (str) – input label

Return type:

str

scripts.create_isbn_edition.get_item_header(header)[source]#

Get the item header (label, description, alias in user language).

Parameters:

header (str | list[str]) – item label, description, or alias language list

Returns:

label, description, or alias in the first available language

Return type:

str

scripts.create_isbn_edition.get_item_header_lang(header, lang)[source]#

Get the item header (label, description, alias in user language).

Parameters:
  • header (str | list[str]) – item label, description, or alias language list

  • lang (str) – language code

Returns:

label, description, or alias in the first available language

Return type:

str

scripts.create_isbn_edition.get_item_list(item_name, instance_id)[source]#

Get list of items by name, belonging to an instance (list).

Normally there should have one single best match. The caller should take care of homonyms.

Parameters:
  • item_name (str) – Item name (case sensitive)

  • instance_id (str | set[str] | list[str]) – Instance ID

Returns:

Set of items

Return type:

set[str]

scripts.create_isbn_edition.get_item_page(qnumber)[source]#

Get the item; handle redirects.

Return type:

ItemPage

scripts.create_isbn_edition.get_item_with_prop_value(prop, propval)[source]#

Get list of items that have a property/value statement.

See also

API:Search

Parameters:
  • prop (str) – Property ID

  • propval (str) – Property value

Returns:

List of items (Q-numbers)

Return type:

set[str]

scripts.create_isbn_edition.get_language_preferences()[source]#

Get the list of preferred languages.

Uses environment variables LANG, LC_ALL, and LANGUAGE, ‘en’ is always appended.

See also

  • :wiki:`List_of_ISO_639-1_codes

Return:

List of ISO 639-1 language codes with strings delimited by ‘:’.

Return type:

list[str]

scripts.create_isbn_edition.is_in_value_list(statement_list, valuelist)[source]#

Verify if statement list contains at least one value from the valuelist.

Parameters:
  • statement_list (list) – Statement list of values

  • valuelist (list[str]) – List of values

Returns:

True when match, False otherwise

Return type:

bool

scripts.create_isbn_edition.item_has_label(item, label)[source]#

Verify if the item has a label.

Parameters:
  • item – Item

  • label (str) – Item label

Returns:

Matching string

Return type:

str

scripts.create_isbn_edition.item_is_in_list(statement_list, itemlist)[source]#

Verify if statement list contains at least one item from the itemlist.

param statement_list: Statement list param itemlist: List of values (string) return: Matching or empty string

Parameters:
  • statement_list (list)

  • itemlist (list[str])

Return type:

str

scripts.create_isbn_edition.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Algorithm:

Get parameters from shell
Validate parameters
Get ISBN data
Convert ISBN data:
    Reverse names when Lastname, Firstname
Get additional data
Register ISBN data into Wikidata:
    Add source reference when creating the item:
        (digital library instance, retrieval date)
    Create or amend items or claims:
        Number the authors in order of appearence
        Check data consistency
        Correct data quality problems:
            OCLC Work ID for Written work
            Written work instance statement
            Inverse relationship written work -> edition
            Move/register OCLC work ID to/with written work
Manually corrections:
    Create missing (referenced) items
        (authors, publishers, written works, main subject/FAST ID)
    Resolve ambiguous values
Parameters:

args (str) – command line arguments

Return type:

None

scripts.create_isbn_edition.show_final_information(isbn_number)[source]#

Print additional information.

Get optional information.Could generate too many transactions errors; so the process might stop at the first error.

Parameters:

isbn_number (str)

Return type:

None

dataextend script#

Script to add properties, identifiers and sources to WikiBase items

Usage:

dataextend <item> [<property>[+*]] [args]

In the basic usage, where no property is specified, item is the Q-number of the item to work on.from html import unescape

If a property (P-number, or the special value ‘Wiki’ or ‘Data’) is specified, only the data from that identifier are added. With a ‘+’ after it, work starts on that identifier, then goes on to identifiers after that (including new identifiers added while working on those identifiers). With a ‘*’ after it, the identifier itself is skipped, but those coming after it (not those coming before it) are included.

The following parameters are supported:

-always

If this is supplied, the bot will not ask for permission after each external link has been handled.

-showonly

Only show claims for a given ItemPage. Don’t try to add any properties

The bot will load the corresponding pages for these identifiers, and try to the meaning of that string for the specified type of thing (for example ‘city’ or ‘gender’). If you want to use it, but not save it (which can happen if the string specifies a certain value now, but might show another value elsewhere, or if it is so specific that you’re pretty sure it won’t occur a second time), you can provide the Q-number with X rather than Q. If you do not want to use the string, you can just hit enter, or give the special value ‘XXX’ which means that it will be skipped in each subsequent run as well.

After an identifier has been worked on, there might be a list of names that has been found, in lc:name format, where lc is a language code. You can accept all suggested names (answer Y), none (answer N) or ask to get asked for each name separately (answer S), the latter being the default if you do not fill in anything.

After all identifiers have been worked on, possible descriptions in various languages are presented, and you get to choose one. The default is here 0, which always is the current description for that language. Finally, for a number of identifiers text is shown that usually gives parts of the description that are hard to parse automatically, so you can see if there any additional pieces of data that can be added.

It is advisable to (re)load the item page that the bot has been working on in the browser afterward, to correct any mistakes it has made, or cases where a more precise and less precise value have both been included.

Added in version 7.2.

Deprecated since version 9.6: will be removed with Pywikibot 10.

class scripts.dataextend.AKLAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AbartAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagedescriptions(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AcademiaeGroninganaeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

getentry(naam, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AcademicTreeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findadvisors(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

finddocstudents(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findwebsite(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AcademieFrancaiseAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findawards(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AcademieRouenAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AccademiaCruscaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AdultFilmAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findethnicities(html)[source]#
Parameters:

html (str)

findeyecolor(html)[source]#
Parameters:

html (str)

findfloruitstart(html)[source]#
Parameters:

html (str)

findhaircolor(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AgorhaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AinmAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

getvalue(field, html, category=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AlkindiAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstnames(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

instanceof(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AlvinAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AmericanArtAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AmericanBiographyAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.Analyzer(ident, data=None, item=None, bot=None)[source]#

Bases: object

SCRIPTRE = re.compile('(?s)<script.*?</script>', re.DOTALL)#
TAGRE = re.compile('<[^<>]*>')#
property alturl#
static commastrip(term)[source]#
property extraurls: list[str]#
findallbyre(regex, html, dtype=None, skips=None, alt=None)[source]#
Return type:

list[str]

findbyre(regex, html, dtype=None, skips=None, alt=None)[source]#
Return type:

str

findclaims()[source]#
Return type:

list[tuple[str, str, Analyzer | None]]

finddefaultmixedrefs(html, includesocial=True)[source]#
finddescriptions(html)[source]#
Parameters:

html (str)

findwikipedianames(html)[source]#
Parameters:

html (str)

getdata(dtype, text, ask=True)[source]#
getdescriptions()[source]#
getlanguage(code)[source]#
getnames()[source]#
longtext()[source]#
prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

static singlespace(text)[source]#
property url#
class scripts.dataextend.AngelicumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

instanceof(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AnimeConsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finalscript(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArchivesDuSpectacleAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArmbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findassociations(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtHistoriansAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtUkAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtcyclopediaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmovements(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArticArtistAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtistsCanadaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ArtnetAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AthenaeumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AustrianBiographicalAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AuteursLuxembourgAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.AutoresArAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BabelioAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BacklinkAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findadvisors(html)[source]#
Parameters:

html (str)

findawards(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddocstudents(html)[source]#
Parameters:

html (str)

findkins(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findparticipantins(html)[source]#
Parameters:

html (str)

findpartners(html)[source]#
Parameters:

html (str)

findpartofs(html)[source]#
Parameters:

html (str)

findparts(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findsiblings(html)[source]#
Parameters:

html (str)

findsources(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

findstudents(html)[source]#
Parameters:

html (str)

findteachers(html)[source]#
Parameters:

html (str)

getrelations(relation, html)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BandcampAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BdelAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findparties(html)[source]#
Parameters:

html (str)

findranks(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BdfaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findheight(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findsportteams(html)[source]#
Parameters:

html (str)

findteampositions(html)[source]#
Parameters:

html (str)

findweight(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BedethequeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findpseudonyms(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BelgianPhotographerAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None, alt=None)[source]#
getvalues(field, html, dtype=None, alt=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BenezitAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findisntanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BenezitUrlAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findisntanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

indinstanceof(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BewebAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findviaf(html)[source]#
Parameters:

html (str)

findwebpages(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BibliotecaNacionalAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BibsysAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findviaf(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BiografischPortaalAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findsources(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BiuSanteAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BnaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

instanceof(html)[source]#
Parameters:

html (str)

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BnbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findviaf(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BneAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findviaf(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BnfAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findcountry(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BookTradeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddeathdate(html)[source]#
Parameters:

html (str)

findfloruit(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BritishExecutionsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findcausedeath(html)[source]#
Parameters:

html (str)

findcrimes(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmannerdeath(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BritishMuseumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddetails(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.BrooklynMuseumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CageMatchAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findheight(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

findweights(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CanadianBiographyAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findspouses(html)[source]#
Parameters:

html (str)

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CanticAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: MarcAnalyzer

finddescriptions(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CbdbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnationalities(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CcedAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findreligion(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CerlAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None, link=False)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CesarAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.Chess365Analyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findchesstitle(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findparticipations(html)[source]#
Parameters:

html (str)

findsportcountries(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CinemagiaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CiniiAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ClaraAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CommonwealthGamesAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findparticipations(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ConorAlAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: ConorAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ConorAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ConorBgAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: ConorAnalyzer

finddescription(html)[source]#
Parameters:

html (str)

findfirstnames(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastnames(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ConorSiAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: ConorAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ConorSrAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: ConorAnalyzer

finddescription(html)[source]#
Parameters:

html (str)

findfirstnames(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastnames(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CsfdAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CthsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.CwaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findawards(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.DaaoAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.DacsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.DanskefilmAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findawards(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findburialplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.DataExtendBot(**kwargs)[source]#

Bases: SingleSiteBot

MONTHNUMBER = {'01': 1, '02': 2, '03': 3, '04': 4, '05': 5, '06': 6, '07': 7, '08': 8, '09': 9, '1': 1, '10': 10, '11': 11, '12': 12, '2': 2, '3': 3, '4': 4, '5': 5, '6': 6, '7': 7, '8': 8, '9': 9, 'abril': 4, 'ag': 8, 'ago': 8, 'agost': 8, 'agosto': 8, 'aibrean': 4, 'aibreán': 4, 'aou': 8, 'aout': 8, 'aoû': 8, 'août': 8, 'apr': 4, 'april': 4, 'aprile': 4, 'aug': 8, 'august': 8, 'augustus': 8, 'avr': 4, 'avril': 4, 'bealtaine': 5, 'czerwca': 6, 'czerwiec': 6, 'dec': 12, 'december': 12, 'deireadh fomhair': 10, 'deireadh fómhair': 10, 'desembre': 12, 'dez': 12, 'dezember': 12, 'dic': 12, 'dicembre': 12, 'diciembre': 12, 'déc': 12, 'décembre': 12, 'eanair': 1, 'eanáir': 1, 'enero': 1, 'f\\xe9vrier': 2, 'feabhra': 2, 'feb': 2, 'febb': 2, 'febbr': 2, 'febbraio': 2, 'febr': 2, 'febrer': 2, 'febrero': 2, 'februar': 2, 'februari': 2, 'february': 2, 'fev': 2, 'fevrier': 2, 'fév': 2, 'février': 2, 'gen': 1, 'gener': 1, 'genn': 1, 'gennaio': 1, 'giu': 6, 'giugno': 6, 'grudnia': 12, 'grudzień': 12, 'i': 1, 'ii': 2, 'iii': 3, 'iuil': 7, 'iv': 4, 'ix': 9, 'iúil': 7, 'jan': 1, 'januar': 1, 'januari': 1, 'january': 1, 'janvier': 1, 'juillet': 7, 'juin': 6, 'jul': 7, 'juli': 7, 'julio': 7, 'juliol': 7, 'july': 7, 'jun': 6, 'june': 6, 'juni': 6, 'junio': 6, 'juny': 6, 'jänner': 1, 'kwiecień': 4, 'kwietnia': 4, 'lipca': 7, 'lipiec': 7, 'listopad': 11, 'listopada': 11, 'lug': 7, 'lugl': 7, 'luglio': 7, 'lunasa': 8, 'lutego': 2, 'luty': 2, 'lúnasa': 8, 'm\\xe4rz': 3, 'maa': 3, 'maart': 3, 'mag': 5, 'magg': 5, 'maggio': 5, 'mai': 5, 'maig': 5, 'maj': 5, 'maja': 5, 'mar': 3, 'marca': 3, 'march': 3, 'mars': 3, 'marta': 3, 'marz': 3, 'marzec': 3, 'marzo': 3, 'març': 3, 'may': 5, 'mayo': 5, 'mean fomhair': 9, 'mei': 5, 'meitheamh': 6, 'meán fómhair': 9, 'mrt': 3, 'márta': 3, 'märz': 3, 'nollaig': 12, 'nov': 11, 'november': 11, 'novembre': 11, 'noviembre': 11, 'oct': 10, 'october': 10, 'octobre': 10, 'octubre': 10, 'okt': 10, 'oktober': 10, 'ott': 10, 'otto': 10, 'ottobre': 10, 'październik': 10, 'października': 10, 'samhain': 11, 'sep': 9, 'sept': 9, 'september': 9, 'septembre': 9, 'septiembre': 9, 'set': 9, 'setembre': 9, 'sett': 9, 'settembre': 9, 'sierpień': 8, 'sierpnia': 8, 'styczeń': 1, 'stycznia': 1, 'v': 5, 'vi': 6, 'vii': 7, 'viii': 8, 'wrzesień': 9, 'września': 9, 'x': 10, 'xi': 11, 'xii': 12}#
PQRE = re.compile('[PQ]\\d+')#
QRE = re.compile('Q\\d+')#
QUANTITYTYPE = {'centimeter': 'Q174728', 'centimetre': 'Q174728', 'cm': 'Q174728', 'feet': 'Q3710', 'foot': 'Q3710', 'ft': 'Q3710', 'kg': 'Q11570', 'kilogram': 'Q11570', 'kilometer': 'Q828224', 'kilometre': 'Q828224', 'km': 'Q828224', 'lb': 'Q100995', 'lbs': 'Q100995', 'm': 'Q11573', 'meter': 'Q11573', 'meters': 'Q11573', 'metre': 'Q11573', 'metres': 'Q11573', 'mi': 'Q253276', 'mile': 'Q253276', 'min': 'Q7727', 'minute': 'Q7727', 'minuten': 'Q7727', 'minutes': 'Q7727', 'pond': 'Q100995', 's': 'Q11574', 'second': 'Q11574', 'м': 'Q11573'}#
createdateclaim(text)[source]#
createquantityclaim(text)[source]#
static definedescription(language, existingdescription, suggestions)[source]#
definelabels(existinglabels, existingaliases, newnames)[source]#
getlocnumber(value, claims)[source]#
isclaim(value, claim)[source]#
isinclaims(value, claims)[source]#
label(title)[source]#
loaddata()[source]#

Read data from files.

page(title)[source]#

Dispatch title and return the appropriate Page object.

showclaims(claims)[source]#
static showtime(time)[source]#
teardown()[source]#

Save data to files.

Return type:

None

treat(item)[source]#

Process the ItemPage.

Return type:

None

update_options: dict[str, Any] = {'restrict': '', 'showonly': False}#

The Bot.

class scripts.dataextend.DblpAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findawards(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.DbnlAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findburialdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findsources(html)[source]#
Parameters:

html (str)

findwebpages(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.DelargeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.DeutscheBiographieAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfloruit(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findreligions(html)[source]#
Parameters:

html (str)

findwebpages(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.DialnetAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findemployers(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.DiscogsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findinstruments(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findparts(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findsiblings(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.EcarticoAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbaptismdate(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findreligions(html)[source]#
Parameters:

html (str)

findsources(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

findstudents(html)[source]#
Parameters:

html (str)

findteachers(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.Edit16Analyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupation(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.EmloAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findsiblings(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.EnlightenmentAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinception(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.EntomologistAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.EoasAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

isperson(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.FantasticFictionAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findawards(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findnominations(html)[source]#
Parameters:

html (str)

findsiblings(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.FastAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.FideAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findchesstitle(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findsportcountries(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.FifaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.FilmportalAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findcast(html)[source]#
Parameters:

html (str)

findcomposers(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

finddirectorsphotography(html)[source]#
Parameters:

html (str)

finddurations(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmoviedirectors(html)[source]#
Parameters:

html (str)

findmovieeditors(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findorigcountry(html)[source]#
Parameters:

html (str)

findprodcoms(html)[source]#
Parameters:

html (str)

findproducers(html)[source]#
Parameters:

html (str)

findpubdate(html)[source]#
Parameters:

html (str)

findscreenwriters(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.FindGraveAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findburialplace(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findsiblings(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

getvalue(name, html, category=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.FoihAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.FotomuseumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
instanceof(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.GameFaqsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddevelopers(html)[source]#
Parameters:

html (str)

findfranchises(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findplatforms(html)[source]#
Parameters:

html (str)

findpubdate(html)[source]#
Parameters:

html (str)

findpublishers(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.GenealogicsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbaptismdate(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findburialplace(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findsiblings(html)[source]#
Parameters:

html (str)

findsources(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

getallvalues(field, html, dtype=None)[source]#
getfullvalue(field, html, dtype=None)[source]#
getsecondvalue(field, html, dtype=None)[source]#
getvalue(field, html, dtype=None)[source]#
prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.GeprisAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddescription(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.GndAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

findcountries(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findfloruit(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findinstruments(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findpseudonyms(html)[source]#
Parameters:

html (str)

findrelorder(html)[source]#
Parameters:

html (str)

findsiblings(html)[source]#
Parameters:

html (str)

findsources(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

findvoice(html)[source]#
Parameters:

html (str)

findwebpages(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.GnisAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findadminloc(html)[source]#
Parameters:

html (str)

findcoords(html)[source]#
Parameters:

html (str)

findcountry(html)[source]#
Parameters:

html (str)

findelevations(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.GoodreadsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.GtaaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findlanguagedescriptions(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.GuggenheimAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findawards(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findwebsite(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.HalensisAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findreligion(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.HdsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.HkmdbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.IWDAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.IaafAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

instanceof(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.IasAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findawards(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findschools(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getsubvalues(field, secondfield, html, dtype=None, alt=None)[source]#
getvalue(field, html, dtype=None, alt=None)[source]#
getvalues(field, html, dtype=None, alt=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.IbdbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findawards(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnominations(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findpartners(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.IgdbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddescriptions(html)[source]#
Parameters:

html (str)

finddevelopers(html)[source]#
Parameters:

html (str)

findengines(html)[source]#
Parameters:

html (str)

findfranchises(html)[source]#
Parameters:

html (str)

findgamemodes(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findpubdates(html)[source]#
Parameters:

html (str)

findpublishers(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None, alt=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ImdbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findcast(html)[source]#
Parameters:

html (str)

findcolors(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

finddurations(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmoviedirectors(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findorigcountry(html)[source]#
Parameters:

html (str)

findoriglanguages(html)[source]#
Parameters:

html (str)

findprodcoms(html)[source]#
Parameters:

html (str)

findpubdate(html)[source]#
Parameters:

html (str)

findscreenwriters(html)[source]#
Parameters:

html (str)

property isfilm#
property isperson#
setup()[source]#

To be used for putting data into subclasses.

property url#
class scripts.dataextend.ImslpAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.InternetBookAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findresidences(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.IntraTextAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.InvaluableAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.IpniAuthorsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findsources(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.IsfdbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.IsniAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

property url#
class scripts.dataextend.ItalianPeopleAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ItauAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

property isperson#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.JukeboxAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findinstanceof(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.JwaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.KinopoiskAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findheight(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.KnawAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.KunstaspekteAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

description(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.KunstenpuntAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

getvalue(field, html, category=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.KunstindeksAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.LbtAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.LcAuthAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findrelorder(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

property isperson#
setup()[source]#

To be used for putting data into subclasses.

property url#
class scripts.dataextend.LdifAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findcast(html)[source]#
Parameters:

html (str)

findcomposers(html)[source]#
Parameters:

html (str)

finddirectorsphotography(html)[source]#
Parameters:

html (str)

finddurations(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmoviedirectors(html)[source]#
Parameters:

html (str)

findmovieeditors(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findorigcountries(html)[source]#
Parameters:

html (str)

findprodcoms(html)[source]#
Parameters:

html (str)

findproducers(html)[source]#
Parameters:

html (str)

findpubdate(html)[source]#
Parameters:

html (str)

findscreenwriters(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.LeonoreAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.LeopoldinaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

findworklocations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.LetterboxdAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findcast(html)[source]#
Parameters:

html (str)

findcomposers(html)[source]#
Parameters:

html (str)

findcostumedesigners(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

finddirectorsphotography(html)[source]#
Parameters:

html (str)

finddurations(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmakeupartists(html)[source]#
Parameters:

html (str)

findmoviedirectors(html)[source]#
Parameters:

html (str)

findmovieeditors(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findorigcountries(html)[source]#
Parameters:

html (str)

findoriglanguages(html)[source]#
Parameters:

html (str)

findprodcoms(html)[source]#
Parameters:

html (str)

findproducers(html)[source]#
Parameters:

html (str)

findproductiondesigners(html)[source]#
Parameters:

html (str)

findscreenwriters(html)[source]#
Parameters:

html (str)

findsounddesigners(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.LibrariesAustraliaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.LibraryKoreaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.LnbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findviaf(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None, alt=None)[source]#
getvalues(field, html, dtype=None, alt=None)[source]#
Return type:

list[str]

instanceof(html)[source]#
Parameters:

html (str)

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.LuminousAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.MarcAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findwebpages(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None, alt=None)[source]#
getvalues(field, html, dtype=None, alt=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.MathGenAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findadvisors(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

finddocstudents(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findschools(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.MathOlympAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findparticipations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.MetallumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findfloruit(html)[source]#
Parameters:

html (str)

findformationlocation(html)[source]#
Parameters:

html (str)

findgenre(html)[source]#
Parameters:

html (str)

findinception(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlabels(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findorigcountry(html)[source]#
Parameters:

html (str)

findparts(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None, alt=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.MunksRollAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.MunzingerAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.MusicBrainzAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

finddissolution(html)[source]#
Parameters:

html (str)

findfacebook(html)[source]#
Parameters:

html (str)

findformationlocation(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinception(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findorigcountry(html)[source]#
Parameters:

html (str)

findtwitter(html)[source]#
Parameters:

html (str)

findviaf(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.MutualAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.MuziekwebAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddescription(html)[source]#
Parameters:

html (str)

findinstruments(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

property name#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NatGeoCanadaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findethnicity(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NationalArchivesAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NationalTrustAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddatesection(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NelsonAtkinsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NgaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NgvAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NilfAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findawards(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NkcrAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findrelorder(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NlpAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NndbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findcausedeath(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findethnicity(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findmannerdeath(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findorientation(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findsiblings(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

getvalue(field, dtype=None, bold=True)[source]#
getvalues(field, dtype=None, bold=True)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NobelPrizeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NoosfereAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findawards(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NpgPersonAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NtaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findviaf(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.NumbersAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.OdisAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddescription(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findpseudonyms(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

findwebpages(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.OfdbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findcast(html)[source]#
Parameters:

html (str)

findcomposers(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

finddirectorsphotography(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findmoviedirectors(html)[source]#
Parameters:

html (str)

findmovieeditors(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findorigcountry(html)[source]#
Parameters:

html (str)

findpubdate(html)[source]#
Parameters:

html (str)

findscreenwriters(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.OmdbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.OnlineBooksAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.OnstageAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findfloruit(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.OpenLibraryAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.OperoneAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.OrcidAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddescriptions(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.OrsayAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findisinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.OxfordAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.OxfordMedievalAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ParlementPolitiekAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findparties(html)[source]#
Parameters:

html (str)

findpolitical(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

getsection(field, html, ntype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PatrinumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findwebpages(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PeakbaggerAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findadminloc(html)[source]#
Parameters:

html (str)

findcoords(html)[source]#
Parameters:

html (str)

findcountry(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findelevations(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisolations(html)[source]#
Parameters:

html (str)

findmountainrange(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findprominences(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PeintresBelgesAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PeopleAustraliaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PerseeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PhotographersAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findincollections(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PlarrAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PlwabnAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findsources(html)[source]#
Parameters:

html (str)

getvalue(field, letter, html, dtype=None)[source]#
getvalues(field, letter, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PoetsWritersAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findethnicities(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findreligions(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

getvalue(field, html, stype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PornhubAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findethnicities(html)[source]#
Parameters:

html (str)

findeyecolor(html)[source]#
Parameters:

html (str)

findfloruit(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findhaircolor(html)[source]#
Parameters:

html (str)

findheight(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnotableworks(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findresidence(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

findweights(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PrdlAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findreligions(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PssBuildingAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findaddress(html)[source]#
Parameters:

html (str)

findadminloc(html)[source]#
Parameters:

html (str)

findarchitects(html)[source]#
Parameters:

html (str)

findcoords(html)[source]#
Parameters:

html (str)

findcountry(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findheights(html)[source]#
Parameters:

html (str)

findinception(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PtbnpAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PublonsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddescription(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnotableworks(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findwebpages(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.PuscAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

getcode(code, html)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.Quasiclaim(title)[source]#

Bases: object

getTarget()[source]#

Return the target value of this QuasiClaim.

property type#
class scripts.dataextend.RedTubeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findethnicities(html)[source]#
Parameters:

html (str)

findfloruit(html)[source]#
Parameters:

html (str)

findhaircolor(html)[source]#
Parameters:

html (str)

findheight(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

findweights(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.RepertoriumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findsources(html)[source]#
Parameters:

html (str)

findtitles(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ResearchGateAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddescriptions(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.RismAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfloruit(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None, splitter=',')[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.RkdArtistsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findfloruit(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinfluences(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findmovements(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findsiblings(html)[source]#
Parameters:

html (str)

findstudents(html)[source]#
Parameters:

html (str)

findteachers(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.RodovidAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfamily(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findreligions(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

findtitles(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.RollDaBeatsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findgenres(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findparts(html)[source]#
Parameters:

html (str)

findresidence(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.RostochiensiumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findreligion(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.RunebergAuthorAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SFAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SandrartAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SbnAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findrelorder(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ScopusAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findemployers(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findworkfields(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ScottishArchitectsAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SelibrAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findviaf(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SikartAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findchoriginplaces(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SnacAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SnsaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findinstruments(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findvoices(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None, alt=None)[source]#
getvalues(field, html, dtype=None, alt=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SpanishBiographyAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SportsReferenceAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findheights(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findparticipations(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

findsportteams(html)[source]#
Parameters:

html (str)

findweights(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.StructuraeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findaddress(html)[source]#
Parameters:

html (str)

findcoords(html)[source]#
Parameters:

html (str)

findcountry(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfloorsabove(html)[source]#
Parameters:

html (str)

findheights(html)[source]#
Parameters:

html (str)

findinception(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlocation(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

finduse(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.StuttgartAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findspouse(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
getvalues(field, html, dtype=None)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SudocAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SurmanAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findreligion(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SvenskFilmAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.SynchronkarteiAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

description(html)[source]#
Parameters:

html (str)

findcast(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findpubdate(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.TgnAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findadminloc(html)[source]#
Parameters:

html (str)

findcoords(html)[source]#
Parameters:

html (str)

findcountry(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.TheatricaliaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.TmdbAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None, alt=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.TrackFieldAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

instanceof(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.TrackFieldFemaleAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: TrackFieldAnalyzer

findgender(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.TrackFieldMaleAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: TrackFieldAnalyzer

findgender(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.TradingCardAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findschools(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

findsportteams(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.TransfermarktAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findheight(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findsportteams(html)[source]#
Parameters:

html (str)

findteampositions(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.UBarcelonaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

finddescriptions(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findviaf(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.UGentAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findschools(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.UlanAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

country(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlocation(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findsiblings(html)[source]#
Parameters:

html (str)

findstudents(html)[source]#
Parameters:

html (str)

findteachers(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.UlsterAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.UnivieAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.UrlAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

class scripts.dataextend.UvaAlbumAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findadvisors(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findmajors(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None, alt=None)[source]#
getvalues(field, html, dtype=None, alt=None)[source]#
Return type:

list[str]

prepare(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ViafAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findgender(html)[source]#
Parameters:

html (str)

findisni(html)[source]#
Parameters:

html (str)

findlanguagedescriptions(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnationalities(html)[source]#
Parameters:

html (str)

findnotableworks(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

getid(name, html)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.WeberAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: UrlAnalyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

findxsdeathdate(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.WebumeniaAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findwebpages(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.WelshBioAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.WhoSampledAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findlongtext(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findparts(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.WhonameditAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.WhosWhoFranceAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationality(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findtwitter(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.WikiAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

static excludetemplate(text)[source]#
static excludetemplatelight(text)[source]#
findadvisors(html)[source]#
Parameters:

html (str)

findartdirectors(html)[source]#
Parameters:

html (str)

findawards(html)[source]#
Parameters:

html (str)

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findbloodtype(html)[source]#
Parameters:

html (str)

findbranches(html)[source]#
Parameters:

html (str)

findburialdate(html)[source]#
Parameters:

html (str)

findburialplace(html)[source]#
Parameters:

html (str)

findcast(html)[source]#
Parameters:

html (str)

findcausedeath(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

findcoatarms(html)[source]#
Parameters:

html (str)

findconflicts(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddegrees(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

finddirectorsphotography(html)[source]#
Parameters:

html (str)

finddissolution(html)[source]#
Parameters:

html (str)

finddistcoms(html)[source]#
Parameters:

html (str)

finddocstudents(html)[source]#
Parameters:

html (str)

findemployers(html)[source]#
Parameters:

html (str)

findethnicities(html)[source]#
Parameters:

html (str)

findfamily(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findfeastday(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findfloruit(html)[source]#
Parameters:

html (str)

findfloruitend(html)[source]#
Parameters:

html (str)

findfloruitstart(html)[source]#
Parameters:

html (str)

findformationlocation(html)[source]#
Parameters:

html (str)

findgender(html)[source]#
Parameters:

html (str)

findgenres(html)[source]#
Parameters:

html (str)

findgens(html)[source]#
Parameters:

html (str)

findheight(html)[source]#
Parameters:

html (str)

findheights(html)[source]#
Parameters:

html (str)

findimage(html)[source]#
Parameters:

html (str)

findinception(html)[source]#
Parameters:

html (str)

findinfluences(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findinstruments(html)[source]#
Parameters:

html (str)

findinworks(html)[source]#
Parameters:

html (str)

findkins(html)[source]#
Parameters:

html (str)

findlabels(html)[source]#
Parameters:

html (str)

findlanguagenames(html)[source]#
Parameters:

html (str)

findlanguages(html)[source]#
Parameters:

html (str)

findlanguagesspoken(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmannerdeath(html)[source]#
Parameters:

html (str)

findmemberships(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findmovements(html)[source]#
Parameters:

html (str)

findmoviedirectors(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findnotableworks(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findorigcountries(html)[source]#
Parameters:

html (str)

findparties(html)[source]#
Parameters:

html (str)

findpartners(html)[source]#
Parameters:

html (str)

findparts(html)[source]#
Parameters:

html (str)

findpatronof(html)[source]#
Parameters:

html (str)

findpositions(html)[source]#
Parameters:

html (str)

findpremiere(html)[source]#
Parameters:

html (str)

findprodcoms(html)[source]#
Parameters:

html (str)

findpseudonyms(html)[source]#
Parameters:

html (str)

findranks(html)[source]#
Parameters:

html (str)

findreligions(html)[source]#
Parameters:

html (str)

findrelorder(html)[source]#
Parameters:

html (str)

findresidences(html)[source]#
Parameters:

html (str)

findschools(html)[source]#
Parameters:

html (str)

findscreenwriters(html)[source]#
Parameters:

html (str)

findsiblings(html)[source]#
Parameters:

html (str)

findsignature(html)[source]#
Parameters:

html (str)

findsports(html)[source]#
Parameters:

html (str)

findsportteams(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

findstudents(html)[source]#
Parameters:

html (str)

findteachers(html)[source]#
Parameters:

html (str)

findteampositions(html)[source]#
Parameters:

html (str)

findtitles(html)[source]#
Parameters:

html (str)

findvoice(html)[source]#
Parameters:

html (str)

findwebpages(html)[source]#
Parameters:

html (str)

findwebsite(html)[source]#
Parameters:

html (str)

findweights(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

findworkplaces(html)[source]#
Parameters:

html (str)

getinfo(names, html, dtype=None, splitters=None, alt=None)[source]#
Return type:

str

getinfos(names, html, dtype=None, splitters='<>,;/،・{}|*', alt=None)[source]#
Return type:

list[str]

prepare(html)[source]#
Parameters:

html (str)

removewiki(text)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.WikitreeAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

findchildren(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

findfamily(html)[source]#
Parameters:

html (str)

findfather(html)[source]#
Parameters:

html (str)

findfirstname(html)[source]#
Parameters:

html (str)

findlastname(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findmother(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findsiblings(html)[source]#
Parameters:

html (str)

findspouses(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.WorldsWithoutEndAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescriptions(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findnationalities(html)[source]#
Parameters:

html (str)

findoccupations(html)[source]#
Parameters:

html (str)

findwebpages(html)[source]#
Parameters:

html (str)

getvalue(field, html, dtype=None)[source]#
setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.YoupornAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

findethnicities(html)[source]#
Parameters:

html (str)

findhaircolor(html)[source]#
Parameters:

html (str)

findheight(html)[source]#
Parameters:

html (str)

findinstanceof(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findoccupations(html)[source]#
Parameters:

html (str)

findweights(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ZbmathAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findawards(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findwebsite(html)[source]#
Parameters:

html (str)

findworkfields(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

class scripts.dataextend.ZobodatAnalyzer(ident, data=None, item=None, bot=None)[source]#

Bases: Analyzer

findbirthdate(html)[source]#
Parameters:

html (str)

findbirthplace(html)[source]#
Parameters:

html (str)

finddeathdate(html)[source]#
Parameters:

html (str)

finddeathplace(html)[source]#
Parameters:

html (str)

finddescription(html)[source]#
Parameters:

html (str)

findlongtext(html)[source]#
Parameters:

html (str)

findmixedrefs(html)[source]#
Parameters:

html (str)

findnames(html)[source]#
Return type:

list[str]

findwebpages(html)[source]#
Parameters:

html (str)

setup()[source]#

To be used for putting data into subclasses.

scripts.dataextend.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (tuple[str, ...]) – command line arguments

Return type:

None

data_ingestion script#

A generic bot to do data ingestion (batch uploading) of photos or other files

In addition it installs related metadata. The uploading is primarily from a url to a wiki-site.

Required configuration files#

  • a ‘Data ingestion’ template on a wiki site that specifies the name of a csv file, and csv configuration values.

  • a csv file that specifies each file to upload, the file’s copy-from URL location, and some metadata.

Required parameters#

The following parameters are required. The ‘csvdir’ and the ‘page:csvFile’ will be joined creating a path to a csv file that should contain specified information about files to upload.

-csvdir

A directory path to csv files

-page

A wiki path to templates. One of the templates at this location must be a ‘Data ingestion’ template with the following parameters.

Required parameters

csvFile

Optional parameters
sourceFormat

options: ‘csv’

sourceFileKey

options: ‘StockNumber’

csvDialect

options: ‘excel’, ‘’

csvDelimiter

options: any delimiter, ‘,’ is most common

csvEncoding

options: ‘utf8’, ‘Windows-1252’

formattingTemplate

titleFormat

Example ‘Data ingestion’ template#

{{Data ingestion
|sourceFormat=csv
|csvFile=csv_ingestion.csv
|sourceFileKey=%(StockNumber)
|csvDialect=
|csvDelimiter=,
|csvEncoding=utf8
|formattingTemplate=Template:Data ingestion test configuration
|titleFormat=%(name)s - %(set)s.%(_ext)s
}}

Csv file#

A full example can be found at tests/data/csv_ingestion.csv The ‘url’ field is the location a file will be copied from.

csv field Headers:

description.en,source,author,license,set,name,url

Usage#

python pwb.py data_ingestion -csvdir:<local_dir/> -page:<cfg_page_on_wiki>

Example

pwb.py data_ingestion -csvdir:"test/data" -page:"User:<Your-Username>/data_ingestion_test_template"

Warning

Put it in one line, otherwise it won’t work correctly.

scripts.data_ingestion.CSVReader(fileobj, urlcolumn, site=None, *args, **kwargs)[source]#

Yield Photo objects for each row of a CSV file.

class scripts.data_ingestion.DataIngestionBot(titlefmt, pagefmt, **kwargs)[source]#

Bases: Bot

Data ingestion bot.

Parameters:
  • titlefmt (str) – Title format

  • pagefmt (str) – Page format

classmethod parse_configuration_page(configuration_page)[source]#

Parse a Page which contains the configuration.

Parameters:

configuration_page (pywikibot.Page) – page with configuration

Return type:

dict[str, str]

treat(page)[source]#

Process each page.

  1. Check for existing duplicates on the wiki specified in self.site.

  2. If duplicates are found, then skip uploading.

  3. Download the file from photo.URL and upload the file to self.site.

Return type:

None

class scripts.data_ingestion.Photo(url, metadata, site=None)[source]#

Bases: FilePage

Represents a Photo (or other file), with metadata, to be uploaded.

Parameters:
  • url (str) – URL of photo

  • metadata (dict[str, Any]) – metadata about the photo that can be referred to from the title & template

  • site (pywikibot.site.APISite | None) – target site

download_photo()[source]#

Download the photo and store it in an io.BytesIO object.

TODO: Add exception handling

Return type:

BinaryIO

find_duplicate_images()[source]#

Find duplicates of the photo.

Calculates the SHA1 hash and asks the MediaWiki API for a list of duplicates.

TODO: Add exception handling, fix site thing

Return type:

list[str]

get_description(template, extraparams=None)[source]#

Generate a description for a file.

Parameters:

extraparams (dict[str, str] | None)

Return type:

str

get_title(fmt)[source]#

Populate format string with %(name)s entries using metadata.

Note

this does not clean the title, so it may be unusable as a MediaWiki page title, and cause an API exception when used.

Parameters:

fmt (str) – format string

Returns:

formatted string

Return type:

str

scripts.data_ingestion.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

delete script#

This script can be used to delete and undelete pages en masse

Of course, you will need an admin account on the relevant wiki.

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-always

Don’t prompt to delete pages, just do it.

-summary:XYZ

Set the summary message text for the edit to XYZ.

-undelete

Actually undelete pages instead of deleting. Obviously makes sense only with -page and -file.

-isorphan

Alert if there are pages that link to page to be deleted (check ‘What links here’). By default it is active and only the summary per namespace is be given. If given as -isorphan:n, n pages per namespace will be shown. If given as -isorphan:0, only the summary per namespace will be shown. If given as -isorphan:n, with n < 0, the option is disabled. This option is disregarded if -always is set.

-orphansonly:

Specified namespaces. Separate multiple namespace numbers or names with commas. Examples:

-orphansonly:0,2,4
-orphansonly:Help,MediaWiki

Note that Main ns can be indicated either with a 0 or a ‘,’:

-orphansonly:0,1
-orphansonly:,Talk

Usage:

python pwb.py delete [-category categoryName]

Examples

Delete everything in the category “To delete” without prompting:

python pwb.py delete -cat:”To delete” -always

class scripts.delete.DeletionRobot(summary, **kwargs)[source]#

Bases: CurrentPageBot

This robot allows deletion of pages en masse.

Parameters:

summary (str) – the reason for the (un)deletion

display_references()[source]#

Display pages that link to the current page, sorted per namespace.

Number of pages to display per namespace is provided by: - self.opt.isorphan

Return type:

None

skip_page(page)[source]#

Skip the page under some conditions.

Return type:

bool

treat_page()[source]#

Process one page from the generator.

Return type:

None

update_options: dict[str, Any] = {'isorphan': 0, 'orphansonly': [], 'undelete': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

class scripts.delete.PageWithRefs(source, title='', ns=0)[source]#

Bases: Page

A subclass of Page with convenience methods for reference checking.

Supports the same interface as Page, with some added methods.

Parameters:

title (str)

get_ref_table(*args, **kwargs)[source]#

Build mapping table with pages which links the current page.

Return type:

defaultdict[Namespace, Page]

namespaces_with_ref_to_page(namespaces=None)[source]#

Check if current page has links from pages in namepaces.

If namespaces is None, all namespaces are checked. Returns a set with namespaces where a ref to page is present.

Parameters:

namespaces (iterable of Namespace objects) – Namespace to check

Return type:

set[Namespace]

property ref_table: defaultdict[Namespace, Page]#

Build link reference table lazily.

This property gives a default table without any parameter set for getReferences(), whereas self.get_ref_table() is able to accept parameters.

scripts.delete.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

delinker script#

Delink removed files from wiki

This script keeps track of image deletions and delinks removed files from current wiki in namespace 0. This script is suitable to delink files from an image repository as well as for local images.

The following parameters are supported:

-category:

Retrieve pages to delink from “Pages with missing files” category. Usually the category is found on Q4989282 wikibase item but can be overwritten by giving the category title with that option. -since option is ignored.

-exclude:

If the deletion log contains this pattern, the file is not delinked (default is ‘no-delink’).

-localonly

Retrieve deleted File pages from local log only

-since:

Start the deletion log with this timestamp given in MediaWiki timestamp format. If no -since option is given, the start timestamp is read from setting file. If the option is empty, the processing starts from the very beginning. If the script stops, the last timestamp is written to the settings file and the next script call starts there if no -since is given.

Note

This script is a ConfigParserBot. All settings can be made either by giving option with the command line or with a settings file which is scripts.ini by default. If you don’t want the default values you can add any option you want to change to that settings file below the [delinker] section like.

Added in version 7.2: This script is completely rewriten from compat branch.

Changed in version 9.4: -category option was added.

class scripts.delinker.CommonsDelinker(site=True, **kwargs)[source]#

Bases: SingleSiteBot, ConfigParserBot, AutomaticTWSummaryBot

Base Delinker Bot.

Create a SingleSiteBot instance.

Parameters:
  • site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.

  • kwargs (Any)

skip_page(page)[source]#

Skip pages which either exists locally or on shared repository.

Return type:

bool

summary_key: str | None = 'delinker-delink'#

Must be defined in subclasses.

treat(file_page)[source]#

Set page to current page and delink that page.

treat_page()[source]#

Delink a single page.

class scripts.delinker.DelinkerFromCategory(site=True, **kwargs)[source]#

Bases: CommonsDelinker

Bot to delink deleted images from pages found in category.

Create a SingleSiteBot instance.

Parameters:
  • site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.

  • kwargs (Any)

property generator#

Retrieve pages with missing files and yield there image links.

init_page(item)[source]#

Upcast logevent to FilePage and combine edit summary.

Return type:

FilePage

pages_with_missing_files = 'Q4989282'#
skip_page(page)[source]#

Skip pages which aren’t deleted on any repository.

Return type:

FilePage

update_options: dict[str, Any] = {'category': True, 'exclude': 'no-delink', 'localonly': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

class scripts.delinker.DelinkerFromLog(site=True, **kwargs)[source]#

Bases: CommonsDelinker

Bot to delink deleted images from deletion log.

Create a SingleSiteBot instance.

Parameters:
  • site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.

  • kwargs (Any)

property generator#

Read deletion logs and yield the oldest entry first.

init_page(item)[source]#

Upcast logevent to FilePage and combine edit summary.

Return type:

FilePage

teardown()[source]#

Save the last used logevent timestamp.

update_options: dict[str, Any] = {'exclude': 'no-delink', 'localonly': False, 'since': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.delinker.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

djvutext script#

This bot uploads text from djvu files onto pages in the “Page” namespace

Note

It is intended to be used for Wikisource.

The following parameters are supported:

-index:

name of the index page (without the Index: prefix)

-djvu:

path to the djvu file, it shall be: .. hlist:

* path to a file name
* dir where a djvu file name as index is located optional,
  by default is current dir '.'
-pages:<start>-<end>,...<start>-<end>,<start>-<end>

upload; optional, start=1, end=djvu file number of images. Page ranges can be specified as:

A-B -> pages A until B
A-  -> pages A until number of images
A   -> just page A
-B  -> pages 1 until B

This script is a ConfigParserBot. The following options can be set within a settings file which is scripts.ini by default:

-summary:

(str) Custom edit summary. Use quotes if edit summary contains spaces.

-force

Overwrites existing text optional, default False.

-always

Do not bother asking to confirm any of the changes.

class scripts.djvutext.DjVuTextBot(djvu, index, pages=None, **kwargs)[source]#

Bases: SingleSiteBot

A bot that uploads text-layer from djvu files to Page:namespace.

Works only on sites with Proofread Page extension installed.

Changed in version 7.0: CheckerBot is a ConfigParserBot

Parameters:
  • djvu (DjVuFile object) – djvu from where to fetch the text layer

  • index (Page object) – index page in the Index: namespace

  • pages (tuple | None) – page interval to upload (start, end)

property generator#

Generate pages from specified page interval.

page_number_gen()[source]#

Generate pages numbers from specified page intervals.

treat(page)[source]#

Process one page.

Return type:

None

update_options: dict[str, Any] = {'force': False, 'summary': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.djvutext.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

download_dump script#

This bot downloads dump from dumps.wikimedia.org

This script supports the following command line parameters:

-filename:# The name of the file (e.g. abstract.xml)

-storepath:# The stored file’s path.

-dumpdate:# The dumpdate date of the dump (default to latest)

formatted as YYYYMMDD.

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

Added in version 3.0.20180108.

class scripts.download_dump.DownloadDumpBot(site=None, **kwargs)[source]#

Bases: Bot, ConfigParserBot

Download dump bot.

Changed in version 7.0: DownloadDumpBot is a ConfigParserBot

Create a Bot instance and initialize cached sites.

Parameters:
available_options: dict[str, Any] = {'dumpdate': 'latest', 'filename': '', 'storepath': './', 'wikiname': ''}#

Handler configuration attribute. Only the keys of the dict can be passed as __init__ options. The values are the default values. Overwrite this in subclasses!

static get_dump_name(db_name, typ, dumpdate)[source]#

Check if dump file exists locally in a Toolforge server.

run()[source]#

Run bot.

Return type:

None

scripts.download_dump.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

fixing_redirects script#

Correct all redirect links in featured pages or only one page of each wiki

Can be used with:

-always

The bot won’t ask for confirmation when putting a page

-featured

Run over featured pages (for some Wikimedia wikis only)

-overwrite

Usually only the link is changed ([[Foo]] -> [[Bar|Foo]]). This parameters sets the script to completly overwrite the link text ([[Foo]] -> [[Bar]]).

-ignoremoves

Do not try to solve deleted pages after page move.

This script supports use of pagegenerators arguments.

class scripts.fixing_redirects.FixingRedirectBot(site=True, **kwargs)[source]#

Bases: SingleSiteBot, ExistingPageBot, AutomaticTWSummaryBot

Run over pages and resolve redirect links.

Create a SingleSiteBot instance.

Parameters:
  • site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.

  • kwargs (Any)

get_target(page)[source]#

Get the target page for a given page.

ignore_server_errors = True#

Replace all source links by target.

summary_key: str | None = 'fixing_redirects-fixing'#

Must be defined in subclasses.

treat_page()[source]#

Change all redirects from the current page to actual links.

Return type:

None

update_options: dict[str, Any] = {'ignoremoves': False, 'overwrite': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.fixing_redirects.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

harvest_template script#

Template harvesting script

Usage (see below for explanations and examples):

python pwb.py harvest_template -transcludes:”…” [default optional arguments] template_parameter PID [local optional arguments] [template_parameter PID [local optional arguments]]

python pwb.py harvest_template [generators] -template:”…” [default optional arguments] template_parameter PID [local optional arguments] [template_parameter PID [local optional arguments]]

This will work on all pages that transclude the template in the article namespace

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

You can also use additional parameters:

-confirm

If used, the bot will ask if it should make changes

-create

Create missing items before importing.

The following command line parameters can be used to change the bot’s behavior. If you specify them before all parameters, they are global and are applied to all param-property pairs. If you specify them after a param-property pair, they are local and are only applied to this pair. If you specify the same argument as both local and global, the local argument overrides the global one (see also examples):

-islink

Treat plain text values as links (“text” -> “[[text]]”).

-exists

If set to ‘p’, add a new value, even if the item already has the imported property but not the imported value. If set to ‘pt’, add a new value, even if the item already has the imported property with the imported value and some qualifiers.

-multi

If set, try to match multiple values from parameter.

-inverse

Import this property as the inverse claim.

Examples

The following command will try to import existing images from “image” parameter of “Infobox person” on English Wikipedia as Wikidata property “P18” (image):

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” image P18

The following command will behave the same as the previous example and also try to import [[links]] from “birth_place” parameter of the same template as Wikidata property “P19” (place of birth):

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” image P18 birth_place P19

The following command will import both “birth_place” and “death_place” params with -islink modifier, ie. the bot will try to import values, even if it doesn’t find a [[link]]:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” -islink birth_place P19 death_place P20

The following command will do the same but only “birth_place” can be imported without a link:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” birth_place P19 -islink death_place P20

The following command will import an occupation from “occupation” parameter of “Infobox person” on English Wikipedia as Wikidata property “P106” (occupation). The page won’t be skipped if the item already has that property but there is not the new value:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” occupation P106 -exists:p

The following command will import band members from the “current_members” parameter of “Infobox musical artist” on English Wikipedia as Wikidata property “P527” (has part). This will only extract multiple band members if each is linked, and will not add duplicate claims for the same member:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox musical artist” current_members P527 -exists:p -multi

The following command will import the category’s main topic from the first anonymous parameter of “Cat main” on English Wikipedia as Wikidata property “P301” (category’s main topic) and whenever a new value is imported, the inverse claim is imported to the topic item as Wikidata property “P910” (topic’s main category) unless a claim of that property is already there:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:14 -template:”Cat main” 1 P301 -inverse:P910 -islink

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

Added in version 7.5: the -inverse option.

class scripts.harvest_template.HarvestRobot(template_title, fields, **kwargs)[source]#

Bases: ConfigParserBot, WikidataBot

A bot to add Wikidata claims.

Changed in version 7.0: HarvestRobot is a ConfigParserBot

Parameters:
  • template_title (str) – The template to work on

  • fields (dict) – A dictionary of fields that are of use to us

Keyword Arguments:
  • islink – Whether non-linked values should be treated as links

  • create – Whether to create a new item if it’s missing

  • exists – pattern for merging existing claims with harvested values

  • multi – Whether multiple values should be extracted from a single parameter

  • inverse – a property to populate on the target, pointing to the page item

getTemplateSynonyms(title)[source]#

Fetch redirects of the title, so we can check against them.

Parameters:

title (str)

Return type:

list[str]

static handle_commonsmedia(value, site, *args)[source]#

Handle ‘commonsMedia’ claim type.

Added in version 7.5.

Return type:

Generator[FilePage, None, None]

static handle_external_id(value, *args)#

Handle ‘string’ and ‘external-id’ claim type.

Added in version 7.5.

Parameters:

value (str)

Return type:

Generator[str, None, None]

static handle_string(value, *args)[source]#

Handle ‘string’ and ‘external-id’ claim type.

Added in version 7.5.

Parameters:

value (str)

Return type:

Generator[str, None, None]

handle_time(value, site, *args)[source]#

Handle ‘time’ claim type.

Added in version 7.5.

Parameters:
Return type:

Generator[WbTime, None, None]

handle_url(value, *args)[source]#

Handle ‘url’ claim type.

Added in version 7.5.

Return type:

Generator[str, None, None]

handle_wikibase_item(value, site, item, field)[source]#

Handle ‘wikibase-item’ claim type.

Added in version 7.5.

Parameters:
Return type:

Generator[ItemPage, None, None]

setup()[source]#

Cache some static data from wikis.

Find the ItemPage target for a given link text.

Changed in version 7.5: Only follow the redirect target if redirect page has no wikibase item.

Parameters:
Return type:

ItemPage | None

treat_field(item, site, field_item)[source]#

Process a single field of template fielddict.

Added in version 7.5.

Parameters:
Return type:

None

treat_page_and_item(page, item)[source]#

Process a single page/item.

Parameters:
Return type:

None

update_options: dict[str, Any] = {'always': True, 'create': False, 'exists': '', 'inverse': None, 'islink': False, 'multi': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

class scripts.harvest_template.PropertyOptionHandler(**kwargs)[source]#

Bases: OptionHandler

Class holding options for a param-property pair.

Only accept options defined in available_options.

Parameters:

kwargs (Any) – bot options

available_options: dict[str, Any] = {'exists': '', 'inverse': None, 'islink': False, 'multi': False}#

Handler configuration attribute. Only the keys of the dict can be passed as __init__ options. The values are the default values. Overwrite this in subclasses!

scripts.harvest_template.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

illustrate_wikidata script#

Bot to add images to Wikidata items

The image is extracted from the page_props. For this to be available the PageImages extension (https://www.mediawiki.org/wiki/Extension:PageImages) needs to be installed.

The following options are provided:

-always

Don’t prompt to make changes, just do them.

-property

The property to add. Should be of type commonsMedia.

Usage:

python pwb.py illustrate_wikidata <some generator>

This script supports use of pagegenerators arguments.

class scripts.illustrate_wikidata.IllustrateRobot(**kwargs)[source]#

Bases: WikidataBot

A bot to add Wikidata image claims.

treat_page_and_item(page, item)[source]#

Treat a page / item.

Return type:

None

update_options: dict[str, Any] = {'property': 'P18'}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.illustrate_wikidata.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

image script#

This script can be used to change one image to another or remove an image

Syntax:

python pwb.py image image_name [new_image_name]

If only one command-line parameter is provided then that image will be removed; if two are provided, then the first image will be replaced by the second one on all pages.

Command line options:

-summary:

Provide a custom edit summary. If the summary includes spaces, surround it with single quotes, such as: -summary:'My edit summary'

-always

Don’t prompt to make changes, just do them.

-loose

Do loose replacements. This will replace all occurrences of the name of the image (and not just explicit image syntax). This should work to catch all instances of the image, including where it is used as a template parameter or in image galleries. However, it can also make more mistakes. This only works with image replacement, not image removal.

Examples

The image “FlagrantCopyvio.jpg” is about to be deleted, so let’s first remove it from everything that displays it:

python pwb.py image FlagrantCopyvio.jpg

The image “Flag.svg” has been uploaded, making the old “Flag.jpg” obsolete:

python pwb.py image Flag.jpg Flag.svg

class scripts.image.ImageRobot(generator, old_image, new_image='', **kwargs)[source]#

Bases: ReplaceRobot

This bot will replace or remove all occurrences of an old image.

Parameters:
  • generator (iterable) – the pages to work on

  • old_image (str) – the title of the old image (without namespace)

  • new_image (str) – the title of the new image (without namespace), or None if you want to remove the image

scripts.image.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

imagetransfer script#

Script to copy images to Wikimedia Commons, or to another wiki

Syntax:

python pwb.py imagetransfer {<pagename>|<generator>} [<options>]

The following parameters are supported:

-interwiki

Look for images in pages found through interwiki links.

-keepname

Keep the filename and do not verify description while replacing.

-tolang:x

(str) Copy the image to the wiki in code x.

-tofamily:y

(str) Copy the image to a wiki in the family y.

-tosite:s

(str) Copy the image to the given site like wikipedia:test.

-force_if_shared

Upload the file to the target, even if it exists on that wiki’s shared repo

-asynchronous

Upload to stash.

-chunk_size:n

(int) Upload in chunks of n bytes.

-file:z

(str) Upload many files from textfile z like:

[[Image:x]]
[[Image:y]]

If pagename is an image description page, offers to copy the image to the target site. If it is a normal page, it will offer to copy any of the images used on that page, or if the -interwiki argument is used, any of the images used on a page reachable via interwiki links.

This script supports use of pagegenerators arguments.

class scripts.imagetransfer.ImageTransferBot(**kwargs)[source]#

Bases: SingleSiteBot, ExistingPageBot

Image transfer bot.

Keyword Arguments:
  • generator – the pages to work on

  • target_site – Site to send image to, default none

  • interwiki – Look for images in interwiki links, default false

  • keepname – Keep the filename and do not verify description while replacing, default false

  • force_if_shared – Upload the file even if it’s currently shared to the target site (e.g. when moving from Commons to another wiki)

  • asynchronous – Upload to stash.

  • chunk_size – Upload in chunks of this size bytes.

show_image_list(imagelist)[source]#

Print image list.

Return type:

None

transfer_allowed(image)[source]#

Check whether transfer is allowed.

Return type:

bool

transfer_image(sourceImagePage)[source]#

Download image and its description, and upload it to another site.

Returns:

the filename which was used to upload the image

Return type:

None

treat(page)[source]#

Treat a single page.

Return type:

None

update_options: dict[str, Any] = {'asynchronous': False, 'chunk_size': 0, 'force_if_shared': False, 'ignore_warning': False, 'interwiki': False, 'keepname': False, 'target': None}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.imagetransfer.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

interwiki script#

Script to check language links for general pages

Uses existing translations of a page, plus hints from the command line, to download the equivalent pages from other languages. All of such pages are downloaded as well and checked for interwiki links recursively until there are no more links that are encountered. A rationalization process then selects the right interwiki links, and if this is unambiguous, the interwiki links in the original page will be automatically updated and the modified page uploaded.

Hint

This script should not be used for wiki farms having a data repository like Wikidata for Wikimedia cluster. Possibly use the :mod:scripts.interwikidata script for similar purpose in such environments.

These command-line arguments can be used to specify which pages to work on:

-days:

Like -years, but runs through all date pages. Stops at Dec 31. If the argument is given in the form -days:X, it will start at month no. X through Dec 31. If the argument is simply given as -days, it will run from Jan 1 through Dec 31. E.g. for -days:9 it will run from Sep 1 through Dec 31.

-years:

Run on all year pages in numerical order. Stop at year 2050. If the argument is given in the form -years:XYZ, it will run from [[XYZ]] through [[2050]]. If XYZ is a negative value, it is interpreted as a year BC. If the argument is simply given as -years, it will run from 1 through 2050.

This option implies -noredirect.

-new:

Work on the 100 newest pages. If given as -new:x, will work on the x newest pages. When multiple -namespace parameters are given, x pages are inspected, and only the ones in the selected name spaces are processed. Use -namespace:all for all namespaces. Without -namespace, only article pages are processed.

This option implies -noredirect.

-restore:

Restore a set of “dumped” pages the bot was working on when it terminated. The dump file will be subsequently removed.

-restore:all

Restore a set of “dumped” pages of all dumpfiles to a given family remaining in the “interwiki-dumps” directory. All these dump files will be subsequently removed. If restoring process interrupts again, it saves all unprocessed pages in one new dump file of the given site.

-continue:

Like restore, but after having gone through the dumped pages, continue alphabetically starting at the last of the dumped pages. The dump file will be subsequently removed.

This script supports use of pagegenerators arguments.

Additionally, these arguments can be used to restrict the bot to certain pages:

-namespace:n

(int) Number or name of namespace to process. The parameter can be used multiple times. It works in combination with all other parameters, except for the -start parameter. If you e.g. want to iterate over all categories starting at M, use -start:Category:M.

-number:

(int) Used as -number:#, specifies that the bot should process that amount of pages and then stop. This is only useful in combination with -start. The default is not to stop.

-until:

(str) Used as -until:title, specifies that the bot should process pages in wiki default sort order up to, and including, “title” and then stop. This is only useful in combination with -start. The default is not to stop.

Note

Do not specify a namespace, even if -start has one.

-bracket

Only work on pages that have (in the home language) parenthesis in their title. All other pages are skipped.

-skipfile:

Used as -skipfile:filename, skip all links mentioned in the given file. This does not work with -number!

-skipauto

Use to skip all pages that can be translated automatically, like dates, centuries, months, etc.

-lack:

Used as -lack:xx with xx a language code: only work on pages without links to language xx. You can also add a number nn like -lack:xx:nn, so that the bot only works on pages with at least nn interwiki links (the default value for nn is 1).

These arguments control miscellaneous bot behaviour:

-quiet

Use this option to get less output

-async

Put page on queue to be saved to wiki asynchronously. This enables loading pages during saving throttling and gives a better performance.

Note

For post-processing it always assumes that saving the pages was successful.

-summary:

(str) Set an additional action summary message for the edit. This could be used for further explainings of the bot action. This will only be used in non-autonomous mode.

-hintsonly

The bot does not ask for a page to work on, even if none of the above page sources was specified. This will make the first existing page of -hint or -hinfile slip in as start page, determining properties like namespace, disambiguation state, and so on. When no existing page is found in the hints, the bot does nothing. Hitting return without input on the “Which page to check:” prompt has the same effect as using -hintsonly. Options like -back or -same are in effect only after a page has been found to work on.

These arguments are useful to provide hints to the bot:

-hint:

Used as -hint:de:Anweisung to give the bot a hint where to start looking for translations. If no text is given after the second ‘:’, the name of the page itself is used as the title for the hint, unless the -hintnobracket command line option (see there) is also selected.

There are some special hints, trying a number of languages at once:

  • all: All languages with at least ca. 100 articles

  • 10: The 10 largest languages (sites with most articles). Analogous for any other natural number

  • arab: All languages using the Arabic alphabet

  • cyril: All languages that use the Cyrillic alphabet

  • chinese: All Chinese dialects

  • latin: All languages using the Latin script

  • scand: All Scandinavian languages

Names of families that forward their interlanguage links to the wiki family being worked upon can be used, they are:

  • commons: Interlanguage links of Wikimedia Commons

  • incubator: Links in pages on the Wikimedia Incubator

  • meta: Interlanguage links of named pages on Meta

  • species: Interlanguage links of the Wikispecies wiki

  • strategy: Links in pages on Wikimedia Strategy wiki

  • test: Take interwiki links from Test Wikipedia

  • wikimania: Interwiki links of Wikimania

Languages, groups and families having the same page title can be combined, as -hint:5,scand,sr,pt,commons:New_York

-hintfile:

Similar to -hint, except that hints are taken from the given file, enclosed in [[]] each, instead of the command line.

-askhints:

For each page one or more hints are asked. See -hint: above for the format, one can for example give “en:something” or “20:” as hint.

-same

Looks over all ‘serious’ languages for the same title. -same is equivalent to -hint:all.

-untranslated:

Works normally on pages with at least one interlanguage link; asks for hints for pages that have none.

-untranslatedonly:

but pages which already have a translation are skipped.

Hint

do NOT use this in combination with -start without a -number limit, because you will go through the whole alphabet before any queries are performed!

-showpage

When asking for hints, show the first bit of the text of the page always, rather than doing so only when being asked for (by typing ‘?’). Only useful in combination with a hint-asking option like -untranslated, -askhints or -untranslatedonly.

-noauto

Do not use the automatic translation feature for years and dates, only use found links and hints.

-hintnobracket

Used to make the bot strip everything in last brackets, and surrounding spaces from the page name, before it is used in a -hint:xy: where the page name has been left out, or -hint:all:, -hint:10:, etc. without a name, or an -askhint reply, where only a language is given.

These arguments define how much user confirmation is required:

-autonomous, -auto

Run automatically, do not ask any questions. If a question to an operator is needed, write the name of the page to autonomous_problems.dat and continue on the next page.

-confirm

Ask for confirmation before any page is changed on the live wiki. Without this argument, additions and unambiguous modifications are made without confirmation.

-force

Do not ask permission to make “controversial” changes, like removing a language because none of the found alternatives actually exists.

-cleanup

Like -force but only removes interwiki links to non-existent or empty pages.

-select

Ask for each link whether it should be included before changing any page. This is useful if you want to remove invalid interwiki links and if you do multiple hints of which some might be correct and others incorrect. Combining -select and -confirm is possible, but seems like overkill.

These arguments specify in which way the bot should follow interwiki links:

-noredirect

Do not follow redirects nor category redirects.

-initialredirect

Work on its target if a redirect or category redirect is entered on the command line or by a generator.

Tip

It is recommended to use this option with the -movelog pagegenerator.

-neverlink:

Used as -neverlink:xx where xx is a language code, disregard any links found to language xx. You can also specify a list of languages to disregard, separated by commas.

-ignore:

Used as -ignore:xx:aaa where xx is a language code, and aaa is a page title to be ignored.

-ignorefile:

Similar to -ignore, except that the pages are taken from the given file instead of the command line.

-localright

Do not follow interwiki links from other pages than the starting page.

Warning

Should be used very sparingly, only when you are sure you have first gotten the interwiki links on the starting page exactly right.

-hintsareright

Do not follow interwiki links to sites for which hints on existing pages are given. Note that, hints given interactively, via the -askhint command line option, are only effective once they have been entered, thus interwiki links on the starting page are followed regardess of hints given when prompted.

Caution

Should be used with care!

-back

Only work on pages that have no backlink from any other language; if a backlink is found, all work on the page will be halted.

The following arguments are only important for users who have accounts for multiple languages, and specify on which sites the bot should modify pages:

-localonly

Only work on the local wiki, not on other wikis in the family I have a login at.

-limittwo

Only update two pages - one in the local wiki (if logged-in) and one in the top available one. For example, if the local page has links to de and fr, this option will make sure that only the local site and the de: (larger) sites are updated. This option is useful to quickly set two way links without updating all of the wiki families sites.

-whenneeded

Works like -limittwo, but other languages are changed in the following cases:

  • If there are no interwiki links at all on the page

  • If an interwiki link must be removed

  • If an interwiki link must be changed and there has been a conflict for this page

Optionally, -whenneeded can be given an additional number (for example -whenneeded:3), in which case other languages will be changed if there are that number or more links to change or add.

The following arguments influence how many pages the bot works on at once:

-array:

The number of pages the bot tries to be working on at once. If the number of pages loaded is lower than this number, a new set of pages is loaded from the starting wiki. The default is 100, but can be changed in the config variable interwiki_min_subjects.

-query:

The maximum number of pages that the bot will load at once. Default value is 50.

Some configuration option can be used to change the working of this bot:

interwiki_min_subjects

the minimum amount of subjects that should be processed at the same time.

interwiki_backlink

if set to True, all problems in foreign wikis will be reported

interwiki_shownew

should interwiki.py display every new link it discovers?

interwiki_graph

output a graph PNG file on conflicts? You need pydot for this: https://pypi.org/project/pydot/

interwiki_graph_format

the file format for interwiki graphs

without_interwiki

save file with local articles without interwikis

All these options can be changed through the user configuration file.

If this script is terminated before it is finished, it will write a dump file to the interwiki-dumps subdirectory. The program will read it if invoked with the -restore or -continue option, and finish all the subjects in that list. After finishing the dump file will be deleted. To run the script on all pages on a language, run it with option -start:!, and if it takes so long that you have to break it off, use -continue next time.

exception scripts.interwiki.GiveUpOnPage(arg)[source]#

Bases: Error

User chose not to work on this page and its linked pages any more.

Parameters:

arg (Exception | str)

Return type:

None

class scripts.interwiki.InterwikiBot(conf=None)[source]#

Bases: object

A class keeping track of a list of subjects.

It controls which pages are queried from which languages when.

add(page, hints=None)[source]#

Add a single subject to the list.

Return type:

None

property dump_titles: Iterable[str]#

Return generator of titles for dump file.

firstSubject()[source]#

Return the first subject that is still being worked on.

Return type:

Subject | None

generateMore(number)[source]#

Generate more subjects.

This is called internally when the list of subjects becomes too small, but only if there is a PageGenerator

Return type:

None

isDone()[source]#

Check whether there is still more work to do.

Return type:

bool

maxOpenSite()[source]#

Return the site that has the most open queries plus the number.

If there is nothing left, return None. Only sites that are todo for the first Subject are returned.

minus(site, count=1)[source]#

Helper routine that the Subject class expects in a counter.

Parameters:

count (int)

Return type:

None

oneQuery()[source]#

Perform one step in the solution process.

Returns True if pages could be preloaded, or false otherwise.

Return type:

bool

plus(site, count=1)[source]#

Helper routine that the Subject class expects in a counter.

Parameters:

count (int)

Return type:

None

queryStep()[source]#

Delete the ones that are done now.

Return type:

None

run()[source]#

Start the process until finished.

Return type:

None

selectQuerySite()[source]#

Select the site the next query should go out for.

setPageGenerator(pageGenerator, number=None, until=None)[source]#

Add a generator of subjects.

Once the list of subjects gets too small, this generator is called to produce more Pages.

Return type:

None

class scripts.interwiki.InterwikiBotConfig[source]#

Bases: object

Container class for interwikibot’s settings.

always = False#
askhints = False#
asynchronous = False#
auto = True#
autonomous = False#
cleanup = False#
confirm = False#
followinterwiki = True#
followredirect = True#
force = False#
hintnobracket = False#
hints = []#
hintsareright = False#
ignore = []#
initialredirect = False#
lacklanguage = None#
limittwo = False#
localonly = False#
maxquerysize = 50#
minsubjects = 100#
needlimit = 0#
nobackonly = False#
note(text)[source]#

Output a notification message with.

The text will be printed only if conf.quiet isn’t set. :param text: text to be shown

Parameters:

text (str)

Return type:

None

parenthesesonly = False#
quiet = False#
readOptions(option)[source]#

Read all commandline parameters for the global container.

Parameters:

option (str)

Return type:

bool

rememberno = False#
remove = []#
repository = False#
restore_all = False#
same = False#
select = False#
showtextlinkadd = 300#
skip = {}#
skipauto = False#
strictlimittwo = False#
summary = ''#
untranslated = False#
untranslatedonly = False#
class scripts.interwiki.InterwikiDumps(**kwargs)[source]#

Bases: OptionHandler

Handle interwiki dumps.

Keyword Arguments:

do_continue – If true, continue alphabetically starting at the last of the dumped pages.

FILE_PATTERN = '{site.family.name}-{site.code}.txt'#
available_options: dict[str, Any] = {'do_continue': False, 'restore_all': False}#

Handler configuration attribute. Only the keys of the dict can be passed as __init__ options. The values are the default values. Overwrite this in subclasses!

delete_dumps()[source]#

Delete processed dumps.

Return type:

None

property files#

Return file generator depending on restore_all option.

rtype: generator

get_files()[source]#

Get dump files from directory.

property next_namespace#

Return next page namespace for continue option.

property next_page#

Return next page title string for continue option.

read_dump()[source]#

Read the dump file.

Return type:

generator

remove(filename)[source]#

Remove filename from restored files.

Parameters:

filename (str) – A filename to be removed from restored set.

Return type:

None

write_dump(iterable, append=True)[source]#

Write dump file.

Parameters:
  • iterable (Iterable) – an iterable of page titles to be dumped.

  • append (bool) – if a dump already exits, append the page titles to it if True else overwrite it.

Return type:

None

exception scripts.interwiki.LinkMustBeRemoved(arg)[source]#

Bases: SaveError

An interwiki link has to be removed manually.

An interwiki link has to be removed, but this can’t be done because of user preferences or because the user chose not to change the page.

Parameters:

arg (Exception | str)

Return type:

None

exception scripts.interwiki.SaveError(arg)[source]#

Bases: Error

An attempt to save a page with changed interwiki has failed.

Parameters:

arg (Exception | str)

Return type:

None

class scripts.interwiki.Subject(origin=None, hints=None, conf=None)[source]#

Bases: Subject

Class to follow the progress of a single ‘subject’.

(i.e. a page with all its translations)

Subject is a transitive closure of the binary relation on Page: “has_a_langlink_pointing_to”.

A formal way to compute that closure would be:

With P a set of pages, NL (‘NextLevel’) a function on sets defined as:

NL(P) = { target | source P, target source.langlinks() }

pseudocode:

todo <- [origin]
done <- []
while todo != []:
    pending <- todo
    todo <-NL(pending) / done
    done <- NL(pending) U done
return done

There is, however, one limitation that is induced by implementation: to compute efficiently NL(P), one has to load the page contents of pages in P. (Not only the langlinks have to be parsed from each Page, but we also want to know if the Page is a redirect, a disambiguation, etc…)

Because of this, the pages in pending have to be preloaded. However, because the pages in pending are likely to be in several sites we cannot “just” preload them as a batch.

Instead of doing “pending <- todo” at each iteration, we have to elect a Site, and we put in pending all the pages from todo that belong to that Site:

Code becomes:

todo <- {origin.site: [origin]}
done <- []
while todo != {}:
    site <- electSite()
    pending <- todo[site]

    preloadpages(site, pending)

    todo[site] <- NL(pending) / done
    done <- NL(pending) U done
return done

Subject objects only operate on pages that should have been preloaded before. In fact, at any time:

  • todo contains new Pages that have not been loaded yet

  • done contains Pages that have been loaded, and that have been treated.

  • If batch preloadings are successful, Page._get() is never called from this Object.

Takes as arguments the Page on the home wiki plus optionally a list of hints for translation

addIfNew(page, counter, linkingPage)[source]#

Add the page to the todo collection, if it hasn’t been seen yet.

If it is added, update the counter accordingly.

Also remembers where we found the page, regardless of whether it had already been found before or not.

Returns True if the page is new.

Return type:

bool

askForHints(counter)[source]#

Ask for hints to other sites.

Return type:

None

assemble()[source]#

Assemble language links.

batchLoaded(counter)[source]#

Notify that the promised batch of pages was loaded.

This is called by a worker to tell us that the promised batch of pages was loaded. In other words, all the pages in self.pending have already been preloaded.

The only argument is an instance of a counter class, that has methods minus() and plus() to keep counts of the total work todo.

Return type:

None

check_page(page, counter)[source]#

Check whether iw links should be added to the todo collection.

Return type:

None

disambigMismatch(page, counter)[source]#

Check whether the given page has a different disambiguation status.

Returns a tuple (skip, alternativePage).

skip is True if the pages have mismatching statuses and the bot is either in autonomous mode, or the user chose not to use the given page.

alternativePage is either None, or a page that the user has chosen to use instead of the given page.

finish()[source]#

Round up the subject, making any necessary changes.

This should be called exactly once after the todo collection has gone empty.

getFoundDisambig(site)[source]#

Return the first disambiguation found.

If we found a disambiguation on the given site while working on the subject, this method returns it. If several ones have been found, the first one will be returned. Otherwise, None will be returned.

getFoundInCorrectNamespace(site)[source]#

Return the first page in the extended namespace.

If we found a page that has the expected namespace on the given site while working on the subject, this method returns it. If several ones have been found, the first one will be returned. Otherwise, None will be returned.

getFoundNonDisambig(site)[source]#

Return the first non-disambiguation found.

If we found a non-disambiguation on the given site while working on the subject, this method returns it. If several ones have been found, the first one will be returned. Otherwise, None will be returned.

static get_alternative(site)[source]#

Ask for an alternative Page for a given site.

Parameters:

site (BaseSite) – a BaseSite

Return type:

Page | None

isDone()[source]#

Return True if all the work for this subject has completed.

isIgnored(page)[source]#

Return True if pages is to be ignored.

Return type:

bool

makeForcedStop(counter)[source]#

End work on the page before the normal end.

Return type:

None

namespaceMismatch(linkingPage, linkedPage, counter)[source]#

Check whether or not the given page has a different namespace.

Returns True if the namespaces are different and the user has selected not to follow the linked page.

Return type:

bool

openSites()[source]#

Iterator.

Yields (site, count) pairs: * site is a site where we still have work to do on * count is the number of items in that Site that need work on

post_processing()[source]#

Some finishing processes to be done.

problem(txt, createneed=True)[source]#

Report a problem with the resolution of this subject.

Parameters:
  • txt (str)

  • createneed (bool)

Return type:

None

process_limit_two(new, updated)[source]#

Post process limittwo.

process_unlimited(new, updated)[source]#

Post process unlimited.

redir_checked(page, counter)[source]#

Check and handle redirect. Return True if check is done.

Return True if saving was successful.

Return type:

bool

Report missing back links.

This will be called from finish() if needed. updatedSites is a list that contains all sites that are changed, to avoid reporting of missing backlinks for already fixed pages.

Return type:

None

reportInterwikilessPage(page)[source]#

Report interwikiless page.

Return type:

None

skipPage(page, target, counter)[source]#

Return whether page has to be skipped.

translate(hints=None, keephintedsites=False)[source]#

Add the given translation hints to the todo collection.

Parameters:

keephintedsites (bool)

Return type:

None

whatsNextPageBatch(site)[source]#

Return the next page batch.

By calling this method, you ‘promise’ this instance that you will preload all the site Pages that are in the todo collection.

Returns:

This routine will return a list of pages that can be treated.

Return type:

list[Page]

whereReport(page, indent=4)[source]#

Report found interlanguage links with conflicts.

Parameters:

indent (int)

Return type:

None

scripts.interwiki.botMayEdit(page)[source]#

Test for allowed edits.

Return type:

bool

scripts.interwiki.compareLanguages(old, new, insite, summary)[source]#

Compare changes and setup i18n message.

scripts.interwiki.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

scripts.interwiki.page_empty_check(page)[source]#

Return True if page should be skipped as it is almost empty.

Pages in content namespaces are considered empty if they contain less than 50 characters, and other pages are considered empty if they are not category pages and contain less than 4 characters excluding interlanguage links and categories.

Return type:

bool

interwikidata script#

Script to handle interwiki links based on Wikibase

This script connects pages to Wikibase items using language links on the page. If multiple language links are present, and they are connected to different items, the bot skips. After connecting the page to an item, language links can be removed from the page.

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-always

If used, the bot won’t ask if it should add the specified text.

-clean

Clean pages.

-create

Create items.

-merge

Merge items.

-summary:

(str) Use your own edit summary for cleaning the page.

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

class scripts.interwikidata.IWBot(**kwargs)[source]#

Bases: ConfigParserBot, ExistingPageBot, SingleSiteBot

The bot for interwiki.

Changed in version 7.0: IWBot is a ConfigParserBot

Initialize the bot.

clean_page()[source]#

Clean interwiki links from the page.

Return type:

None

create_item()[source]#

Create item in repo for current_page.

Return type:

ItemPage

get_items()[source]#

Return all items of pages linked through the interwiki.

Return type:

set[ItemPage]

handle_complicated()[source]#

Handle pages when they have interwiki conflict.

When this method returns True it means conflict has resolved and it’s okay to clean old interwiki links. This method should change self.current_item and fix conflicts. Change it in subclasses.

Return type:

bool

treat_page()[source]#

Check page.

Return type:

None

try_to_add()[source]#

Add current page in repo.

Return type:

ItemPage | bool | None

try_to_merge(item)[source]#

Merge two items.

Return type:

ItemPage | bool | None

update_options: dict[str, Any] = {'clean': False, 'create': False, 'ignore_ns': False, 'merge': False, 'summary': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.interwikidata.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

listpages script#

Print a list of pages, as defined by page generator parameters

Optionally, it also prints page content to STDOUT or save it to a file in the current directory.

These parameters are supported to specify which pages titles to print:

-format

Defines the output format.

Can be a custom string according to python string.format() notation or can be selected by a number from following list (1 is default format):

1 - ‘{num:4d} {page.title}’

–> 10 PageTitle

2 - ‘{num:4d} [[{page.title}]]’

–> 10 [[PageTitle]]

3 - ‘{page.title}’

–> PageTitle

4 - ‘[[{page.title}]]’

–> [[PageTitle]]

5 - ‘{num:4d} <<lightred>>{page.loc_title:<40}<<default>>’

–> 10 localised_Namespace:PageTitle (colorised in lightred)

6 - ‘{num:4d} {page.loc_title:<40} {page.can_title:<40}’
–> 10 localised_Namespace:PageTitle

canonical_Namespace:PageTitle

7 - ‘{num:4d} {page.loc_title:<40} {page.trs_title:<40}’
–> 10 localised_Namespace:PageTitle

outputlang_Namespace:PageTitle

(*) requires “outputlang:lang” set.

num is the sequential number of the listed page.

An empty format is equal to -notitle and just shows the total amount of pages.

-outputlang

Language for translation of namespaces.

-notitle

Page title is not printed.

-get

Page content is printed.

-tofile

Save Page titles to a single file. File name can be set with -tofile:filename or -tofile:dir_name/filename.

-save

Save Page content to a file named as page.title(as_filename=True). Directory can be set with -save:dir_name. If no dir is specified, current directory will be used.

-encode

File encoding can be specified with ‘-encode:name’ (name must be a valid python encoding: utf-8, etc.). If not specified, it defaults to config.textfile_encoding.

-put:

(str) Save the list to the defined page of the wiki. By default it does not overwrite an existing page.

-overwrite

Overwrite the page if it exists. Can only by applied with -put.

-summary:

(str) The summary text when the page is written. If it’s one word just containing letters, dashes and underscores it uses that as a translation key.

Custom format can be applied to the following items extrapolated from a page object:

site

Obtained from page._link._site.

title

Obtained from page._link._title.

loc_title

Obtained from page._link.canonical_title().

can_title

Obtained from page._link.ns_title(). Based either the canonical namespace name or on the namespace name in the language specified by the -trans param; a default value ****** will be used if no ns is found.

onsite

Obtained from pywikibot.Site(outputlang, self.site.family).

trs_title

Obtained from page._link.ns_title(onsite=onsite). If selected, format requires trs_title, -outputlang must be set.

This script supports use of pagegenerators arguments.

class scripts.listpages.Formatter(page, outputlang=None, default='******')[source]#

Bases: object

Structure with Page attributes exposed for formatting from cmd line.

Parameters:
  • page (Page object.) – the page to be formatted.

  • outputlang (str or None, if no translation is wanted.) –

    language code in which namespace before title should be translated.

    Page ns will be searched in Site(outputlang, page.site.family) and, if found, its custom name will be used in page.title().

  • default (str) – default string to be used if no corresponding namespace is found when outputlang is not None.

fmt_need_lang = ['7']#
fmt_options = {'1': '{num:4d} {page.title}', '2': '{num:4d} [[{page.title}]]', '3': '{page.title}', '4': '[[{page.title}]]', '5': '{num:4d} <<lightred>>{page.loc_title:<40}<<default>>', '6': '{num:4d} {page.loc_title:<40} {page.can_title:<40}', '7': '{num:4d} {page.loc_title:<40} {page.trs_title:<40}'}#
output(num=None, fmt='1')[source]#

Output formatted string.

Parameters:

fmt (str)

Return type:

str

class scripts.listpages.ListPagesBot(site=True, **kwargs)[source]#

Bases: AutomaticTWSummaryBot, SingleSiteBot

Print a list of pages.

Create a SingleSiteBot instance.

Parameters:
  • site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.

  • kwargs (Any)

available_options: dict[str, Any] = {'always': True, 'encode': 'utf-8', 'format': '1', 'get': False, 'notitle': False, 'outputlang': None, 'overwrite': False, 'preloading': None, 'put': None, 'save': None, 'summary': '', 'tofile': None}#

Handler configuration attribute. Only the keys of the dict can be passed as __init__ options. The values are the default values. Overwrite this in subclasses!

setup()[source]#

Initialize output_list and num and adjust base directory.

Return type:

None

summary_key: str | None = 'listpages-save-list'#

Must be defined in subclasses.

teardown()[source]#

Print list, if selected put it to wiki page or save it to a file.

Return type:

None

treat(page)[source]#

Process one page and add it to the output_list.

Return type:

None

scripts.listpages.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

misspelling script#

This script fixes links that contain common spelling mistakes

This is only possible on wikis that have a template for these misspellings.

Command line options:

-always:XY instead of asking the user what to do, always perform the same

action. For example, XY can be “r0”, “u” or “2”. Be careful with this option, and check the changes made by the bot. Note that some choices for XY don’t make sense and will result in a loop, e.g. “l” or “m”.

-main

only check pages in the main namespace, not in the Talk, Project, User, etc. namespaces.

-start:XY goes through all misspellings in the category on your wiki

that is defined (to the bot) as the category containing misspelling pages, starting at XY. If the -start argument is not given, it starts at the beginning.

class scripts.misspelling.MisspellingRobot(*args, **kwargs)[source]#

Bases: DisambiguationRobot

Spelling bot.

findAlternatives(page)[source]#

Append link target to a list of alternative links.

Overrides the BaseDisambigBot method.

Returns:

True if alternate link was appended

Return type:

bool

property generator: Generator[Page]#

Generator to retrieve misspelling pages or misspelling redirects.

misspelling_categories = ('Q8644265', 'Q9195708')#
misspelling_templates = {'wikipedia:de': ('Falschschreibung', 'Obsolete Schreibung')}#
setSummaryMessage(page, *args, **kwargs)[source]#

Setup the summary message.

Overrides the BaseDisambigBot method.

Return type:

None

update_options: dict[str, Any] = {'start': None}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.misspelling.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

movepages script#

This script can move pages

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-from

The page to move from.

-to

The page to move to.

-noredirect

Leave no redirect behind.

-notalkpage

Do not move this page’s talk page (if it exists)

-nosubpages

Do not move subpages

-prefix

Move pages by adding a namespace prefix to the names of the pages. (Will remove the old namespace prefix if any) Argument can also be given as -prefix:namespace:.

-always

Don’t prompt to make changes, just do them.

-skipredirects

Skip redirect pages (Warning: increases server load)

-summary

(str) Prompt for a custom summary, bypassing the predefined message texts. Argument can also be given as -summary:XYZ.

-pairsfile

Read pairs of file names from a file. The file must be in a format [[frompage]] [[topage]] [[frompage]] [[topage]] … Argument can also be given as -pairsfile:filename

class scripts.movepages.MovePagesBot(**kwargs)[source]#

Bases: CurrentPageBot

Page move bot.

Changed in version 7.2: movesubpages option was added

move_one(page, new_page_tite)[source]#

Move one page to new_page_tite.

Return type:

None

skip_page(page)[source]#

Treat only non-redirect pages if ‘skipredirects’ is set.

treat_page()[source]#

Treat a single page.

Return type:

None

update_options: dict[str, Any] = {'movesubpages': True, 'movetalkpage': True, 'noredirect': False, 'prefix': '', 'skipredirects': False, 'summary': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.movepages.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

newitem script#

This script creates new items on Wikidata based on certain criteria

  • When was the (Wikipedia) page created?

  • When was the last edit on the page?

  • Does the page contain interwikis?

This script understands various command-line arguments:

-lastedit

The minimum number of days that has passed since the page was last edited.

-pageage

The minimum number of days that has passed since the page was created.

-touch

Do a null edit on every page which has a Wikibase item. Be careful, this option can trigger edit rates or captchas if your account is not autoconfirmed.

class scripts.newitem.NewItemRobot(**kwargs)[source]#

Bases: WikidataBot

A bot to create new items.

Only accepts options defined in available_options.

get_skipping_templates(site)[source]#

Get templates which leads the page to be skipped.

If the script is used for multiple sites, hold the skipping templates as attribute.

Return type:

set[Page]

setup()[source]#

Setup ages.

Return type:

None

skip_page(page)[source]#

Skip pages which are unwanted to treat.

Return type:

bool

skip_templates(page)[source]#

Check whether the page is to be skipped due to skipping template.

Parameters:

page (Page) – treated page

Returns:

the template which leads to skip

Return type:

str

treat_missing_item = True#
treat_page_and_item(page, item)[source]#

Treat page/item.

Return type:

None

update_options: dict[str, Any] = {'always': True, 'lastedit': 7, 'pageage': 21, 'touch': 'newly'}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirect = False#
scripts.newitem.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

noreferences script#

This script adds a missing references section to pages

It goes over multiple pages, searches for pages where <references /> is missing although a <ref> tag is present, and in that case adds a new references section.

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-xml

Retrieve information from a local XML dump (pages-articles or pages-meta-current, see https://dumps.wikimedia.org). Argument can also be given as “-xml:filename”.

-always

Don’t prompt you for each replacement.

-quiet

Use this option to get less output

If neither a page title nor a page generator is given, it takes all pages from the default maintenance category.

It is strongly recommended not to run this script over the entire article namespace (using the -start) parameter, as that would consume too much bandwidth. Instead, use the -xml parameter, or use another way to generate a list of affected articles

class scripts.noreferences.NoReferencesBot(**kwargs)[source]#

Bases: AutomaticTWSummaryBot, SingleSiteBot, ExistingPageBot

References section bot.

addReferences(oldText)[source]#

Add a references tag into an existing section where it fits into.

If there is no such section, creates a new section containing the references tag. Also repair malformed references tags. Set the edit summary accordingly.

Parameters:

oldText (str) – page text to be modified

Returns:

The modified pagetext

Return type:

str

createReferenceSection(oldText, index, ident='==')[source]#

Create a reference section and insert it into the given text.

Changed in version 9.1: raise exceptions.TranslationError if script is not localized for the current site.

Parameters:
  • oldText (str) – page text that is going to be be amended

  • index (int) – the index of oldText where the reference section should be inserted at

  • ident (str) – symbols to be inserted before and after reference section title

Returns:

the amended page text with reference section added

Raises:

TranslationError – script is not localized for the current site

Return type:

str

lacksReferences(text)[source]#

Check whether or not the page is lacking a references tag.

Return type:

bool

skip_page(page)[source]#

Check whether the page could be processed.

treat_page()[source]#

Run the bot.

Changed in version 9.1: print error message and close bot.BaseBot.generator if exceptions.TranslationError was raised.

Return type:

None

use_disambigs: bool | None = False#

Attribute to determine whether to use disambiguation pages. Set it to True to use disambigs only, set it to False to skip disambigs. If None both are processed.

Added in version 7.2.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.noreferences.PLACE_AFTER_SECTIONS: dict[str, list[str]] = {'simple': ['Notes']}#

References sections can also be placed after a given section. This dictionary defines these sections, sorted by priority. For example, on Simple wiki, the script would place the “References” section after the “Notes” section, if that existed. The PLACE_AFTER_SECTIONS is priorized over the placing of the “placeBeforeSections” sections.

Attention

not implemented yet.

scripts.noreferences.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

scripts.noreferences.maintenance_category: str = 'Q6483427'#

The maintenance category to retrieve pages for processing

scripts.noreferences.noTitleRequired: list[str] = ['be', 'szl']#

Sites where no title is required for references template as it is already included there

scripts.noreferences.placeBeforeSections: dict[str, list[str]] = {'ar': ['وصلات خارجية', 'انظر أيضا', 'ملاحظات'], 'arz': ['لينكات برانيه', 'لينكات', 'شوف كمان'], 'ca': ['Bibliografia', 'Bibliografia complementària', 'Vegeu també', 'Enllaços externs', 'Enllaços'], 'ckb': ['خوێندنەوەی زیاتر', 'بەستەرە دەرەکییەکان', 'ئەمانەش ببینە', 'تێبینییەکان'], 'cs': ['Externí odkazy', 'Poznámky'], 'da': ['Eksterne links'], 'de': ['Literatur', 'Weblinks', 'Siehe auch', 'Weblink'], 'dsb': ['Nožki'], 'en': ['Further reading', 'External links', 'See also', 'Notes'], 'eo': ['Eksteraj ligiloj', 'Ekstera ligilo', 'Eksteraj ligoj', 'Ekstera ligo', 'Rete'], 'es': ['Enlaces externos', 'Véase también', 'Notas'], 'fa': ['پیوند به بیرون', 'پانویس', 'جستارهای وابسته'], 'fi': ['Kirjallisuutta', 'Aiheesta muualla', 'Ulkoiset linkit', 'Linkkejä'], 'fr': ['Liens externes', 'Lien externe', 'Voir aussi', 'Notes'], 'he': ['ראו גם', 'לקריאה נוספת', 'קישורים חיצוניים', 'הערות שוליים'], 'hsb': ['Nóžki'], 'hu': ['Külső hivatkozások', 'Lásd még'], 'it': ['Bibliografia', 'Voci correlate', 'Altri progetti', 'Collegamenti esterni', 'Vedi anche'], 'ja': ['関連項目', '参考文献', '外部リンク'], 'ko': ['외부 링크', '외부링크', '바깥 고리', '바깥고리', '바깥 링크', '바깥링크외부 고리', '외부고리'], 'lt': ['Nuorodos'], 'nl': ['Literatuur', 'Zie ook', 'Externe verwijzingen', 'Externe verwijzing'], 'pdc': ['Beweisunge', 'Quelle unn Literatur', 'Gwelle', 'Gwuelle', 'Auswenniche Gleecher', 'Gewebbgleecher', 'Guckt mol aa', 'Seh aa'], 'pl': ['Źródła', 'Bibliografia', 'Zobacz też', 'Linki zewnętrzne'], 'pt': ['Ligações externas', 'Veja também', 'Ver também', 'Notas'], 'ru': ['Ссылки', 'Литература'], 'sd': ['وڌيڪ ڏسو', 'حوالا', 'خارجي ڳنڌڻا'], 'simple': ['Other websites', 'Sources'], 'sk': ['Pozri aj'], 'sr': ['Даље читање', 'Спољашње везе', 'Види још', 'Напомене', 'Литература'], 'szl': ['Przipisy', 'Připisy'], 'th': ['อ่านเพิ่มเติม', 'แหล่งข้อมูลอื่น', 'ดูเพิ่ม', 'หมายเหตุ'], 'ur': ['مزید دیکھیے', 'حوالہ جات', 'بیرونی روابط'], 'zh': ['外部链接', '外部連结', '外部連結', '外部连接']}#

References sections are usually placed before further reading / external link sections. This dictionary defines these sections, sorted by priority. For example, on an English wiki, the script would place the “References” section in front of the “Further reading” section, if that existed. Otherwise, it would try to put it in front of the “External links” section, or if that fails, the “See also” section, etc.

scripts.noreferences.referencesSections: dict[str, dict[str, list[str]]] = {'wikipedia': {'ar': ['مراجع', 'المراجع', 'مصادر', 'المصادر', 'مراجع ومصادر', 'مصادر ومراجع', 'المراجع والمصادر', 'المصادر والمراجع'], 'ary': ['لمصادر', 'مصادر'], 'arz': ['مراجع', 'المراجع', 'مصادر', 'المصادر'], 'ca': ['Referències'], 'ckb': ['سەرچاوەکان'], 'cs': ['Reference', 'Poznámky'], 'da': ['Noter'], 'de': ['Einzelnachweise', 'Anmerkungen', 'Belege', 'Endnoten', 'Fußnoten', 'Fuß-/Endnoten', 'Quellen', 'Quellenangaben'], 'dsb': ['Nožki'], 'en': ['References', 'Footnotes', 'Notes'], 'eo': ['Referencoj'], 'es': ['Referencias', 'Notas'], 'fa': ['منابع', 'منبع'], 'fi': ['Lähteet', 'Viitteet'], 'fr': ['Notes et références', 'Notes? et r[ée]f[ée]rences?', 'R[ée]f[ée]rences?', 'Notes?', 'Sources?'], 'he': ['הערות שוליים'], 'hsb': ['Nóžki'], 'hu': ['Források és jegyzetek', 'Források', 'Jegyzetek', 'Hivatkozások', 'Megjegyzések'], 'is': ['Heimildir', 'Tilvísanir'], 'it': ['Note', 'Riferimenti'], 'ja': ['脚注', '脚注欄', '脚注・出典', '出典', '注釈', '註'], 'ko': ['주석', '각주주석 참고 자료주석 참고자료', '주석 참고 출처'], 'lt': ['Šaltiniai', 'Literatūra'], 'nl': ['Voetnoten', 'Voetnoot', 'Referenties', 'Noten', 'Bronvermelding'], 'pdc': ['Aamarrickunge'], 'pl': ['Przypisy', 'Uwagi'], 'pt': ['Referências'], 'ru': ['Примечания', 'Сноски', 'Источники'], 'sd': ['حوالا'], 'simple': ['References'], 'sk': ['Referencie'], 'sr': ['Референце', 'Извори'], 'szl': ['Przipisy', 'Připisy'], 'th': ['อ้างอิง', 'เชิงอรรถ', 'หมายเหตุ'], 'ur': ['حوالہ جات', 'حوالہ'], 'zh': ['參考資料', '参考资料', '參考文獻', '参考文献', '資料來源', '资料来源']}, 'wiktionary': {'ar': ['مراجع', 'المراجع', 'مصادر', 'المصادر', 'مراجع ومصادر', 'مصادر ومراجع', 'المراجع والمصادر', 'المصادر والمراجع'], 'ary': ['لمصادر', 'مصادر'], 'arz': ['مراجع', 'المراجع', 'مصادر', 'المصادر'], 'ca': ['Referències'], 'ckb': ['سەرچاوەکان'], 'cs': ['poznámky', 'reference'], 'da': ['Noter'], 'de': ['Einzelnachweise', 'Anmerkungen', 'Belege', 'Endnoten', 'Fußnoten', 'Fuß-/Endnoten', 'Quellen', 'Quellenangaben'], 'dsb': ['Nožki'], 'en': ['References', 'Footnotes', 'Notes'], 'eo': ['Referencoj'], 'es': ['Referencias', 'Notas'], 'fa': ['منابع', 'منبع'], 'fi': ['Lähteet', 'Viitteet'], 'fr': ['Notes et références', 'Notes? et r[ée]f[ée]rences?', 'R[ée]f[ée]rences?', 'Notes?', 'Sources?'], 'he': ['הערות שוליים'], 'hsb': ['Nóžki'], 'hu': ['Források és jegyzetek', 'Források', 'Jegyzetek', 'Hivatkozások', 'Megjegyzések'], 'is': ['Heimildir', 'Tilvísanir'], 'it': ['Note', 'Riferimenti'], 'ja': ['脚注', '脚注欄', '脚注・出典', '出典', '注釈', '註'], 'ko': ['주석', '각주주석 참고 자료주석 참고자료', '주석 참고 출처'], 'lt': ['Šaltiniai', 'Literatūra'], 'nl': ['Voetnoten', 'Voetnoot', 'Referenties', 'Noten', 'Bronvermelding'], 'pdc': ['Aamarrickunge'], 'pl': ['Przypisy', 'Uwagi'], 'pt': ['Referências'], 'ru': ['Примечания', 'Сноски', 'Источники'], 'sd': ['حوالا'], 'simple': ['References'], 'sk': ['Referencie'], 'sr': ['Референце', 'Извори'], 'szl': ['Przipisy', 'Připisy'], 'th': ['อ้างอิง', 'เชิงอรรถ', 'หมายเหตุ'], 'ur': ['حوالہ جات', 'حوالہ'], 'zh': ['參考資料', '参考资料', '參考文獻', '参考文献', '資料來源', '资料来源']}}#

Titles of sections where a reference tag would fit into. The first title should be the preferred one: It’s the one that will be used when a new section has to be created. Section titles can be regex patterns except of the first.

scripts.noreferences.referencesSubstitute: dict[str, dict[str, list[str]]] = {'wikipedia': {'ar': '{{مراجع}}', 'ary': '{{مراجع}}', 'arz': '{{مصادر}}', 'be': '{{зноскі}}', 'ckb': '{{سەرچاوەکان}}', 'da': '{{reflist}}', 'dsb': '{{referency}}', 'fa': '{{پانویس}}', 'fi': '{{viitteet}}', 'fr': '{{références}}', 'he': '{{הערות שוליים}}', 'hsb': '{{referency}}', 'hu': '{{Források}}', 'pl': '{{Przypisy}}', 'ru': '{{примечания}}', 'sd': '{{حوالا}}', 'simple': '{{reflist}}', 'sr': '{{reflist}}', 'szl': '{{Przipisy}}', 'th': '{{รายการอ้างอิง}}', 'ur': '{{حوالہ جات}}', 'zh': '{{reflist}}'}}#

Text to be added instead of the <references /> tag. Define this only if required by your wiki.

scripts.noreferences.referencesTemplates: dict[str, dict[str, list[str]]] = {'wikipedia': {'ar': ['مراجع', 'المراجع', 'ثبت المراجع', 'ثبت المصادر', 'قائمة مصادر', 'Reflist'], 'ary': ['مراجع', 'المراجع', 'المصادر', 'Reflist', 'Refs'], 'arz': ['مصادر', 'مراجع', 'المراجع', 'ثبت المراجع', 'Reflist', 'Refs'], 'be': ['Зноскі', 'Примечания', 'Reflist', 'Спіс заўваг', 'Заўвагі'], 'be-tarask': ['Зноскі'], 'ca': ['Referències', 'Reflist', 'Listaref', 'Referència', 'Referencies', 'Referències2', 'Amaga', 'Amaga ref', 'Amaga Ref', 'Amaga Ref2', 'Apèndix'], 'ckb': ['Reflist', 'Refs', 'Reference', 'ژێدەرەکان', 'سەرچاوەکان', 'پەراوێز', 'پەراوێزەکان', 'پەڕاوێزەکان'], 'da': ['Reflist'], 'dsb': ['Referency'], 'en': ['Reflist', 'Refs', 'FootnotesSmall', 'Reference', 'Ref-list', 'Reference list', 'References-small', 'Reflink', 'Footnotes', 'FootnotesSmall'], 'eo': ['Referencoj'], 'es': ['Listaref', 'Reflist', 'muchasref'], 'fa': ['Reflist', 'Refs', 'FootnotesSmall', 'Reference', 'پانویس', 'پانویس\u200cها ', 'پانویس ۲', 'پانویس۲', 'فهرست منابع'], 'fi': ['Viitteet', 'Reflist'], 'fr': ['Références', 'Notes', 'References', 'Reflist'], 'he': ['הערות שוליים', 'הערה'], 'hsb': ['Referency'], 'hu': ['reflist', 'források', 'references', 'megjegyzések'], 'is': ['reflist'], 'it': ['References'], 'ja': ['Reflist', '脚注リスト'], 'ko': ['주석', 'Reflist'], 'lt': ['Reflist', 'Ref', 'Litref'], 'nl': ['Reflist', 'Refs', 'FootnotesSmall', 'Reference', 'Ref-list', 'Reference list', 'References-small', 'Reflink', 'Referenties', 'Bron', 'Bronnen/noten/referenties', 'Bron2', 'Bron3', 'ref', 'references', 'appendix', 'Noot', 'FootnotesSmall'], 'pl': ['Przypisy', 'Przypisy-lista', 'Uwagi'], 'pt': ['Notas', 'ref-section', 'Referências', 'Reflist'], 'ru': ['Reflist', 'Примечания', 'Список примечаний', 'Сноски'], 'sd': ['Reflist', 'Refs', 'Reference', 'حوالا'], 'simple': ['Reflist'], 'sr': ['Reflist', 'Референце', 'Извори', 'Рефлист'], 'szl': ['Przipisy', 'Připisy'], 'th': ['รายการอ้างอิง'], 'ur': ['Reflist', 'Refs', 'Reference', 'حوالہ جات', 'حوالے'], 'zh': ['Reflist', 'RefFoot', 'NoteFoot']}}#

Templates which include a <references /> tag. If there is no such template on your wiki, you don’t have to enter anything here.

nowcommons script#

Script to delete files that are also present on Wikimedia Commons

Do not run this script on Wikimedia Commons itself. It works based on a given array of templates defined below.

Files are downloaded and compared. If the files match, it can be deleted on the source wiki. If multiple versions of the file exist, the script will not delete. If the SHA1 comparison is not equal, the script will not delete.

A sysop rights on the local wiki is required if you want all features of this script to work properly.

This script understands various command-line arguments:

-always

run automatically, do not ask any questions. All files that qualify for deletion are deleted. Reduced screen output.

-replace

replace links if the files are equal and the file names differ

-replacealways

replace links if the files are equal and the file names differ without asking for confirmation

-replaceloose

Do loose replacements. This will replace all occurrences of the name of the file (and not just explicit file syntax). This should work to catch all instances of the file, including where it is used as a template parameter or in galleries. However, it can also make more mistakes.

-replaceonly

Use this if you do not have a local sysop rights, but do wish to replace links from the NowCommons template.

Example

python pwb.py nowcommons -replaceonly -replaceloose -replacealways -replace

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

class scripts.nowcommons.NowCommonsDeleteBot(**kwargs)[source]#

Bases: CurrentPageBot, ConfigParserBot

Bot to delete migrated files.

Changed in version 7.0: NowCommonsDeleteBot is a ConfigParserBot

find_file_on_commons(local_file_page)[source]#

Find filename on Commons.

property generator#

Generator method.

init_page(item)[source]#

Ensure that generator retrieves FilePage objects.

Parameters:

item (Page)

Return type:

FilePage

property nc_templates#

A set of now commons template Page instances.

nc_templates_list()[source]#

Return nowcommons templates.

skip_page(page)[source]#

Skip shared files.

Return type:

bool

teardown()[source]#

Show a message if no files were found.

treat_page()[source]#

Treat a single page.

Return type:

None

update_options: dict[str, Any] = {'replace': False, 'replacealways': False, 'replaceloose': False, 'replaceonly': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.nowcommons.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

pagefromfile script#

Bot to upload pages from a text file

This bot takes its input from the UTF-8 text file that contains a number of pages to be put on the wiki. The pages should all have the same beginning and ending text (which may not overlap). The beginning and ending text is not uploaded with the page content by default.

As a pagename is by default taken the first text block from the page content marked in bold (wrapped between ‘’’ and ‘’’). If you expect the page title not to be present in the text or marked by different markers, use -titlestart, -titleend, and -notitle parameters.

Specific arguments:

-file:xxx

The filename we are getting our material from, the default value is “dict.txt”

-begin:xxx

The text that marks the beginning of a page, the default value is “{{-start-}}”

-end:xxx

The text that marks the end of the page, the default value is “{{-stop-}}”

-include

Include the beginning and end markers to the page

-textonly

Text is given without markers. Only one page text is given. -begin and -end options are ignored.

-titlestart:xxx

The text used in place of ‘’’ for identifying the beginning of a page title

-titleend:xxx

The text used in place of ‘’’ for identifying the end of the page title

-notitle

Do not include the page title, including titlestart and titleend, to the page. Can be used to specify unique page title above the page content

-title:xxx

The page title is given directly. Ignores -titlestart, -titleend and -notitle options

-nocontent:xxx

If the existing page contains specified statement, the page is skipped from editing

-noredirect

Do not upload on redirect pages

-summary:xxx

The text used as an edit summary for the upload. If the page exists, standard messages for prepending, appending, or replacement are appended after it.

-autosummary

Use MediaWiki’s autosummary when creating a new page, overrides -summary-

-minor

Set the minor edit flag on page edits

-showdiff

Show difference between current page and page to upload, also forces the bot to ask for confirmation on every edit.

If the page to be uploaded already exists, it is skipped by default. But you can override this behavior if you want to:

-appendtop

Add the text to the top of the existing page

-appendbottom

Add the text to the bottom of the existing page

-force

Overwrite the existing page

It is possible to define a separator after the ‘append’ modes which is added between the existing and the new text. For example a parameter -appendtop:foo would add ‘foo’ between them. A new line can be added between them by specifying ‘n’ as a value.

exception scripts.pagefromfile.NoTitleError(offset)[source]#

Bases: Exception

No title found.

Return type:

None

class scripts.pagefromfile.PageFromFileReader(filename, site=None, **kwargs)[source]#

Bases: OptionHandler, GeneratorWrapper

Generator class, responsible for reading the file.

Changed in version 7.6: subclassed from pywikibot.tools.collections.GeneratorWrapper

available_options: dict[str, Any] = {'begin': '{{-start-}}', 'end': '{{-stop-}}', 'include': False, 'notitle': False, 'textonly': False, 'title': None, 'titleend': "'''", 'titlestart': "'''"}#

Handler configuration attribute. Only the keys of the dict can be passed as __init__ options. The values are the default values. Overwrite this in subclasses!

find_page(text)[source]#

Find page to work on.

Return type:

tuple[int, str, str]

property generator: Generator[Page, None, None]#

Read file and yield a page with content from file.

content is stored as a page attribute defined by CTX_ATTR.

Changed in version 7.6: changed from iterator method to generator property

class scripts.pagefromfile.PageFromFileRobot(site=True, **kwargs)[source]#

Bases: SingleSiteBot, CurrentPageBot

Responsible for writing pages to the wiki.

Titles and contents are given by a PageFromFileReader.

Create a SingleSiteBot instance.

Parameters:
  • site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.

  • kwargs (Any)

treat_page()[source]#

Upload page content.

Return type:

None

update_options: dict[str, Any] = {'append': None, 'autosummary': False, 'force': False, 'minor': False, 'nocontent': '', 'redirect': True, 'showdiff': False, 'summary': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.pagefromfile.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

parser_function_count script#

Used to find expensive templates that are subject to be converted to Lua

It counts parser functions and then orders templates by number of these and uploads the first n titles or alternatively templates having count()>n.

Parameters:

-start

Will start from the given title (it does not have to exist). Parameter may be given as “-start” or “-start:title”. Defaults to ‘!’.

-first

Returns the first n results in decreasing order of number of hits (or without ordering if used with -nosort) Parameter may be given as “-first” or “-first:n”.

-atleast

Returns templates with at least n hits. Parameter may be given as “-atleast” or “-atleast:n”.

-nosort

Keeps the original order of templates. Default behaviour is to sort them by decreasing order of count(parserfunctions).

-save

Saves the results. The file is in the form you may upload it to a wikipage. May be given as “-save:<filename>”. If it exists, titles will be appended.

-upload

Specify a page in your wiki where results will be uploaded. Parameter may be given as “-upload” or “-upload:title”. Say good-bye to previous content if existed.

Precedence of evaluation: results are first sorted in decreasing order of templates, unless nosort is switched on. Then first n templates are taken if first is specified, and at last atleast is evaluated. If nosort and first are used together, the program will stop at the nth hit without scanning the rest of the template namespace. This may be used to run it in more sessions (continue with -start next time).

First is strict. That means if results #90-120 have the same number of parser functions and you specify -first:100, only the first 100 will be listed (even if atleast is used as well).

Should you specify neither first nor atleast, all templates using parser functions will be listed.

class scripts.parser_function_count.ParserFunctionCountBot(**kwargs)[source]#

Bases: SingleSiteBot, ExistingPageBot

Bot class used for obtaining Parser function Count.

property generator#

Generator.

setup()[source]#

Setup magic words, regex and result counter.

Return type:

None

teardown()[source]#

Final processing.

Return type:

None

treat(page)[source]#

Process a single template.

Return type:

None

update_options: dict[str, Any] = {'atleast': None, 'first': None, 'nosort': False, 'save': None, 'start': '!', 'upload': None}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.parser_function_count.main(*args)[source]#

Process command line arguments and invoke ParserFunctionCountBot.

Parameters:

args (str)

Return type:

None

patrol script#

The bot is meant to mark the edits based on info obtained by whitelist

This bot obtains a list of recent changes and newpages and marks the edits as patrolled based on a whitelist.

Whitelist Format#

The whitelist is formatted as a number of list entries. Any links outside of lists are ignored and can be used for documentation. In a list the first link must be to the username which should be white listed and any other link following is adding that page to the white list of that username. If the user edited a page on their white list it gets patrolled. It will also patrol pages which start with the mentioned link (e.g. [[foo]] will also patrol [[foobar]]).

To avoid redlinks it’s possible to use Special:PrefixIndex as a prefix so that it will list all pages which will be patrolled. The page after the slash will be used then.

On Wikisource, it’ll also check if the page is on the author namespace in which case it’ll also patrol pages which are linked from that page.

An example can be found at https://en.wikisource.org/wiki/User:Wikisource-bot/patrol_whitelist

Commandline parameters:

-namespace

Filter the page generator to only yield pages in specified namespaces

-ask

If True, confirm each patrol action

-whitelist

page title for whitelist (optional)

-autopatroluserns

Takes user consent to automatically patrol

-versionchecktime

Check versionchecktime lapse in sec

-repeat

Repeat run after 60 seconds

-newpages

Run on unpatrolled new pages (default for Wikipedia Projects)

-recentchanges

Run on complete unpatrolled recentchanges (default for any project except Wikipedia Projects)

-usercontribs

Filter generators above to the given user

class scripts.patrol.LinkedPagesRule(page_title)[source]#

Bases: object

Matches of page site title and linked pages title.

Parameters:

page_title (str) – The page title for this rule

match(page_title)[source]#

Match page_title to linkedpages elements.

Return type:

bool

class scripts.patrol.PatrolBot(site=None, **kwargs)[source]#

Bases: BaseBot

Bot marks the edits as patrolled based on info obtained by whitelist.

Keyword Arguments:
  • ask – If True, confirm each patrol action

  • whitelist – page title for whitelist (optional)

  • autopatroluserns – Takes user consent to automatically patrol

  • versionchecktime – Check versionchecktime lapse in sec

static in_list(pagelist, title)[source]#

Check if title present in pagelist.

Parameters:
  • pagelist (Container)

  • title (str)

Return type:

bool

is_wikisource_author_page(title)[source]#

Patrol a single item.

Return type:

bool

parse_page_tuples(wikitext, user=None)[source]#

Parse page details apart from ‘user:’ for use.

setup()[source]#

Load most recent watchlist_page for further processing.

treat(page)[source]#

It loads the given page, does some changes, and saves it.

update_options: dict[str, Any] = {'ask': False, 'autopatroluserns': False, 'versionchecktime': 300, 'whitelist': None}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

whitelist_subpage_name = {'en': 'patrol_whitelist'}#
scripts.patrol.api_feed_repeater(gen, *, delay=60, repeat=False, namespaces=None, user=None, recent_new_gen=True)[source]#

Generator which loads pages details to be processed.

Parameters:
  • delay (float)

  • repeat (bool)

  • recent_new_gen (bool)

scripts.patrol.main(*args)[source]#

Process command line arguments and invoke PatrolBot.

Parameters:

args (str)

Return type:

None

scripts.patrol.verbose_output(string)[source]#

Verbose output.

Return type:

None

protect script#

This script can be used to protect and unprotect pages en masse

Of course, you will need an admin account on the relevant wiki. These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-unprotect

Acts like “default:all”

-default:

Sets the default protection level (default ‘sysop’). If no level is defined it doesn’t change unspecified levels.

-[type]:[level] Set [type] protection level to [level]

Usual values for [level] are: sysop, autoconfirmed, all; further levels may be provided by some wikis.

For all protection types (edit, move, etc.) it chooses the default protection level. This is “sysop” or “all” if -unprotect was selected. If multiple parameters -unprotect or -default are used, only the last occurrence is applied.

This script is a ConfigParserBot. The following options can be set within a settings file which is scripts.ini by default:

-always           Don't prompt to protect pages, just do it.
-summary:

Supply a custom edit summary. Tries to generate summary from the page selector. If no summary is supplied or couldn’t determine one from the selector it’ll ask for one.

-expiry:

Supply a custom protection expiry, which defaults to indefinite. Any string understandable by MediaWiki, including relative and absolute, is acceptable. See: API:Protect#Parameters

Usage:

python pwb.py protect <OPTIONS>

Examples

Protect everything in the category ‘To protect’ prompting:

python pwb.py protect -cat:”To protect”

Unprotect all pages listed in text file ‘unprotect.txt’ without prompting:

python pwb.py protect -file:unprotect.txt -unprotect -always

class scripts.protect.ProtectionRobot(protections, **kwargs)[source]#

Bases: SingleSiteBot, ConfigParserBot, CurrentPageBot

This bot allows protection of pages en masse.

Changed in version 7.0: CheckerBot is a ConfigParserBot

Create a new ProtectionRobot.

Parameters:
  • protections (dict) – protections as a dict with “type”: “level”

  • kwargs – additional arguments directly feed to super().__init__()

treat_page()[source]#

Run the bot’s action on each page.

treat_page treats every page given by the generator and applies the protections using this method.

Return type:

None

update_options: dict[str, Any] = {'expiry': '', 'summary': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.protect.check_protection_level(operation, level, levels, default=None)[source]#

Check if the protection level is valid or ask if necessary.

Returns:

a valid protection level

Return type:

str

scripts.protect.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

redirect script#

Script to resolve double redirects, and to delete broken redirects

Requires access to MediaWiki’s maintenance pages or to a XML dump file. Delete function requires adminship.

Syntax:

python pwb.py redirect action [-arguments …]

where action can be one of these

double:

Shortcut: do. Fix redirects which point to other redirects.

broken:

Shortcut: br. Tries to fix redirect which point to nowhere by using the last moved target of the destination page. If this fails and the -delete option is set, it either deletes the page or marks it for deletion depending on whether the account has admin rights. It will mark the redirect not for deletion if there is no speedy deletion template available.

both:

Both of the above. Retrieves redirect pages from live wiki, not from a special page.

and arguments can be:

-xml

Retrieve information from a local XML dump (https://dumps.wikimedia.org). Argument can also be given as “-xml:filename.xml”. Cannot be used with -fullscan or -moves.

-fullscan

Retrieve redirect pages from live wiki, not from a special page Cannot be used with -xml or ‘both’ action.

-moves

Use the page move log to find double-redirect candidates. Only works with action “double”, does not work with -xml.

NOTE: You may use only one of these options above. If neither of -xml -fullscan -moves is given, info will be loaded from a special page of the live wiki.

-offset:n

With -moves, the number of hours ago to start scanning moved pages. With -xml, the number of the redirect to restart with (see progress). Otherwise, ignored.

-start:title

The starting page title in each namespace. Page need not exist.

-until:title

The possible last page title in each namespace. Page needs not exist.

-limit:n

The maximum count of redirects to work upon. If omitted, there is no limit.

-delete

Prompt the user whether broken redirects should be deleted (or marked for deletion if the account has no admin rights) instead of just skipping them.

-sdtemplate:x

Add the speedy deletion template string including brackets. This enables overriding the default template via i18n or to enable speedy deletion for projects other than Wikipedias.

-always

Don’t prompt you for each replacement.

Furthermore the following options are provided:

This script supports use of pagegenerators arguments.

class scripts.redirect.RedirectGenerator(action, **kwargs)[source]#

Bases: OptionHandler

Redirect generator.

available_options: dict[str, Any] = {'fullscan': False, 'limit': None, 'moves': False, 'namespaces': {0}, 'offset': -1, 'start': None, 'until': None, 'xml': None}#

Handler configuration attribute. Only the keys of the dict can be passed as __init__ options. The values are the default values. Overwrite this in subclasses!

get_moved_pages_redirects()[source]#

Generate redirects to recently-moved pages.

Return type:

Generator[Page]

get_redirect_pages_via_api()[source]#

Yield Pages that are redirects.

Return type:

Generator[Page]

get_redirects_from_dump(alsoGetPageTitles=False)[source]#

Extract redirects from dump.

Load a local XML dump file, look at all pages which have the redirect flag set, and find out where they’re pointing at. Return a dictionary where the redirect names are the keys and the redirect targets are the values.

Parameters:

alsoGetPageTitles (bool)

Return type:

tuple[dict[str, str], set[str]]

get_redirects_via_api(maxlen=8)[source]#

Return a generator that yields tuples of data about redirect Pages.

Changed in version 7.0: only yield tuple if type of redirect is not 1 (normal redirect)

The description of returned tuple items is as follows:

[0]:

page title of a redirect page

[1]:

type of redirect:

None:

start of a redirect chain of unknown length, or loop

[0]:

broken redirect, target page title missing

[1]:

normal redirect, target page exists and is not a redirect

[2:maxlen]:

start of a redirect chain of that many redirects (currently, the API seems not to return sufficient data to make these return values possible, but that may change)

[maxlen+1]:

start of an even longer chain, or a loop (currently, the API seems not to return sufficient data to allow this return values, but that may change)

[2]:

target page title of the redirect, or chain (may not exist)

[3]:

target page of the redirect, or end of chain, or page title where chain or loop detecton was halted, or None if unknown

Parameters:

maxlen (int)

Return type:

Generator[tuple[str, int | None, str, str | None]]

retrieve_broken_redirects()[source]#

Retrieve broken redirects.

Return type:

Generator[str | Page]

retrieve_double_redirects()[source]#

Retrieve double redirects.

Return type:

Generator[str | Page]

class scripts.redirect.RedirectRobot(action, **kwargs)[source]#

Bases: ExistingPageBot

Redirect bot.

delete_1_broken_redirect()[source]#

Treat one broken redirect.

Return type:

None

delete_redirect(page, summary_key)[source]#

Delete the redirect page.

Parameters:
  • page (pywikibot.page.BasePage) – The page to delete

  • summary_key (str) – The message key for the deletion summary

Return type:

None

fix_1_double_redirect()[source]#

Treat one double redirect.

Return type:

None

fix_double_or_delete_broken_redirect()[source]#

Treat one broken or double redirect.

Return type:

None

static get_redirect_target(page)[source]#

Get redirect target page and handle some exceptions.

get_sd_template(site=None)[source]#

Look for speedy deletion template and return it.

Parameters:

site (BaseSite | None) – site for which the template has to be given

Returns:

A valid speedy deletion template.

Return type:

str | None

init_page(item)[source]#

Ensure that we process page objects.

Return type:

Page

property sdtemplate#

Gives the speedy deletion template for the current_page.

treat(page)[source]#

Treat a page.

Parameters:

page (pywikibot.page.BasePage) – Page to be treated.

Return type:

None

update_options: dict[str, Any] = {'delete': False, 'limit': inf, 'sdtemplate': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

use_redirects: bool | None = True#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.redirect.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

scripts.redirect.space_to_underscore(link)[source]#

Convert spaces to underscore.

Return type:

str

replace script#

This bot will make direct text replacements

It will retrieve information on which pages might need changes either from an XML dump or a text file, or only change a single page.

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-mysqlquery

Retrieve information from a local database mirror. If no query specified, bot searches for pages with given replacements.

-xml

Retrieve information from a local XML dump (pages-articles or pages-meta-current, see https://dumps.wikimedia.org). Argument can also be given as “-xml:filename”.

-regex

Make replacements using regular expressions. If this argument isn’t given, the bot will make simple text replacements.

-nocase

Use case insensitive regular expressions.

-dotall

Make the dot match any character at all, including a newline. Without this flag, ‘.’ will match anything except a newline.

-multiline

‘^’ and ‘$’ will now match begin and end of each line.

-xmlstart

(Only works with -xml) Skip all articles in the XML dump before the one specified (may also be given as -xmlstart:Article).

-addcat:cat_name

Adds “cat_name” category to every altered page.

-excepttitle:XYZ

Skip pages with titles that contain XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.

-requiretitle:XYZ Only do pages with titles that contain XYZ. If the -regex

argument is given, XYZ will be regarded as a regular expression.

-excepttext:XYZ

Skip pages which contain the text XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.

-exceptinside:XYZ Skip occurrences of the to-be-replaced text which lie

within XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.

-exceptinsidetag:XYZ Skip occurrences of the to-be-replaced text which lie

within an XYZ tag.

-summary:XYZ

Set the summary message text for the edit to XYZ, bypassing the predefined message texts with original and replacements inserted. To add the replacements to your summary use the %(description)s placeholder, for example: -summary:”Bot operated replacement: %(description)s” Can’t be used with -automaticsummary.

-automaticsummary Uses an automatic summary for all replacements which don’t

have a summary defined. Can’t be used with -summary.

-sleep:123

If you use -fix you can check multiple regex at the same time in every page. This can lead to a great waste of CPU because the bot will check every regex without waiting using all the resources. This will slow it down between a regex and another in order not to waste too much CPU.

-fix:XYZ

Perform one of the predefined replacements tasks, which are given in the dictionary ‘fixes’ defined inside the files fixes.py and user-fixes.py.

The available fixes are listed in pywikibot.fixes.

-manualinput

Request manual replacements via the command line input even if replacements are already defined. If this option is set (or no replacements are defined via -fix or the arguments) it’ll ask for additional replacements at start.

-pairsfile

Lines from the given file name(s) will be read as replacement arguments. i.e. a file containing lines “a” and “b”, used as:

python pwb.py replace -page:X -pairsfile:file c d

will replace ‘a’ with ‘b’ and ‘c’ with ‘d’.

-always

Don’t prompt you for each replacement

-quiet

Don’t prompt a message if a page keeps unchanged

-nopreload

Do not preload pages. Useful if disabled on a wiki.

-recursive

Recurse replacement as long as possible. Be careful, this might lead to an infinite loop.

-allowoverlap

When occurrences of the pattern overlap, replace all of them. Be careful, this might lead to an infinite loop.

-fullsummary

Use one large summary for all command line replacements.

Replacement parameters

Replacement parameters are pairs of arguments given to the script. The First argument is the old text to be replaced, the second argument is the new text. If the -regex argument is given, the first argument will be regarded as a regular expression, and the second argument might contain expressions like \1 or \g<name>. The second parameter can also be specified as empty string, usually "". It is possible to introduce more than one pair of replacement parameters.

Empty string arguments with PowerShell

Using PowerShell as command shell removes empty strings during PowerShell’s command line parsing. To enable empty strings with PowerShell you have either to escape quotation marks with gravis symbols in front of them like `"`" or to disable command line parsing with --% symbol for all following command parts like python pwb replace --% -start:! foo "" which disables parsing for all replace options and arguments following this delimiter and enables empty strings.

Examples

If you want to change templates from the old syntax, e.g. {{msg:Stub}}, to the new syntax, e.g. {{Stub}}, download an XML dump file (pages-articles) from https://dumps.wikimedia.org, then use this command:

python pwb.py replace -xml -regex “{{msg:(.*?)}}” “{{1}}”

If you have a dump called foobar.xml and want to fix typos in articles, e.g. Errror -> Error, use this:

python pwb.py replace -xml:foobar.xml “Errror” “Error” -namespace:0

If you want to do more than one replacement at a time, use this:

python pwb.py replace -xml:foobar.xml “Errror” “Error” “Faail” “Fail” -namespace:0

If you have a page called ‘John Doe’ and want to fix the format of ISBNs, use:

python pwb.py replace -page:John_Doe -fix:isbn

This command will change ‘referer’ to ‘referrer’, but not in pages which talk about HTTP, where the typo has become part of the standard:

python pwb.py replace referer referrer -file:typos.txt -excepttext:HTTP

See also

scripts.template to modify or remove templates.

scripts.replace.EXC_KEYS = {'-exceptinside': 'inside', '-exceptinsidetag': 'inside-tags', '-excepttext': 'text-contains', '-excepttitle': 'title', '-requiretitle:': 'require-title'}#

Dictionary to convert exceptions command line options to exceptions keys.

Added in version 7.0.

class scripts.replace.ReplaceRobot(generator, replacements, exceptions=None, **kwargs)[source]#

Bases: SingleSiteBot, ExistingPageBot

A bot that can do text replacements.

Parameters:
  • generator (generator) – generator that yields Page objects

  • replacements (list[tuple[Any, str]]) – a list of Replacement instances or sequences of length 2 with the original text (as a compiled regular expression) and replacement text (as a string).

  • exceptions (dict[str, Any] | None) –

    a dictionary which defines when not to change an occurrence. This dictionary can have these keys:

    title

    A list of regular expressions. All pages with titles that are matched by one of these regular expressions are skipped.

    text-contains

    A list of regular expressions. All pages with text that contains a part which is matched by one of these regular expressions are skipped.

    inside

    A list of regular expressions. All occurrences are skipped which lie within a text region which is matched by one of these regular expressions.

    inside-tags

    A list of strings. These strings must be keys from the dictionary in textlib._create_default_regexes() or must be accepted by textlib.get_regexes().

Keyword Arguments:
  • allowoverlap – when matches overlap, all of them are replaced.

  • recursive – Recurse replacement as long as possible.

  • addcat – category to be added to every page touched

  • sleep – slow down between processing multiple regexes

  • summary – Set the summary message text bypassing the default

  • always – the user won’t be prompted before changes are made

  • site – Site the bot is working on.

Warning

  • Be careful with recursive parameter, this might lead to an infinite loop.

  • site parameter should be passed to constructor. Otherwise the bot takes the current site and warns the operator about the missing site

apply_replacements(original_text, applied, page=None)[source]#

Apply all replacements to the given text.

Return type:

str, set

generate_summary(applied_replacements)[source]#

Generate a summary message for the replacements.

isTextExcepted(text, exceptions=None)[source]#

Return True iff one of the exceptions applies for the given text.

Return type:

bool

isTitleExcepted(title, exceptions=None)[source]#

Return True if one of the exceptions applies for the given title.

Return type:

bool

save(page, oldtext, newtext, applied, **kwargs)[source]#

Save the given page.

Return type:

None

skip_page(page)[source]#

Check whether treat should be skipped for the page.

treat(page)[source]#

Work on each page retrieved from generator.

Return type:

None

user_confirm(question)[source]#

Always return True due to our own input choice.

Return type:

bool

class scripts.replace.Replacement(old, new, use_regex=None, exceptions=None, case_insensitive=None, edit_summary=None, default_summary=True)[source]#

Bases: ReplacementBase

A single replacement with it’s own data.

Create a single replacement entry unrelated to a fix.

property case_insensitive#

Return whether the search text is case insensitive.

classmethod from_compiled(old_regex, new, **kwargs)[source]#

Create instance from already compiled regex.

get_inside_exceptions()[source]#

Get exceptions on text (inside exceptions).

property use_regex#

Return whether the search text is using regex.

class scripts.replace.ReplacementBase(old, new, edit_summary=None, default_summary=True)[source]#

Bases: object

The replacement instructions.

Create a basic replacement instance.

compile(use_regex, flags)[source]#

Compile the search text.

Return type:

None

property container#

Container object which contains this replacement.

A container object is an object that groups one or more replacements together and provides some properties that are common to all of them. For example, containers may define a common name for a group of replacements, or a common edit summary.

Container objects must have a “name” attribute.

property description: str#

Description of the changes that this replacement applies.

This description is used as the default summary of the replacement. If you do not specify an edit summary on the command line or in some other way, whenever you apply this replacement to a page and submit the changes to the MediaWiki server, the edit summary includes the descriptions of each replacement that you applied to the page.

property edit_summary: str#

Return the edit summary for this fix.

class scripts.replace.ReplacementList(use_regex, exceptions, case_insensitive, edit_summary, name)[source]#

Bases: list

A list of replacements which all share some properties.

The shared properties are: * use_regex * exceptions * case_insensitive

Each entry in this list should be a ReplacementListEntry. The exceptions are compiled only once.

Create a fix list which can contain multiple replacements.

class scripts.replace.ReplacementListEntry(old, new, fix_set, edit_summary=None, default_summary=True)[source]#

Bases: ReplacementBase

A replacement entry for ReplacementList.

Create a replacement entry inside a fix set.

property case_insensitive#

Return whether the fix set is case insensitive.

property container#

Container object which contains this replacement.

A container object is an object that groups one or more replacements together and provides some properties that are common to all of them. For example, containers may define a common name for a group of replacements, or a common edit summary.

Container objects must have a “name” attribute.

property edit_summary#

Return this entry’s edit summary or the fix’s summary.

property exceptions#

Return the exceptions of the fix set.

get_inside_exceptions()[source]#

Get exceptions on text (inside exceptions).

property use_regex#

Return whether the fix set is using regex.

class scripts.replace.XmlDumpReplacePageGenerator(xmlFilename, xmlStart, replacements, exceptions, site)[source]#

Bases: object

Iterator that will yield Pages that might contain text to replace.

These pages will be retrieved from a local XML dump file.

Parameters:
  • xmlFilename (str) – The dump’s path, either absolute or relative

  • xmlStart (str) – Skip all articles in the dump before this one

  • replacements (list[tuple[Any, str]]) – A list of 2-tuples of original text (as a compiled regular expression) and replacement text (as a string).

  • exceptions (dict) – A dictionary which defines when to ignore an occurrence. See docu of the ReplaceRobot initializer below.

isTextExcepted(text)[source]#

Return True if one of the exceptions applies for the given text.

Return type:

bool

isTitleExcepted(title)[source]#

Return True if one of the exceptions applies for the given title.

Return type:

bool

scripts.replace.handle_exceptions(*args)[source]#

Handle exceptions args to ignore pages which contain certain texts.

Added in version 7.0.

Parameters:

args (str)

Return type:

tuple[list[str], dict[str, str]]

scripts.replace.handle_manual()[source]#

Handle manual input.

Added in version 7.0.

Return type:

list[str]

scripts.replace.handle_pairsfile(filename)[source]#

Handle -pairsfile argument.

Added in version 7.0.

Changed in version 9.2: replacement patterns are printed it they are incomplete.

Parameters:

filename (str)

Return type:

list[str] | None

scripts.replace.handle_sql(sql, replacements, exceptions)[source]#

Handle default sql query.

Added in version 7.0.

Parameters:
  • sql (str)

  • replacements (list[Pattern])

  • exceptions (list[Pattern])

Return type:

Generator

scripts.replace.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Changed in version 9.2: replacement patterns are printed it they are incomplete.

Parameters:

args (str) – command line arguments

Return type:

None

scripts.replace.precompile_exceptions(exceptions, use_regex, flags)[source]#

Compile the exceptions with the given flags.

Return type:

None

scripts.replace.prepareRegexForMySQL(pattern)[source]#

Convert regex to MySQL syntax.

Parameters:

pattern (str)

Return type:

str

replicate_wiki script#

This bot replicates pages in a wiki to a second wiki within one family

Example:

python pwb.py replicate_wiki [-r] -ns 10 -family:wikipedia -o nl li fy

or:

python pwb.py replicate_wiki [-r] -ns 10 -family:wikipedia -lang:nl li fy

to copy all templates from nlwiki to liwiki and fywiki. It will show which pages have to be changed if -r is not present, and will only actually write pages if -r is present.

You can add replicate_replace to your user config file (user-config.py), which has the following format:

replicate_replace = {
    'wikipedia:li': {'Hoofdpagina': 'Veurblaad'}
}

to replace all occurrences of ‘Hoofdpagina’ with ‘Veurblaad’ when writing to liwiki. Note that this does not take the origin wiki into account.

The following parameters are supported:

-r, --replace

actually replace pages (without this option you will only get an overview page)

-o, --original

original wiki (you may use -lang:<code> option instead)

-ns, --namespace

specify namespace

-dns, --dest-namespace

destination namespace (if different)

destination_wiki destination wiki(s)

class scripts.replicate_wiki.SyncSites(options)[source]#

Bases: object

Work is done in here.

check_namespace(namespace)[source]#

Check an entire namespace.

Return type:

None

check_namespaces()[source]#

Check all namespaces, to be ditched for clarity.

Return type:

None

check_page(pagename)[source]#

Check one page.

Return type:

None

check_sysops()[source]#

Check if sysops are the same on all wikis.

Return type:

None

generate_overviews()[source]#

Create page on wikis with overview of bot results.

Return type:

None

put_message(site)[source]#

Return synchronization message.

Return type:

str

scripts.replicate_wiki.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

scripts.replicate_wiki.multiple_replace(text, word_dict)[source]#

Replace all occurrences in text of key value pairs in word_dict.

revertbot script#

This script can be used for reverting certain edits

The following command line parameters are supported:

-username

Edits of which user need to be reverted. Default is bot’s username (site.username()).

-rollback

Rollback edits instead of reverting them.

Note

No diff would be shown in this mode.

-limit:num

(int) Use the last num contributions to be checked for revert. Default is 500.

Users who want to customize the behaviour should subclass the BaseRevertBot and override its callback method. Here is a sample:

class myRevertBot(BaseRevertBot):

    '''Example revert bot.'''

    def callback(self, item) -> bool:
        '''Sample callback function for 'private' revert bot.

        :param item: an item from user contributions
        :type item: dict
        '''
        if 'top' in item:
            page = pywikibot.Page(self.site, item['title'])
            text = page.get(get_redirect=True)
            pattern = re.compile(r'\[\[.+?:.+?\..+?\]\]')
            return bool(pattern.search(text))
        return False
class scripts.revertbot.BaseRevertBot(site=None, **kwargs)[source]#

Bases: OptionHandler

Base revert bot.

Subclass this bot and override callback to get it to do something useful.

available_options: dict[str, Any] = {'comment': '', 'limit': 500, 'rollback': False}#

Handler configuration attribute. Only the keys of the dict can be passed as __init__ options. The values are the default values. Overwrite this in subclasses!

static callback(item)[source]#

Callback function.

Parameters:

item (Container)

Return type:

bool

get_contributions(total=500, ns=None)[source]#

Get contributions.

Parameters:

total (int)

local_timestamp(ts)[source]#

Convert Timestamp to a localized timestamp string.

Added in version 7.0.

Return type:

str

revert(item)[source]#

Revert a single item.

Return type:

str | bool

revert_contribs(callback=None)[source]#

Revert contributions.

Return type:

None

scripts.revertbot.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

scripts.revertbot.myRevertBot#

alias of BaseRevertBot

solve_disambiguation script#

Script to help a human solve disambiguations by presenting a set of options

Specify the disambiguation page on the command line.

The program will pick up the page, and look for all alternative links, and show them with a number adjacent to them. It will then automatically loop over all pages referring to the disambiguation page, and show 30 characters of context on each side of the reference to help you make the decision between the alternatives. It will ask you to type the number of the appropriate replacement, and perform the change.

It is possible to choose to replace only the link (just type the number) or replace both link and link-text (type ‘r’ followed by the number).

Multiple references in one page will be scanned in order, but typing ‘n’ (next) on any one of them will leave the complete page unchanged. To leave only some reference unchanged, use the ‘s’ (skip) option.

Command line options:

-pos:XXXX adds XXXX as an alternative disambiguation

-just

only use the alternatives given on the command line, do not read the page for other possibilities

-dnskip

Skip links already marked with a disambiguation-needed template (e.g., {{dn}})

-primary

“primary topic” disambiguation (Begriffsklärung nach Modell 2). That’s titles where one topic is much more important, the disambiguation page is saved somewhere else, and the important topic gets the nice name.

-primary:XY like the above, but use XY as the only alternative, instead of

searching for alternatives in [[Keyword (disambiguation)]]. Note: this is the same as -primary -just -pos:XY

-file:XYZ reads a list of pages from a text file. XYZ is the name of the

file from which the list is taken. If XYZ is not given, the user is asked for a filename. Page titles should be inside [[double brackets]]. The -pos parameter won’t work if -file is used.

-always:XY instead of asking the user what to do, always perform the same

action. For example, XY can be “r0”, “u” or “2”. Be careful with this option, and check the changes made by the bot. Note that some choices for XY don’t make sense and will result in a loop, e.g. “l” or “m”.

-main

only check pages in the main namespace, not in the Talk, Project, User, etc. namespaces.

-first

Uses only the first link of every line on the disambiguation page that begins with an asterisk. Useful if the page is full of irrelevant links that are not subject to disambiguation. You won’t get all af them as options, just the first on each line. For a moderated example see https://en.wikipedia.org/wiki/Szerdahely A really exotic one is https://hu.wikipedia.org/wiki/Brabant_(egyértelműsítő lap)

-start:XY goes through all disambiguation pages in the category on your

wiki that is defined (to the bot) as the category containing disambiguation pages, starting at XY. If only ‘-start’ or ‘-start:’ is given, it starts at the beginning.

-min:XX (XX being a number) only work on disambiguation pages for which

at least XX are to be worked on.

To complete a move of a page, one can use:

python pwb.py solve_disambiguation -just -pos:New_Name Old_Name

class scripts.solve_disambiguation.AddAlternativeOption(option, shortcut, output, **kwargs)[source]#

Bases: OutputProxyOption

Add a new alternative.

Create a new option for the given sequence.

Parameters:
  • option (str)

  • shortcut (str)

  • output (OutputOption)

  • kwargs (Any)

result(value)[source]#

Add the alternative and then list them.

Return type:

None

class scripts.solve_disambiguation.AliasOption(option, shortcuts, stop=True)[source]#

Bases: StandardOption

An option allowing multiple aliases which also select it.

Parameters:

stop (bool)

test(value)[source]#

Test aliases and combine it with the original test.

Return type:

bool

class scripts.solve_disambiguation.DisambiguationRobot(*args, **kwargs)[source]#

Bases: SingleSiteBot

Disambiguation Bot.

available_options: dict[str, Any] = {'always': None, 'dnskip': False, 'first': False, 'just': True, 'main': False, 'min': 0, 'pos': [], 'primary': False}#

Handler configuration attribute. Only the keys of the dict can be passed as __init__ options. The values are the default values. Overwrite this in subclasses!

checkContents(text)[source]#

Check if the text matches any of the ignore regexes.

Parameters:

text (str) – wikitext of a page

Returns:

None if none of the regular expressions given in the dictionary at the top of this class matches a substring of the text, otherwise the matched substring

Return type:

str | None

findAlternatives(page)[source]#

Extend self.opt.pos using correctcap of disambPage.linkedPages.

Parameters:

page (Page) – the disambiguation page

Returns:

True if everything goes fine, False otherwise

Return type:

bool

firstize(page, links)[source]#

Call firstlinks and remove extra links.

This will remove a lot of silly redundant links from overdecorated disambiguation pages and leave the first link of each asterisked line only. This must be done if -first is used in command line.

Return type:

list[Page]

Return a list of first links of every line beginning with *.

When a disambpage is full of unnecessary links, this may be useful to sort out the relevant links. E.g. from line * [[Jim Smith (smith)|Jim Smith]] ([[1832]]-[[1932]]) [[English]] it returns only ‘Jim Smith (smith)’ Lines without an asterisk at the beginning will be disregarded. No check for page existence, it has already been done.

Return type:

Generator[str]

ignore_contents = {'de': ('{{[Ii]nuse}}', '{{[Ll]öschen}}'), 'fi': ('{{[Tt]yöstetään}}',), 'kk': ('{{[Ii]nuse}}', '{{[Pp]rocessing}}'), 'nl': ('{{wiu2}}', '{{nuweg}}'), 'ru': ('{{[Ii]nuse}}', '{{[Pp]rocessing}}')}#
makeAlternativesUnique()[source]#

Remove duplicate items from self.opt.pos.

Preserve the order of alternatives.

Return type:

None

primary_redir_template = {'hu': 'Egyért-redir'}#
setSummaryMessage(page, new_targets=None, unlink_counter=0, dn=False)[source]#

Setup i18n summary message.

Parameters:
  • unlink_counter (int)

  • dn (bool)

Return type:

None

setup()[source]#

Compile regular expressions.

Return type:

None

teardown()[source]#

Write ignoring pages to a file.

Return type:

None

treat(page)[source]#

Work on a single disambiguation page.

Return type:

None

treat_disamb_only(ref_page, disamb_page)[source]#

Resolve the links to disamb_page but don’t look for its redirects.

Parameters:
  • disamb_page (Page) – the disambiguation page or redirect we don’t want anything to link to

  • ref_page (Page) – a page linking to disamb_page

Returns:

“nextpage” if the user enters “n” to skip this page, “nochange” if the page needs no change, and “done” if the page is processed successfully

Return type:

str

Resolve the links to disamb_page or its redirects.

Parameters:
  • disamb_page (Page) – the disambiguation page or redirect we don’t want anything to link to

  • ref_page (Page) – a page linking to disamb_page

Returns:

Return whether continue with next page (True) or next disambig (False)

Return type:

bool

class scripts.solve_disambiguation.EditOption(option, shortcut, text, start, title)[source]#

Bases: StandardOption

Edit the text.

result(value)[source]#

Open a text editor and let the user change it.

Return type:

str

property stop: bool#

Return whether if user didn’t press cancel and changed it.

class scripts.solve_disambiguation.PrimaryIgnoreManager(disamb_page, enabled=False)[source]#

Bases: object

Primary ignore manager.

If run with the -primary argument, reads from a file which pages should not be worked on; these are the ones where the user pressed n last time. If run without the -primary argument, doesn’t ignore any pages.

Parameters:

enabled (bool)

ignore(page_titles)[source]#

Write pages to ignorelist.

Parameters:

page_titles (iterable) – page titles to be ignored

Return type:

None

isIgnored(ref_page)[source]#

Return if ref_page is to be ignored.

Return type:

bool

class scripts.solve_disambiguation.ReferringPageGeneratorWithIgnore(page, primary=False, minimum=0, main_only=False)[source]#

Bases: object

Referring Page generator, with an ignore manager.

Parameters:
  • primary (bool)

  • minimum (int)

  • main_only (bool)

class scripts.solve_disambiguation.ShowPageOption(option, shortcut, start, page)[source]#

Bases: StandardOption

Show the page’s contents in an editor.

result(value)[source]#

Open a text editor and show the text.

Return type:

None

scripts.solve_disambiguation.correctcap(link, text)[source]#

Return the link capitalized/uncapitalized according to the text.

Parameters:
  • link (Page) – link page

  • text (str) – the wikitext that is supposed to refer to the link

Returns:

uncapitalized title of the link if the text links to the link with an uncapitalized title, else capitalized

Return type:

str

scripts.solve_disambiguation.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

speedy_delete script#

Help sysops to quickly check and/or delete pages listed for speedy deletion

This bot trawls through candidates for speedy deletion in a fast and semi-automated fashion. It displays the contents of each page one at a time and provides a prompt for the user to skip or delete the page. Of course, this will require a sysop account.

Future upcoming options include the ability to untag a page as not being eligible for speedy deletion, as well as the option to commute its sentence to Proposed Deletion (see [[en:WP:PROD]] for more details). Also, if the article text is long, to prevent terminal spamming, it might be a good idea to truncate it just to the first so many bytes.

Warning

This tool shows the contents of the top revision only. It is possible that a vandal has replaced a perfectly good article with nonsense, which has subsequently been tagged by someone who didn’t realize it was previously a good article. The onus is on you to avoid making these mistakes.

Note

This script currently only works for the Wikipedia project.

class scripts.speedy_delete.SpeedyBot(**kwargs)[source]#

Bases: SingleSiteBot, ExistingPageBot

Bot to delete pages which are tagged as speedy deletion.

This bot will load a list of pages from the category of candidates for speedy deletion on the language’s wiki and give the user an interactive prompt to decide whether each should be deleted or not.

Keyword Arguments:

site (pywikibot.site.APISite) – the site to work on

LINES = 22#

maximum lines to extract from wiki page

csd_cat_item = 'Q5964'#
csd_cat_title = {'incubator': {'incubator': 'Category:Maintenance:Delete'}, 'wikibooks': {'en': 'Category:Candidates for speedy deletion'}, 'wikiversity': {'beta': 'Category:Candidates for speedy deletion'}}#
delete_reasons = {'wikipedia': {'de': {'asdf': 'Tastaturtest', 'egal': 'Eindeutig irrelevant', 'ka': 'Kein Artikel', 'mist': 'Unsinn', 'move': 'Redirectlöschung, um Platz für Verschiebung zu schaffen', 'nde': 'Nicht in deutscher Sprache verfasst', 'pfui': 'Beleidigung', 'redir': 'Unnötiger Redirect', 'spam': 'Spam', 'web': 'Nur ein Weblink', 'wg': 'Wiedergänger (wurde bereits zuvor gelöscht)'}, 'it': {'copyviol': 'Violazione di copyright', 'promo': 'Pagina promozionale', 'redirect': 'Redirect rotto o inutile', 'spam': 'Spam', 'test': 'Si tratta di un test', 'vandalismo': 'Caso di vandalismo'}, 'ja': {'ad': '[[WP:CSD]] 全般4 宣伝', 'auth': '[[WP:CSD]] 記事3 投稿者依頼or初版立項者による白紙化', 'commons': '[[WP:CSD]] マルチメディア7 コモンズの画像ページ', 'cont': '[[WP:CSD]] 全般1 意味不明な内容のページ', 'cp': '[[WP:CSD]] 全般6 コピペ移動or分割', 'ipu': '[[WP:CSD]] 利用者ページ3 IPユーザの利用者ページ', 'nc': '[[WP:CSD]] リダイレクト2 [[WP:NC]]違反', 'nd': '[[WP:CSD]] 記事1 定義なし', 'nr': '[[WP:CSD]] リダイレクト1 無意味なリダイレクト', 'nuu': '[[WP:CSD]] 利用者ページ2 利用者登録されていない利用者ページ', 'ren': '[[WP:CSD]] リダイレクト3 改名提案を経た曖昧回避括弧付きの移動の残骸', 'rep': '[[WP:CSD]] 全般5 削除されたページの改善なき再作成', 'sh': '[[WP:CSD]] 記事1 短すぎ', 'test': '[[WP:CSD]] 全般2 テスト投稿', 'tmp': '[[WP:CSD]] テンプレート1 初版投稿者依頼', 'uau': '[[WP:CSD]] 利用者ページ1 本人希望', 'vand': '[[WP:CSD]] 全般3 荒らしand/orいたずら'}, 'zh': {'ad': '[[WP:CSD#G11]]: 明顯的以廣告宣傳為目而建立的頁面', 'adc': '[[WP:CSD#G11]]: 只有條目名稱中的人物或團體之聯絡資訊', 'anou': '[[WP:CSD#O3]]: 匿名用戶的用戶討論頁,其中的內容不再有用', 'auth': '[[WP:CSD#G10]]: 原作者請求', 'bio': '[[WP:CSD#G12]]: 未列明來源及語調負面的生者傳記', 'cn': '[[WP:CSD#R2]]: 跨空間重定向', 'commons': '[[WP:CSD#F7]]: 此圖片已存在於[[:commons:|維基共享資源]]', 'cont': '[[WP:CSD#A1]]: 非常短,而且沒有定義或內容。', 'empty': '[[WP:CSD#G1]]: 沒有實際內容或歷史記錄的文章。', 'isol': '[[WP:CSD#G15]]: 孤立頁面', 'isol-f': '[[WP:CSD#G15]]: 孤立頁面-沒有對應檔案的檔案頁面', 'isol-sub': '[[WP:CSD#G15]]: 孤立頁面-沒有對應母頁面的子頁面', 'lssd': '[[WP:CSD#F3]]: 沒有版權或來源資訊,無法確認圖片是否符合方針要求', 'mactra': '[[WP:CSD#G13]]: 明顯的機器翻譯', 'move': '[[WP:CSD#G8]]: 依[[Wikipedia:移動請求|移動請求]]暫時刪除以進行移動或合併頁面之工作', 'nc': '[[WP:CSD#A3]]: 跨計劃內容', 'nls': '[[WP:CSD#F3]]: 沒有版權模板,無法確認版權資訊', 'nocont': '[[WP:CSD#A2]]: 內容只包括外部連接、參見、圖書參考、類別標籤、模板標籤、跨語言連接的條目', 'notrans': '[[WP:CSD#G14]]: 未翻譯的頁面', 'oprj': '[[WP:CSD#G7]]: 內容來自其他中文計劃', 'rep': '[[WP:CSD#G5]]: 經討論被刪除後又重新創建的內容', 'repa': '[[WP:CSD#G5]]: 重複的文章', 'repi': '[[WP:CSD#F1]]: 重複的檔案', 'slr': '[[WP:CSD#R5]]: 指向本身的重定向或循環的重定向', 'svg': '[[WP:CSD#F5]]: 被高解析度與SVG檔案取代的圖片', 'tempcp': '[[WP:CSD#G16]]: 臨時頁面依然侵權', 'test': '[[WP:CSD#G2]]: 測試頁', 'tmp': '[[WP:CSD]]: 臨時頁面', 'uc': '[[WP:CSD#O4]]: 空類別', 'ui': '[[WP:CSD#F6]]: 圖片未使用且不自由', 'urs': '[[WP:CSD#O1]]: 用戶請求刪除自己的用戶頁子頁面', 'vand': '[[WP:CSD#G3]]: 純粹破壞', 'wr': '[[WP:CSD#R3]]: 錯誤重定向'}}}#

A list of often-used reasons for deletion. Shortcuts are keys, and reasons are values. If the user enters a shortcut, the associated reason will be used.

deletion_messages = {'wikinews': {'en': {'_default': '[[WN:CSD]]'}, 'zh': {'_default': '[[WN:CSD]]'}}, 'wikipedia': {'ar': {'_default': 'حذف مرشح للحذف السريع حسب [[وب:شطب|معايير الحذف السريع]]'}, 'arz': {'_default': 'مسح صفحه مترشحه للمسح السريع حسب [[ويكيبيديا:مسح سريع|معايير المسح السريع]]'}, 'cs': {'_default': 'Bylo označeno k [[Wikipedie:Rychlé smazání|rychlému smazání]]'}, 'de': {'_default': 'Lösche Artikel nach [[Wikipedia:Schnelllöschantrag|Schnelllöschantrag]]'}, 'en': {'_default': 'Deleting candidate for speedy deletion per [[WP:CSD|CSD]]', 'db-attack': 'Deleting page per [[WP:CSD|CSD]] G10: Page that exists solely to attack its subject.', 'db-author': 'Deleting page per [[WP:CSD|CSD]] G7: Author requests deletion and is its only editor.', 'db-band': 'Deleting page per [[WP:CSD|CSD]] A7: Article about a non-notable band.', 'db-banned': 'Deleting page per [[WP:CSD|CSD]] G5: Page created by a banned user.', 'db-bio': 'Deleting page per [[WP:CSD|CSD]] A7: Article about a non-notable person.', 'db-catempty': 'Deleting page per [[WP:CSD|CSD]] C1: Empty category.', 'db-copyvio': 'Deleting page per [[WP:CSD|CSD]] G12: Page is a blatant copyright violation.', 'db-disparage': 'Deleting page per [[WP:CSD|CSD]] T1: Divisive or inflammatory template.', 'db-empty': 'Deleting page per [[WP:CSD|CSD]] A1: Empty article.', 'db-experiment': 'Deleting page per [[WP:CSD|CSD]] G2: Page was created as an experiment.', 'db-nocontext': 'Deleting page per [[WP:CSD|CSD]] A1: Short article that provides little or no context.', 'db-nonsense': 'Deleting page per [[WP:CSD|CSD]] G1: Page is patent nonsense or gibberish.', 'db-notenglish': "Deleting page per [[WP:CSD|CSD]] A2: Article isn't written in English.", 'db-r1': 'Deleting page per [[WP:CSD|CSD]] R1: Redirect to a deleted or non-existent page.', 'db-repost': 'Deleting page per [[WP:CSD|CSD]] G4: Recreation of previously deleted material.', 'db-spam': 'Deleting page per [[WP:CSD|CSD]] G11: Blatant advertising.', 'db-talk': 'Deleting page per [[WP:CSD|CSD]] G8: Talk page of a deleted or non-existent page.', 'db-test': 'Deleting page per [[WP:CSD|CSD]] G2: Test page.', 'db-vandalism': 'Deleting page per [[WP:CSD|CSD]] G3: Blatant vandalism.'}, 'fa': {'_default': 'حذف مرشَّح للحذف السريع حسب [[ويكيبيديا:حذف سريع|معايير الحذف السريع]]'}, 'he': {'_default': 'מחיקת מועמד למחיקה מהירה לפי [[ויקיפדיה:מדיניות המחיקה|מדיניות המחיקה]]', 'גם בוויקישיתוף': 'הקובץ זמין כעת בוויקישיתוף.'}, 'it': {'_default': 'Rimuovo pagina che rientra nei casi di [[Wikipedia:IMMEDIATA|cancellazione immediata]].'}, 'ja': {'_default': '[[WP:CSD|即時削除の方針]]に基づい削除'}, 'pl': {'_default': 'Usuwanie artykułu zgodnie z zasadami [[Wikipedia:Ekspresowe kasowanko|ekspresowego kasowania]]'}, 'pt': {'_default': 'Apagando página por [[Wikipedia:Páginas para eliminar|eliminação rápida]]'}, 'zh': {'_default': '[[WP:CSD]]', 'advert': 'ad', 'db-blanked': 'auth', 'db-rediruser': '[[WP:CSD#O1|CSD O6]] 沒有在使用的討論頁', 'db-spam': '[[WP:CSD#G11|CSD G11]]: 廣告、宣傳頁面', 'db-vandalism': 'vand', 'no license': '[[WP:CSD#I3|CSD I3]]: 沒有版權模板,無法確認版權資訊', 'no source': '[[WP:CSD#I3|CSD I3]]: 沒有來源連結,無法確認來源與版權資訊', 'notchinese': '[[WP:CSD#G7|CSD G7]]: 非中文條目且長時間未翻譯', 'notmandarin': 'oprj', 'nowcommons': 'commons', 'roughtranslation': 'mactra', 'temppage': '[[WP:CSD]]: 臨時頁面', 'unknown': '[[WP:CSD#I3|CSD I3]]: 沒有版權模板,無法確認版權資訊', '翻譯': 'oprj', '翻译': 'oprj'}}}#

If the site has several templates for speedy deletion, it might be possible to find out the reason for deletion by the template used. _default will be used if no such semantic template was used.

exit()[source]#

Just call teardown after current run.

Return type:

None

get_reason_for_deletion(page)[source]#

Get a reason for speedy deletion from operator.

guess_reason_for_deletion(page)[source]#

Find a default reason for speedy deletion.

run()[source]#

Start the bot’s action.

Return type:

None

setup()[source]#

Refresh generator.

Return type:

None

talk_deletion_msg = {'wikinews': {'en': 'Orphaned talk page', 'zh': '[[WN:CSD#O1|CSD O1 O2 O6]] 沒有在使用的討論頁'}, 'wikipedia': {'ar': 'صفحة نقاش يتيمة', 'arz': 'صفحه نقاش يتيمه', 'cs': 'Osiřelá diskusní stránka', 'de': 'Verwaiste Diskussionsseite', 'en': 'Orphaned talk page', 'fa': 'بحث یتیم', 'fr': 'Page de discussion orpheline', 'he': 'דף שיחה של ערך שנמחק', 'it': 'Rimuovo pagina di discussione di una pagina già cancellata', 'pl': 'Osierocona strona dyskusji', 'pt': 'Página de discussão órfã', 'zh': '[[WP:CSD#O1|CSD O1 O2 O6]] 沒有在使用的討論頁'}}#

Default reason for deleting a talk page.

treat_page()[source]#

Process one page.

Return type:

None

scripts.speedy_delete.main(*args)[source]#

Script entry point.

Parameters:

args (str)

Return type:

None

template script#

Very simple script to replace a template with another one

It also converts the old MediaWiki boilerplate format to the new format.

Syntax:

python pwb.py template [-remove] [xml[:filename]] oldTemplate

[newTemplate]

Specify the template on the command line. The program will pick up the template page, and look for all pages using it. It will then automatically loop over them, and replace the template.

Command line options:

-remove

Remove every occurrence of the template from every article

-subst

Resolves the template by putting its text directly into the article. This is done by changing {{…}} or {{msg:…}} into {{subst:…}}. If you want to use safesubst, you can do -subst:safe. Substitution is not available inside <ref>…</ref>, <gallery>…</gallery>, <poem>…</poem> and <pagelist … /> tags.

-assubst

Replaces the first argument as old template with the second argument as new template but substitutes it like -subst does. Using both options -remove and -subst in the same command line has the same effect.

-xml

retrieve information from a local dump (https://dumps.wikimedia.org). If this argument isn’t given, info will be loaded from the maintenance page of the live wiki. argument can also be given as “-xml:filename.xml”.

-onlyuser:

Only process pages edited by a given user

-skipuser:

Only process pages not edited by a given user

-timestamp:

(With -onlyuser or -skipuser). Only check for a user where his edit is not older than the given timestamp. Timestamp must be written in MediaWiki timestamp format which is “%Y%m%d%H%M%S”. If this parameter is missed, all edits are checked but this is restricted to the last 100 edits.

-summary:

(str) Lets you pick a custom edit summary. Use quotes if edit summary contains spaces.

-always

Don’t bother asking to confirm any of the changes, Just Do It.

-addcat:

Appends the given category to every page that is edited. This is useful when a category is being broken out from a template parameter or when templates are being upmerged but more information must be preserved.

other: First argument is the old template name, second one is the

new name. If you want to address a template which has spaces, put quotation marks around it, or use underscores.

Examples

If you have a template called [[Template:Cities in Washington]] and want to change it to [[Template:Cities in Washington state]], start:

python pwb.py template “Cities in Washington” “Cities in Washington state”

Move the page [[Template:Cities in Washington]] manually afterwards.

If you have a template called [[Template:test]] and want to substitute it only on pages in the User: and User talk: namespaces, do:

python pwb.py template test -subst -namespace:2 -namespace:3

Note

-namespace: is a global Pywikibot parameter

This next example substitutes the template lived with a supplied edit summary. It only performs substitutions in main article namespace and doesn’t prompt to start replacing. Note that -putthrottle: is a global Pywikibot parameter:

python pwb.py template -putthrottle:30 -namespace:0 lived -subst -always -summary:”BOT: Substituting {{lived}}, see [[WP:SUBST]].”

This next example removes the templates {{cfr}}, {{cfru}}, and {{cfr-speedy}} from five category pages as given:

python pwb.py template cfr cfru cfr-speedy -remove -always -page:”Category:Mountain monuments and memorials” -page:”Category:Indian family names” -page:”Category:Tennis tournaments in Belgium” -page:”Category:Tennis tournaments in Germany” -page:”Category:Episcopal cathedrals in the United States” -summary:”Removing Cfd templates from category pages that survived.”

This next example substitutes templates test1, test2, and space test on all user talk pages (namespace #3):

python pwb.py template test1 test2 “space test” -subst -ns:3 -always

class scripts.template.TemplateRobot(generator, templates, **kwargs)[source]#

Bases: ReplaceRobot

This bot will replace, remove or subst all occurrences of a template.

Parameters:
  • generator (iterable) – the pages to work on

  • templates (dict) – a dictionary which maps old template names to their replacements. If remove or subst is True, it maps the names of the templates that should be removed/resolved to None.

update_options: dict[str, Any] = {'addcat': None, 'remove': False, 'subst': False, 'summary': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.template.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

templatecount script#

Display the list of pages transcluding a given list of templates

It can also be used to simply count the number of pages (rather than listing each individually).

Syntax:

python pwb.py templatecount options templates

Command line options:

-count

Counts the number of times each template (passed in as an argument) is transcluded.

-list

Gives the list of all of the pages transcluding the templates (rather than just counting them).

-namespace:

Filters the search to a given namespace. If this is specified multiple times it will search all given namespaces

Examples

Counts how many times {{ref}} and {{note}} are transcluded in articles:

python pwb.py templatecount -count -namespace:0 ref note

Lists all the category pages that transclude {{cfd}} and {{cfdu}}:

python pwb.py templatecount -list -namespace:14 cfd cfdu

class scripts.templatecount.TemplateCountRobot[source]#

Bases: object

Template count bot.

classmethod count_templates(templates, namespaces)[source]#

Display number of transclusions for a list of templates.

Displays the number of transcluded page in the given ‘namespaces’ for each template given by ‘templates’ list.

Parameters:
  • templates (list) – list of template names

  • namespaces (list) – list of namespace numbers

Return type:

None

classmethod list_templates(templates, namespaces)[source]#

Display transcluded pages for a list of templates.

Displays each transcluded page in the given ‘namespaces’ for each template given by ‘templates’ list.

Parameters:
  • templates (list) – list of template names

  • namespaces (list) – list of namespace numbers

Return type:

None

classmethod template_dict(templates, namespaces)[source]#

Create a dict of templates and its transcluded pages.

The names of the templates are the keys, and lists of pages transcluding templates in the given namespaces are the values.

Parameters:
  • templates (list) – list of template names

  • namespaces (list) – list of namespace numbers

Return type:

dict[str, list[Page]]

static template_dict_generator(templates, namespaces)[source]#

Yield transclusions of each template in ‘templates’.

For each template in ‘templates’, yield a tuple (template, transclusions), where ‘transclusions’ is a list of all pages in ‘namespaces’ where the template has been transcluded.

Parameters:
  • templates (list) – list of template names

  • namespaces (list) – list of namespace numbers

Return type:

Generator[tuple[str, list[Page]], None, None]

scripts.templatecount.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

touch script#

This bot goes over multiple pages of a wiki, and edits them without changes

This is for example used to get category links in templates working.

Command-line arguments:

-purge

Purge the page instead of touching it

Touch mode (default):

-botflag

Force botflag in case of edits with changes.

Purge mode:

-converttitles

Convert titles to other variants if necessary

-forcelinkupdate

Update the links tables

-forcerecursivelinkupdate

Update the links table, and update the links tables for any page that uses this page as a template

-redirects

Automatically resolve redirects

This script supports use of pagegenerators arguments.

class scripts.touch.PurgeBot(*args, **kwargs)[source]#

Bases: MultipleSitesBot

Purge each page on the generator.

available_options: dict[str, Any] = {'converttitles': None, 'forcelinkupdate': None, 'forcerecursivelinkupdate': None, 'redirects': None}#

Handler configuration attribute. Only the keys of the dict can be passed as __init__ options. The values are the default values. Overwrite this in subclasses!

purgepages(flush=False)[source]#

Purge a bulk of page if rate limit exceeded.

Added in version 8.0.

Changed in version 9.0: site.APISite.ratelimit() method is used to determine bulk length and delay.

teardown()[source]#

Purge remaining pages if no KeyboardInterrupt was made.

Added in version 8.0.

treat(page)[source]#

Purge the given page.

Changed in version 8.0: Enable batch purge using APISite.purgepages()

Return type:

None

class scripts.touch.TouchBot(**kwargs)[source]#

Bases: MultipleSitesBot

Page touch bot.

Parameters:

kwargs (Any) – bot options

Keyword Arguments:

generator – a generator processed by run() method

treat(page)[source]#

Touch the given page.

Return type:

None

update_options: dict[str, Any] = {'botflag': False}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.touch.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

transferbot script#

This script transfers pages from a source wiki to a target wiki

It also copies edit history to a subpage.

The following parameters are supported:

-tolang:

The target site code.

-tofamily:

The target site family.

-prefix:

Page prefix on the new site.

-overwrite:

Existing pages are skipped by default. Use this option to overwrite pages.

-target

Use page generator of the target site

Warning

Internal links are not repaired!

Pages to work on can be specified using any of:

This script supports use of pagegenerators arguments.

Examples

Transfer all pages in category “Query service” from the English Wikipedia to the Arabic Wiktionary, adding “Wiktionary:Import enwp/” as prefix:

python pwb.py transferbot -family:wikipedia -lang:en -cat:”Query service” -tofamily:wiktionary -tolang:ar -prefix:”Wiktionary:Import enwp/”

Copy the template “Query service” from the English Wikipedia to the Arabic Wiktionary:

python pwb.py transferbot -family:wikipedia -lang:en -tofamily:wiktionary -tolang:ar -page:”Template:Query service”

Copy 10 wanted templates of German Wikipedia from English Wikipedia to German:

python pwb.py transferbot -family:wikipedia -lang:en -tolang:de -wantedtemplates:10 -target

scripts.transferbot.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

transwikiimport script#

This script transfers pages from a source wiki to a target wiki

It uses API:Import and it is also able to copy the full edit history.

The following parameters are supported:

-interwikisource:

The interwiki code of the source wiki.

-fullhistory:

Include all versions of the page.

-includealltemplates:

All templates and transcluded pages will be copied (dangerous).

-assignknownusers:

If user exists on target wiki, assign the editions to them

-correspondingnamespace: The number of the corresponding namespace.

-rootpage:

Import as subpages of …

-summary:

Log entry import summary.

-tags:

Change tags to apply to the entry in the import log and to the null revision on the imported pages.

-overwrite:

Existing pages are skipped by default. Use this option to overwrite pages.

-target

Use page generator of the target site This also affects the correspondingnamespace.

Warning

Internal links are not repaired!

Pages to work on can be specified using any of:

This script supports use of pagegenerators arguments.

Examples

Transfer all pages in category “Query service” from the English Wikipedia to the home Wikipedia, adding “Wikipedia:Import enwp/” as prefix:

python pwb.py transwikiimport -interwikisource:en -cat:”Query service” -prefix:”Wikipedia:Import enwp/” -fullhistory -assignknownusers

Copy the template “Query service” from the English Wikipedia to the home Wiktionary:

python pwb.py transferbot -interwikisource:w:en -page:”Template:Query service” -fullhistory -assignknownusers

Copy 10 wanted templates of the home Wikipedia from English Wikipedia to the home Wikipedia:

python pwb.py transferbot -interwikisource:en -wantedtemplates:10 -target -fullhistory -assignknownusers

Advices

The module gives access to all parameters of the API (and special page) and is compatible to the scripts.transferbot script. However for most scenarios the parameters -overwrite, -target and -includealltemplates should be avoided; by default they are set to False.

The correspondingnamespace is used only if the namespaces on both wikis do not correspond one with another.

Correspondingnamespace and rootpage are mutually exclusive.

Target and rootpage are mutually exclusive. (This combination does not seem to be feasible.)

If the target page already exists, the target page will be overwritten if -overwrite is set or skipped otherwise.

The list of pages to be imported can be generated outside of the pywikbot:

for i in {1..10} ; do python3 pwb.py transwikiimport -interwikisource:mul -page:”Page:How to become famous.djvu/$i” -fullhistory -assignknownusers ; done

The pages *``Page:How to become famous.djvu/1``, ``Page:How to become famous.djvu/2`` .. ``Page:How to become famous.djvu/10`` will be copied from wikisource (mul) to the home-wikisource, all versions will be imported and the usernames will be identified (existing pages will be skipped).*

Or generated using the usual pywikibot generators:

python3 pwb.py transwikiimport -interwikisource:mul -prefixindex:”Page:How to become famous.djvu” -fullhistory -assignknownusers -summary:”Book copied from oldwiki.”

All pages like *``Page:How to become famous.djvu``… will be copied from wikisource (mul) to the home-wikisource, all versions will be imported and the usernames will be identified (existing pages will be skipped).*

The global option -simulate disables the import and the bot prints the names of the pages that would be imported. Since the import of pages is a quite exceptionell process and potentially dangerous it should be made carefully and tested in advance.

The -simulate option can help to find out which pages would be moved and what would be the target of the import. However it does not print the titles of the transcluded pages (e.g. templates) if -includealltemplates is set.

This option is quite dangerous. If the title of an existing page on home wiki clashes with the title of one of the linked pages it would be overritten. The histories would be merged. (If the imported version is newer.) Even if -overwrite is not set the linked page can be overwritten.

Hints

The list of wikis that can be used as a interwiki source is defined in the variable $wgImportSources. It can be viewed on the Special:Import page.

Rights

For transwikiimport script and even to access the Special:Import page the appropriate flag on the account must be set, usually administrator, transwiki importer or importer.

Added in version 8.2.

scripts.transwikiimport.api_query(site, params)[source]#

Request data from given site.

Parameters:

params (dict[str, str])

scripts.transwikiimport.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

unusedfiles script#

This bot appends some text to all unused images and notifies uploaders

Parameters:

-limit:n

(int) Specify number of pages to work on where n is the maximum number of articles to work on. If not used, all pages are processe.

-always

Don’t be asked every time.

This script is a ConfigParserBot. The following options can be set within a settings file which is scripts.ini by default:

-nouserwarning

Do not warn uploader about orphaned file.

-filetemplate:

(str) Use a custom template on unused file pages.

-usertemplate:

(str) Use a custom template to warn the uploader.

class scripts.unusedfiles.UnusedFilesBot(**kwargs)[source]#

Bases: SingleSiteBot, AutomaticTWSummaryBot, ConfigParserBot, ExistingPageBot

Unused files bot.

Changed in version 7.0: UnusedFilesBot is a ConfigParserBot

append_text(page, apptext)[source]#

Append apptext to the page.

skip_page(image)[source]#

Skip processing on repository images or if image is already tagged.

Use get_file_url() and file_is_shared() to confirm it is local media rather than a local page with the same name as shared media.

Parameters:

image (FilePage)

Return type:

bool

summary_key: str | None = 'unusedfiles-comment'#

Must be defined in subclasses.

treat(image)[source]#

Process one image page.

Return type:

None

update_options: dict[str, Any] = {'filetemplate': '', 'nouserwarning': False, 'usertemplate': ''}#

update_options can be used to update available_options; do not use it if the bot class is to be derived but use self.available_options.update(<dict>) initializer in such case.

Added in version 6.4.

scripts.unusedfiles.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

upload script#

Script to upload images to Wikipedia

The following parameters are supported:

-keep

Keep the filename as is

-filename:

(str) Target filename without the namespace prefix

-prefix:

(str) Add specified prefix to every filename.

-noverify

Do not ask for verification of the upload description if one is given

-abortonwarn:

Abort upload on the specified warning type. If no warning type is specified, aborts on any warning.

-ignorewarn:

Ignores specified upload warnings. If no warning type is specified, ignores all warnings. Use with caution

-chunked:

Upload the file in chunks (more overhead, but restartable). If no value is specified the chunk size is 1 MiB. The value must be a number which can be preceded by a suffix. The units are:

No suffix: Bytes
'k': Kilobytes (1000 B)
'M': Megabytes (1000000 B)
'Ki': Kibibytes (1024 B)
'Mi': Mebibytes (1024x1024 B)

The suffixes are case insensitive.

-async

Make potentially large file operations asynchronous on the server side when possible.

-always

Don’t ask the user anything. This will imply -keep and -noverify and require that either -abortonwarn or -ignorewarn is defined for all. It will also require a valid file name and description. It’ll only overwrite files if -ignorewarn includes the ‘exists’ warning.

-recursive

When the filename is a directory it also uploads the files from the subdirectories.

-summary:

(str) Pick a custom edit summary for the bot.

-descfile:

(str) Specify a filename where the description is stored

It is possible to combine -abortonwarn and -ignorewarn so that if the specific warning is given it won’t apply the general one but more specific one. So if it should ignore specific warnings and abort on the rest it’s possible by defining no warning for -abortonwarn and the specific warnings for -ignorewarn. The order does not matter. If both are unspecific or a warning is specified by both, it’ll prefer aborting.

If any other arguments are given, the first is either URL, filename or directory to upload, and the rest is a proposed description to go with the upload. If none of these are given, the user is asked for the directory, file or URL to upload. The bot will then upload the image to the wiki.

The script will ask for the location of an image(s), if not given as a parameter, and for a description.

scripts.upload.get_chunk_size(match)[source]#

Get chunk size.

Return type:

int

scripts.upload.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

watchlist script#

Allows access to the bot account’s watchlist

The watchlist can be updated manually by running this script.

Syntax:

python pwb.py watchlist [-all | -count | -count:all | -new]

Command line options:

-all

Reloads watchlists for all wikis where a watchlist is already present.

-count

Count only the total number of pages on the watchlist of the account the bot has access to.

-count:all

Count only the total number of pages on all wikis watchlists that the bot is connected to.

-new

Load watchlists for all wikis where accounts is set in user config file

Changed in version 7.7: watchlist is retrieved in parallel tasks.

scripts.watchlist.count_watchlist(site=None)[source]#

Count only the total number of page(s) in watchlist for this wiki.

Return type:

None

scripts.watchlist.count_watchlist_all(quiet=False)[source]#

Count only the total number of page(s) in watchlist for all wikis.

Return type:

None

scripts.watchlist.get(site=None)[source]#

Load the watchlist, fetching it if necessary.

Return type:

list[str]

scripts.watchlist.isWatched(pageName, site=None)[source]#

Check whether a page is being watched.

scripts.watchlist.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

scripts.watchlist.refresh(site)[source]#

Fetch the watchlist.

scripts.watchlist.refresh_all()[source]#

Reload watchlists for all wikis where a watchlist is already present.

Return type:

None

scripts.watchlist.refresh_new()[source]#

Load watchlists of all wikis for accounts set in user config.

Return type:

None

weblinkchecker script#

This bot is used for checking external links found at the wiki

It checks several pages at once, with a limit set by the config variable max_external_links, which defaults to 50.

The bot won’t change any wiki pages, it will only report dead links such that people can fix or remove the links themselves.

The bot will store all links found dead in a .dat file in the deadlinks subdirectory. To avoid the removing of links which are only temporarily unavailable, the bot ONLY reports links which were reported dead at least two times, with a time lag of at least one week. Such links will be logged to a .txt file in the deadlinks subdirectory.

The .txt file uses wiki markup and so it may be useful to post it on the wiki and then exclude that page from subsequent runs. For example if the page is named Broken Links, exclude it with ‘-titleregexnot:^Broken Links$’

After running the bot and waiting for at least one week, you can re-check those pages where dead links were found, using the -repeat parameter.

In addition to the logging step, it is possible to automatically report dead links to the talk page of the article where the link was found. To use this feature, set report_dead_links_on_talk = True in your user config file, or specify “-talk” on the command line. Adding “-notalk” switches this off irrespective of the configuration variable.

When a link is found alive, it will be removed from the .dat file.

These command line parameters can be used to specify which pages to work on:

-repeat

Work on all pages where dead links were found before. This is useful to confirm that the links are dead after some time (at least one week), which is required before the script will report the problem.

-namespace

Only process templates in the namespace with the given number or name. This parameter may be used multiple times.

-xml

Should be used instead of a simple page fetching method from pagegenerators.py for performance and load issues

-xmlstart

Page to start with when using an XML dump

-ignore

HTTP return codes to ignore. Can be provided several times : -ignore:401 -ignore:500

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-talk

Overrides the report_dead_links_on_talk config variable, enabling the feature.

-notalk

Overrides the report_dead_links_on_talk config variable, disabling the feature.

-day

Do not report broken link if the link is there only since x days or less. If not set, the default is 7 days.

The following config variables are supported:

max_external_links The maximum number of web pages that should be

loaded simultaneously. You should change this according to your Internet connection speed. Be careful: if it is set too high, the script might get socket errors because your network is congested, and will then think that the page is offline.

report_dead_links_on_talk If set to true, causes the script to report dead

links on the article’s talk page if (and ONLY if) the linked page has been unavailable at least two times during a timespan of at least one week.

weblink_dead_days sets the timespan (default: one week) after which

a dead link will be reported

Examples

Loads all wiki pages in alphabetical order using the Special:Allpages feature:

python pwb.py weblinkchecker -start:!

Loads all wiki pages using the Special:Allpages feature, starting at “Example page”:

python pwb.py weblinkchecker -start:Example_page

Loads all wiki pages that link to www.example.org:

python pwb.py weblinkchecker -weblink:www.example.org

Only checks links found in the wiki page “Example page”:

python pwb.py weblinkchecker Example page

Loads all wiki pages where dead links were found during a prior run:

python pwb.py weblinkchecker -repeat

class scripts.weblinkchecker.DeadLinkReportThread[source]#

Bases: Thread

A Thread that is responsible for posting error reports on talk pages.

There is only one DeadLinkReportThread, and it is using a semaphore to make sure that two LinkCheckerThreads cannot access the queue at the same time.

kill()[source]#

Kill thread.

Return type:

None

report(url, error_report, containing_page, archive_url)[source]#

Report error on talk page of the page containing the dead link.

Return type:

None

run()[source]#

Run thread.

Return type:

None

shutdown()[source]#

Finish thread.

Return type:

None

class scripts.weblinkchecker.History(report_thread, site=None)[source]#

Bases: object

Store previously found dead links.

The URLs are dictionary keys, and values are lists of tuples where each tuple represents one time the URL was found dead. Tuples have the form (title, date, error) where title is the wiki page where the URL was found, date is an instance of time, and error is a string with error code and message.

We assume that the first element in the list represents the first time we found this dead link, and the last element represents the last time.

Example:

dict = {
    'https://www.example.org/page': [
        ('WikiPageTitle', DATE, '404: File not found'),
        ('WikiPageName2', DATE, '404: File not found'),
    ]
}
log(url, error, containing_page, archive_url)[source]#

Log an error report to a text file in the deadlinks subdirectory.

Return type:

None

save()[source]#

Save the .dat file to disk.

Return type:

None

Add the fact that the link was found dead to the .dat file.

Return type:

None

Record that the link is now alive.

If link was previously found dead, remove it from the .dat file.

Returns:

True if previously found dead, else returns False.

Return type:

bool

class scripts.weblinkchecker.LinkCheckThread(page, url, history, http_ignores, day)[source]#

Bases: Thread

A thread responsible for checking one URL.

After checking the page, it will die.

classmethod get_delay(name)[source]#

Determine delay from class attribute.

Store the last call for a given hostname with an offset of 6 seconds to ensure there are no more than 10 calls per minute for the same host. Calculate the delay to start the run.

Parameters:

name (str) – The key for the hosts class attribute

Returns:

The calulated delay to start the run

Return type:

float

hosts: dict[str, float] = {}#

Collecting start time of a thread for any host

lock = <unlocked _thread.lock object>#
run()[source]#

Run the bot.

exception scripts.weblinkchecker.NotAnURLError[source]#

Bases: BaseException

The link is not an URL.

scripts.weblinkchecker.RepeatPageGenerator()[source]#

Generator for pages in History.

class scripts.weblinkchecker.WeblinkCheckerRobot(http_ignores=None, day=7, **kwargs)[source]#

Bases: SingleSiteBot, ExistingPageBot

Bot which will search for dead weblinks.

It uses several LinkCheckThreads at once to process pages from generator.

Parameters:

day (int)

Count LinkCheckThread threads.

Returns:

number of LinkCheckThread threads

Return type:

int

teardown()[source]#

Finish remaining threads and save history file.

Return type:

None

treat_page()[source]#

Process one page.

Return type:

None

use_redirects: bool | None = False#

Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:

class MyRedirectBot(ExistingPageBot):

    '''Bot who only works on existing redirects.'''

    use_redirects = True

Added in version 7.2.

scripts.weblinkchecker.get_archive_url(url)[source]#

Get archive URL.

scripts.weblinkchecker.main(*args)[source]#

Process command line arguments and invoke bot.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

Yield web links from text.

Only used as text predicate for XmlDumpPageGenerator to speed up generator.

TODO: move to textlib

Parameters:
  • without_bracketed (bool)

  • only_bracketed (bool)

welcome script#

Script to welcome new users

This script works out of the box for Wikis that have been defined in the script.

Important

Ensure you have community support before running this bot!

Everything that needs customisation to support additional projects is indicated by comments.

Description of basic functionality

  • Request a list of new users every period (default: 3600 seconds). You can choose to break the script after the first check (see arguments)

  • Check if new user has passed a threshold for a number of edits (default: 1 edit)

  • Optional: check username for bad words in the username or if the username consists solely of numbers; log this somewhere on the wiki (default: False). A whitelist is also implemented (explanation below).

  • If user has made enough edits (it can be also 0), check if user has an empty talk page

  • If user has an empty talk page, add a welcome message.

  • Optional: Once the set number of users have been welcomed, add this to the configured log page, one for each day (default: True)

  • If no log page exists, create a header for the log page first.

This script uses two templates that need to be on the local wiki:

  • {{WLE}}: contains mark up code for log entries (just copy it from Commons)

  • {{welcome}}: contains the information for new users

This script understands the following command-line arguments:

-edit[:#]

(int) Define how many edits a new user needs to be welcomed (default: 1, max: 50)

-time[:#]

Define how many seconds the bot sleeps before restart (default: 3600)

-break

Use it if you don’t want that the Bot restart at the end (it will break) (default: False)

-nlog

Use this parameter if you do not want the bot to log all welcomed users (default: False)

-limit[:#]

(int) Use this parameter to define how may users should be checked (default:50)

-offset[:TIME]

Skip the latest new users (those newer than TIME) to give interactive users a chance to welcome the new users (default: now) Timezone is the server timezone, GMT for Wikimedia TIME format : yyyymmddhhmmss or yyyymmdd

-timeoffset[:#]

Skip the latest new users, accounts newer than # minutes

-numberlog[:#]

The number of users to welcome before refreshing the welcome log (default: 4)

-filter

Enable the username checks for bad names (default: False)

-ask

Use this parameter if you want to confirm each possible bad username (default: False)

-random

Use a random signature, taking the signatures from a wiki page (for instruction, see below).

-file[:#]

Use a file instead of a wikipage to take the random sign. If you use this parameter, you don’t need to use -random.

-sign

Use one signature from command line instead of the default

-savedata

This feature saves the random signature index to allow to continue to welcome with the last signature used.

-sul

Welcome the auto-created users (default: False)

-quiet

Prevents users without contributions are displayed

GUIDE#

Report, Bad and white list guide

  1. Set in the code which page it will use to load the badword, the whitelist and the report.

  2. In these page you have to add a “tuple” with the names that you want to add in the two list. For example: (‘cat’, ‘mouse’, ‘dog’) You can write also other text in the page, it will work without problem.

  3. What will do the two pages? Well, the Bot will check if a badword is in the username and set the “warning” as True. Then the Bot check if a word of the whitelist is in the username. If yes it remove the word and recheck in the bad word list to see if there are other badword in the username.

    Example:

    • dio is a badword

    • Claudio is a normal name

    • The username is “Claudio90 fuck!”

    • The Bot finds dio and sets “warning”

    • The Bot finds Claudio and sets “ok”

    • The Bot finds fuck at the end and sets “warning”

    • Result: The username is reported.

  4. When a user is reported you have to check him and do

    • If he’s ok, put the {{welcome}}

    • If he’s not, block him

    • You can decide to put a “you are blocked, change another username” template or not.

    • Delete the username from the page.

    Important

    The Bot check the user in this order

    • Search if he has a talkpage (if yes, skip)

    • Search if he’s blocked, if yes he will be skipped

    • Search if he’s in the report page, if yes he will be skipped

    • If no, he will be reported.

Random signature guide

Some welcomed users will answer to the one who has signed the welcome message. When you welcome many new users, you might be overwhelmed with such answers. Therefore you can define usernames of other users who are willing to receive some of these messages from newbies.

  1. Set the page that the bot will load

  2. Add the signature lines in this way:

    • SIGNATURE

    Example of signatures:

    <pre>
    * [[User:Filnik|Filnik]]
    * [[User:Rock|Rock]]
    </pre>
    

    Note

    The white space after asterisk and <pre></pre> aren’t required but it is recommended you to use them.

Badwords#

The list of Badwords of the code is opened. If you think that a word is international and it must be blocked in all the projects feel free to add it. If also you think that a word isn’t so international, feel free to delete it.

However, there is a dinamic-wikipage to load that badwords of your project or you can add them directly in the source code that you are using without adding or deleting.

Some words, like “Administrator” or “Dio” (God in italian) or “Jimbo” aren’t badwords at all but can be used for some bad-nickname.

exception scripts.welcome.FilenameNotSet(arg)[source]#

Bases: Error

An exception indicating that a signature filename was not specified.

Parameters:

arg (Exception | str)

Return type:

None

class scripts.welcome.Global[source]#

Bases: object

Container class for global settings.

attach_edit_count = 1#
confirm = False#
default_sign = '--~~~~'#
dump_to_log = 15#
filt_bad_name = False#
make_welcome_log = True#
offset = None#
query_limit = 50#
quiet = False#
random_sign = False#
recursive = True#
save_sign_index = False#
sign_file_name = None#
time_recur = 3600#
timeoffset = 0#
welcome_auto = False#
class scripts.welcome.Msg(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

Enum for show_status method providing message header and color.

DEFAULT = ('MSG', 'lightpurple')#
DONE = ('Done', 'lightblue')#
IGNORE = ('NoAct', 'lightaqua')#
MATCH = ('Match', 'lightgreen')#
MSG = ('MSG', 'lightpurple')#
SKIP = ('Skip', 'lightyellow')#
WARN = ('Warn', 'lightred')#
class scripts.welcome.WelcomeBot(**kwargs)[source]#

Bases: SingleSiteBot

Bot to add welcome messages on User pages.

bad_name_filer(name, force=False)[source]#

Check for bad names.

Parameters:

force (bool)

Return type:

bool

collect_bad_accounts(name)[source]#

Add bad account to queue.

Parameters:

name (str)

Return type:

None

define_sign(force=False)[source]#

Setup signature.

Parameters:

force (bool)

Return type:

list[str]

property generator: Generator[User, None, None]#

Retrieve new users.

makelogpage()[source]#

Make log page.

Return type:

None

report_bad_account()[source]#

Report bad account.

Return type:

None

static show_status(message=Msg.MSG)[source]#

Output colorized status.

Return type:

None

skip_page(user)[source]#

Check whether the user is to be skipped.

Changed in version 7.0: also skip if user is locked globally

Return type:

bool

teardown()[source]#

Some cleanups after run operation.

Return type:

None

treat(user)[source]#

Run the bot.

Return type:

None

write_log()[source]#

Write logfile.

Return type:

None

scripts.welcome.get_welcome_text(site)[source]#

Check that site is managed by the script and return the message.

Raises:

KeyError – site is not in WELCOME dict

Parameters:

site (BaseSite)

Return type:

str

scripts.welcome.handle_args(args)[source]#

Process command line arguments.

If args is an empty list, sys.argv is used.

Parameters:

args (str) – command line arguments

Return type:

None

scripts.welcome.load_word_function(raw)[source]#

Load the badword list and the whitelist.

Return type:

list[str]

scripts.welcome.main(*args)[source]#

Invoke bot.

Parameters:

args (str) – command line arguments

Return type:

None

Script subpackages#

For information on contents of subpackages, see