Main bot scripts#

add_text script#

Append text to the top or bottom of a page

By default this adds the text to the bottom above the categories and interwiki.

Use the following command line parameters to specify what to add:

-text: (str) Text to append. “n” are interpreted as newlines.
-textfile: (str) Path to a file with text to append
-summary: (str) Change summary to use
-up: Append text to the top of the page rather than the bottom
-create: Create the page if necessary. Note that talk pages are created already without of this option.
-createonly: Only create the page but do not edit existing ones
-always: If used, the bot won’t ask if it should add the specified text
-major: If used, the edit will be saved without the “minor edit” flag
-talk, -talkpage: Put the text onto the talk page instead
-excepturl: (str) Skip pages with a url that matches this regular expression
-noreorder: Place the text beneath the categories and interwiki

Furthermore, the following can be used to specify which pages to process…

This script supports use of pagegenerators arguments.

Examples

Append ‘hello world’ to the bottom of the sandbox:

python pwb.py add_text -page:Wikipedia:Sandbox
-summary:"Bot: pywikibot practice" -text:"hello world"

Add a template to the top of the pages with ‘category:catname’:

python pwb.py add_text -cat:catname -summary:"Bot: Adding a template"
-text:"{{Something}}" -except:"\{\{([Tt]emplate:|)[Ss]omething" -up

Command used on it.wikipedia to put the template in the page without any category:

python pwb.py add_text -except:"\{\{([Tt]emplate:|)[Cc]ategorizzare"
-text:"{{Categorizzare}}" -excepturl:"class='catlinks'>" -uncat
-summary:"Bot: Aggiungo template Categorizzare"

category script#

Script to manage categories

Syntax:

python pwb.py category action [-option]

where action can be one of these

add: mass-add a category to a list of pages.
remove: remove category tag from all pages in a category. If a pagegenerators option is given, the intersection with category pages is processed.
move: move all pages in a category to another category. If a pagegenerators option is given, the intersection with category pages is processed.
tidy: tidy up a category by moving its pages into subcategories.
tree: show a tree of subcategories of a given category.
listify: make a list of all of the articles that are in a category.
clean: Removes redundant grandchildren from specified category by removing direct link to grandparent. In another words a grandchildren should not be also a children.

and option can be one of these

Options for add action:

-person: Sort persons by their last name.
-create: If a page doesn’t exist, do not skip it, create it instead.
-redirect: Follow redirects.

Options for listify action:

-append: This appends the list to the current page that is already existing (appending to the bottom by default).
-overwrite: This overwrites the current page with the list even if something is already there.
-showimages: This displays images rather than linking them in the list.
-talkpages: This outputs the links to talk pages of the pages to be listified in addition to the pages themselves.

-prefix:#: You may specify a list prefix like “#” for a numbered list or any other prefix. Default is a bullet list with prefix “*”.

Options for remove action:

-nodelsum: This specifies not to use the custom edit summary as the deletion reason. Instead, it uses the default deletion reason for the language, which is “Category was disbanded” in English.

Options for move action:

-hist: Creates a nice wikitable on the talk page of target category that contains detailed page history of the source category.
-nodelete: Don’t delete the old category after move.
-nowb: Don’t update the Wikibase repository.
-allowsplit: If that option is not set, it only moves the talk and main page together.
-mvtogether: Only move the pages/subcategories of a category, if the target page (and talk page, if -allowsplit is not set) doesn’t exist.
-keepsortkey: Use sortKey of the old category also for the new category. If not specified, sortKey is removed. An alternative method to keep sortKey is to use -inplace option.

Options for listify and tidy actions:

-namespaces, -namespace, -ns: Filter the arcitles in the specified namespaces. Separate multiple namespace numbers or names with commas. Examples: -ns:0,2,4, -ns:Help,MediaWiki

Options for clean action:

-always: The bot won’t ask for confirmation when putting a page.

Options for several actions:

-rebuild: Reset the database.

-from:: The category to move from (for the move option). Also, the category to remove from in the remove option. Also, the category to make a list of in the listify option.
-to:: The category to move to (for the move option). Also, the name of the list to make in the listify option.

-batch: Don’t prompt to delete emptied categories (do it automatically).

-summary:: Pick a custom edit summary for the bot.

-inplace: Use this flag to change categories in place rather than rearranging them.

-recurse[:<depth>]: Recurse through subcategories of the category to optional depth.

-pagesonly: While removing pages from a category, keep the subpage links and do not remove them.
-match: Only work on pages whose titles match the given regex (for move and remove actions).

-depth:: The max depth limit beyond which no subcategories will be isted.

Note

If the category names have spaces in them you may need to use a special syntax in your shell so that the names aren’t treated as separate parameters. For instance, in BASH, use single quotes, e.g. -from:'Polar bears'.

If action is “add”, “move” or “remove, the following additional options are supported:

This script supports use of pagegenerators arguments.

For the actions tidy and tree, the bot will store the category structure locally in category.dump. This saves time and server load, but if it uses these data later, they may be outdated; use the -rebuild parameter in this case.

For example, to create a new category from a list of persons, type:

python pwb.py category add -person

and follow the on-screen instructions.

Or to do it all from the command-line, use the following syntax:

python pwb.py category move -from:US -to:”United States”

This will move all pages in the category US to the category United States.

A pagegenerators option can be given with move and remove action:

pwb category -site:wikipedia:en remove -from:Hydraulics -cat:Pneumatics

The sample above would remove ‘Hydraulics’ category from all pages which are also in ‘Pneumatics’ category.

Changed in version 8.0: pagegenerators are supported with “move” and “remove” action.

replace script#

This bot will make direct text replacements

It will retrieve information on which pages might need changes either from an XML dump or a text file, or only change a single page.

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-mysqlquery: Retrieve information from a local database mirror. If no query specified, bot searches for pages with given replacements.
-xml: Retrieve information from a local XML dump (pages-articles or pages-meta-current, see https://dumps.wikimedia.org). Argument can also be given as “-xml:filename”.
-regex: Make replacements using regular expressions. If this argument isn’t given, the bot will make simple text replacements.
-nocase: Use case insensitive regular expressions.
-dotall: Make the dot match any character at all, including a newline. Without this flag, ‘.’ will match anything except a newline.
-multiline: ‘^’ and ‘$’ will now match begin and end of each line.
-xmlstart: (Only works with -xml) Skip all articles in the XML dump before the one specified (may also be given as -xmlstart:Article).

-addcat:cat_name

Adds “cat_name” category to every altered page.

-excepttitle:XYZ

Skip pages with titles that contain XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.

-requiretitle:XYZ Only do pages with titles that contain XYZ. If the -regex

argument is given, XYZ will be regarded as a regular expression.

-excepttext:XYZ

Skip pages which contain the text XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.

-exceptinside:XYZ Skip occurrences of the to-be-replaced text which lie

within XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.

-exceptinsidetag:XYZ Skip occurrences of the to-be-replaced text which lie

within an XYZ tag.

-summary:XYZ

Set the summary message text for the edit to XYZ, bypassing the predefined message texts with original and replacements inserted. To add the replacements to your summary use the %(description)s placeholder, for example: -summary:”Bot operated replacement: %(description)s” Can’t be used with -automaticsummary.

-automaticsummary Uses an automatic summary for all replacements which don’t

have a summary defined. Can’t be used with -summary.

-sleep:123

If you use -fix you can check multiple regex at the same time in every page. This can lead to a great waste of CPU because the bot will check every regex without waiting using all the resources. This will slow it down between a regex and another in order not to waste too much CPU.

-fix:XYZ

Perform one of the predefined replacements tasks, which are given in the dictionary ‘fixes’ defined inside the files fixes.py and user-fixes.py.

The available fixes are listed in pywikibot.fixes.

-manualinput

Request manual replacements via the command line input even if replacements are already defined. If this option is set (or no replacements are defined via -fix or the arguments) it’ll ask for additional replacements at start.

-pairsfile

Lines from the given file name(s) will be read as replacement arguments. i.e. a file containing lines “a” and “b”, used as:

python pwb.py replace -page:X -pairsfile:file c d

will replace ‘a’ with ‘b’ and ‘c’ with ‘d’.

-always

Don’t prompt you for each replacement

-quiet

Don’t prompt a message if a page keeps unchanged

-nopreload

Do not preload pages. Useful if disabled on a wiki.

-recursive

Recurse replacement as long as possible. Be careful, this might lead to an infinite loop.

-allowoverlap

When occurrences of the pattern overlap, replace all of them. Be careful, this might lead to an infinite loop.

-fullsummary

Use one large summary for all command line replacements.

Replacement parameters: Replacement parameters are pairs of arguments given to the script. The First argument is the old text to be replaced, the second argument is the new text. If the -regex argument is given, the first argument will be regarded as a regular expression, and the second argument might contain expressions like \1 or \g<name>. The second parameter can also be specified as empty string, usually "". It is possible to introduce more than one pair of replacement parameters.

Empty string arguments with PowerShell

Using PowerShell as command shell removes empty strings during PowerShell’s command line parsing. To enable empty strings with PowerShell you have either to escape quotation marks with gravis symbols in front of them like `"`" or to disable command line parsing with --% symbol for all following command parts like python pwb replace --% -start:! foo "" which disables parsing for all replace options and arguments following this delimiter and enables empty strings.

Examples

If you want to change templates from the old syntax, e.g. {{msg:Stub}}, to the new syntax, e.g. {{Stub}}, download an XML dump file (pages-articles) from https://dumps.wikimedia.org, then use this command:

python pwb.py replace -xml -regex “{{msg:(.*?)}}” “{{1}}”

If you have a dump called foobar.xml and want to fix typos in articles, e.g. Errror -> Error, use this:

python pwb.py replace -xml:foobar.xml “Errror” “Error” -namespace:0

If you want to do more than one replacement at a time, use this:

python pwb.py replace -xml:foobar.xml “Errror” “Error” “Faail” “Fail” -namespace:0

If you have a page called ‘John Doe’ and want to fix the format of ISBNs, use:

python pwb.py replace -page:John_Doe -fix:isbn

This command will change ‘referer’ to ‘referrer’, but not in pages which talk about HTTP, where the typo has become part of the standard:

python pwb.py replace referer referrer -file:typos.txt -excepttext:HTTP

solve_disambiguation script#

Script to help a human solve disambiguations by presenting a set of options

Specify the disambiguation page on the command line.

The program will pick up the page, and look for all alternative links, and show them with a number adjacent to them. It will then automatically loop over all pages referring to the disambiguation page, and show 30 characters of context on each side of the reference to help you make the decision between the alternatives. It will ask you to type the number of the appropriate replacement, and perform the change.

It is possible to choose to replace only the link (just type the number) or replace both link and link-text (type ‘r’ followed by the number).

Multiple references in one page will be scanned in order, but typing ‘n’ (next) on any one of them will leave the complete page unchanged. To leave only some reference unchanged, use the ‘s’ (skip) option.

Command line options:

-pos:XXXX adds XXXX as an alternative disambiguation

-just

only use the alternatives given on the command line, do not read the page for other possibilities

-dnskip

Skip links already marked with a disambiguation-needed template (e.g., {{dn}})

-primary

“primary topic” disambiguation (Begriffsklärung nach Modell 2). That’s titles where one topic is much more important, the disambiguation page is saved somewhere else, and the important topic gets the nice name.

-primary:XY like the above, but use XY as the only alternative, instead of
searching for alternatives in [[Keyword (disambiguation)]]. Note: this is the same as -primary -just -pos:XY

-file:XYZ reads a list of pages from a text file. XYZ is the name of the
file from which the list is taken. If XYZ is not given, the user is asked for a filename. Page titles should be inside [[double brackets]]. The -pos parameter won’t work if -file is used.

-always:XY instead of asking the user what to do, always perform the same
action. For example, XY can be “r0”, “u” or “2”. Be careful with this option, and check the changes made by the bot. Note that some choices for XY don’t make sense and will result in a loop, e.g. “l” or “m”.

-main

only check pages in the main namespace, not in the Talk, Project, User, etc. namespaces.

-first

Uses only the first link of every line on the disambiguation page that begins with an asterisk. Useful if the page is full of irrelevant links that are not subject to disambiguation. You won’t get all af them as options, just the first on each line. For a moderated example see https://en.wikipedia.org/wiki/Szerdahely A really exotic one is https://hu.wikipedia.org/wiki/Brabant_(egyértelműsítő lap)

-start:XY goes through all disambiguation pages in the category on your
wiki that is defined (to the bot) as the category containing disambiguation pages, starting at XY. If only ‘-start’ or ‘-start:’ is given, it starts at the beginning.

-min:XX (XX being a number) only work on disambiguation pages for which
at least XX are to be worked on.

To complete a move of a page, one can use:

python pwb.py solve_disambiguation -just -pos:New_Name Old_Name

upload script#

Script to upload images to Wikipedia

The following parameters are supported:

-keep: Keep the filename as is

-filename:: (str) Target filename without the namespace prefix
-prefix:: (str) Add specified prefix to every filename.

-noverify: Do not ask for verification of the upload description if one is given

-abortonwarn:

Abort upload on the specified warning type. If no warning type is specified, aborts on any warning.

-ignorewarn:

Ignores specified upload warnings. If no warning type is specified, ignores all warnings. Use with caution

-chunked:

Upload the file in chunks (more overhead, but restartable). If no value is specified the chunk size is 1 MiB. The value must be a number which can be preceded by a suffix. The units are:

No suffix: Bytes
'k': Kilobytes (1000 B)
'M': Megabytes (1000000 B)
'Ki': Kibibytes (1024 B)
'Mi': Mebibytes (1024x1024 B)

The suffixes are case insensitive.

-async: Make potentially large file operations asynchronous on the server side when possible.
-always: Don’t ask the user anything. This will imply -keep and -noverify and require that either -abortonwarn or -ignorewarn is defined for all. It will also require a valid file name and description. It’ll only overwrite files if -ignorewarn includes the ‘exists’ warning.
-recursive: When the filename is a directory it also uploads the files from the subdirectories.

-summary:: (str) Pick a custom edit summary for the bot.
-descfile:: (str) Specify a filename where the description is stored

It is possible to combine -abortonwarn and -ignorewarn so that if the specific warning is given it won’t apply the general one but more specific one. So if it should ignore specific warnings and abort on the rest it’s possible by defining no warning for -abortonwarn and the specific warnings for -ignorewarn. The order does not matter. If both are unspecific or a warning is specified by both, it’ll prefer aborting.

If any other arguments are given, the first is either URL, filename or directory to upload, and the rest is a proposed description to go with the upload. If none of these are given, the user is asked for the directory, file or URL to upload. The bot will then upload the image to the wiki.

The script will ask for the location of an image(s), if not given as a parameter, and for a description.

weblinkchecker script#

This bot is used for checking external links found at the wiki

It checks several pages at once, with a limit set by the config variable max_external_links, which defaults to 50.

The bot won’t change any wiki pages, it will only report dead links such that people can fix or remove the links themselves.

The bot will store all links found dead in a .dat file in the deadlinks subdirectory. To avoid the removing of links which are only temporarily unavailable, the bot ONLY reports links which were reported dead at least two times, with a time lag of at least one week. Such links will be logged to a .txt file in the deadlinks subdirectory.

The .txt file uses wiki markup and so it may be useful to post it on the wiki and then exclude that page from subsequent runs. For example if the page is named Broken Links, exclude it with ‘-titleregexnot:^Broken Links$’

After running the bot and waiting for at least one week, you can re-check those pages where dead links were found, using the -repeat parameter.

In addition to the logging step, it is possible to automatically report dead links to the talk page of the article where the link was found. To use this feature, set report_dead_links_on_talk = True in your user config file, or specify “-talk” on the command line. Adding “-notalk” switches this off irrespective of the configuration variable.

When a link is found alive, it will be removed from the .dat file.

These command line parameters can be used to specify which pages to work on:

-repeat: Work on all pages where dead links were found before. This is useful to confirm that the links are dead after some time (at least one week), which is required before the script will report the problem.
-namespace: Only process templates in the namespace with the given number or name. This parameter may be used multiple times.
-xml: Should be used instead of a simple page fetching method from pagegenerators.py for performance and load issues
-xmlstart: Page to start with when using an XML dump
-ignore: HTTP return codes to ignore. Can be provided several times : -ignore:401 -ignore:500

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-talk: Overrides the report_dead_links_on_talk config variable, enabling the feature.
-notalk: Overrides the report_dead_links_on_talk config variable, disabling the feature.
-day: Do not report broken link if the link is there only since x days or less. If not set, the default is 7 days.

The following config variables are supported:

max_external_links The maximum number of web pages that should be
loaded simultaneously. You should change this according to your Internet connection speed. Be careful: if it is set too high, the script might get socket errors because your network is congested, and will then think that the page is offline.

report_dead_links_on_talk If set to true, causes the script to report dead
links on the article’s talk page if (and ONLY if) the linked page has been unavailable at least two times during a timespan of at least one week.

weblink_dead_days sets the timespan (default: one week) after which
a dead link will be reported

Examples

Loads all wiki pages in alphabetical order using the Special:Allpages feature:

python pwb.py weblinkchecker -start:!

Loads all wiki pages using the Special:Allpages feature, starting at “Example page”:

python pwb.py weblinkchecker -start:Example_page

Loads all wiki pages that link to www.example.org:

python pwb.py weblinkchecker -weblink:www.example.org

Only checks links found in the wiki page “Example page”:

python pwb.py weblinkchecker Example page

Loads all wiki pages where dead links were found during a prior run:

python pwb.py weblinkchecker -repeat