Main bot scripts#

add_text script#

Append text to the top or bottom of a page

By default this adds the text to the bottom above the categories and interwiki.

Use the following command line parameters to specify what to add:

-text             Text to append. "\n" are interpreted as newlines.

-textfile         Path to a file with text to append

-summary          Change summary to use

-up               Append text to the top of the page rather than the bottom

-create           Create the page if necessary. Note that talk pages are
                  created already without of this option.

-createonly       Only create the page but do not edit existing ones

-always           If used, the bot won't ask if it should add the specified
                  text

-major            If used, the edit will be saved without the "minor edit" flag

-talkpage         Put the text onto the talk page instead
-talk

-excepturl        Skip pages with a url that matches this regular expression

-noreorder        Place the text beneath the categories and interwiki

Furthermore, the following can be used to specify which pages to process…

This script supports use of pagegenerators arguments.

Examples

Append ‘hello world’ to the bottom of the sandbox:

python pwb.py add_text -page:Wikipedia:Sandbox \
-summary:"Bot: pywikibot practice" -text:"hello world"

Add a template to the top of the pages with ‘category:catname’:

python pwb.py add_text -cat:catname -summary:"Bot: Adding a template" \
-text:"{{Something}}" -except:"\{\{([Tt]emplate:|)[Ss]omething" -up

Command used on it.wikipedia to put the template in the page without any category:

python pwb.py add_text -except:"\{\{([Tt]emplate:|)[Cc]ategorizzare" \
-text:"{{Categorizzare}}" -excepturl:"class='catlinks'>" -uncat \
-summary:"Bot: Aggiungo template Categorizzare"

category script#

Script to manage categories

Syntax:

python pwb.py category action [-option]

where action can be one of these

add: mass-add a category to a list of pages.
remove: remove category tag from all pages in a category. If a pagegenerators option is given, the intersection with category pages is processed.
move: move all pages in a category to another category. If a pagegenerators option is given, the intersection with category pages is processed.
tidy: tidy up a category by moving its pages into subcategories.
tree: show a tree of subcategories of a given category.
listify: make a list of all of the articles that are in a category.
clean: Removes redundant grandchildren from specified category by removing direct link to grandparent. In another words a grandchildren should not be also a children.

and option can be one of these

Options for “add” action:

-person      - Sort persons by their last name.
-create      - If a page doesn't exist, do not skip it, create it instead.
-redirect    - Follow redirects.

Options for “listify” action:

-append      - This appends the list to the current page that is already
               existing (appending to the bottom by default).
-overwrite   - This overwrites the current page with the list even if
               something is already there.
-showimages  - This displays images rather than linking them in the list.
-talkpages   - This outputs the links to talk pages of the pages to be
               listified in addition to the pages themselves.
-prefix:#    - You may specify a list prefix like "#" for a numbered list or
               any other prefix. Default is a bullet list with prefix "*".

Options for “remove” action:

-nodelsum    - This specifies not to use the custom edit summary as the
               deletion reason. Instead, it uses the default deletion reason
               for the language, which is "Category was disbanded" in
               English.

Options for “move” action:

-hist        - Creates a nice wikitable on the talk page of target category
               that contains detailed page history of the source category.
-nodelete    - Don't delete the old category after move.
-nowb        - Don't update the Wikibase repository.
-allowsplit  - If that option is not set, it only moves the talk and main
               page together.
-mvtogether  - Only move the pages/subcategories of a category, if the
               target page (and talk page, if -allowsplit is not set)
               doesn't exist.
-keepsortkey - Use sortKey of the old category also for the new category.
               If not specified, sortKey is removed.
               An alternative method to keep sortKey is to use -inplace
               option.

Options for “listify” and “tidy” actions:

-namespaces    Filter the arcitles in the specified namespaces. Separate
-namespace     multiple namespace numbers or names with commas. Examples::
-ns            -ns:0,2,4
               -ns:Help,MediaWiki

Options for “clean” action:

-always

Options for several actions:

-rebuild     - Reset the database.
-from:       - The category to move from (for the move option)
               Also, the category to remove from in the remove option
               Also, the category to make a list of in the listify option.
-to:         - The category to move to (for the move option).
             - Also, the name of the list to make in the listify option.

-batch       - Don't prompt to delete emptied categories (do it
               automatically).
-summary:    - Pick a custom edit summary for the bot.
-inplace     - Use this flag to change categories in place rather than
               rearranging them.
-recurse[:<depth>]
             - Recurse through subcategories of the category to
               optional depth.
-pagesonly   - While removing pages from a category, keep the subpage links
               and do not remove them.
-match       - Only work on pages whose titles match the given regex (for
               move and remove actions).
-depth:      - The max depth limit beyond which no subcategories will be
               listed.

Note

If the category names have spaces in them you may need to use a special syntax in your shell so that the names aren’t treated as separate parameters. For instance, in BASH, use single quotes, e.g. -from:'Polar bears'.

If action is “add”, “move” or “remove, the following additional options are supported:

This script supports use of pagegenerators arguments.

For the actions tidy and tree, the bot will store the category structure locally in category.dump. This saves time and server load, but if it uses these data later, they may be outdated; use the -rebuild parameter in this case.

For example, to create a new category from a list of persons, type:

python pwb.py category add -person

and follow the on-screen instructions.

Or to do it all from the command-line, use the following syntax:

python pwb.py category move -from:US -to:"United States"

This will move all pages in the category US to the category United States.

A pagegenerators option can be given with move and remove action:

pwb category -site:wikipedia:en remove -from:Hydraulics -cat:Pneumatics

The sample above would remove ‘Hydraulics’ category from all pages which are also in ‘Pneumatics’ category.

Changed in version 8.0: pagegenerators are supported with “move” and “remove” action.

replace script#

This bot will make direct text replacements

It will retrieve information on which pages might need changes either from an XML dump or a text file, or only change a single page.

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-mysqlquery       Retrieve information from a local database mirror.
                  If no query specified, bot searches for pages with
                  given replacements.

-xml              Retrieve information from a local XML dump
                  (pages-articles or pages-meta-current, see
                  https://dumps.wikimedia.org). Argument can also
                  be given as "-xml:filename".

-regex            Make replacements using regular expressions. If this argument
                  isn't given, the bot will make simple text replacements.

-nocase           Use case insensitive regular expressions.

-dotall           Make the dot match any character at all, including a newline.
                  Without this flag, '.' will match anything except a newline.

-multiline        '^' and '$' will now match begin and end of each line.

-xmlstart         (Only works with -xml) Skip all articles in the XML dump
                  before the one specified (may also be given as
                  -xmlstart:Article).

-addcat:cat_name  Adds "cat_name" category to every altered page.

-excepttitle:XYZ  Skip pages with titles that contain XYZ. If the -regex
                  argument is given, XYZ will be regarded as a regular
                  expression.

-requiretitle:XYZ Only do pages with titles that contain XYZ. If the -regex
                  argument is given, XYZ will be regarded as a regular
                  expression.

-excepttext:XYZ   Skip pages which contain the text XYZ. If the -regex
                  argument is given, XYZ will be regarded as a regular
                  expression.

-exceptinside:XYZ Skip occurrences of the to-be-replaced text which lie
                  within XYZ. If the -regex argument is given, XYZ will be
                  regarded as a regular expression.

-exceptinsidetag:XYZ Skip occurrences of the to-be-replaced text which lie
                 within an XYZ tag.

-summary:XYZ      Set the summary message text for the edit to XYZ, bypassing
                  the predefined message texts with original and replacements
                  inserted. To add the replacements to your summary use the
                  %(description)s placeholder, for example:
                  -summary:"Bot operated replacement: %(description)s"
                  Can't be used with -automaticsummary.

-automaticsummary Uses an automatic summary for all replacements which don't
                  have a summary defined. Can't be used with -summary.

-sleep:123        If you use -fix you can check multiple regex at the same time
                  in every page. This can lead to a great waste of CPU because
                  the bot will check every regex without waiting using all the
                  resources. This will slow it down between a regex and another
                  in order not to waste too much CPU.

-fix:XYZ          Perform one of the predefined replacements tasks, which are
                  given in the dictionary 'fixes' defined inside the files
                  fixes.py and user-fixes.py.

                 The available fixes are listed in :py:mod:`pywikibot.fixes`.

-manualinput      Request manual replacements via the command line input even
                  if replacements are already defined. If this option is set
                  (or no replacements are defined via -fix or the arguments)
                  it'll ask for additional replacements at start.

-pairsfile        Lines from the given file name(s) will be read as replacement
                  arguments. i.e. a file containing lines "a" and "b", used as:

                      python pwb.py replace -page:X -pairsfile:file c d

                  will replace 'a' with 'b' and 'c' with 'd'.

-always           Don't prompt you for each replacement

-quiet            Don't prompt a message if a page keeps unchanged

-nopreload        Do not preload pages. Useful if disabled on a wiki.

-recursive        Recurse replacement as long as possible. Be careful, this
                  might lead to an infinite loop.

-allowoverlap     When occurrences of the pattern overlap, replace all of them.
                  Be careful, this might lead to an infinite loop.

-fullsummary      Use one large summary for all command line replacements.

Replacement parameters: Replacement parameters are pairs of arguments given to the script. The First argument is the old text to be replaced, the second argument is the new text. If the -regex argument is given, the first argument will be regarded as a regular expression, and the second argument might contain expressions like \1 or \g<name>. The second parameter can also be specified as empty string, usually "". It is possible to introduce more than one pair of replacement parameters.

Empty string arguments with PowerShell

Using PowerShell as command shell removes empty strings during PowerShell’s command line parsing. To enable empty strings with PowerShell you have either to escape quotation marks with gravis symbols in front of them like `"`" or to disable command line parsing with --% symbol for all following command parts like python pwb replace --% -start:! foo "" which disables parsing for all replace options and arguments following this delimiter and enables empty strings.

Examples

If you want to change templates from the old syntax, e.g. {{msg:Stub}}, to the new syntax, e.g. {{Stub}}, download an XML dump file (pages-articles) from https://dumps.wikimedia.org, then use this command:

python pwb.py replace -xml -regex "{{msg:(.*?)}}" "{{\1}}"

If you have a dump called foobar.xml and want to fix typos in articles, e.g. Errror -> Error, use this:

python pwb.py replace -xml:foobar.xml "Errror" "Error" -namespace:0

If you want to do more than one replacement at a time, use this:

python pwb.py replace -xml:foobar.xml "Errror" "Error" "Faail" "Fail" \
-namespace:0

If you have a page called ‘John Doe’ and want to fix the format of ISBNs, use:

python pwb.py replace -page:John_Doe -fix:isbn

This command will change ‘referer’ to ‘referrer’, but not in pages which talk about HTTP, where the typo has become part of the standard:

python pwb.py replace referer referrer -file:typos.txt -excepttext:HTTP

solve_disambiguation script#

Script to help a human solve disambiguations by presenting a set of options

Specify the disambiguation page on the command line.

The program will pick up the page, and look for all alternative links, and show them with a number adjacent to them. It will then automatically loop over all pages referring to the disambiguation page, and show 30 characters of context on each side of the reference to help you make the decision between the alternatives. It will ask you to type the number of the appropriate replacement, and perform the change.

It is possible to choose to replace only the link (just type the number) or replace both link and link-text (type ‘r’ followed by the number).

Multiple references in one page will be scanned in order, but typing ‘n’ (next) on any one of them will leave the complete page unchanged. To leave only some reference unchanged, use the ‘s’ (skip) option.

Command line options:

-pos:XXXX   adds XXXX as an alternative disambiguation

-just       only use the alternatives given on the command line, do not
            read the page for other possibilities

-dnskip     Skip links already marked with a disambiguation-needed
            template (e.g., {{dn}})

-primary    "primary topic" disambiguation (Begriffsklärung nach Modell 2).
            That's titles where one topic is much more important, the
            disambiguation page is saved somewhere else, and the important
            topic gets the nice name.

-primary:XY like the above, but use XY as the only alternative, instead of
            searching for alternatives in [[Keyword (disambiguation)]].
            Note: this is the same as -primary -just -pos:XY

-file:XYZ   reads a list of pages from a text file. XYZ is the name of the
            file from which the list is taken. If XYZ is not given, the
            user is asked for a filename. Page titles should be inside
            [[double brackets]]. The -pos parameter won't work if -file
            is used.

-always:XY  instead of asking the user what to do, always perform the same
            action. For example, XY can be "r0", "u" or "2". Be careful with
            this option, and check the changes made by the bot. Note that
            some choices for XY don't make sense and will result in a loop,
            e.g. "l" or "m".

-main       only check pages in the main namespace, not in the Talk,
            Project, User, etc. namespaces.

-first      Uses only the first link of every line on the disambiguation
            page that begins with an asterisk. Useful if the page is full
            of irrelevant links that are not subject to disambiguation.
            You won't get all af them as options, just the first on each
            line. For a moderated example see
            https://en.wikipedia.org/wiki/Szerdahely
            A really exotic one is
            https://hu.wikipedia.org/wiki/Brabant_(egyértelműsítő lap)

-start:XY   goes through all disambiguation pages in the category on your
            wiki that is defined (to the bot) as the category containing
            disambiguation pages, starting at XY. If only '-start' or
            '-start:' is given, it starts at the beginning.

-min:XX     (XX being a number) only work on disambiguation pages for which
            at least XX are to be worked on.

To complete a move of a page, one can use:

python pwb.py solve_disambiguation -just -pos:New_Name Old_Name

upload script#

Script to upload images to Wikipedia

The following parameters are supported:

-keep         Keep the filename as is
-filename:    Target filename without the namespace prefix
-prefix:      Add specified prefix to every filename.
-noverify     Do not ask for verification of the upload description if one
              is given
-abortonwarn: Abort upload on the specified warning type. If no warning type
              is specified, aborts on any warning.
-ignorewarn:  Ignores specified upload warnings. If no warning type is
              specified, ignores all warnings. Use with caution
-chunked:     Upload the file in chunks (more overhead, but restartable). If
              no value is specified the chunk size is 1 MiB. The value must
              be a number which can be preceded by a suffix. The units are::

                  No suffix: Bytes
                  'k': Kilobytes (1000 B)
                  'M': Megabytes (1000000 B)
                  'Ki': Kibibytes (1024 B)
                  'Mi': Mebibytes (1024x1024 B)

              The suffixes are case insensitive.
-async        Make potentially large file operations asynchronous on the
              server side when possible.
-always       Don't ask the user anything. This will imply -keep and
              -noverify and require that either -abortonwarn or -ignorewarn
              is defined for all. It will also require a valid file name and
              description. It'll only overwrite files if -ignorewarn includes
              the 'exists' warning.
-recursive    When the filename is a directory it also uploads the files from
              the subdirectories.
-summary:     Pick a custom edit summary for the bot.
-descfile:    Specify a filename where the description is stored

It is possible to combine -abortonwarn and -ignorewarn so that if the specific warning is given it won’t apply the general one but more specific one. So if it should ignore specific warnings and abort on the rest it’s possible by defining no warning for -abortonwarn and the specific warnings for -ignorewarn. The order does not matter. If both are unspecific or a warning is specified by both, it’ll prefer aborting.

If any other arguments are given, the first is either URL, filename or directory to upload, and the rest is a proposed description to go with the upload. If none of these are given, the user is asked for the directory, file or URL to upload. The bot will then upload the image to the wiki.

The script will ask for the location of an image(s), if not given as a parameter, and for a description.

weblinkchecker script#

This bot is used for checking external links found at the wiki

It checks several pages at once, with a limit set by the config variable max_external_links, which defaults to 50.

The bot won’t change any wiki pages, it will only report dead links such that people can fix or remove the links themselves.

The bot will store all links found dead in a .dat file in the deadlinks subdirectory. To avoid the removing of links which are only temporarily unavailable, the bot ONLY reports links which were reported dead at least two times, with a time lag of at least one week. Such links will be logged to a .txt file in the deadlinks subdirectory.

The .txt file uses wiki markup and so it may be useful to post it on the wiki and then exclude that page from subsequent runs. For example if the page is named Broken Links, exclude it with ‘-titleregexnot:^Broken Links$’

After running the bot and waiting for at least one week, you can re-check those pages where dead links were found, using the -repeat parameter.

In addition to the logging step, it is possible to automatically report dead links to the talk page of the article where the link was found. To use this feature, set report_dead_links_on_talk = True in your user config file, or specify “-talk” on the command line. Adding “-notalk” switches this off irrespective of the configuration variable.

When a link is found alive, it will be removed from the .dat file.

These command line parameters can be used to specify which pages to work on:

-repeat      Work on all pages where dead links were found before. This is
             useful to confirm that the links are dead after some time (at
             least one week), which is required before the script will report
             the problem.

-namespace   Only process templates in the namespace with the given number or
             name. This parameter may be used multiple times.

-xml         Should be used instead of a simple page fetching method from
             pagegenerators.py for performance and load issues

-xmlstart    Page to start with when using an XML dump

-ignore      HTTP return codes to ignore. Can be provided several times :
                -ignore:401 -ignore:500

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-talk        Overrides the report_dead_links_on_talk config variable, enabling
             the feature.

-notalk      Overrides the report_dead_links_on_talk config variable, disabling
             the feature.

-day         Do not report broken link if the link is there only since
             x days or less. If not set, the default is 7 days.

The following config variables are supported:

max_external_links         The maximum number of web pages that should be
                           loaded simultaneously. You should change this
                           according to your Internet connection speed.
                           Be careful: if it is set too high, the script
                           might get socket errors because your network
                           is congested, and will then think that the page
                           is offline.

report_dead_links_on_talk  If set to true, causes the script to report dead
                           links on the article's talk page if (and ONLY if)
                           the linked page has been unavailable at least two
                           times during a timespan of at least one week.

weblink_dead_days          sets the timespan (default: one week) after which
                           a dead link will be reported

Examples

Loads all wiki pages in alphabetical order using the Special:Allpages feature:

python pwb.py weblinkchecker -start:!

Loads all wiki pages using the Special:Allpages feature, starting at “Example page”:

python pwb.py weblinkchecker -start:Example_page

Loads all wiki pages that link to www.example.org:

python pwb.py weblinkchecker -weblink:www.example.org

Only checks links found in the wiki page “Example page”:

python pwb.py weblinkchecker Example page

Loads all wiki pages where dead links were found during a prior run:

python pwb.py weblinkchecker -repeat