Main bot scripts#
add_text script#
Append text to the top or bottom of a page
By default this adds the text to the bottom above the categories and interwiki.
Use the following command line parameters to specify what to add:
-text Text to append. "\n" are interpreted as newlines.
-textfile Path to a file with text to append
-summary Change summary to use
-up Append text to the top of the page rather than the bottom
-create Create the page if necessary. Note that talk pages are
created already without of this option.
-createonly Only create the page but do not edit existing ones
-always If used, the bot won't ask if it should add the specified
text
-major If used, the edit will be saved without the "minor edit" flag
-talkpage Put the text onto the talk page instead
-talk
-excepturl Skip pages with a url that matches this regular expression
-noreorder Place the text beneath the categories and interwiki
Furthermore, the following can be used to specify which pages to process…
This script supports use of pagegenerators
arguments.
Examples
Append ‘hello world’ to the bottom of the sandbox:
python pwb.py add_text -page:Wikipedia:Sandbox \
-summary:"Bot: pywikibot practice" -text:"hello world"
Add a template to the top of the pages with ‘category:catname’:
python pwb.py add_text -cat:catname -summary:"Bot: Adding a template" \
-text:"{{Something}}" -except:"\{\{([Tt]emplate:|)[Ss]omething" -up
Command used on it.wikipedia to put the template in the page without any category:
python pwb.py add_text -except:"\{\{([Tt]emplate:|)[Cc]ategorizzare" \
-text:"{{Categorizzare}}" -excepturl:"class='catlinks'>" -uncat \
-summary:"Bot: Aggiungo template Categorizzare"
category script#
Script to manage categories
Syntax:
python pwb.py category action [-option]
where action can be one of these
- add
mass-add a category to a list of pages.
- remove
remove category tag from all pages in a category. If a pagegenerators option is given, the intersection with category pages is processed.
- move
move all pages in a category to another category. If a pagegenerators option is given, the intersection with category pages is processed.
- tidy
tidy up a category by moving its pages into subcategories.
- tree
show a tree of subcategories of a given category.
- listify
make a list of all of the articles that are in a category.
- clean
Removes redundant grandchildren from specified category by removing direct link to grandparent. In another words a grandchildren should not be also a children.
and option can be one of these
Options for “add” action:
-person - Sort persons by their last name.
-create - If a page doesn't exist, do not skip it, create it instead.
-redirect - Follow redirects.
Options for “listify” action:
-append - This appends the list to the current page that is already
existing (appending to the bottom by default).
-overwrite - This overwrites the current page with the list even if
something is already there.
-showimages - This displays images rather than linking them in the list.
-talkpages - This outputs the links to talk pages of the pages to be
listified in addition to the pages themselves.
-prefix:# - You may specify a list prefix like "#" for a numbered list or
any other prefix. Default is a bullet list with prefix "*".
Options for “remove” action:
-nodelsum - This specifies not to use the custom edit summary as the
deletion reason. Instead, it uses the default deletion reason
for the language, which is "Category was disbanded" in
English.
Options for “move” action:
-hist - Creates a nice wikitable on the talk page of target category
that contains detailed page history of the source category.
-nodelete - Don't delete the old category after move.
-nowb - Don't update the Wikibase repository.
-allowsplit - If that option is not set, it only moves the talk and main
page together.
-mvtogether - Only move the pages/subcategories of a category, if the
target page (and talk page, if -allowsplit is not set)
doesn't exist.
-keepsortkey - Use sortKey of the old category also for the new category.
If not specified, sortKey is removed.
An alternative method to keep sortKey is to use -inplace
option.
Options for “listify” and “tidy” actions:
-namespaces Filter the arcitles in the specified namespaces. Separate
-namespace multiple namespace numbers or names with commas. Examples::
-ns -ns:0,2,4
-ns:Help,MediaWiki
Options for “clean” action:
-always
Options for several actions:
-rebuild - Reset the database.
-from: - The category to move from (for the move option)
Also, the category to remove from in the remove option
Also, the category to make a list of in the listify option.
-to: - The category to move to (for the move option).
- Also, the name of the list to make in the listify option.
-batch - Don't prompt to delete emptied categories (do it
automatically).
-summary: - Pick a custom edit summary for the bot.
-inplace - Use this flag to change categories in place rather than
rearranging them.
-recurse[:<depth>]
- Recurse through subcategories of the category to
optional depth.
-pagesonly - While removing pages from a category, keep the subpage links
and do not remove them.
-match - Only work on pages whose titles match the given regex (for
move and remove actions).
-depth: - The max depth limit beyond which no subcategories will be
listed.
Note
If the category names have spaces in them you may need to use
a special syntax in your shell so that the names aren’t treated as
separate parameters. For instance, in BASH, use single quotes, e.g.
-from:'Polar bears'
.
If action is “add”, “move” or “remove, the following additional options are supported:
This script supports use of pagegenerators
arguments.
For the actions tidy and tree, the bot will store the category structure locally in category.dump. This saves time and server load, but if it uses these data later, they may be outdated; use the -rebuild parameter in this case.
For example, to create a new category from a list of persons, type:
python pwb.py category add -person
and follow the on-screen instructions.
Or to do it all from the command-line, use the following syntax:
python pwb.py category move -from:US -to:"United States"
This will move all pages in the category US to the category United States.
A pagegenerators option can be given with move
and remove
action:
pwb category -site:wikipedia:en remove -from:Hydraulics -cat:Pneumatics
The sample above would remove ‘Hydraulics’ category from all pages which are also in ‘Pneumatics’ category.
Changed in version 8.0: pagegenerators
are supported with “move” and “remove” action.
replace script#
This bot will make direct text replacements
It will retrieve information on which pages might need changes either from an XML dump or a text file, or only change a single page.
These command line parameters can be used to specify which pages to work on:
This script supports use of pagegenerators
arguments.
Furthermore, the following command line parameters are supported:
-mysqlquery Retrieve information from a local database mirror.
If no query specified, bot searches for pages with
given replacements.
-xml Retrieve information from a local XML dump
(pages-articles or pages-meta-current, see
https://dumps.wikimedia.org). Argument can also
be given as "-xml:filename".
-regex Make replacements using regular expressions. If this argument
isn't given, the bot will make simple text replacements.
-nocase Use case insensitive regular expressions.
-dotall Make the dot match any character at all, including a newline.
Without this flag, '.' will match anything except a newline.
-multiline '^' and '$' will now match begin and end of each line.
-xmlstart (Only works with -xml) Skip all articles in the XML dump
before the one specified (may also be given as
-xmlstart:Article).
-addcat:cat_name Adds "cat_name" category to every altered page.
-excepttitle:XYZ Skip pages with titles that contain XYZ. If the -regex
argument is given, XYZ will be regarded as a regular
expression.
-requiretitle:XYZ Only do pages with titles that contain XYZ. If the -regex
argument is given, XYZ will be regarded as a regular
expression.
-excepttext:XYZ Skip pages which contain the text XYZ. If the -regex
argument is given, XYZ will be regarded as a regular
expression.
-exceptinside:XYZ Skip occurrences of the to-be-replaced text which lie
within XYZ. If the -regex argument is given, XYZ will be
regarded as a regular expression.
-exceptinsidetag:XYZ Skip occurrences of the to-be-replaced text which lie
within an XYZ tag.
-summary:XYZ Set the summary message text for the edit to XYZ, bypassing
the predefined message texts with original and replacements
inserted. To add the replacements to your summary use the
%(description)s placeholder, for example:
-summary:"Bot operated replacement: %(description)s"
Can't be used with -automaticsummary.
-automaticsummary Uses an automatic summary for all replacements which don't
have a summary defined. Can't be used with -summary.
-sleep:123 If you use -fix you can check multiple regex at the same time
in every page. This can lead to a great waste of CPU because
the bot will check every regex without waiting using all the
resources. This will slow it down between a regex and another
in order not to waste too much CPU.
-fix:XYZ Perform one of the predefined replacements tasks, which are
given in the dictionary 'fixes' defined inside the files
fixes.py and user-fixes.py.
The available fixes are listed in :py:mod:`pywikibot.fixes`.
-manualinput Request manual replacements via the command line input even
if replacements are already defined. If this option is set
(or no replacements are defined via -fix or the arguments)
it'll ask for additional replacements at start.
-pairsfile Lines from the given file name(s) will be read as replacement
arguments. i.e. a file containing lines "a" and "b", used as:
python pwb.py replace -page:X -pairsfile:file c d
will replace 'a' with 'b' and 'c' with 'd'.
-always Don't prompt you for each replacement
-quiet Don't prompt a message if a page keeps unchanged
-nopreload Do not preload pages. Useful if disabled on a wiki.
-recursive Recurse replacement as long as possible. Be careful, this
might lead to an infinite loop.
-allowoverlap When occurrences of the pattern overlap, replace all of them.
Be careful, this might lead to an infinite loop.
-fullsummary Use one large summary for all command line replacements.
- Replacement parameters
Replacement parameters are pairs of arguments given to the script. The First argument is the old text to be replaced, the second argument is the new text. If the
-regex
argument is given, the first argument will be regarded as a regular expression, and the second argument might contain expressions like\1
or\g<name>
. The second parameter can also be specified as empty string, usually""
. It is possible to introduce more than one pair of replacement parameters.
Empty string arguments with PowerShell
Using PowerShell as command shell removes empty strings during
PowerShell’s command line parsing. To enable empty strings with
PowerShell you have either to escape quotation marks with gravis
symbols in front of them like `"`"
or to disable command line
parsing with --%
symbol for all following command parts like
python pwb replace --% -start:! foo ""
which disables parsing
for all replace options and arguments following this delimiter and
enables empty strings.
Examples
If you want to change templates from the old syntax, e.g.
{{msg:Stub}}
, to the new syntax, e.g. {{Stub}}
, download an XML
dump file (pages-articles) from https://dumps.wikimedia.org, then use
this command:
python pwb.py replace -xml -regex "{{msg:(.*?)}}" "{{\1}}"
If you have a dump called foobar.xml
and want to fix typos in
articles, e.g. Errror -> Error, use this:
python pwb.py replace -xml:foobar.xml "Errror" "Error" -namespace:0
If you want to do more than one replacement at a time, use this:
python pwb.py replace -xml:foobar.xml "Errror" "Error" "Faail" "Fail" \
-namespace:0
If you have a page called ‘John Doe’ and want to fix the format of ISBNs, use:
python pwb.py replace -page:John_Doe -fix:isbn
This command will change ‘referer’ to ‘referrer’, but not in pages which talk about HTTP, where the typo has become part of the standard:
python pwb.py replace referer referrer -file:typos.txt -excepttext:HTTP
See also
scripts.template
to modify or remove templates.
solve_disambiguation script#
Script to help a human solve disambiguations by presenting a set of options
Specify the disambiguation page on the command line.
The program will pick up the page, and look for all alternative links, and show them with a number adjacent to them. It will then automatically loop over all pages referring to the disambiguation page, and show 30 characters of context on each side of the reference to help you make the decision between the alternatives. It will ask you to type the number of the appropriate replacement, and perform the change.
It is possible to choose to replace only the link (just type the number) or replace both link and link-text (type ‘r’ followed by the number).
Multiple references in one page will be scanned in order, but typing ‘n’ (next) on any one of them will leave the complete page unchanged. To leave only some reference unchanged, use the ‘s’ (skip) option.
Command line options:
-pos:XXXX adds XXXX as an alternative disambiguation
-just only use the alternatives given on the command line, do not
read the page for other possibilities
-dnskip Skip links already marked with a disambiguation-needed
template (e.g., {{dn}})
-primary "primary topic" disambiguation (Begriffsklärung nach Modell 2).
That's titles where one topic is much more important, the
disambiguation page is saved somewhere else, and the important
topic gets the nice name.
-primary:XY like the above, but use XY as the only alternative, instead of
searching for alternatives in [[Keyword (disambiguation)]].
Note: this is the same as -primary -just -pos:XY
-file:XYZ reads a list of pages from a text file. XYZ is the name of the
file from which the list is taken. If XYZ is not given, the
user is asked for a filename. Page titles should be inside
[[double brackets]]. The -pos parameter won't work if -file
is used.
-always:XY instead of asking the user what to do, always perform the same
action. For example, XY can be "r0", "u" or "2". Be careful with
this option, and check the changes made by the bot. Note that
some choices for XY don't make sense and will result in a loop,
e.g. "l" or "m".
-main only check pages in the main namespace, not in the Talk,
Project, User, etc. namespaces.
-first Uses only the first link of every line on the disambiguation
page that begins with an asterisk. Useful if the page is full
of irrelevant links that are not subject to disambiguation.
You won't get all af them as options, just the first on each
line. For a moderated example see
https://en.wikipedia.org/wiki/Szerdahely
A really exotic one is
https://hu.wikipedia.org/wiki/Brabant_(egyértelműsítő lap)
-start:XY goes through all disambiguation pages in the category on your
wiki that is defined (to the bot) as the category containing
disambiguation pages, starting at XY. If only '-start' or
'-start:' is given, it starts at the beginning.
-min:XX (XX being a number) only work on disambiguation pages for which
at least XX are to be worked on.
To complete a move of a page, one can use:
python pwb.py solve_disambiguation -just -pos:New_Name Old_Name
upload script#
Script to upload images to Wikipedia
The following parameters are supported:
-keep Keep the filename as is
-filename: Target filename without the namespace prefix
-prefix: Add specified prefix to every filename.
-noverify Do not ask for verification of the upload description if one
is given
-abortonwarn: Abort upload on the specified warning type. If no warning type
is specified, aborts on any warning.
-ignorewarn: Ignores specified upload warnings. If no warning type is
specified, ignores all warnings. Use with caution
-chunked: Upload the file in chunks (more overhead, but restartable). If
no value is specified the chunk size is 1 MiB. The value must
be a number which can be preceded by a suffix. The units are::
No suffix: Bytes
'k': Kilobytes (1000 B)
'M': Megabytes (1000000 B)
'Ki': Kibibytes (1024 B)
'Mi': Mebibytes (1024x1024 B)
The suffixes are case insensitive.
-async Make potentially large file operations asynchronous on the
server side when possible.
-always Don't ask the user anything. This will imply -keep and
-noverify and require that either -abortonwarn or -ignorewarn
is defined for all. It will also require a valid file name and
description. It'll only overwrite files if -ignorewarn includes
the 'exists' warning.
-recursive When the filename is a directory it also uploads the files from
the subdirectories.
-summary: Pick a custom edit summary for the bot.
-descfile: Specify a filename where the description is stored
It is possible to combine -abortonwarn and -ignorewarn so that if the specific warning is given it won’t apply the general one but more specific one. So if it should ignore specific warnings and abort on the rest it’s possible by defining no warning for -abortonwarn and the specific warnings for -ignorewarn. The order does not matter. If both are unspecific or a warning is specified by both, it’ll prefer aborting.
If any other arguments are given, the first is either URL, filename or directory to upload, and the rest is a proposed description to go with the upload. If none of these are given, the user is asked for the directory, file or URL to upload. The bot will then upload the image to the wiki.
The script will ask for the location of an image(s), if not given as a parameter, and for a description.
weblinkchecker script#
This bot is used for checking external links found at the wiki
It checks several pages at once, with a limit set by the config variable max_external_links, which defaults to 50.
The bot won’t change any wiki pages, it will only report dead links such that people can fix or remove the links themselves.
The bot will store all links found dead in a .dat file in the deadlinks subdirectory. To avoid the removing of links which are only temporarily unavailable, the bot ONLY reports links which were reported dead at least two times, with a time lag of at least one week. Such links will be logged to a .txt file in the deadlinks subdirectory.
The .txt file uses wiki markup and so it may be useful to post it on the wiki and then exclude that page from subsequent runs. For example if the page is named Broken Links, exclude it with ‘-titleregexnot:^Broken Links$’
After running the bot and waiting for at least one week, you can re-check those pages where dead links were found, using the -repeat parameter.
In addition to the logging step, it is possible to automatically report dead links to the talk page of the article where the link was found. To use this feature, set report_dead_links_on_talk = True in your user config file, or specify “-talk” on the command line. Adding “-notalk” switches this off irrespective of the configuration variable.
When a link is found alive, it will be removed from the .dat file.
These command line parameters can be used to specify which pages to work on:
-repeat Work on all pages where dead links were found before. This is
useful to confirm that the links are dead after some time (at
least one week), which is required before the script will report
the problem.
-namespace Only process templates in the namespace with the given number or
name. This parameter may be used multiple times.
-xml Should be used instead of a simple page fetching method from
pagegenerators.py for performance and load issues
-xmlstart Page to start with when using an XML dump
-ignore HTTP return codes to ignore. Can be provided several times :
-ignore:401 -ignore:500
This script supports use of pagegenerators
arguments.
Furthermore, the following command line parameters are supported:
-talk Overrides the report_dead_links_on_talk config variable, enabling
the feature.
-notalk Overrides the report_dead_links_on_talk config variable, disabling
the feature.
-day Do not report broken link if the link is there only since
x days or less. If not set, the default is 7 days.
The following config variables are supported:
max_external_links The maximum number of web pages that should be
loaded simultaneously. You should change this
according to your Internet connection speed.
Be careful: if it is set too high, the script
might get socket errors because your network
is congested, and will then think that the page
is offline.
report_dead_links_on_talk If set to true, causes the script to report dead
links on the article's talk page if (and ONLY if)
the linked page has been unavailable at least two
times during a timespan of at least one week.
weblink_dead_days sets the timespan (default: one week) after which
a dead link will be reported
Examples
Loads all wiki pages in alphabetical order using the Special:Allpages feature:
python pwb.py weblinkchecker -start:!
Loads all wiki pages using the Special:Allpages feature, starting at “Example page”:
python pwb.py weblinkchecker -start:Example_page
Loads all wiki pages that link to www.example.org:
python pwb.py weblinkchecker -weblink:www.example.org
Only checks links found in the wiki page “Example page”:
python pwb.py weblinkchecker Example page
Loads all wiki pages where dead links were found during a prior run:
python pwb.py weblinkchecker -repeat