Main bot scripts#
add_text script#
Append text to the top or bottom of a page
By default this adds the text to the bottom above the categories and interwiki.
Use the following command line parameters to specify what to add:
- -text
(str) Text to append. “n” are interpreted as newlines.
- -textfile
(str) Path to a file with text to append
- -summary
(str) Change summary to use
- -up
Append text to the top of the page rather than the bottom
- -create
Create the page if necessary. Note that talk pages are created already without of this option.
- -createonly
Only create the page but do not edit existing ones
- -always
If used, the bot won’t ask if it should add the specified text
- -major
If used, the edit will be saved without the “minor edit” flag
- -talk, -talkpage
Put the text onto the talk page instead
- -excepturl
(str) Skip pages with a url that matches this regular expression
- -noreorder
Place the text beneath the categories and interwiki
Furthermore, the following can be used to specify which pages to process…
This script supports use of pagegenerators
arguments.
Examples
Append ‘hello world’ to the bottom of the sandbox:
python pwb.py add_text -page:Wikipedia:Sandbox
-summary:"Bot: pywikibot practice" -text:"hello world"
Add a template to the top of the pages with ‘category:catname’:
python pwb.py add_text -cat:catname -summary:"Bot: Adding a template"
-text:"{{Something}}" -except:"\{\{([Tt]emplate:|)[Ss]omething" -up
Command used on it.wikipedia to put the template in the page without any category:
python pwb.py add_text -except:"\{\{([Tt]emplate:|)[Cc]ategorizzare"
-text:"{{Categorizzare}}" -excepturl:"class='catlinks'>" -uncat
-summary:"Bot: Aggiungo template Categorizzare"
category script#
Script to manage categories
Syntax:
python pwb.py category action [-option]
where action can be one of these
- add
mass-add a category to a list of pages.
- remove
remove category tag from all pages in a category. If a pagegenerators option is given, the intersection with category pages is processed.
- move
move all pages in a category to another category. If a pagegenerators option is given, the intersection with category pages is processed.
- tidy
tidy up a category by moving its pages into subcategories.
- tree
show a tree of subcategories of a given category.
- listify
make a list of all of the articles that are in a category.
- clean
Removes redundant grandchildren from specified category by removing direct link to grandparent. In another words a grandchildren should not be also a children.
and option can be one of these
Options for add action:
- -person
Sort persons by their last name.
- -create
If a page doesn’t exist, do not skip it, create it instead.
- -redirect
Follow redirects.
Options for listify action:
- -append
This appends the list to the current page that is already existing (appending to the bottom by default).
- -overwrite
This overwrites the current page with the list even if something is already there.
- -showimages
This displays images rather than linking them in the list.
- -talkpages
This outputs the links to talk pages of the pages to be listified in addition to the pages themselves.
- -prefix:#
You may specify a list prefix like “#” for a numbered list or any other prefix. Default is a bullet list with prefix “*”.
Options for remove action:
- -nodelsum
This specifies not to use the custom edit summary as the deletion reason. Instead, it uses the default deletion reason for the language, which is “Category was disbanded” in English.
Options for move action:
- -hist
Creates a nice wikitable on the talk page of target category that contains detailed page history of the source category.
- -nodelete
Don’t delete the old category after move.
- -nowb
Don’t update the Wikibase repository.
- -allowsplit
If that option is not set, it only moves the talk and main page together.
- -mvtogether
Only move the pages/subcategories of a category, if the target page (and talk page, if
-allowsplit
is not set) doesn’t exist.- -keepsortkey
Use sortKey of the old category also for the new category. If not specified, sortKey is removed. An alternative method to keep sortKey is to use
-inplace
option.
Options for listify and tidy actions:
- -namespaces, -namespace, -ns
Filter the arcitles in the specified namespaces. Separate multiple namespace numbers or names with commas. Examples:
-ns:0,2,4
,-ns:Help,MediaWiki
Options for clean action:
- -always
The bot won’t ask for confirmation when putting a page.
Options for several actions:
- -rebuild
Reset the database.
- -from:
The category to move from (for the move option). Also, the category to remove from in the remove option. Also, the category to make a list of in the listify option.
- -to:
The category to move to (for the move option). Also, the name of the list to make in the listify option.
- -batch
Don’t prompt to delete emptied categories (do it automatically).
- -summary:
Pick a custom edit summary for the bot.
- -inplace
Use this flag to change categories in place rather than rearranging them.
- -recurse[:<depth>]
Recurse through subcategories of the category to optional depth.
- -pagesonly
While removing pages from a category, keep the subpage links and do not remove them.
- -match
Only work on pages whose titles match the given regex (for move and remove actions).
- -depth:
The max depth limit beyond which no subcategories will be isted.
Note
If the category names have spaces in them you may need to use
a special syntax in your shell so that the names aren’t treated as
separate parameters. For instance, in BASH, use single quotes, e.g.
-from:'Polar bears'
.
If action is “add”, “move” or “remove, the following additional options are supported:
This script supports use of pagegenerators
arguments.
For the actions tidy and tree, the bot will store the category structure locally in category.dump. This saves time and server load, but if it uses these data later, they may be outdated; use the -rebuild parameter in this case.
For example, to create a new category from a list of persons, type:
python pwb.py category add -person
and follow the on-screen instructions.
Or to do it all from the command-line, use the following syntax:
python pwb.py category move -from:US -to:”United States”
This will move all pages in the category US to the category United States.
A pagegenerators option can be given with move and remove action:
pwb category -site:wikipedia:en remove -from:Hydraulics -cat:Pneumatics
The sample above would remove ‘Hydraulics’ category from all pages which are also in ‘Pneumatics’ category.
Changed in version 8.0: pagegenerators
are supported with “move” and “remove” action.
replace script#
This bot will make direct text replacements
It will retrieve information on which pages might need changes either from an XML dump or a text file, or only change a single page.
These command line parameters can be used to specify which pages to work on:
This script supports use of pagegenerators
arguments.
Furthermore, the following command line parameters are supported:
- -mysqlquery
Retrieve information from a local database mirror. If no query specified, bot searches for pages with given replacements.
- -xml
Retrieve information from a local XML dump (pages-articles or pages-meta-current, see https://dumps.wikimedia.org). Argument can also be given as “-xml:filename”.
- -regex
Make replacements using regular expressions. If this argument isn’t given, the bot will make simple text replacements.
- -nocase
Use case insensitive regular expressions.
- -dotall
Make the dot match any character at all, including a newline. Without this flag, ‘.’ will match anything except a newline.
- -multiline
‘^’ and ‘$’ will now match begin and end of each line.
- -xmlstart
(Only works with -xml) Skip all articles in the XML dump before the one specified (may also be given as -xmlstart:Article).
- -addcat:cat_name
Adds “cat_name” category to every altered page.
- -excepttitle:XYZ
Skip pages with titles that contain XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.
- -requiretitle:XYZ Only do pages with titles that contain XYZ. If the -regex
argument is given, XYZ will be regarded as a regular expression.
- -excepttext:XYZ
Skip pages which contain the text XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.
- -exceptinside:XYZ Skip occurrences of the to-be-replaced text which lie
within XYZ. If the -regex argument is given, XYZ will be regarded as a regular expression.
- -exceptinsidetag:XYZ Skip occurrences of the to-be-replaced text which lie
within an XYZ tag.
- -summary:XYZ
Set the summary message text for the edit to XYZ, bypassing the predefined message texts with original and replacements inserted. To add the replacements to your summary use the %(description)s placeholder, for example: -summary:”Bot operated replacement: %(description)s” Can’t be used with -automaticsummary.
- -automaticsummary Uses an automatic summary for all replacements which don’t
have a summary defined. Can’t be used with -summary.
- -sleep:123
If you use -fix you can check multiple regex at the same time in every page. This can lead to a great waste of CPU because the bot will check every regex without waiting using all the resources. This will slow it down between a regex and another in order not to waste too much CPU.
- -fix:XYZ
Perform one of the predefined replacements tasks, which are given in the dictionary ‘fixes’ defined inside the files fixes.py and user-fixes.py.
The available fixes are listed in
pywikibot.fixes
.
- -manualinput
Request manual replacements via the command line input even if replacements are already defined. If this option is set (or no replacements are defined via -fix or the arguments) it’ll ask for additional replacements at start.
- -pairsfile
Lines from the given file name(s) will be read as replacement arguments. i.e. a file containing lines “a” and “b”, used as:
python pwb.py replace -page:X -pairsfile:file c d
will replace ‘a’ with ‘b’ and ‘c’ with ‘d’.
- -always
Don’t prompt you for each replacement
- -quiet
Don’t prompt a message if a page keeps unchanged
- -nopreload
Do not preload pages. Useful if disabled on a wiki.
- -recursive
Recurse replacement as long as possible. Be careful, this might lead to an infinite loop.
- -allowoverlap
When occurrences of the pattern overlap, replace all of them. Be careful, this might lead to an infinite loop.
- -fullsummary
Use one large summary for all command line replacements.
- Replacement parameters
Replacement parameters are pairs of arguments given to the script. The First argument is the old text to be replaced, the second argument is the new text. If the
-regex
argument is given, the first argument will be regarded as a regular expression, and the second argument might contain expressions like\1
or\g<name>
. The second parameter can also be specified as empty string, usually""
. It is possible to introduce more than one pair of replacement parameters.
Empty string arguments with PowerShell
Using PowerShell as command shell removes empty strings during
PowerShell’s command line parsing. To enable empty strings with
PowerShell you have either to escape quotation marks with gravis
symbols in front of them like `"`"
or to disable command line
parsing with --%
symbol for all following command parts like
python pwb replace --% -start:! foo ""
which disables parsing
for all replace options and arguments following this delimiter and
enables empty strings.
Examples
If you want to change templates from the old syntax, e.g.
{{msg:Stub}}
, to the new syntax, e.g. {{Stub}}
, download an XML
dump file (pages-articles) from https://dumps.wikimedia.org, then use
this command:
python pwb.py replace -xml -regex “{{msg:(.*?)}}” “{{1}}”
If you have a dump called foobar.xml
and want to fix typos in
articles, e.g. Errror -> Error, use this:
python pwb.py replace -xml:foobar.xml “Errror” “Error” -namespace:0
If you want to do more than one replacement at a time, use this:
python pwb.py replace -xml:foobar.xml “Errror” “Error” “Faail” “Fail” -namespace:0
If you have a page called ‘John Doe’ and want to fix the format of ISBNs, use:
python pwb.py replace -page:John_Doe -fix:isbn
This command will change ‘referer’ to ‘referrer’, but not in pages which talk about HTTP, where the typo has become part of the standard:
python pwb.py replace referer referrer -file:typos.txt -excepttext:HTTP
See also
scripts.template
to modify or remove templates.
solve_disambiguation script#
Script to help a human solve disambiguations by presenting a set of options
Specify the disambiguation page on the command line.
The program will pick up the page, and look for all alternative links, and show them with a number adjacent to them. It will then automatically loop over all pages referring to the disambiguation page, and show 30 characters of context on each side of the reference to help you make the decision between the alternatives. It will ask you to type the number of the appropriate replacement, and perform the change.
It is possible to choose to replace only the link (just type the number) or replace both link and link-text (type ‘r’ followed by the number).
Multiple references in one page will be scanned in order, but typing ‘n’ (next) on any one of them will leave the complete page unchanged. To leave only some reference unchanged, use the ‘s’ (skip) option.
Command line options:
-pos:XXXX adds XXXX as an alternative disambiguation
- -just
only use the alternatives given on the command line, do not read the page for other possibilities
- -dnskip
Skip links already marked with a disambiguation-needed template (e.g., {{dn}})
- -primary
“primary topic” disambiguation (Begriffsklärung nach Modell 2). That’s titles where one topic is much more important, the disambiguation page is saved somewhere else, and the important topic gets the nice name.
- -primary:XY like the above, but use XY as the only alternative, instead of
searching for alternatives in [[Keyword (disambiguation)]]. Note: this is the same as -primary -just -pos:XY
- -file:XYZ reads a list of pages from a text file. XYZ is the name of the
file from which the list is taken. If XYZ is not given, the user is asked for a filename. Page titles should be inside [[double brackets]]. The -pos parameter won’t work if -file is used.
- -always:XY instead of asking the user what to do, always perform the same
action. For example, XY can be “r0”, “u” or “2”. Be careful with this option, and check the changes made by the bot. Note that some choices for XY don’t make sense and will result in a loop, e.g. “l” or “m”.
- -main
only check pages in the main namespace, not in the Talk, Project, User, etc. namespaces.
- -first
Uses only the first link of every line on the disambiguation page that begins with an asterisk. Useful if the page is full of irrelevant links that are not subject to disambiguation. You won’t get all af them as options, just the first on each line. For a moderated example see https://en.wikipedia.org/wiki/Szerdahely A really exotic one is https://hu.wikipedia.org/wiki/Brabant_(egyértelműsítő lap)
- -start:XY goes through all disambiguation pages in the category on your
wiki that is defined (to the bot) as the category containing disambiguation pages, starting at XY. If only ‘-start’ or ‘-start:’ is given, it starts at the beginning.
- -min:XX (XX being a number) only work on disambiguation pages for which
at least XX are to be worked on.
To complete a move of a page, one can use:
python pwb.py solve_disambiguation -just -pos:New_Name Old_Name
upload script#
Script to upload images to Wikipedia
The following parameters are supported:
- -keep
Keep the filename as is
- -filename:
(str) Target filename without the namespace prefix
- -prefix:
(str) Add specified prefix to every filename.
- -noverify
Do not ask for verification of the upload description if one is given
- -abortonwarn:
Abort upload on the specified warning type. If no warning type is specified, aborts on any warning.
- -ignorewarn:
Ignores specified upload warnings. If no warning type is specified, ignores all warnings. Use with caution
- -chunked:
Upload the file in chunks (more overhead, but restartable). If no value is specified the chunk size is 1 MiB. The value must be a number which can be preceded by a suffix. The units are:
No suffix: Bytes 'k': Kilobytes (1000 B) 'M': Megabytes (1000000 B) 'Ki': Kibibytes (1024 B) 'Mi': Mebibytes (1024x1024 B)
The suffixes are case insensitive.
- -async
Make potentially large file operations asynchronous on the server side when possible.
- -always
Don’t ask the user anything. This will imply -keep and
-noverify
and require that either-abortonwarn
or-ignorewarn
is defined for all. It will also require a valid file name and description. It’ll only overwrite files if-ignorewarn
includes the ‘exists’ warning.- -recursive
When the filename is a directory it also uploads the files from the subdirectories.
- -summary:
(str) Pick a custom edit summary for the bot.
- -descfile:
(str) Specify a filename where the description is stored
It is possible to combine -abortonwarn
and -ignorewarn
so that
if the specific warning is given it won’t apply the general one but more
specific one. So if it should ignore specific warnings and abort on the
rest it’s possible by defining no warning for -abortonwarn and the
specific warnings for -ignorewarn
. The order does not matter. If
both are unspecific or a warning is specified by both, it’ll prefer
aborting.
If any other arguments are given, the first is either URL, filename or directory to upload, and the rest is a proposed description to go with the upload. If none of these are given, the user is asked for the directory, file or URL to upload. The bot will then upload the image to the wiki.
The script will ask for the location of an image(s), if not given as a parameter, and for a description.
weblinkchecker script#
This bot is used for checking external links found at the wiki
It checks several pages at once, with a limit set by the config variable max_external_links, which defaults to 50.
The bot won’t change any wiki pages, it will only report dead links such that people can fix or remove the links themselves.
The bot will store all links found dead in a .dat file in the deadlinks subdirectory. To avoid the removing of links which are only temporarily unavailable, the bot ONLY reports links which were reported dead at least two times, with a time lag of at least one week. Such links will be logged to a .txt file in the deadlinks subdirectory.
The .txt file uses wiki markup and so it may be useful to post it on the wiki and then exclude that page from subsequent runs. For example if the page is named Broken Links, exclude it with ‘-titleregexnot:^Broken Links$’
After running the bot and waiting for at least one week, you can re-check those pages where dead links were found, using the -repeat parameter.
In addition to the logging step, it is possible to automatically report dead links to the talk page of the article where the link was found. To use this feature, set report_dead_links_on_talk = True in your user config file, or specify “-talk” on the command line. Adding “-notalk” switches this off irrespective of the configuration variable.
When a link is found alive, it will be removed from the .dat file.
These command line parameters can be used to specify which pages to work on:
- -repeat
Work on all pages where dead links were found before. This is useful to confirm that the links are dead after some time (at least one week), which is required before the script will report the problem.
- -namespace
Only process templates in the namespace with the given number or name. This parameter may be used multiple times.
- -xml
Should be used instead of a simple page fetching method from pagegenerators.py for performance and load issues
- -xmlstart
Page to start with when using an XML dump
- -ignore
HTTP return codes to ignore. Can be provided several times : -ignore:401 -ignore:500
This script supports use of pagegenerators
arguments.
Furthermore, the following command line parameters are supported:
- -talk
Overrides the report_dead_links_on_talk config variable, enabling the feature.
- -notalk
Overrides the report_dead_links_on_talk config variable, disabling the feature.
- -day
Do not report broken link if the link is there only since x days or less. If not set, the default is 7 days.
The following config variables are supported:
- max_external_links The maximum number of web pages that should be
loaded simultaneously. You should change this according to your Internet connection speed. Be careful: if it is set too high, the script might get socket errors because your network is congested, and will then think that the page is offline.
- report_dead_links_on_talk If set to true, causes the script to report dead
links on the article’s talk page if (and ONLY if) the linked page has been unavailable at least two times during a timespan of at least one week.
- weblink_dead_days sets the timespan (default: one week) after which
a dead link will be reported
Examples
Loads all wiki pages in alphabetical order using the Special:Allpages feature:
python pwb.py weblinkchecker -start:!
Loads all wiki pages using the Special:Allpages feature, starting at “Example page”:
python pwb.py weblinkchecker -start:Example_page
Loads all wiki pages that link to www.example.org:
python pwb.py weblinkchecker -weblink:www.example.org
Only checks links found in the wiki page “Example page”:
python pwb.py weblinkchecker Example page
Loads all wiki pages where dead links were found during a prior run:
python pwb.py weblinkchecker -repeat