Wikibase scripts#

claimit script#

A script that adds claims to Wikidata items based on a list of pages

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Usage:

python pwb.py claimit [pagegenerators] P1 Q2 P123 Q456

You can use any typical pagegenerator (like categories) to provide with a list of pages. Then list the property–>target pairs to add.

For geographic coordinates:

python pwb.py claimit [pagegenerators] P625 [lat-dec],[long-dec],[prec]

[lat-dec] and [long-dec] represent the latitude and longitude respectively, and [prec] represents the precision. All values are in decimal degrees, not DMS. If [prec] is omitted, the default precision is 0.0001 degrees.

Example

python pwb.py claimit [pagegenerators] P625 -23.3991,-52.0910,0.0001

By default, claimit.py does not add a claim if one with the same property already exists on the page. To override this behavior, use the ‘exists’ option:

python pwb.py claimit [pagegenerators] P246 “string example” -exists:p

Suppose the claim you want to add has the same property as an existing claim and the “-exists:p” argument is used. Now, claimit.py will not add the claim if it has the same target, source, and/or the existing claim has qualifiers. To override this behavior, add ‘t’ (target), ‘s’ (sources), or ‘q’ (qualifiers) to the ‘exists’ argument.

For instance, to add the claim to each page even if one with the same property and target and some qualifiers already exists:

python pwb.py claimit [pagegenerators] P246 “string example” -exists:ptq

Note that the ordering of the letters in the ‘exists’ argument does not matter, but ‘p’ must be included.

create_isbn_edition script#

Pywikibot client to load ISBN linked data into Wikidata

Pywikibot script to get ISBN data from a digital library, and create or amend the related Wikidata item for edition (with the P212, ISBN number as unique external ID).

Use digital libraries to get ISBN data in JSON format, and integrate the results into Wikidata.

Note

ISBN data should only be used for editions, and not for written works.

Then the resulting item number can be used e.g. to generate Wikipedia references using template Cite_Q.

Parameters:

All parameters are optional:

*P1:*        digital library (default wiki "-")

    bnf      Catalogue General (France)
    bol      Bol.com
    dnb      Deutsche National Library
    goob     Google Books
    kb       National Library of the Netherlands
    loc      Library of Congress US
    mcues    Ministerio de Cultura (Spain)
    openl    OpenLibrary.org
    porbase  urn.porbase.org Portugal
    sbn      Servizio Bibliotecario Nazionale (Italy)
    wiki     wikipedia.org
    worldcat WorldCat (wc)

*P2:*        ISO 639-1 language code. Default LANG; e.g. en, nl,
             fr, de, es, it, etc.

*P3 P4...:*  P/Q pairs to add additional claims (repeated) e.g.
             P921 Q107643461 (main subject: database management
             linked to P2163, Fast ID 888037)

*stdin:*     List of ISBN numbers (International standard book
             number, version 10 or 13). Free text (e.g.
             Wikipedia references list, or publication list) is
             accepted. Identification is done via an ISBN regex
             expression.
Functionality:
  • Both ISBN-10 and ISBN-13 numbers are accepted as input.

  • Only ISBN-13 numbers are stored. ISBN-10 numbers are only used for identification purposes; they are not stored.

  • The ISBN number is used as a primary key; no two items can have the same P212 ISBN number. The item update is not performed when there is no unique match. Only editions are updated or created.

  • Individual statements are added or merged incrementally; existing data is not overwritten.

  • Authors and publishers are searched to get their item number; unknown of ambiguous items are skipped.

  • Book title and subtitle are separated with either ‘.’, ‘:’, or ‘-’ in that order.

  • Detect author, illustrator, writer preface, afterwork instances.

  • Add profession “author” to individual authors.

  • This script can be run incrementally.

Examples:

Default library (Google Books), language (LANG), no additional statements:

pwb create_isbn_edition.py 9789042925564

Wikimedia, language English, main subject: database management:

pwb create_isbn_edition.py wiki en P921 Q107643461 978-0-596-10089-6

Data quality:
  • ISBN numbers (P212) are only assigned to editions.

  • A written work should not have an ISBN number (P212).

  • For targets of P629 (edition of) amend “is an Q47461344 (written work) instance” and “inverse P747 (work has edition)” statements

  • Use https://query.wikidata.org/querybuilder/ to identify P212 duplicates. Merge duplicate items before running the script again.

  • The following properties should only be used for written works, not for editions:

    • P5331: OCLC work ID (editions should only have P243)

    • P8383: Goodreads-identificatiecode for work (editions should only have P2969)

Return status:

The following status codes are returned to the shell:

3   Invalid or missing parameter
4   Library not installed
12  Item does not exist
20  Network error
Standard ISBN properties for editions:
P31:Q3331189:  instance of edition (mandatory statement)
P50:           author
P123:          publisher
P212:          canonical ISBN number (with dashes; searchable
               via Wikidata Query)
P407:          language of work (Qnumber linked to ISO 639-1
               language code)
P577:          date of publication (year)
P1476:         book title
P1680:         subtitle
Other ISBN properties:
P921:   main subject (inverse lookup from external Fast ID P2163)
P629:   work for edition
P747:   edition of work
Qualifiers:
P248:   Source
P813:   Retrieval date
P1545:  (author) sequence number
External identifiers:
P243:   OCLC ID
P1036:  Dewey Decimal Classification
P2163:  Fast ID (inverse lookup via Wikidata Query)
        -> P921: main subject

(not implemented)
P2969:  Goodreads-identificatiecode

(only for written works)
P5331:  OCLC work ID (editions should only have P243)

(not implemented)
P8383:  Goodreads-identificatiecode for work
        (editions should only have P2969)
P213:   ISNI ID
P496:   ORCID ID
P675:   Google Books-identificatiecode
Unavailable properties from digital library:
(not implemented by isbnlib)
P98:    Editor
P110:   Illustrator/photographer
P291:   place of publication
P1104:  number of pages
?:      edition format (hardcover, paperback)
Author:

Geert Van Pamel (User:Geertivp), MIT License, 2022-08-04,

Prerequisites:

In addition to Pywikibot the following ISBN lib package is mandatory; install it with:

pip install isbnlib

The following ISBN lib package are optional; install them with:

pip install isbnlib-bnf
pip install isbnlib-bol
pip install isbnlib-dnb
pip install isbnlib-kb
pip install isbnlib-loc
pip install isbnlib-worldcat2
Restrictions:
  • Better use the ISO 639-1 language code parameter as a default. The language code is not always available from the digital library; therefore we need a default.

  • Publisher unknown: * Missing P31:Q2085381 statement, missing subclass in script * Missing alias * Create publisher

  • Unknown author: create author as a person

Known Problems:
  • Unknown ISBN, e.g. 9789400012820

  • If there is no ISBN data available for an edition either returns no output (goob = Google Books), or an error message (wiki, openl). The script is taking care of both. Try another library instance.

  • Only 6 specific ISBN attributes are listed by the webservice(s), missing are e.g.: place of publication, number of pages

  • Some digital libraries have more registrations than others.

  • Some digital libraries have data quality problems.

  • Not all ISBN atttributes have data values (authors, publisher, date of publication), language can be missing at the digital library.

  • How to add still more digital libraries?

    • This would require an additional isbnlib module

    • Does the KBR has a public ISBN service (Koninklijke Bibliotheek van België)?

  • The script uses multiple webservice calls; script might take time, but it is automated.

  • Need to manually amend ISBN items that have no author, publisher, or other required data * You could use another digital library * Which other services to use?

  • BibTex service is currently unavailable

  • Filter for work properties: https://www.wikidata.org/wiki/Q63413107

    ['9781282557246', '9786612557248', '9781847196057', '9781847196040']
    P5331: OCLC identification code for work 793965595; should only
           have P243)
    P8383: Goodreads identification code for work 13957943; should
           only have P2969)
    
  • ERROR: an HTTP error has ocurred e.g. (503) Service Unavailable

  • error: externally-managed-environment

    isbnlib-kb cannot be installed via pip install command. It raises error: externally-managed-environment because this environment is externally managed.

    To install Python packages system-wide, try apt install python3-xyz, where xyz is the package you are trying to install.

    If you wish to install a non-Debian-packaged Python package, create a virtual environment using python3 -m venv path/to/venv. Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make sure you have python3-full installed.

    If you wish to install a non-Debian packaged Python application, it may be easiest to use pipx install xyz, which will manage a virtual environment for you. Make sure you have pipx installed.

    See also

    See Python Library venv for more information about virtual environments.

    Note

    If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages to pip.

    Hint

    See PEP 668 for the detailed specification.

    You need to install a local python environment:

    sudo -s
    apt install python3-full
    python3 -m venv /opt/python
    /opt/python/bin/pip install pywikibot
    /opt/python/bin/pip install isbnlib-kb
    /opt/python/bin/python ../userscripts/create_isbn_edition.py kb
    
Environment:

The python script can run on the following platforms:

  • Linux client

  • Google Chromebook (Linux container)

  • Toolforge Portal

  • PAWS

LANG: default ISO 639-1 language code

Applications:

Generate a book reference. Example for wp.en only:

{{Cite Q|Q63413107}}

Use the Visual editor reference with Qnumber.

Wikidata Query:
Related projects:
Other systems:
Documentation:

Added in version 7.7.

Changed in version 9.6: several implementation improvements

dataextend script#

Script to add properties, identifiers and sources to WikiBase items

Usage:

dataextend <item> [<property>[+*]] [args]

In the basic usage, where no property is specified, item is the Q-number of the item to work on.from html import unescape

If a property (P-number, or the special value ‘Wiki’ or ‘Data’) is specified, only the data from that identifier are added. With a ‘+’ after it, work starts on that identifier, then goes on to identifiers after that (including new identifiers added while working on those identifiers). With a ‘*’ after it, the identifier itself is skipped, but those coming after it (not those coming before it) are included.

The following parameters are supported:

-always

If this is supplied, the bot will not ask for permission after each external link has been handled.

-showonly

Only show claims for a given ItemPage. Don’t try to add any properties

The bot will load the corresponding pages for these identifiers, and try to the meaning of that string for the specified type of thing (for example ‘city’ or ‘gender’). If you want to use it, but not save it (which can happen if the string specifies a certain value now, but might show another value elsewhere, or if it is so specific that you’re pretty sure it won’t occur a second time), you can provide the Q-number with X rather than Q. If you do not want to use the string, you can just hit enter, or give the special value ‘XXX’ which means that it will be skipped in each subsequent run as well.

After an identifier has been worked on, there might be a list of names that has been found, in lc:name format, where lc is a language code. You can accept all suggested names (answer Y), none (answer N) or ask to get asked for each name separately (answer S), the latter being the default if you do not fill in anything.

After all identifiers have been worked on, possible descriptions in various languages are presented, and you get to choose one. The default is here 0, which always is the current description for that language. Finally, for a number of identifiers text is shown that usually gives parts of the description that are hard to parse automatically, so you can see if there any additional pieces of data that can be added.

It is advisable to (re)load the item page that the bot has been working on in the browser afterward, to correct any mistakes it has made, or cases where a more precise and less precise value have both been included.

Added in version 7.2.

Deprecated since version 9.6: will be removed with Pywikibot 10.

harvest_template script#

Template harvesting script

Usage (see below for explanations and examples):

python pwb.py harvest_template -transcludes:”…” [default optional arguments] template_parameter PID [local optional arguments] [template_parameter PID [local optional arguments]]

python pwb.py harvest_template [generators] -template:”…” [default optional arguments] template_parameter PID [local optional arguments] [template_parameter PID [local optional arguments]]

This will work on all pages that transclude the template in the article namespace

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

You can also use additional parameters:

-confirm

If used, the bot will ask if it should make changes

-create

Create missing items before importing.

The following command line parameters can be used to change the bot’s behavior. If you specify them before all parameters, they are global and are applied to all param-property pairs. If you specify them after a param-property pair, they are local and are only applied to this pair. If you specify the same argument as both local and global, the local argument overrides the global one (see also examples):

-islink

Treat plain text values as links (“text” -> “[[text]]”).

-exists

If set to ‘p’, add a new value, even if the item already has the imported property but not the imported value. If set to ‘pt’, add a new value, even if the item already has the imported property with the imported value and some qualifiers.

-multi

If set, try to match multiple values from parameter.

-inverse

Import this property as the inverse claim.

Examples

The following command will try to import existing images from “image” parameter of “Infobox person” on English Wikipedia as Wikidata property “P18” (image):

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” image P18

The following command will behave the same as the previous example and also try to import [[links]] from “birth_place” parameter of the same template as Wikidata property “P19” (place of birth):

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” image P18 birth_place P19

The following command will import both “birth_place” and “death_place” params with -islink modifier, ie. the bot will try to import values, even if it doesn’t find a [[link]]:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” -islink birth_place P19 death_place P20

The following command will do the same but only “birth_place” can be imported without a link:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” birth_place P19 -islink death_place P20

The following command will import an occupation from “occupation” parameter of “Infobox person” on English Wikipedia as Wikidata property “P106” (occupation). The page won’t be skipped if the item already has that property but there is not the new value:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” occupation P106 -exists:p

The following command will import band members from the “current_members” parameter of “Infobox musical artist” on English Wikipedia as Wikidata property “P527” (has part). This will only extract multiple band members if each is linked, and will not add duplicate claims for the same member:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox musical artist” current_members P527 -exists:p -multi

The following command will import the category’s main topic from the first anonymous parameter of “Cat main” on English Wikipedia as Wikidata property “P301” (category’s main topic) and whenever a new value is imported, the inverse claim is imported to the topic item as Wikidata property “P910” (topic’s main category) unless a claim of that property is already there:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:14 -template:”Cat main” 1 P301 -inverse:P910 -islink

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

Added in version 7.5: the -inverse option.

illustrate_wikidata script#

Bot to add images to Wikidata items

The image is extracted from the page_props. For this to be available the PageImages extension (https://www.mediawiki.org/wiki/Extension:PageImages) needs to be installed.

The following options are provided:

-always

Don’t prompt to make changes, just do them.

-property

The property to add. Should be of type commonsMedia.

Usage:

python pwb.py illustrate_wikidata <some generator>

This script supports use of pagegenerators arguments.

interwikidata script#

Script to handle interwiki links based on Wikibase

This script connects pages to Wikibase items using language links on the page. If multiple language links are present, and they are connected to different items, the bot skips. After connecting the page to an item, language links can be removed from the page.

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-always

If used, the bot won’t ask if it should add the specified text.

-clean

Clean pages.

-create

Create items.

-merge

Merge items.

-summary:

(str) Use your own edit summary for cleaning the page.

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

newitem script#

This script creates new items on Wikidata based on certain criteria

  • When was the (Wikipedia) page created?

  • When was the last edit on the page?

  • Does the page contain interwikis?

This script understands various command-line arguments:

-lastedit

The minimum number of days that has passed since the page was last edited.

-pageage

The minimum number of days that has passed since the page was created.

-touch

Do a null edit on every page which has a Wikibase item. Be careful, this option can trigger edit rates or captchas if your account is not autoconfirmed.