Wikibase scripts#

claimit script#

A script that adds claims to Wikidata items based on a list of pages

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Usage:

python pwb.py claimit [pagegenerators] P1 Q2 P123 Q456

You can use any typical pagegenerator (like categories) to provide with a list of pages. Then list the property–>target pairs to add.

For geographic coordinates:

python pwb.py claimit [pagegenerators] P625 [lat-dec],[long-dec],[prec]

[lat-dec] and [long-dec] represent the latitude and longitude respectively, and [prec] represents the precision. All values are in decimal degrees, not DMS. If [prec] is omitted, the default precision is 0.0001 degrees.

Example

python pwb.py claimit [pagegenerators] P625 -23.3991,-52.0910,0.0001

By default, claimit.py does not add a claim if one with the same property already exists on the page. To override this behavior, use the ‘exists’ option:

python pwb.py claimit [pagegenerators] P246 "string example" -exists:p

Suppose the claim you want to add has the same property as an existing claim and the “-exists:p” argument is used. Now, claimit.py will not add the claim if it has the same target, source, and/or the existing claim has qualifiers. To override this behavior, add ‘t’ (target), ‘s’ (sources), or ‘q’ (qualifiers) to the ‘exists’ argument.

For instance, to add the claim to each page even if one with the same property and target and some qualifiers already exists:

python pwb.py claimit [pagegenerators] P246 "string example" -exists:ptq

Note that the ordering of the letters in the ‘exists’ argument does not matter, but ‘p’ must be included.

create_isbn_edition script#

Pywikibot script to load ISBN related data into Wikidata

Pywikibot script to get ISBN data from a digital library, and create or amend the related Wikidata item for edition (with the P212=ISBN number as unique external ID).

Use digital libraries to get ISBN data in JSON format, and integrate the results into Wikidata.

Then the resulting item number can be used e.g. to generate Wikipedia references using template Cite_Q.

param All parameters are optional:

P1: digital library (default goob “-“)

bnf Catalogue General (France) bol Bol.com dnb Deutsche National Library goob Google Books kb National Library of the Netherlands loc Library of Congress US mcues Ministerio de Cultura (Spain) openl OpenLibrary.org porbase urn.porbase.org Portugal sbn Servizio Bibliotecario Nazionale wiki wikipedia.org worldcat WorldCat

P2: ISO 639-1 language code

Default LANG; e.g. en, nl, fr, de, es, it, etc.

P3 P4…: P/Q pairs to add additional claims (repeated)

e.g. P921 Q107643461 (main subject: database management linked to P2163 Fast ID)

param stdin:

ISBN numbers (International standard book number)

Free text (e.g. Wikipedia references list, or publication list) is accepted. Identification is done via an ISBN regex expression.

Functionality:
  • The ISBN number is used as a primary key (P212 where no duplicates are allowed. The item update is not performed when there is no unique match

  • Statements are added or merged incrementally; existing data is not overwritten.

  • Authors and publishers are searched to get their item number (ambiguous items are skipped)

  • Book title and subtitle are separated with ‘.’, ‘:’, or ‘-’

  • This script can be run incrementally with the same parameters Caveat: Take into account the Wikidata Query database replication delay. Wait for minimum 5 minutes to avoid creating duplicate objects.

Data quality:
  • Use https://query.wikidata.org/querybuilder/ to identify P212 duplicates. Merge duplicate items before running the script again.

  • The following properties should only be used for written works P5331: OCLC work ID (editions should only have P243) P8383: Goodreads-identificatiecode for work (editions should only have P2969)

Examples

Default library (Google Books), language (LANG), no additional statements:

pwb create_isbn_edition.py 9789042925564

Wikimedia, language Dutch, main subject: database management:

pwb create_isbn_edition.py wiki en P921 Q107643461 978-0-596-10089-6

Standard ISBN properties:

P31:Q3331189:   instance of edition
P50:    author
P123:   publisher
P212:   canonical ISBN number (lookup via Wikidata Query)
P407:   language of work (Qnumber linked to ISO 639-1 language code)
P577:   date of publication (year)
P1476:  book title
P1680:  subtitle

Other ISBN properties:

P291:   place of publication
P921:   main subject (inverse lookup from external Fast ID P2163)
P629:   work for edition
P747:   edition of work
P1104:  number of pages

Qualifiers:

P1545:  (author) sequence number

External identifiers:

P213:   ISNI ID
P243:   OCLC ID
P496:   ORCID iD
P675:   Google Books-identificatiecode
P1036:  Dewey Decimal Classification
P2163:  Fast ID (inverse lookup via Wikidata Query) -> P921: main subject
P2969:  Goodreads-identificatiecode

(only for written works)
P5331:  OCLC work ID (editions should only have P243)
P8383:  Goodreads-identificatiecode for work (editions should only
        have P2969)
Author:

Geert Van Pamel, 2022-08-04, GNU General Public License v3.0, User:Geertivp

Documentation:
Prerequisites:

pywikibot

Install the following ISBN lib packages:: https://pypi.org/search/?q=isbnlib_

pip install isbnlib (mandatory)

(optional) pip install isbnlib-bol pip install isbnlib-bnf pip install isbnlib-dnb pip install isbnlib-kb pip install isbnlib-loc pip install isbnlib-worldcat2 etc.

Restrictions:
  • Better use the ISO 639-1 language code parameter as a default

    The language code is not always available from the digital library.

  • SPARQL queries run on a replicated database

    Possible important replication delay; wait 5 minutes before retry – otherwise risk for creating duplicates.

Known problems:
  • Unknown ISBN, e.g. 9789400012820

  • No ISBN data available for an edition either causes no output (goob = Google Books), or an error message (wiki, openl) The script is taking care of both

  • Only 6 ISBN attributes are listed by the webservice(s) missing are e.g.: place of publication, number of pages

  • Not all ISBN atttributes have data (authos, publisher, date of publication, language)

  • The script uses multiple webservice calls (script might take time, but it is automated)

  • Need to amend ISBN items that have no author, publisher, or other required data (which additional services to use?)

  • How to add still more digital libraries?
    • Does the KBR has a public ISBN service (Koninklijke Bibliotheek van België)?

  • Filter for work properties – need to amend Q47461344 (written work) instance and P629 (edition of) + P747 (has edition) statements https://www.wikidata.org/wiki/Q63413107 [‘9781282557246’, ‘9786612557248’, ‘9781847196057’, ‘9781847196040’] P8383: Goodreads-identificatiecode voor work 13957943 (should have P2969) P5331: OCLC-identificatiecode voor work 793965595 (should have P243)

Algorithm:

# Get parameters # Validate parameters # Get ISBN data # Convert ISBN data # Get additional data # Register ISBN data into Wikidata (create or amend items or claims)

Environment:

The python script can run on the following platforms::

    Linux client
    Google Chromebook (Linux container)
    Toolforge Portal
    PAWS

LANG: ISO 639-1 language code

Applications:

Generate a book reference
    Example: {{Cite Q|Q63413107}} (wp.en)
    See also::
        https://meta.wikimedia.org/wiki/WikiCite
        https://www.wikidata.org/wiki/Q21831105 (WikiCite)
        https://www.wikidata.org/wiki/Q22321052 (Cite_Q)
        https://www.mediawiki.org/wiki/Global_templates
        https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData
        https://phabricator.wikimedia.org/tag/wikicite/
        https://meta.wikimedia.org/wiki/WikiCite/Shared_Citations
Wikidata Query:
Related projects:
Other systems:

New in version 7.7.

dataextend script#

Script to add properties, identifiers and sources to WikiBase items

Usage:

dataextend <item> [<property>[+*]] [args]

In the basic usage, where no property is specified, item is the Q-number of the item to work on.from html import unescape

If a property (P-number, or the special value ‘Wiki’ or ‘Data’) is specified, only the data from that identifier are added. With a ‘+’ after it, work starts on that identifier, then goes on to identifiers after that (including new identifiers added while working on those identifiers). With a ‘*’ after it, the identifier itself is skipped, but those coming after it (not those coming before it) are included.

The following parameters are supported:

-always    If this is supplied, the bot will not ask for permission
           after each external link has been handled.

-showonly  Only show claims for a given ItemPage. Don't try to add any
           properties

The bot will load the corresponding pages for these identifiers, and try to the meaning of that string for the specified type of thing (for example ‘city’ or ‘gender’). If you want to use it, but not save it (which can happen if the string specifies a certain value now, but might show another value elsewhere, or if it is so specific that you’re pretty sure it won’t occur a second time), you can provide the Q-number with X rather than Q. If you do not want to use the string, you can just hit enter, or give the special value ‘XXX’ which means that it will be skipped in each subsequent run as well.

After an identifier has been worked on, there might be a list of names that has been found, in lc:name format, where lc is a language code. You can accept all suggested names (answer Y), none (answer N) or ask to get asked for each name separately (answer S), the latter being the default if you do not fill in anything.

After all identifiers have been worked on, possible descriptions in various languages are presented, and you get to choose one. The default is here 0, which always is the current description for that language. Finally, for a number of identifiers text is shown that usually gives parts of the description that are hard to parse automatically, so you can see if there any additional pieces of data that can be added.

It is advisable to (re)load the item page that the bot has been working on in the browser afterward, to correct any mistakes it has made, or cases where a more precise and less precise value have both been included.

New in version 7.2.

harvest_template script#

Template harvesting script

Usage (see below for explanations and examples):

python pwb.py harvest_template -transcludes:"..." \
   [default optional arguments] \
   template_parameter PID [local optional arguments] \
   [template_parameter PID [local optional arguments]]
python pwb.py harvest_template [generators] -template:"..." \
   [default optional arguments] \
   template_parameter PID [local optional arguments] \
   [template_parameter PID [local optional arguments]]

This will work on all pages that transclude the template in the article namespace

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

You can also use additional parameters:

-confirm            If used, the bot will ask if it should make changes

-create             Create missing items before importing.

The following command line parameters can be used to change the bot’s behavior. If you specify them before all parameters, they are global and are applied to all param-property pairs. If you specify them after a param-property pair, they are local and are only applied to this pair. If you specify the same argument as both local and global, the local argument overrides the global one (see also examples):

-islink           Treat plain text values as links ("text" -> "[[text]]").

-exists           If set to 'p', add a new value, even if the item already
                  has the imported property but not the imported value.
                  If set to 'pt', add a new value, even if the item already
                  has the imported property with the imported value and
                  some qualifiers.

-multi            If set, try to match multiple values from parameter.

-inverse          Import this property as the inverse claim.

Examples

The following command will try to import existing images from “image” parameter of “Infobox person” on English Wikipedia as Wikidata property “P18” (image):

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 \
    -template:"Infobox person" image P18

The following command will behave the same as the previous example and also try to import [[links]] from “birth_place” parameter of the same template as Wikidata property “P19” (place of birth):

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 \
    -template:"Infobox person" image P18 birth_place P19

The following command will import both “birth_place” and “death_place” params with -islink modifier, ie. the bot will try to import values, even if it doesn’t find a [[link]]:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 \
    -template:"Infobox person" -islink birth_place P19 death_place P20

The following command will do the same but only “birth_place” can be imported without a link:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 \
    -template:"Infobox person" birth_place P19 -islink death_place P20

The following command will import an occupation from “occupation” parameter of “Infobox person” on English Wikipedia as Wikidata property “P106” (occupation). The page won’t be skipped if the item already has that property but there is not the new value:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 \
    -template:"Infobox person" occupation P106 -exists:p

The following command will import band members from the “current_members” parameter of “Infobox musical artist” on English Wikipedia as Wikidata property “P527” (has part). This will only extract multiple band members if each is linked, and will not add duplicate claims for the same member:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 \
    -template:"Infobox musical artist" current_members P527 -exists:p \
    -multi

The following command will import the category’s main topic from the first anonymous parameter of “Cat main” on English Wikipedia as Wikidata property “P301” (category’s main topic) and whenever a new value is imported, the inverse claim is imported to the topic item as Wikidata property “P910” (topic’s main category) unless a claim of that property is already there:

python pwb.py harvest_template -lang:en -family:wikipedia -namespace:14 \
    -template:"Cat main" 1 P301 -inverse:P910 -islink

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

New in version 7.5: the -inverse option.

illustrate_wikidata script#

Bot to add images to Wikidata items

The image is extracted from the page_props. For this to be available the PageImages extension (https://www.mediawiki.org/wiki/Extension:PageImages) needs to be installed

Usage:

python pwb.py illustrate_wikidata <some generator>

This script supports use of pagegenerators arguments.

interwikidata script#

Script to handle interwiki links based on Wikibase

This script connects pages to Wikibase items using language links on the page. If multiple language links are present, and they are connected to different items, the bot skips. After connecting the page to an item, language links can be removed from the page.

These command line parameters can be used to specify which pages to work on:

This script supports use of pagegenerators arguments.

Furthermore, the following command line parameters are supported:

-always           If used, the bot won't ask if it should add the specified
                  text

-clean            Clean pages.

-create           Create items.

-merge            Merge items.

-summary:         Use your own edit summary for cleaning the page.

Note

This script is a ConfigParserBot. All options can be set within a settings file which is scripts.ini by default.

newitem script#

This script creates new items on Wikidata based on certain criteria

  • When was the (Wikipedia) page created?

  • When was the last edit on the page?

  • Does the page contain interwikis?

This script understands various command-line arguments:

-lastedit         The minimum number of days that has passed since the page was
                  last edited.

-pageage          The minimum number of days that has passed since the page was
                  created.

-touch            Do a null edit on every page which has a Wikibase item.
                  Be careful, this option can trigger edit rates or captchas
                  if your account is not autoconfirmed.