Lua Task 7 - Wikibase client

edit

Prerequisite: Lua Task 6 - MediaWiki libraries. This task requires research and independent learning. It is considerably harder than previous tasks and may be unsuitable for beginners to programming. The previous task examined some of the specialised functions that are available in the Scribunto libraries that we can use for working in MediaWiki projects. This task looks at how we can fetch data from Wikidata and use it in Wikipedia.

Wikidata is a database containing over 70 million items as of December 2019. The data is structured so that each item can have a number of properties, each of which can potentially have multiple values. For example, the entry for Douglas Adams (Q42) contains many facts about Douglas Adams. You can see that there is property date of birth (P569) which has the value "11 March 1952", The type of that data is time, and other common datatypes include string, url, and wikibase-item (a link to another item in Wikidata). Items in the database are referenced by a unique ID which consists of the letter 'Q' followed by a number – you may see this referred to as the "Q-number". Similarly, all of the properties consist of the letter 'P' followed by a number. This allows each language to have its own label for each item and each property. You can find the wikibase-item ID for any entry by searching Wikidata for the topic, and looking at the url – the ID is the last part of the url.

The Scribunto extension for MediaWiki can be enhanced further by an extension such as the Wikibase client. The documentation for this extension is at mw:Extension:Wikibase Client/Lua. You don't need to learn it, but you will find it useful to refer to when you write code to fetch data from Wikidata.

Fetching a date

edit

You will write a simple function in your module sandbox to get Douglas Adams' date of birth from Wikidata.

1. In your module sandbox, before the final return p, add a comment -- Get date, and then add a new function called p.getdate. In the function, set the local variable qid to "Q42" (the Wikidata ID for Douglas Adams). Set the local variable prop to "P569" (the Wikidata property ID for "date of birth"). Use following line to get a table of values:

local valtbl = mw.wikibase.getBestStatements(qid, prop)

Assume that there is one value returned. That will be in valtbl[1], which is also a table. The structure of the data in Wikidata is usually tables inside tables. The syntax to get the value for a date/time into a local variable called timestamp is

local timestamp = valtbl[1].mainsnak.datavalue.value.time

Add that line and then return timestamp from the function. Add an end to finish the function. Save your module sandbox.

Timestamps are a fairly standard way of storing a date and time as a string. For Douglas Adams' date of birth, the timestamp should look something like this:

  • +1952-03-11T00:00:00Z

Can you work out how that represents a date and what the date is? (The time is set to zero. i.e. not set.)

2. At the end of your user sandbox, add a new second level heading, "Fetching a date".

Write the line needed to call the getdate function from your module sandbox. Preview the result and correct any errors in either the module or user sandbox. When you are satisfied that you have the right value, save your user sandbox.

3. Look back at your answers to Task 5, in particular the pattern matching. Work out what pattern you would need to get the year, month and day as a string like "1952-03-11" from the timestamp. We call that an "ISO-style" date. Alternatively, you could use string.sub to get the year, month and day if you prefer. Decide on which method you will use, and then modify your function in the appropriate places so that it returns just the year, month, day instead of the whole timestamp. Save your module sandbox, then refresh your user sandbox and make sure it shows just the year, month and day. Fix any errors.

4. You can generalise your function so that it can read any date from any Wikidata item.

Modify your function so that it reads parameters called |qid= and |prop= from the frame, instead of hard-coding them as "Q42" and "P569" in the function. Save your module sandbox. Then modify your user sandbox call (which will now show an error) so that you pass |qid=Q42 and |prop=P569 in the #invoke. Make sure that works and save your user sandbox.

In your user sandbox, write two more test calls to show (a) the date of birth (P569) of Richard Burton (Q151973), and (b) the date of death (P570) of Elizabeth Taylor (Q34851). Save your user sandbox.

5. We often want dates in a more user-readable format, such as "11 March 1952" and Lua's table handling and pattern matching make this easy to achieve,.

In your module sandbox, create a new function called p.getfulldate, which is a copy of the p.getdate function. You will now modify the p.getfulldate function and not change p.getdate any further.

6. Add this line near the start of the p.getfulldate function:

local monthname = { "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December" }

You should be able to see that monthname[4] has the value "April", for example. Save your module sandbox.

7. Use either pattern matching or the string.sub function to extract the local variables year, month, and day. Remembering that the month variable contains a string, convert that to a number and use that to index monthname so that you now have the name of the month, instead of its number.

8. Concatenate the day, monthname and year with spaces between them, and return that. Save your module sandbox.

9. Add three new tests to your user sandbox, calling getfulldate to show the full date of birth of Douglas Adams and Richard Burton, and the full date of death of Elizabeth Taylor. Save your user sandbox.

Fetching another item

edit

The format for values that are wikibase-item datatype is different from that for dates. We can see the structure of any property in any item using {{examine}} like this {{examine|Q42|P1196}}, which shows the manner of death (P1196) for Douglas Adams (Q42). This is the result for Douglas Adams' manner of death (P1196):

table#1 {
    table#2 {
        ["id"] = "Q42$2CF6704F-527F-46F7-9A89-41FC0C9DF492",
        ["mainsnak"] = table#3 {
            ["datatype"] = "wikibase-item",
            ["datavalue"] = table#4 {
                ["type"] = "wikibase-entityid",
                ["value"] = table#5 {
                    ["entity-type"] = "item",
                    ["id"] = "Q3739104",
                    ["numeric-id"] = 3739104,
                },
            },
            ["property"] = "P1196",
            ["snaktype"] = "value",
        },
        ["rank"] = "normal",
        ["references"] = table#6 {
            table#7 {
                ["hash"] = "792c357be1391569a970da13099242a6ad44af96",
                ["snaks"] = table#8 {
                    ["P854"] = table#9 {
                        table#10 {
                            ["datatype"] = "url",
                            ["datavalue"] = table#11 {
                                ["type"] = "string",
                                ["value"] = "https://web.archive.org/web/20111010233102/http://www.laweekly.com/2001-05-24/news/lots-of-screamingly-funny-sentences-no-fish/",
                            },
                            ["property"] = "P854",
                            ["snaktype"] = "value",
                        },
                    },
                },
                ["snaks-order"] = table#12 {
                    "P854",
                },
            },
        },
        ["type"] = "statement",
    },
}

If you look at the entry Douglas Adams (Q42) in Wikidata, you will see that the value is "natural causes", which is linked to natural causes (Q3739104). As you can see above, the value stored for the property is the ID for the "natural causes" item, not the words themselves. This allows users to see the value in their own language and script.

10. In your module sandbox create a new function called p.getitem. The function needs to read the local variables qid and prop from the frame. You can use the code from function p.getdate for a lot of the code in your new function. Use the same mw.wikibase.getBestStatements(qid, prop) call as before to get the table of values for the property.

Assume there is one value returned. Work out from the structure provided by {{examine}} above what you need to write to get the id from the property values. Hint: it will be similar to the previous syntax, but will end in .id instead of .time. Add that line to your function so that you now have the id of the linked item.

Set your function to return the id and end your function. Save your module sandbox.

11. At the end of your user sandbox, add a new second level heading, "Fetching an item".

Write a call to your getitem function that returns the id for the manner of death (P1196) of Douglas Adams (Q42). Preview it to make sure you have "Q3739104". When it is working, save your user sandbox.

12. The ID is useful for machines, but not for humans. You can convert an ID into words by using the mw.wikibase.getLabel() function as described in mw:Extension:Wikibase Client/Lua #mw.wikibase.getLabel.

In your module sandbox, modify your p.getitem function so that it uses the id (that you were returning) in a mw.wikibase.getLabel() call and assign that to a new local variable. Then return that variable from your function instead of the id. Save your module sandbox.

13. Refresh your user sandbox to make sure you now get the label "natural causes" instead of "Q3739104". Fix any errors.

14. At the end of your user sandbox, add another test call that shows the name of Richard Burton's wife. You'll need Richard Burton (Q151973) and spouse (P26). Save your user sandbox.