Wikidata:WikiProject British Politicians/Building Queries
WikiProject British Politicians
|
This page will work through some simple examples to demonstrate the model used for UK MPs, and how to write queries to make best use of them, introducing a few specialised features that can be particularly useful. It is intended to complement the more general introduction at Wikidata:SPARQL query service/Wikidata Query Help and the associated tutorials, and the more elaborate set of examples at Wikidata:WikiProject British Politicians/Sample Queries.
Each section builds on the data presented in the previous one, to build up more complex features in (hopefully) a reasonably sensible order.
The first section covers the basic structure of the SPARQL triples, then an example of how we can use this to find people holding a position, and the model we use for British MPs.
The second section looks at qualifiers, demonstrating how you can use this to find current MPs, or ones in a particular party, introducing some new syntax features (OPTIONAL, FILTER NOT EXISTS, UNION).
The third section adds labels and shows how to select different data elements.
The fourth section brings in other data from the linked items (eg images, coordinates) and introduces GROUP BY, SAMPLE filters and the + property modifier.
The fifth section demonstrates how to use the data for older MPs, either in a single Parliament or over a wider timeframe. It introduces queries for different statements on the same property (?ps, ?ps2), MIN grouping, and numeric FILTER clauses.
Lastly, the sixth section discusses how the data model changes before 1832, and demonstrates queries to find all Members over time, members in History of Parliament, and how to use the richer data available in some periods. It introduces the * property modifier and how to bind dates to years, as well as how to filter on the properties of linked terms rather than statements.
1. Basic model and SPARQL syntax
[edit]First, let's look at a simple fragment of SPARQL. This shows the subject / property / value approached used in queries for a single "triple".
?person wdt:P31 wd:Q5 item property value #1 #2 #3
This says we want to 1) select an item with the variable ?person, which has a property statement using 2) property instance of (P31), with 3) value human (Q5). The "wdt" prefix indicates we want the best available value of a property, and the "wd" indicates the value should be a specified Wikidata item.
There is also a more complicated syntax which we can use where there are likely to be several different values for a property on an item:
?person p:P39 ?ps . ?ps ps:P39 wd:Q77685926 .
This matches a variable ?person which has a property statement using position held (P39), and then selects the specific statement using the p: prefix, to give us the statement as variable ?ps. Then it adds a second clause, to say the statement should have value (ps: prefix) member of the 59th Parliament of the United Kingdom (Q126063447) - a member of the current (2024) Parliament.
We can then link the two together to get a combined fragment:
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps . ?ps ps:P39 wd:Q126063447 .
This will give us values which are defined as people, and who are members of the current Parliament. In theory we might not need the first section - but it is good practice to include it, as Wikidata models a small number of fictional MPs!
The main reason to use this slightly more complicated "position statement" syntax, rather than the simple wdt:, is to allow us to access the statement's qualifiers. These are values attached to the main statement to improve or extend it in some way.
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps . ?ps ps:P39 wd:Q126063447 . ?ps pq:P580 ?start .
This goes into the position statement ?ps to find the start time qualifier ?start (note the pq: prefix).
You will note that we have been looking at the specific period member of the 59th Parliament of the United Kingdom (Q126063447). The data model used for the UK is constructed on a "per-Parliament" basis, where each position refers to the specific Parliamentary term. We can select a range of these by using a more detailed query:
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps . ?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
This finds all MPs who hold a position (?ps), where the value (?term) of that position is subclass of (P279) of Member of Parliament (Q16707842) - eg a member of the 55th Parliament, the 37th Parliament, etc.
Now that we have constructed our basic query syntax, we can look at putting it into practice. For our "current Parliament" query, this would be:
SELECT DISTINCT ?person WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps . ?ps ps:P39 wd:Q126063447 .
}
The SELECT DISTINCT ?person line says that we want to SELECT all values of ?person - ie the items representing MPs - and that we would like it to be de-duplicated (DISTINCT) should any items for whatever reason appear twice. Then WHERE { ... } encloses the query itself.
You will note that we are getting more results than the expected 650 - this is because it does not know about time, so is returning retired/deceased members as well as currently active ones.
To get the whole lot using our "all Parliaments" query:
SELECT DISTINCT ?person WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
}
This finds all MPs who hold a position (?ps), where the value of that position is any valid subclass of Member of Parliament (Q16707842) - eg a member of the 55th Parliament, the 37th Parliament, etc., right back to the members of the 1st Parliament in 1801.
Note that "UK MP" stops at the Act of Union in 1801. Prior to this we use "British MP" (1707-1801), and "English MP" (up to 1707). The modelling is a bit more complicated and variable in quality for MPs before 1832, and we will discuss that later on. For now, we can be content with two centuries!
2. Adding qualifiers and filtering to find current MPs
[edit]We saw earlier that modern MP statements have a wide range of qualifiers, which can be selected with the pq: syntax. These help refine details about a term and let us go beyond the very simple "was an MP".
SELECT DISTINCT ?person WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
?ps pq:P2715 ?election . ?ps pq:P582 ?end . ?ps pq:P1534 ?cause .
}
This is the full set of "normal qualifiers" for a term - start/end dates, party, seat represented, election that began the period, and end-cause to end it. However, asking for all qualifiers *requires* that they be present - which of course will cause problems for any current MP, as their terms have not ended!
SELECT DISTINCT ?person WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
OPTIONAL { ?ps pq:P2715 ?election } .
OPTIONAL { ?ps pq:P582 ?end } .
OPTIONAL { ?ps pq:P1534 ?cause } .
}
This gives us a little more flexibility. It means that each of the last three qualifiers are flexible - returned if known, blank if not. The use of three distinct OPTIONAL elements means the three are independent of each other; if they are in the same clause then they would only show up if all three exist.
NB: start dates and seats are complete back to 1832, parties back to about 1923 - so for older data you may need to make party OPTIONAL. Election data also requires careful handling - while it seems intuitive that all terms begin at an election, in practice some will reflect mid-term party changes and so do not have this value, so again OPTIONAL is useful.
If you want to explicitly only look for sitting MPs, you can forbid a variable:
SELECT DISTINCT ?person WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
FILTER NOT EXISTS { ?ps pq:P582 ?end } .
}
This will remove all terms which have an end date - and thus only return ones which have not ended, which means those that are current.
Some older historic data does not have end dates - but they also don't usually have start dates, so neatly don't show up here! In principle we could avoid the question of historic data by just asking for the current parliament (?ps ps:P39 wd:Q126063447) but this approach means it will also work after the next election - so we are future-proofing. Both approaches are valid.
To find all sitting MPs in a specific party, you would set a specific value for ?party :
SELECT DISTINCT ?person WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P768 ?seat .
?ps pq:P4100 wd:Q9626 .
FILTER NOT EXISTS { ?ps pq:P582 ?end } .
}
This only finds statements where the party value is Conservative Party (Q9626). For Labour members, we currently index Labour Party (Q9630) and Labour Co-operative (Q6467393) seperately, which means you would need to select either-or party using a UNION clause:
SELECT DISTINCT ?person WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P768 ?seat .
{ ?ps pq:P4100 wd:Q9630 } UNION { ?ps pq:P4100 wd:Q6467393 }
FILTER NOT EXISTS { ?ps pq:P582 ?end } .
}
This will find statements that match either party value, and do not have an end date.
3. Returning more data and labels
[edit]We now have a query that will return a list of specified MPs. However, it is not particularly helpful - it is just a single column of IDs with no further information. We can see the qualifier data by adding these variables to the SELECT line:
SELECT DISTINCT ?person ?seat ?party WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
FILTER NOT EXISTS { ?ps pq:P582 ?end } .
}
This is all sitting MPs, with their seat and party. But they are still Q-IDs and so not very informative. To get something human-readable we would add ?xxLabel values to the SELECT line:
SELECT DISTINCT ?person ?personLabel ?seat ?seatLabel ?party ?partyLabel WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
FILTER NOT EXISTS { ?ps pq:P582 ?end } .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
We are now accessing the "labels", which is the human-readable names of items. You can change the language (eg "fr" to get French labels) or add a series of languages as fallback ("hi,en" will give Hindi if known, then English).
You do not need to give the item to get the labels (eg you can SELECT ?seatLabel without ?seat):
SELECT DISTINCT ?person ?personLabel ?seatLabel ?partyLabel WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
FILTER NOT EXISTS { ?ps pq:P582 ?end } .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
4. Linking in other data
[edit]We can also access other data from these items. For example, an image. (It is "optional" here as not all items will have it)
SELECT DISTINCT ?person ?personLabel ?seat ?seatLabel ?party ?partyLabel ?image WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
FILTER NOT EXISTS { ?ps pq:P582 ?end } .
OPTIONAL { ?person wdt:P18 ?image } .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Counterintuitively, this may return more results than before - some items have two images, so they show up twice. DISTINCT prevents exact duplicate lines, but will allow two lines with different data. Ideally, this should not happen very often - for properties like the image, one should be set to "preferred" (see Help:Ranking) and thus only one value will be returned with wdt:.
If you do find these duplicate lines, one way around it is to sample the offending variables to pick a single value, like so:
SELECT DISTINCT ?person ?personLabel ?seat ?seatLabel ?party ?partyLabel (SAMPLE(?image) as ?imagesample) WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
FILTER NOT EXISTS { ?ps pq:P582 ?end } .
OPTIONAL { ?person wdt:P18 ?image } .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} group by ?person ?personLabel ?seat ?seatLabel ?party ?partyLabel
Here the "group by" at the end is all the variables you want to leave unchanged, and the SAMPLE command tells it to only select one value from ?image. You'll see that the number of results is now exactly what you expect.
(Depending on what you are doing with the data, it may also be possible to just accept the duplicate lines...)
We can also start using the variables we have identified to obtain data from elsewhere in the graph. For example, we might want to find the GSS code (2011) (P836) and coordinate location (P625) for the constituencies to help link into another dataset.
SELECT DISTINCT ?person ?personLabel ?partyLabel ?seatLabel ?coordinates ?gss_code WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
FILTER NOT EXISTS { ?ps pq:P582 ?end } .
?seat wdt:P625 ?coordinates .
?seat wdt:P836 ?gss_code .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Or we could ask what country a constituency is in.
SELECT DISTINCT ?person ?personLabel ?partyLabel ?seatLabel ?countryLabel WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
FILTER NOT EXISTS { ?ps pq:P582 ?end } .
?seat wdt:P131+ ?country . ?country wdt:P31 wd:Q3336843 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
The wdt:P131+ syntax says "find the value for located in the administrative territorial entity (P131), then the value that item has for located in the administrative territorial entity (P131), and so on up the chain". The filter on ?country then constrains it to only match those items in the chain that are identified as constituent country of the United Kingdom (Q3336843). This is a much more "expensive"/inefficient query (it takes ~45s to run) as it is having to investigate a long chain of values from each seat.
5. Looking at older MPs
[edit]So far we have been concentrating on current MPs. But we can try to find all MPs in a broader period. A simple approach would be to look at the current Parliament.
SELECT DISTINCT ?person ?personLabel ?seatLabel ?partyLabel ?start ?electionLabel ?end ?causeLabel WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps . ?ps ps:P39 wd:Q126063447 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
OPTIONAL { ?ps pq:P2715 ?election } .
OPTIONAL { ?ps pq:P582 ?end } .
OPTIONAL { ?ps pq:P1534 ?cause } .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
This finds all MPs who served in the current Parliament, and their terms in that Parliament. Note that a small number are "completed" terms - sometimes because the MP has left the House, sometimes because they have changed party and so have two terms, one for the old party and one for the new.
We do not have to restrict ourselves to a single Parliament, however.
SELECT DISTINCT ?person ?personLabel ?seatLabel ?partyLabel ?start ?electionLabel ?end ?causeLabel WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
OPTIONAL { ?ps pq:P2715 ?election } .
OPTIONAL { ?ps pq:P582 ?end } .
OPTIONAL { ?ps pq:P1534 ?cause } .
FILTER(?start >= "1997-05-01T00:00:00Z"^^xsd:dateTime) .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
This combines the "any term" and a date filter to look for only MPs who have terms which began on or after 1 May 1997, the date of the 1997 general election. (We date all terms as beginning from the day of election; before 1918, where we had multi-day elections, usually this is the first day of the election period.)
We could link two sets of queries together to find MPs who were serving before 1997, and are still in Parliament today.
SELECT DISTINCT ?person ?personLabel (MIN(?start2) as ?first) ?seatLabel ?partyLabel ?start WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 .
?ps pq:P580 ?start . ?ps pq:P4100 ?party . ?ps pq:P768 ?seat .
FILTER NOT EXISTS { ?ps pq:P582 ?end } .
?person p:P39 ?ps2 . ?ps2 ps:P39 ?term2 . ?term2 wdt:P279 wd:Q16707842 .
?ps2 pq:P580 ?start2 .
FILTER(?start2 < "1997-05-01T00:00:00Z"^^xsd:dateTime) .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} group by ?person ?personLabel ?seatLabel ?partyLabel ?start
This approach seems at first glance to be running two independent queries (?ps and ?ps2), but this is intentional. We use multiple variables because the value for ?ps/?term that matches "current term" will of course not be the same as ?ps2/?term2, "term before 1997". But because both are linked to ?person, it will only return records that match on both aspects.
We use "<" rather than "<=" because we want terms which existed before 1 May 1997; using <= would include anyone first elected then.
Finally, the use of (MIN(?start2) as ?first) is a way to find out when they first entered parliament. ?start2 is the start date on the "earlier" terms, the before-1997 ones. MIN(?start2) selects the lowest value of ?start2 for each person, and then binds it to a new variable name. Note that when using MIN (or any other aggregator, like SAMPLE above) you need to group by all elements which are not being aggregated - hence the last line.
6. Going back before 1832...
[edit]As we noted earlier, the data model changes over time. Broadly speaking, parties are complete back to 1923, and term dates and seats are complete back to 1820, but before that things are less comprehensive. A lot of 1801-20 MPs do not have a full set of qualifiers, although all are present and have the relevant term memberships.
Prior to 1801, MPs are modelled as holding a position which is a subclass of Member of Parliament of Great Britain (Q18015642) (1707-1800) or Member of Parliament in the Parliament of England (Q18018860) (to 1707), eg Member of the 1393 Parliament (Q60576692) or member of the 14th Parliament of Great Britain (Q96776434). In some cases where data is unclear, they may hold the generic GB/English position directly rather than a per-term item, but this is being gradually phased out.
This query will obtain all of these MPs: note the use of the P279* format, where the asterisk (similar to the plus sign above) means "zero or more", and effectively says that we would like to find things that are a subclass of the value, or the value itself. It is needed for GB and English terms, but not post-1800 ones.
SELECT DISTINCT ?person WHERE
{
{ ?person wdt:P31 wd:Q5 . ?person p:P39 ?ps . ?ps ps:P39 ?term . ?term wdt:P279 wd:Q16707842 }
UNION
{ ?person wdt:P31 wd:Q5 . ?person p:P39 ?ps . ?ps ps:P39 ?term . ?term wdt:P279* wd:Q18015642 }
UNION
{ ?person wdt:P31 wd:Q5 . ?person p:P39 ?ps . ?ps ps:P39 ?term . ?term wdt:P279* wd:Q18018860 }
}
The use of {#1} UNION {#2} UNION {#3} here means that the ?person value has to match #1 OR #2 OR #3, but does not have to match all three. Each section is run independently and the results are combined.
Before 1800, not all items necessarily have complete sets of per-term holdings (the eighteenth century in particular is patchy) and those positions are frequently missing dates, constituencies, etc. However, there is at the very least an item for each MP who is found in the History of Parliament volumes, which are themselves linked with History of Parliament ID (P1614); these cover 1386-1421, 1509-1629, and 1660-1832. Before the mid sixteenth century, the History of Parliament records themselves are incomplete - the names of many MPs are simply lost to history.
This query will obtain all people found in the History of Parliament records, along with their corresponding entries. Note that many people have two or even three records, as they exist in multiple volumes.
SELECT DISTINCT ?person ?hop WHERE
{
?person wdt:P31 wd:Q5 . ?person wdt:P1614 ?hop .
}
Some periods have better coverage. For 1386-1421, and 1509-1629, the records for MPs list their specific terms with a qualifier for the associated seat (where known - occasionally this is a mystery). However, they do not (mostly) have individual start/end dates, as these are often not known with any precision, and Wikidata is very idiosyncratic with how it handles Julian-era dates, so day-precision data can be misleading. Dates can, however, be inferred from the dates on the term itself.
So to find all MPs who served in one of the two Parliaments of 1390 (things moved faster in those days...):
SELECT DISTINCT ?person ?personLabel ?termLabel ?seatLabel WHERE
{
?person wdt:P31 wd:Q5 . ?person p:P39 ?ps .
?ps ps:P39 ?term . ?term wdt:P279 wd:Q18018860 .
optional { ?ps pq:P768 ?seat }
?term wdt:P571 ?start . BIND(YEAR(?start) as ?startyear) .
FILTER(STR(?startyear) = '1390')
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
There are three new elements here. Firstly, terms use inception (P571) and dissolved, abolished or demolished date (P576) rather than the start time (P580)/end time (P582) - Wikidata makes a rather subtle distinction here between "thing coming into existence" and "period of time starting". Secondly, the YEAR(?start) syntax takes the date and breaks it down to extract only the year element, and BIND() sets the result as a value. Finally, the use of STR(?startyear) is to convert the year from being a date to being a string, which we can then compare to the string '1390'. This approach of year-based filtering is often a little easier to work with than day-precision, and also neatly avoids any problems with converting between Julian and proleptic Gregorian calendars.