Commons talk:Structured data/Modeling/Author
These earlier notes and resources may be inspiring:
- Notes from first data modelling discussion at Wikimania 2019
- Notes from second data modelling discussion at Wikimania 2019
- Properties table
- Interesting Wikimedia Commons files collected because of their structured data modelling challenges
Multichill (talk) 16:44, 7 September 2019 (UTC)
Authorship notes from Wikimania 2019
[edit]Copy here for future reference
Author (P50) can be used if the author has a Wikidata item If the author does not have a Wikidata item, there are a couple other possibilities to identify the author:
- Author name string (P2093) can be used if the author doesn't have a Wikidata item
- Wikimedia username (P4174) can be used if the author has a Wikimedia username --> COMMENT: May be best indicated using "author" = "somevalue" with qualifier "Wikimedia username" = Username .
- Conventions need to be established on how to designate anonymous and unknown authors
- Should we use the Anonymous (Q4233718) and/or Unknown (Q24238356) items? --> COMMENT: There is quite developed practise on Wikidata now for paintings of unknown/anonymous/pseudonymous authorship
- Should there be a way to set unknown value from the Add statement interface? --> COMMENT: "Unknown" in the Wikidata / SDC UI should be renamed "somevalue", as per the underlying software, because this is what the value actually means.
Author properties
- If the author has an item, most relavent data about the author should be pulled automatically from the author item
- If the author doesn't have an item, we should create qualifiers under the Author name string or Wikimedia username value, e.g. Date of death (P570), Official website (P856), Flickr user ID (P3267), etc. --> COMMENT: For consistency, use "somevalue" with qualifer "stated as" rather than "author name string" ?
Numerous Wikidata properties are available for author IDs in various databases, but we should probably pull this informtation automatically from the Author item in most cases. The role of each author could be specified as a qualifier to the Author or Author name string value using the Subject has role (P2868) property. For example, photographer, painter, architect, scultpor, etc. A new property is probably needed for Author attribution (which accepts a string). This will likely go under the licensing data, however, rather than the authorship data. There are only 13 attribution templates per https://commons.wikimedia.org/wiki/Category:Attribution_templates
- Creator templates : Probably should all be managed through Wikidata Author items, but we need a way to check if all the data from the templates is on Wikidata. Eventually, these will be replaced entirely
- --> COMMENT: Conversely, display extended Creator style information in field on file description page, if we have an author statement with a Q-item Ultimately this approach could provide all the visible functionality of a creator template, without needing any creator templates
Uploader to be treated separately from authorship.
End of copy Multichill (talk) 16:47, 7 September 2019 (UTC)
Getting authorship started
[edit]Author is one of the core fields in our current templates. This was discussed quite extensively during Wikimania (notes in previous section). With a file on Commons, multiple people might be involved with multiple roles: The painter of the original work, the photographer, the person who uploaded it, etc. We discussed two possible approaches:
- Have a property for each role
- Use one generic property and a bunch of qualifiers
Consensus seemed to be towards the second approach. Some remarks:
- In the notes, talk is about author (P50). That property is actually only for written works, so the more generic creator (P170). The creator property already has extensive usage on Wikidata for works of art
- In the notes, subject has role (P2868) is mentioned, but that should in fact be object of statement has role (P3831)
As mentioned above, we'll have a lot of cases where the person doesn't have a Wikidata item. We don't have to create items for all these persons, in fact I'm very much against that. We can just use "somevalue". If you look at this edit and what messages are used, you'll notice that MediaWiki:wikibase-snakview-snaktypeselector-somevalue is used. We should probably update that from "unknown value " to something like "some value without a Wikidata item" or "no Wikidata item available".
So we get the possible qualifiers when creator (P170) is set to "somevalue":
- object of statement has role (P3831) when we have multiple persons in different roles. We probably need to make a list of common roles and how to use them.
- author name string (P2093) to put in the name of the author. For users here that can be the username or the real name
- Wikimedia username (P4174) to link it to the username here
- URL (P2699) (or new property "Author url") to link to the author in a consistent way. That can be a link to a userpage here or some external website like Flickr. We could also just use
- Other person identifiers, like official website (P856) and Flickr user ID (P3267). Imho these shouldn't replace the new "author url" property because that offers a consistent way to construct links for re-use
- Any other properties that would be acceptable on persons on Wikidata. This is probably more rare.
So as an example, one of my painting uploads:
- creator (P170) -> some value (message we need to update)
- object of statement has role (P3831) -> photographer (Q33231)
- author name string (P2093) -> "Multichill" (I prefer to not have my real name in here)
- Wikimedia username (P4174) -> "Multichill"
- URL (P2699) -> https://commons.wikimedia.org/wiki/User:Multichill
What do you think? Multichill (talk) 17:38, 7 September 2019 (UTC)
I am in favor of more generic property also, but what I see here - 4 qualifiers. People wont be adding this by hand. So the question is weather we can create it by bot, than I am in favor of that, if not, lets leave it simple as much as possible, and use "author" and "author (text)" only.
- I dont know, how you would indicate roles of those, who does not have an item on Wikidata, such as the person who digitalize the work, or the person who made a crop, which is in some cases indicated in the author field of the description template. Juandev (talk) 15:13, 13 September 2019 (UTC)
- See the example where I am the author and I don't have a Wikidata item. Multichill (talk) 09:26, 14 September 2019 (UTC)
- Well object of statement has role (P3831) has an item data type. So if somebody just made a crop, how exactly you would indicate it? Juandev (talk) 06:30, 27 September 2019 (UTC)
- Find a suitable role on Wikidata or create one? Probably something like editor (Q1607826). Multichill (talk) 17:58, 1 October 2019 (UTC)
- Well object of statement has role (P3831) has an item data type. So if somebody just made a crop, how exactly you would indicate it? Juandev (talk) 06:30, 27 September 2019 (UTC)
- See the example where I am the author and I don't have a Wikidata item. Multichill (talk) 09:26, 14 September 2019 (UTC)
Alternative modelling
[edit]I think this method has to many qualifiers, here is what I would prefer:
creator (P170) if the creator is notable for Wikidata and has an item
author name string (P2093) if creator is not notable for Wikidata
and a new property "Creator (Wikimedia-user)" with a Wikimedia-user-datatype if the creator has an Wikimedia-account and is not notable
As addition there should be a "Uploading user" property with Wikimedia-user-datatype.
Changes on files should have the same attributes with qualifiers. --GPSLeo (talk) 16:05, 13 September 2019 (UTC)
- That only covers a small subset of cases. Take for example Flickr, that wouldn't be covered by this.
- I agree that quite a few qualifiers are possible. From a data model point of view that's no problem, from an interface point of view some time should be spend on making it easier and convenient to enter this. Multichill (talk) 08:59, 14 September 2019 (UTC)
- I think these things could be dose as source. The way I would prefer would be to create a new Wikibase instance with items for every creator with an file or file change on commons. There we could link the Useraccount here, a flicker account or just the Wikidata-Item of the author. --GPSLeo (talk) 09:15, 14 September 2019 (UTC)
- I was afraid you're going to say that. That's a firm no. We're not going to create an item on Wikidata for every person who every uploaded a file here or on Flickr. Those items are very much out of scope on Wikidata. Has been discussed before. Trying to go down that road is just a waste of effort and you'll hit a dead end. Multichill (talk) 09:24, 14 September 2019 (UTC)
- Well, if fillimg all the lines would be optional, why not. Those whou would like to ad as much as informaion can and thos whoud like to shorten it could shorten it.Juandev (talk) 05:53, 17 September 2019 (UTC)
- I do not want Wikidata-Items for every Commons creator. I want a separate database just for this, like the user-namespace. But not only for creators with a Wikimedia account. --GPSLeo (talk) 16:52, 22 September 2019 (UTC)
- And why we need a database for commons contributors?Juandev (talk) 06:41, 27 September 2019 (UTC)
- That me have one place to link Commons-user account, Flickr, website, name, e-mail-address and all important information that is in templates now and would need to have that at every image as separate statements. With this way only one statement is needed to link all that information. --GPSLeo (talk) 08:06, 27 September 2019 (UTC)
- And why we need a database for commons contributors?Juandev (talk) 06:41, 27 September 2019 (UTC)
- I do not want Wikidata-Items for every Commons creator. I want a separate database just for this, like the user-namespace. But not only for creators with a Wikimedia account. --GPSLeo (talk) 16:52, 22 September 2019 (UTC)
- Well, if fillimg all the lines would be optional, why not. Those whou would like to ad as much as informaion can and thos whoud like to shorten it could shorten it.Juandev (talk) 05:53, 17 September 2019 (UTC)
- I was afraid you're going to say that. That's a firm no. We're not going to create an item on Wikidata for every person who every uploaded a file here or on Flickr. Those items are very much out of scope on Wikidata. Has been discussed before. Trying to go down that road is just a waste of effort and you'll hit a dead end. Multichill (talk) 09:24, 14 September 2019 (UTC)
- I think these things could be dose as source. The way I would prefer would be to create a new Wikibase instance with items for every creator with an file or file change on commons. There we could link the Useraccount here, a flicker account or just the Wikidata-Item of the author. --GPSLeo (talk) 09:15, 14 September 2019 (UTC)
Adding some authors
[edit]This page is not really getting a lot of useful feedback. Other people might be busy with something else or just haven't noticed. I'm adding some authorship information to some images now. This might encourage more people to give feedback. Multichill (talk) 18:01, 1 October 2019 (UTC)
- Be careful. The API seems to work like on Wikidata with full functionality but the GUI does not. The property does not get displayed if containing none item values. --GPSLeo (talk) 18:25, 1 October 2019 (UTC)
Datatype for Commons photographers
[edit][[phabricator:TExpression error: Unrecognized punctuation character "?".|<span class="tracked-url trakfab-TExpression error: Unrecognized punctuation character "?".">Task TExpression error: Unrecognized punctuation character "?".]]
Moved from Commons talk:Structured data
A question that keeps coming up on Wikidata is: which property Commons should use for photographers (e.g. some string property). I had thought that this was being addressed by the project with a new datatype, but it seems that it is still open.
A possible approach that might not have been checked yet, could be to create a datatype that links to Commons user pages.
You might know that, in addition to the datatypes for files (d:Help:Data_type#Commons_media), Wikibase/Wikidata has a few datatypes for other Commons namespaces. These are:
For Commons user namespace, a new one could be defined fairly easily, as it would probably re-use the code of the above. Jura1 (talk) 12:56, 13 September 2019 (UTC)
- Look at #Let's do some modeling!. I think we should discuss it altogether on one place. At least we, who identify ourselves as Comonists. Juandev (talk) 15:21, 13 September 2019 (UTC)
- I think there was already some prior discussion/proposal(s) not referenced there. Jura1 (talk) 15:40, 13 September 2019 (UTC)
End of move. Multichill (talk) 08:50, 14 September 2019 (UTC)
- We sure had previous discussions about this. At least a couple of years ago at the first kick off of structured data. Now you're probably looking for phab:T127929. It's quite a long discussion and it doesn't look like we figured out how the data type should work exactly. So here we just picked it apart in different qualifiers to see if we can get that to work. Maybe later on based on our experience the new data type can be implemented and replace the current structure. Waiting for this new data type will just get us stuck on implementing authorship. Multichill (talk) 09:06, 14 September 2019 (UTC)
- I removed the reference to phab:T127929 as this proposal is somewhat different and likely simple to implement. Jura1 (talk) 10:28, 14 September 2019 (UTC)
- We sure had previous discussions about this. At least a couple of years ago at the first kick off of structured data. Now you're probably looking for phab:T127929. It's quite a long discussion and it doesn't look like we figured out how the data type should work exactly. So here we just picked it apart in different qualifiers to see if we can get that to work. Maybe later on based on our experience the new data type can be implemented and replace the current structure. Waiting for this new data type will just get us stuck on implementing authorship. Multichill (talk) 09:06, 14 September 2019 (UTC)
- User pages and namespaces are probably to be discarded as the use of user namespace can vary a lot (no user pages, various redirects, ect...), something with the account name maybe with a compulsory qualifier "stated as" the user name used/wanted. Christian Ferrer (talk) 13:12, 14 September 2019 (UTC)
- This is a tricky one. Originally, there was talk about implementing Wikibase "virtual statements" for things like user accounts, but that never took off. Personally, I'm in favor of something similar to how Wikidata handles P856 ("official website"). It uses regexes to look at the domains in the entered URL, and automatically shifts the input to an appropriate property if necessary (ex: twitter.com values get automatically converted to Twitter Username [P2002] statements, instagram.com values get turned into Instagram username [P2003] statements, etc.). Since Commons has media from Flickr, YouTube, and Commons itself, finding a way to easily support all of that in one place seems prudent. It's ultimately up to you guys though :) RIsler (WMF) (talk) 19:00, 23 September 2019 (UTC)
- @RIsler (WMF): Something like this? What would be the appropriate ID for Commons users? Jura1 (talk) 09:45, 26 September 2019 (UTC)
- That could work. I see two possible scenarios here: 1.) We just have one field for Author/Photographer URL and it's just a simple URL with no fancy formatting. 2.) We allow for multiple URLs/usernames to account for cases where someone could have a Flickr page, AND an official website, AND whatever else. This would probably work more like the conditional property setting features of P856 (official website), but maybe with those values as qualifiers of a top level value. Option 1 is certainly easiest to implement. Option 2 may give the most flexibility for the data but involves a lot more effort. RIsler (WMF) (talk) 19:43, 26 September 2019 (UTC)
- @RIsler (WMF): to me, option (2) looks much like having items. A simple approach could be to create these at Wikidata (what some don't want) .. an alternative could be entities in creator or some other namespace. I guess I will check back in a year or so to see what was finally chosen. Jura1 (talk) 12:51, 29 September 2019 (UTC)
- That could work. I see two possible scenarios here: 1.) We just have one field for Author/Photographer URL and it's just a simple URL with no fancy formatting. 2.) We allow for multiple URLs/usernames to account for cases where someone could have a Flickr page, AND an official website, AND whatever else. This would probably work more like the conditional property setting features of P856 (official website), but maybe with those values as qualifiers of a top level value. Option 1 is certainly easiest to implement. Option 2 may give the most flexibility for the data but involves a lot more effort. RIsler (WMF) (talk) 19:43, 26 September 2019 (UTC)
Creators without items
[edit]So for creators without items, one would use URL (P2699) and author name string (P2093) and Wikimedia username (P4174) instead of a simple property? [1].(it's currently not visible anywhere but in the diff. Jura1 (talk) 13:08, 17 November 2019 (UTC)
- I think this way is still ugly because with this it is very hard to query for images of a creator. We need a better way that contains the idea of structured data. I think to only way to do this properly would be to create a separate Wikibase instance for creator data. --GPSLeo (talk) 14:31, 17 November 2019 (UTC)
- I think we need a new property "author's Wikimedia username" and use that for all the authors like
[[user:Example|Example]]
. In case of authors in form[[user:Example|my name]]
, I would store "my name" in author name string (P2093). URL is not needed as https://commons.wikimedia.org/wiki/User:Example can be used. --Jarekt (talk) 21:04, 20 December 2019 (UTC)- The developers are working on the task to get this working. Messing up the data model just because some user interface is not ready sounds stupid to me. Multichill (talk) 10:18, 21 December 2019 (UTC)
- I think we need a new property "author's Wikimedia username" and use that for all the authors like
Pure text
[edit]I am sorry, I am not so familiar with Wikidata. can I normally query pure text values and use them as conditions as I can do for an item like values? Juandev (talk) 05:56, 17 September 2019 (UTC)
Proposal for author's wikimedia username property
[edit]@Multichill, Hannolans, Jura1, GPSLeo, Juandev, and Mike Peel: I created proposal for author's wikimedia username property at d:Wikidata:Property proposal/author's wikimedia username, to be companion to author (P50) and author name string (P2093). With it we can handle probably majority of cases on Commons, without some complicated qualifiers. We might need more, like author's URL perhaps, but lets leave that on for the future. Please comment, vote or correct the proposal if needed. --Jarekt (talk) 04:30, 21 December 2019 (UTC)
- @Jarekt: I don't get it. We already solved that problem. We have three parts:
- The Wikimedia username. For that we use Wikimedia username (P4174)
- The name to display (for example the real name). For that we use author name string (P2093)
- The link to the proffered user page (this can be another wiki than Commons like for example the English Wikipedia). For that we use URL (P2699)
- Multichill (talk) 10:15, 21 December 2019 (UTC)
- Working with qualifiers should be minimized. And here the main statement creator (P170) "unknown value" is definitely wrong because that is not unknown. --GPSLeo (talk) 10:50, 21 December 2019 (UTC)
- Multichill, I am sorry, I though we were still looking for some simple solution to store the user names, which will be used on around 30M pages. I totally agree with the use of author name string (P2093) for alternative name to be displayed, and URL (P2699) for alternative project, but I do not like them as qualifiers of creator (P170) or author (P50) with "unknown value". The upload wizard default is the name in the form
[[user:username|username]]
and unless someone changed it, this is what we have on most of the files with {{Own}} template, so my thinking was to create as simple way as possible to encode that info. A designated property is as simple as it gets and majority of files will be able to encode it in a simple statement without a need for any qualifiers, and we can add P2699 and P2093 qualifiers to it if we want to alter it. My desire to minimize the use of qualifiers to encode info we use a lot is to minimize number of varieties and errors. Bots do not care and can add equally well 30M statements in either form, but when humans get involved than number of errors and varieties goes up and adding 30M creator (P170) or author (P50) with "unknown value" and qualifiers just asks for trouble. --Jarekt (talk) 15:49, 21 December 2019 (UTC)- It's not "unknown value", it's "some value" defined as "A PropertySomeValueSnak describes that an Entity has some value for a certain Property, without saying anything about this value.". That's the case here. A work has a creator, we just don't have a (known) Wikidata item for that creator.
- I don't share this desire for a dumbed down system next to the normal system. I rather have a clean model with a good user interface. The UploadWizard can add it any way we like. Multichill (talk) 16:02, 21 December 2019 (UTC)
- Multichill, I am sorry, I though we were still looking for some simple solution to store the user names, which will be used on around 30M pages. I totally agree with the use of author name string (P2093) for alternative name to be displayed, and URL (P2699) for alternative project, but I do not like them as qualifiers of creator (P170) or author (P50) with "unknown value". The upload wizard default is the name in the form
- Working with qualifiers should be minimized. And here the main statement creator (P170) "unknown value" is definitely wrong because that is not unknown. --GPSLeo (talk) 10:50, 21 December 2019 (UTC)
Multichill, Our two options are :
author |
| ||||||||||||
add value |
versus
author's wikimedia username |
| ||||||||||
add value |
in the simple case and in the most complicated case:
author |
| ||||||||||||||||
add value |
versus
author's wikimedia username |
| ||||||||||||||
add value |
I do not understand why you feel like using author's wikimedia username, would produce "dumbed down system". And why would it be "next to the normal system". We are going to add it to a LOT of files and once we do it is going to be the only system. I share your desire for "clean model with a good user interface" and I though my proposal will result in simpler more intuitive model. It is possible, I overlooked some advantages of the solution using existing properties, but I can not think of any. --Jarekt (talk) 16:30, 21 December 2019 (UTC)
- I think this version should more be like:
author's wikimedia username |
| ||||||||||
add value |
author name string |
| ||||||||||
add value |
- That is all. Very simple for humans and machines. --GPSLeo (talk) 20:02, 21 December 2019 (UTC)
This only covers local uploads, it doesn't cover uploads from other sources like Geograph or Flickr. We end up with two systems. Multichill (talk) 13:09, 25 December 2019 (UTC)
- It does not have to. There is just no author's wikimedia username property for this file. That is the only difference. --GPSLeo (talk) 14:28, 25 December 2019 (UTC)
I suggest that, with constraint to indicate the value for the qualifier P4174 when "Wikimedian" is used as a value for "author":
author |
| ||||||||||||
add value |
Christian Ferrer (talk) 17:57, 25 December 2019 (UTC)
- I think that Christian Ferrer's model is better than the original proposal, and it can be extended to flickr, and other large communities (we would have to use different qualifier). --Jarekt (talk) 00:40, 3 January 2020 (UTC)
- I still do not understand why you absolutely want to use author (P50). All what is there as a qualifier could be a seperate statement. --GPSLeo (talk) 11:47, 3 January 2020 (UTC)
- GPSLeo My first preference was new author's wikimedia username property, but nobody else shared my enthusiasm for it, so I withdrew. My second preference is model proposed by Christian Ferrer, It is a bit more tricky to add properly by hand, and I can see that if we add it to 30M files there will be a lot of ways they can get messed up. The approach with some value seems to me the most confusing to users not familiar with wikidata and SDC, which might be the most users on Commons. --Jarekt (talk) 14:43, 3 January 2020 (UTC)
- Because of this I would say to do not use author (P50) in any kind of the author does not have an wikidata item. The way I would like much more would be to create a third database for all none notable creators. --GPSLeo (talk) 15:05, 3 January 2020 (UTC)
- Please not use author (P50) here, that property is scoped only to written works, use creator (P170). And use a property like Wikimedian (Q41546637) is a very bad plan. We have "somevalue" for that. Multichill (talk) 18:17, 3 January 2020 (UTC)
- Note that in fact I have no specific preference, I just added the exemple above because it was missing... I don't understand "use a property like Wikimedian (Q41546637) is a very bad plan" as this is a bit short. Why is it a bad plan? why "somevalue" is better? will this work be done by BOT? will one day this work be done at the same time of the upload? Note also that there is also this kind of possibility:
- Please not use author (P50) here, that property is scoped only to written works, use creator (P170). And use a property like Wikimedian (Q41546637) is a very bad plan. We have "somevalue" for that. Multichill (talk) 18:17, 3 January 2020 (UTC)
- Because of this I would say to do not use author (P50) in any kind of the author does not have an wikidata item. The way I would like much more would be to create a third database for all none notable creators. --GPSLeo (talk) 15:05, 3 January 2020 (UTC)
- GPSLeo My first preference was new author's wikimedia username property, but nobody else shared my enthusiasm for it, so I withdrew. My second preference is model proposed by Christian Ferrer, It is a bit more tricky to add properly by hand, and I can see that if we add it to 30M files there will be a lot of ways they can get messed up. The approach with some value seems to me the most confusing to users not familiar with wikidata and SDC, which might be the most users on Commons. --Jarekt (talk) 14:43, 3 January 2020 (UTC)
- I still do not understand why you absolutely want to use author (P50). All what is there as a qualifier could be a seperate statement. --GPSLeo (talk) 11:47, 3 January 2020 (UTC)
attribution |
| ||||||||||||
add value |
that can work as well for other cases... :
attribution |
| ||||||||||||
add value |
Christian Ferrer (talk) 20:36, 3 January 2020 (UTC)
stated as vs author name string
[edit]Is it better to use stated as as a qualifier of "creator"? see d:Wikidata:Property proposal/creator name string.--GZWDer (talk) 19:58, 3 January 2020 (UTC)