Wikidata:Project chat: Difference between revisions
Peter James (talk | contribs) →Missing a label and ridiculous area unit to sort out: seems to be the infobox, not Wikidata; a more specific infobox can be used |
|||
Line 555: | Line 555: | ||
::::Why split off British English? That largely is the same as English? "English" here does not mean "Welsh English" or the English preferred by Welsh-speakers, but the common term in English overall, not just locally correct. {{tq|I'm not aware of an Indonesian spelling for Wrexham, so just applied Welsh spelling}}, if you're not aware then don't change it, leave it to actual Indonesians if they get to it. You do not seem to actually agree with Sionk, who is referring to English overall, not the new spin off of "British English" where you demoted the "anglicised name". English Wikipedia is unfortunately the most copied because most other languages will more likely refer to English sources not the Welsh ones, so you're clearly trying to write the great wrong of place-names and promoting Welsh place-names as much as you can. Individually, each language may start to use Welsh names at some point, but you have blanketly applied it to all of them. Both spellings are used, English and Welsh, the most common should take precedent in that language, if sources in that language use the Welsh one more then yes, they should use that, but you provided no evidence, and it is likely they (unfortunately) use the English spelling because of how much more used it is because of how many more speak English. [[User:DankJae|DankJae]] ([[User talk:DankJae|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 14:07, 20 November 2023 (UTC) |
::::Why split off British English? That largely is the same as English? "English" here does not mean "Welsh English" or the English preferred by Welsh-speakers, but the common term in English overall, not just locally correct. {{tq|I'm not aware of an Indonesian spelling for Wrexham, so just applied Welsh spelling}}, if you're not aware then don't change it, leave it to actual Indonesians if they get to it. You do not seem to actually agree with Sionk, who is referring to English overall, not the new spin off of "British English" where you demoted the "anglicised name". English Wikipedia is unfortunately the most copied because most other languages will more likely refer to English sources not the Welsh ones, so you're clearly trying to write the great wrong of place-names and promoting Welsh place-names as much as you can. Individually, each language may start to use Welsh names at some point, but you have blanketly applied it to all of them. Both spellings are used, English and Welsh, the most common should take precedent in that language, if sources in that language use the Welsh one more then yes, they should use that, but you provided no evidence, and it is likely they (unfortunately) use the English spelling because of how much more used it is because of how many more speak English. [[User:DankJae|DankJae]] ([[User talk:DankJae|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 14:07, 20 November 2023 (UTC) |
||
:::::Here is two indonesian sources using "Wrexham" not "Wrecsam". [https://www.cnnindonesia.com/olahraga/20230423063733-142-941071/klub-ryan-reynolds-wrexham-promosi-ke-league-two] [https://www.cnbcindonesia.com/lifestyle/20230423134742-33-431870/klub-bola-milik-ryan-reynolds-wrexham-lolos-ke-league-two] [[User:DankJae|DankJae]] ([[User talk:DankJae|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 14:10, 20 November 2023 (UTC) |
:::::Here is two indonesian sources using "Wrexham" not "Wrecsam". [https://www.cnnindonesia.com/olahraga/20230423063733-142-941071/klub-ryan-reynolds-wrexham-promosi-ke-league-two] [https://www.cnbcindonesia.com/lifestyle/20230423134742-33-431870/klub-bola-milik-ryan-reynolds-wrexham-lolos-ke-league-two] [[User:DankJae|DankJae]] ([[User talk:DankJae|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 14:10, 20 November 2023 (UTC) |
||
::::::So taking [[Q1335466|Bala Lake]] as an example, perhaps someone could clarify for me the purpose of the 21 entries of language labels, where half of them just default to the Welsh word with no evidence. It seems to me, but I may be missing something, that if a language doesn't have a word used in that language for the lake, that there should not be an entry at all. Why tell me that the Dutch for Bala Lake is Llyn Tegid, when that is clearly not a Dutch word, and will not be pronounced correctly in Dutch? Because it is not just the "Ll" that is a problem there. In Dutch, Tegid is not going to be pronounced correctly unless a Dutch person applies ''English'' orthography rules (which they are much more likely to know than Welsh). In Dutch, that g is a different sound completely. |
|||
::::::And do the Dutch have a word for Bala lake? Nope. Here's a source (an older one) that just calls it Bala-meer.[https://books.google.co.uk/books?id=l6eD6x8HJuIC&pg=PA73&dq=%22bala-meer+is+het+grootste+meer+in+Wales%22&hl=en&sa=X&ved=2ahUKEwjpmqDUzNOCAxVUUEEAHVPaD3YQ6AF6BAgHEAI#v=onepage&q=%22bala-meer%20is%20het%20grootste%20meer%20in%20Wales%22&f=false] which is Bala Lake, using the English word. Here is a newer interesting one: [https://www.goodbye.be/blog/wandelen-in-wales-de-drie-mooiste-routes]. It is interesting because it has "Llyn Tegid or in English, Bala lake." But that is not a vote for Llyn Tegid. The piece is telling you the ''Welsh'' and ''English'' names, but then says "Wandel het Bala meer rond" (walk around Bala lake), so there is no Dutch word. Incidentally the Dutch page on the lake is translated from German. So to my original question: shouldn't the Dutch word be blank? It doesn't exist. If it does exist, it is commonly Bala-meer, but that is just a borrow word. I don't see the benefit in specifying ''anything'' here. What am I missing? [[User:Sirfurboy|Sirfurboy]] ([[User talk:Sirfurboy|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 14:01, 26 November 2023 (UTC) |
|||
== Emojis on Wiktionary == |
== Emojis on Wiktionary == |
Revision as of 14:01, 26 November 2023
Wikidata project chat A place to discuss any and all aspects of Wikidata: the project itself, policy and proposals, individual data items, technical issues, etc.
Please use
|
- Afrikaans
- العربية
- беларуская
- беларуская (тарашкевіца)
- български
- Banjar
- বাংলা
- brezhoneg
- bosanski
- català
- کوردی
- čeština
- словѣньскъ / ⰔⰎⰑⰂⰡⰐⰠⰔⰍⰟ
- dansk
- Deutsch
- Zazaki
- dolnoserbski
- Ελληνικά
- English
- Esperanto
- español
- eesti
- فارسی
- suomi
- føroyskt
- français
- Nordfriisk
- galego
- Alemannisch
- ગુજરાતી
- עברית
- हिन्दी
- hrvatski
- hornjoserbsce
- magyar
- հայերեն
- Bahasa Indonesia
- interlingua
- Ilokano
- íslenska
- italiano
- 日本語
- Jawa
- ქართული
- қазақша
- ಕನ್ನಡ
- 한국어
- kurdî
- Latina
- lietuvių
- latviešu
- Malagasy
- Minangkabau
- македонски
- മലയാളം
- मराठी
- Bahasa Melayu
- Mirandés
- مازِرونی
- Nedersaksies
- नेपाली
- Nederlands
- norsk bokmål
- norsk nynorsk
- occitan
- ଓଡ଼ିଆ
- ਪੰਜਾਬੀ
- polski
- پنجابی
- português
- Runa Simi
- română
- русский
- Scots
- davvisámegiella
- srpskohrvatski / српскохрватски
- සිංහල
- Simple English
- slovenčina
- slovenščina
- shqip
- српски / srpski
- svenska
- ślůnski
- தமிழ்
- తెలుగు
- ไทย
- Tagalog
- Türkçe
- українська
- اردو
- oʻzbekcha / ўзбекча
- Tiếng Việt
- Yorùbá
- 中文
On this page, old discussions are archived after 7 days. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/11. |
Concept of bot edits
There is a problem I would like to ask the community about. The description will be long, I will ask the specific questions at the end.
Vojtěch Dostál imported data on hundreds of thousands of individuals from the Czech National Library's NKC database this summer. This is a big and important project, although the data was incomplete and sometimes wrong, I think I was not the only one who was basically happy with the project, followed the import and corrected and completed the items.
Another editor, Frettie, with the help of his bot (Frettiebot), started to add more data to the items : occupations, birthplaces, languages spoken, etc. This also meant forcing a lot of problematic data. Violent, because it is currently the case that if one corrects or deletes an erroneous data, Frettiebot will add the same data again, if one deletes it again, it will add it again, and this repeats in an endless cycle. Unfortunately, communication with Frettie is at a very low level, despite being told repeatedly and repeatedly that what he is doing is a problem, he neglects requests and usually gives a condescending answer: if you don't like the data added by the bot, change it to "deprecated". His edits have led to edit wars: between editors and Frettiebot on the one hand, and with other bots on the other (the latter has led to two bots being blocked from a page)
Let's look at the problem with occupation data: obviously all NKC-identified persons have at least one occupation, but it is common to include two or three statements for the P106 trait. For an import of hundreds of thousands of persons, this is hundreds of thousands of data. If only ten per cent of this is incorrect or redundant, it is also in the order of tens of thousands, if one per cent, it is also in the order of thousands. The fundamental problem is that for data imports of this magnitude, it is the wrong methodology to build a project around correcting data 'manually' over and over again. The right thing to do would be not to overwrite the already corrected data by the bot.
Not a problem for me, but I note that if a source database gives this much erroneous data, the reason for deprecated rank (P2241) added to the "debrecated" flag will eventually include source known to be unreliable (Q22979588), which in turn qualifies the entire Czech National Library database. But I don't think the source is that unreliable, it's just a bad concept of data distribution and the bot operator doesn't hear the problem signal.
The conceptual question is, where do we import from and how much do we build on the source data?The personal database of the Czech National Library is not a biographical database, just as the other library catalogues are not. The intention of the database creators was simply to be able to distinguish between identical forms of names in some way. Therefore, for example, they do not or rarely include detailed biographical data: they do not include exact dates and places of birth or death, perhaps only years, exact occupations, education, and obviously cannot be used as archontological data. For example, Hrvatski biografski leksikon ID (P8581) or Vienna History Wiki ID P7842 etc. point to a biographical database, but neither NKC, Viaf nor OSZK are biographical databases. Data imported from the latter should be treated with a certain degree of caution, rather than forcibly rewritten over and over again to the items. Here, however, it seems that despite all the feedback requesting corrections, the NKC data are treated by the bot host as if they were dead certain.
The countless incorrect or unnecessary data added in this way will only turn the Wikidata page into a swamp. Why, for example, do you need five or ten occupations, three or five of which should be set to obsolete because they are either wrong or simply add nothing extra?Let's see what are the typical mistakes:
For example, if a person's occupation is Lutheran pastor (Q96236305), but is recorded in the NKC as priest (Q42603), parson (Q955464) or pastor (Q152002), Frettiebot will add it to the existing Lutheran pastor (Q96236305), sometimes all of them. If someone is known to be living in the 18th arrondissement of Paris, but the NKC only records his place of birth as Paris, Frettiebot comes and adds Paris to the element, even though it already records that he was born in the 18th arrondissement of Paris. If this element is not a person but several people (e.g. a duo, twins, married couple, etc.), then certain attributes are not added to this element but to the element containing P31 Q5. Such a property is, for example, P1412, which is not added to the group of several people, but to each person, but Frettiebot ignores this caveat.
These are just a few examples, obviously I have not brought this to the attention of the community just because of three problems, but because there are countless - in my opinion conceptually flawed, unnecessary - bot editing practices.
The specific question is: is it correct for a bot to repeatedly enter the same data into an element if that data is incorrect, redundant or out of place? Is it correct to extract specific biographical data not from a biographical database but from a non-specialised catalogue? Is it right to put the burden of correction so much on the users when it could be done by the bot operator?
Of course, I'm also waiting for Frettie's reply, because - although asked - he never described what justifies the bot having to re-enter redundant, redundant or incorrect data over and over again, i.e. why is it better for the user to set it to obsolete, rather than the bot changing the data entry? Pallor (talk) 20:18, 21 October 2023 (UTC)
- Bot is not fortune-teller, bot cannot know what has been deleted by other user. That's the main problem. It could find this from the history, but it would make the script run longer (a lot). Moreover, I personally think that deprecated values are better left, because we are able to detect that it is wrong in the source data and possibly have it fixed. Which is sometime done, the cooperation between WM CR and NK CR is mutual. By the way, I'm a man. --Frettie (talk) 21:40, 21 October 2023 (UTC)
- I would have thought a bot should only be used if its edits are correct. If it adds substandard information, for example, then please no. Automated undoing of corrections? Um, Maculosae tegmine lyncis (talk) 21:47, 21 October 2023 (UTC)
- Given you know there are problems with your data sources, can't you record your inputs for each tranche and only apply the differences each run. It seems very bad practice to reapply the complete dataset knowing that it will recreate errors. And deprecated values are for "often thought to be true, but actually not", not to act as a database of problems in your data sources. To create that you should do a report from WD and compare it with your sources outside the WD system. Vicarage (talk) 22:05, 21 October 2023 (UTC)
- @Vicarage I don't think your definition of deprecated rank is correct. It's also used to mark sourced, but incorrect statements (i.e. information that was never correct, but was at some point thought to be). Vojtěch Dostál (talk) 18:14, 22 October 2023 (UTC)
- Yes, but I don't think WD should be used as a staging area for fixing other people's data, as @Frettie merely hints they might do, particularly in this case when the approach is irritating others. Vicarage (talk) 18:59, 22 October 2023 (UTC)
- @Vicarage I don't think your definition of deprecated rank is correct. It's also used to mark sourced, but incorrect statements (i.e. information that was never correct, but was at some point thought to be). Vojtěch Dostál (talk) 18:14, 22 October 2023 (UTC)
- Why shouldn't the bot be suspended until at least it stops edit warring? Assuming "Unfortunately, communication with Frettie is at a very low level, despite being told repeatedly and repeatedly that what he is doing is a problem, he neglects requests and usually gives a condescending answer: if you don't like the data added by the bot, change it to "deprecated". His edits have led to edit wars: between editors and Frettiebot on the one hand, and with other bots on the other." is accurate, are you unwilling or unable to resolve the problems, starting with stopping it from edit warring? I think Vicarage is right. RudolfoMD (talk) 03:52, 23 October 2023 (UTC)
- @Frettie? RudolfoMD (talk) 18:13, 23 October 2023 (UTC)
- Iam disagree with: "Unfortunately, communication with Frettie is at a very low level". It's fact, stopping of edit warring is by leave "mistake" with deprecate status. It is correct way. --Frettie (talk) 19:16, 23 October 2023 (UTC)
- @Frettie, I see you continue to refuse to explain why is it better for the user to set it to obsolete, rather than the bot changing the data entry. This is unacceptable: if a person's occupation is Lutheran pastor (Q96236305), but is recorded in the NKC as priest (Q42603), parson (Q955464) or pastor (Q152002), Frettiebot will add it to the existing Lutheran pastor (Q96236305), sometimes all of them. Draceane is right. As you refuse to fix the bot, it should be blocked. It would be bad if the bot added them with the flag obsolete, but at least that would make leaving the bot running defensible. Adding them as it's doing is indefensible. RudolfoMD (talk) 19:29, 23 October 2023 (UTC)
- I see that Frettiebot is still being run while complaints about its use are being discussed here. This is inexcusable. Vicarage (talk) 19:42, 23 October 2023 (UTC)
- If is people part of Lutherian pastor and priest, so it is ok, because, he is pastor AND priest, no Pastor OR priest, it's my point of view. So, if bot would be fixed – how? What is best practice? Do you have some ideas? If some value is imported and later removed, bot dont have this information. Bot can save pairs "QID" + "PROPERTY" + "VALUE" from all runs and if this is again ready to save, bot does not save this. It can be possible, but it will be slower. @Vojtěch Dostál: – what do you think? Adds new values only once. --Frettie (talk) 06:36, 24 October 2023 (UTC)
- I see that Frettiebot is still being run while complaints about its use are being discussed here. This is inexcusable. Vicarage (talk) 19:42, 23 October 2023 (UTC)
- @Frettie, I see you continue to refuse to explain why is it better for the user to set it to obsolete, rather than the bot changing the data entry. This is unacceptable: if a person's occupation is Lutheran pastor (Q96236305), but is recorded in the NKC as priest (Q42603), parson (Q955464) or pastor (Q152002), Frettiebot will add it to the existing Lutheran pastor (Q96236305), sometimes all of them. Draceane is right. As you refuse to fix the bot, it should be blocked. It would be bad if the bot added them with the flag obsolete, but at least that would make leaving the bot running defensible. Adding them as it's doing is indefensible. RudolfoMD (talk) 19:29, 23 October 2023 (UTC)
- Iam disagree with: "Unfortunately, communication with Frettie is at a very low level". It's fact, stopping of edit warring is by leave "mistake" with deprecate status. It is correct way. --Frettie (talk) 19:16, 23 October 2023 (UTC)
- @Frettie? RudolfoMD (talk) 18:13, 23 October 2023 (UTC)
- @Pallor From my point of view, Wikidata is a database aggregator. We collect data (with a bot) and then we sometimes curate them (usually by manually setting ranks). That's how I understand Wikidata's general approach. P.S. I note that your examples with Lutheran pastor (Q96236305) and Paris (Q90) aren't in fact examples of incorrect data, am I right? Vojtěch Dostál (talk) 18:18, 22 October 2023 (UTC)
- Vojtěch Dostál yes, we collect data, but we are lucky that we are human beings, not machines, we can make decisions that machines cannot. We also operate the machines and we can tell them what to do and what not to do. With all this in mind, the aim obviously cannot be to put all the variations of all occupations, or all the occurrences of a settlement, on a data sheet and increase the noise to infinity, because that would turn the Wikidata database into a swamp. We can make good decisions and bad ones. The evangelical pastor, Paris, and all the other examples not listed here show that it is possible to pour data into Wikidata that makes a piece of data - which was previously precisely defined - redundant or ambiguous. I can give a particularly bad example, when a graphic artist/photographer's album of historical sites was written in the descriptive data that the author was a historian, but your vitalapod also had a case of incorrect data. All my examples support the point that you should not spread data like this, you should give users a chance to correct what the source does not know well, you should not force the issue of putting up incorrect and redundant data at all costs. Pallor (talk) 18:42, 22 October 2023 (UTC)
- You are obviously right about the importance of humans for Wikidata and I understand that. But I have hard time understanding how the presence of "less precise" professions turns Wikidata into a swamp. How is the "profession:priest" statement preventing you from querying all Wikidata for all lutheran pastors? I see how it would be a problem in Wikipedia, but isn't it a purely aesthetic problem for Wikidata? And on the contrary, if the source for "lutheran pastor" is later deemed incorrect and the corresponding statement deprecated, because the person actually was a priest but not lutheran, we still have a rough idea about his profession with the less precise statement... Vojtěch Dostál (talk) 18:57, 22 October 2023 (UTC)
- I feel this is still our (my and Vojtěch's) ongoing dispute over data representation. IMO WD should be not only machine readable, but also human readable. For you it's just aesthetics, for many others this is the matter of usability. — Draceane talkcontrib. 14:47, 23 October 2023 (UTC)
- Yes, I have the same feeling about this discussion :) It's about the desire of a part of the Wikidata community to turn it into a second Wikipedia :-). Vojtěch Dostál (talk) 06:52, 24 October 2023 (UTC)
- I feel this is still our (my and Vojtěch's) ongoing dispute over data representation. IMO WD should be not only machine readable, but also human readable. For you it's just aesthetics, for many others this is the matter of usability. — Draceane talkcontrib. 14:47, 23 October 2023 (UTC)
- You are obviously right about the importance of humans for Wikidata and I understand that. But I have hard time understanding how the presence of "less precise" professions turns Wikidata into a swamp. How is the "profession:priest" statement preventing you from querying all Wikidata for all lutheran pastors? I see how it would be a problem in Wikipedia, but isn't it a purely aesthetic problem for Wikidata? And on the contrary, if the source for "lutheran pastor" is later deemed incorrect and the corresponding statement deprecated, because the person actually was a priest but not lutheran, we still have a rough idea about his profession with the less precise statement... Vojtěch Dostál (talk) 18:57, 22 October 2023 (UTC)
- WD is a curated database, yes we might reduce the workload by using bots for mass import, but if a human decides the information is wrong, I think they should remove it. There are clearly techniques in the AI world where a learning machine can absorb vast quantities of machine scraped information and do probabilistic assessment of which facts are most likely to be correct, but using them here would overwhelm the GUI we have, and I agree with @Pallor we'd have a swamp. Vicarage (talk) 19:07, 22 October 2023 (UTC)
- We agree that we have to record certain data even if it is not true: this could be, for example, a historical error or a poorly drawn conclusion, since it is widespread, and we help to refute it by indicating it. But we usually do this on the basis of reliable sources and thus help to refute incorrect/erroneous data. But here the source itself is not perfect either, since - as I explained above - we do not take the data from a biographical database, but from a library catalog. The aim of the librarian was not to position the person between the denominations, but to distinguish him from the person of the same name, perhaps born in the same year, and for this it was sufficient to describe a more general, schematic occupation. It's like the system of tags and descriptions in Wikidata: you don't have to be extremely precise there either, but when you fill in the P106 field, you're obviously trying to create the most accurate model of reality, you're not forced to rough out the description. If someone is a high school teacher, we don't have to describe that he is a educator, a instructor, AND a high school teacher, the last one is enough, there is no need to add the other two - especially not if our source is not completely reliable in this regard. Pallor (talk) 19:33, 22 October 2023 (UTC)
- I personally don't think that this is a majority view. I would be surprised if the community here really thinks that we should remove incorrect sourced statements rather than deprecating them. Can we somehow determine what the consensus really is? Let's write it down somewhere afterwards, because I feel I already had this discussion somewhere. Vojtěch Dostál (talk) 19:15, 22 October 2023 (UTC)
- Vojtěch Dostál yes, we collect data, but we are lucky that we are human beings, not machines, we can make decisions that machines cannot. We also operate the machines and we can tell them what to do and what not to do. With all this in mind, the aim obviously cannot be to put all the variations of all occupations, or all the occurrences of a settlement, on a data sheet and increase the noise to infinity, because that would turn the Wikidata database into a swamp. We can make good decisions and bad ones. The evangelical pastor, Paris, and all the other examples not listed here show that it is possible to pour data into Wikidata that makes a piece of data - which was previously precisely defined - redundant or ambiguous. I can give a particularly bad example, when a graphic artist/photographer's album of historical sites was written in the descriptive data that the author was a historian, but your vitalapod also had a case of incorrect data. All my examples support the point that you should not spread data like this, you should give users a chance to correct what the source does not know well, you should not force the issue of putting up incorrect and redundant data at all costs. Pallor (talk) 18:42, 22 October 2023 (UTC)
- Bot is machine. If is some type of wrong edit made very often, is good to add some exception to bot.
- But not only for this case it would be fine, if there is some universal solution. What about some bot which would deprecate statements which are one level upper than some other statement? When there is eg. genre=adventure film (Q319221), statement genre=film (Q11424) will be marked as deprecated. THe same for occupation, place of birth, category combines topics etc.. JAn Dudík (talk) 07:44, 23 October 2023 (UTC)
- That bot job would be against Wikidata rules. True statements should never be deprecated. Vojtěch Dostál (talk) 14:12, 23 October 2023 (UTC)
- I don't generally agree. By applying this rule literally, we could add to all items instance of (P31) entity (Q35120), to all people place of birth (P19) Earth (Q2). Yeah, it's true, but um... If you added all superclasses of the statements, you would just made WikiSwamp, incomprehensible for humans. — Draceane talkcontrib. 14:47, 23 October 2023 (UTC)
- @Draceane That would be absurd, but I don't see a relevant source that collects all people born on Earth as opposed to people born on other planets :). Vojtěch Dostál (talk) 06:45, 24 October 2023 (UTC)
- That's exactly Draceane's point. The examples he gives are true statements, yet as absurd as the additions the bot owner is being asked to stop making, and you are saying should not be deprecated merely because they are true. Your argument makes no sense. It seems like Frettie is trying, hard, to not understand, but AGF makes me assume it's a language barrier. (For clarity, I'm referring to the notion that "It is difficult to get a man to understand something, when his ego depends upon his not understanding it!")
- Are the edits the bot is making so valuable as to outweigh the problems its causing? I suggested an admin suspend the bot. RudolfoMD (talk) 00:45, 25 October 2023 (UTC)
- It is sadly not Frettie who does not understand. Actually, I think other people find it hard to understand elementary rules of Wikidata: 1) Wrong sourced claims should not be removed but deprecated and 2) Preferred claims are marked with ranks, not by removing less precise yet true claims. These rules are essential to the way Wikidata operates and cause no significant problems at all to reuse of Wikidata, but it is sometimes difficult for Wikipedians to get a grasp of them. Vojtěch Dostál (talk) 14:19, 25 October 2023 (UTC)
- After reading all this I feel a strong urge to express my agreement with Pallor and Vicarage. Not because I have new points to add in favor of their opinion, but as a counterpoint to Vojtěch Dostál’s claim that their point of view marks a misunderstanding of Wikidata's principles. I think this comes close to assaulting them and like-minded Wikidata users like me on a personal level. In my opinion, this discussion is too important to be bogged down like this. Let's try and keep the exchange productive and respectful, please.
- On the point of Frettie's alleged "not understanding": My argument applies here too (mutatis mutandis). But I must confess I have a hard time understanding what you are trying to say, Frettie, because of your English phrasing. Maybe the same is true for others? Jonathan Groß (talk) 16:38, 25 October 2023 (UTC)
- It is sadly not Frettie who does not understand. Actually, I think other people find it hard to understand elementary rules of Wikidata: 1) Wrong sourced claims should not be removed but deprecated and 2) Preferred claims are marked with ranks, not by removing less precise yet true claims. These rules are essential to the way Wikidata operates and cause no significant problems at all to reuse of Wikidata, but it is sometimes difficult for Wikipedians to get a grasp of them. Vojtěch Dostál (talk) 14:19, 25 October 2023 (UTC)
- @Draceane That would be absurd, but I don't see a relevant source that collects all people born on Earth as opposed to people born on other planets :). Vojtěch Dostál (talk) 06:45, 24 October 2023 (UTC)
- I don't generally agree. By applying this rule literally, we could add to all items instance of (P31) entity (Q35120), to all people place of birth (P19) Earth (Q2). Yeah, it's true, but um... If you added all superclasses of the statements, you would just made WikiSwamp, incomprehensible for humans. — Draceane talkcontrib. 14:47, 23 October 2023 (UTC)
- That bot job would be against Wikidata rules. True statements should never be deprecated. Vojtěch Dostál (talk) 14:12, 23 October 2023 (UTC)
Discussion after bot suspension
A day ago, at Wikidata:Administrators'_noticeboard#Suspend_a_bot;_remove_incorrect_admin_claims? our request on the administrators' message board, Frettiebot was suspended until this discussion was closed. I'd like to lay down some basics (although I've already mentioned some of them).
- The transfer of data from the NKC database to Wikidata is fundamentally good, so it benefits Wikidata.
- Frettiebot has some useful edits.
- The goal is not to make a rule that says: a bot cannot fix or override a person's edit (see e.g. the
{{Autofix}}
template, which I think is useful) - At the same time, we also don't want a bot to UNOVERWRITABLE fill up Wikidata with unnecessary and/or wrong data.
If others agree with this point 4, then we respectfully ask Fretti to improve the operation of the bot, upload all data from NKC only once, and accept when this data is corrected or deleted. I am pinging a few people who have participated in the debate or have previously made a request to Frettie in a similar matter to write down if they can support point 4. Of course, VD and Frettie can also ping people who have previously commented on the question anywhere.
@Maculosae tegmine lyncis, Vicarage, RudolfoMD, Draceane, Jonathan Groß, GrandEscogriffe: @Emu, Canley, U. M. Owen, Andrew Gray, RAN, Jackie Bensberg: @Polarlys, Vanbasten 23: (I apologize to those who are no longer interested in the topic, but still had to come here) Pallor (talk) 23:00, 27 October 2023 (UTC)
- Support for point 4, although even if this were to become consensus (which it should), the assessment of what is "wrong data" will always be a point of contention. In any case, thank you for this clear and constructive comment. Jonathan Groß (talk) 05:40, 28 October 2023 (UTC)
- Oppose I find myself perfectly in accordance with Vojtěch's vision of what is Wikidata. We should aggregate first, sort (and not delete) later. I think Frettiebot is doing an important job of providing references to P106 that are too often not referenced, making them basically worthless. Frankly, I'd even wish other bots would do the same with LC or GND. Now sure, as it was said, NKC is only a library catalog, therefore it might not be the best source available, nevertheless it is a legitimate source. I think the real problem here isn't much the bot's edits but rather how do we model competing or hierarchical values for P106? The bot is only exposing the problem, but it would have come sooner or later. --Jahl de Vautban (talk) 06:38, 28 October 2023 (UTC)
- Yes, you have explained the problem very well. If a person is a footballer it makes no sense to also add that he is an "athlete" o "sport people", because we would be filling Wikidata with useless data. Many Wikipedias use this data for their templates and what we are achieving is that these files are full of professions that do not inform readers of anything, on the contrary, they confuse them more. These users see it as normal and there is no room for much discussion. --Vanbasten 23 (talk) 07:54, 28 October 2023 (UTC)
- Support importing bots should only attempt to add data once. @Frettie allows his bot to do this multiple times, while not engaging, or even pausing his bot after multiple complaints. The huge differential in human time in setting a bot in action, and reviewing and flagging the results means anyone running a bot needs to be cautious, and what we have here is reckless behaviour. Vicarage (talk) 08:40, 28 October 2023 (UTC)
- "importing bots should only attempt to add data once" is in practice super complicated and IMO not feasible in most situations. Use ranks to indicate which claims should be visible (and which ones not) to end users; and ask the bot operator not to import already existing values regardless of their ranks. —MisterSynergy (talk) 09:05, 28 October 2023 (UTC)
- Since I have been pinged: I probably don’t understand all nuances but it seems to boil down to “do import unless an issue is raised with an edit or a type of edit, in this case resolve manually”. That’s the general idea with mass edits anyway, so yeah, no reason to act differently in this case. On a more general note: Vojtěch Dostál is right, no notes from me on that issue. --Emu (talk) 09:10, 28 October 2023 (UTC)
- Support for point 4 of course. Also I agree with Emu that there are two different issues. First a general good practice of bot programming that bots should always accept human corrections, and never get into edit wars. This should be consensual. Second the more fundamental question of which kind of data should appear in Wikidata, which I am surprised has not been resolved earlier in the history of the project. There I am in the Pallor/Vicarage/Draceane/RudolfoMD/Jonathan Groß camp. I think that some statements are both true, sourceable, and useless because they are superseded by a more precise true statement, and that such statements should not appear in Wikidata at all. Yes Vojtěch Dostál my opinion is informed by Wikipedia, but what is the problem with that? Isn't supplying the Wikipedias the main original mission of Wikidata? Are there use cases of Wikidata where it is in fact useful to have large lists of redundant imprecise statements? --GrandEscogriffe (talk) 11:10, 28 October 2023 (UTC)
- Sure but that doesn’t mean that we have to answer to the whims of infobox programmers from other projects, to put it bluntly. I often find it quite helpful (when researching and/or disambiguating) to have many statements of varying precision and even accuracy. This gives me a fuller picture of what is generally known about a person – whether true or not, whether precise or not. It also sometimes helps to trace how inaccuracies over time evolved into falsehoods. This is different from Wikipedia where we generally only strive for the best available version of the received opinion of the truth. --Emu (talk) 11:41, 28 October 2023 (UTC)
- Can you give an example? GrandEscogriffe (talk) 12:18, 28 October 2023 (UTC)
- I don’t have a good example for occupation (P106) at hand (and most of those cases would be hard to explain since there is often an element of language dependency and I mostly work with German sources) but in the past (in a very, very similar discussion) I have mentioned Q94694204#P569 as an example in that direction. --Emu (talk) 22:18, 28 October 2023 (UTC)
- @Emu: I agree with you and Epìdosis below that keeping incorrect sourced statements as deprecated is useful. My problem is mostly with redundant (and therefore correct) statements. In your example, I do not see what can be the use of the correct, redundant statement date of birth (P569) 1831 — unlike the deprecated 1841s which inform users not to add 1841 at normal rank. Every user (human or bot) who is tempted to add the imprecise 1831 should already "see" that 1831-09-09 is present. So the imprecise 1831 does not play the safeguarding role that deprecated common falsehoods do.
- Also, this example has only one best-ranked value (as it should) so it does not clutter the external users*. A big problem with Frettiebot is that it put everything at the normal rank. I would be much less bothered if it upgraded the already existing more precise statement to preferred rank every time it adds a less precise statement. Although even then I would not really see the point.
- *Of these users, I am familiar with Wikipedia, but I guess other external users also rely and the rank system, and I am really curious of who these other users are. Perhaps Wikidata should not be at the whims of infobox programmers specifically, but it should make/keep itself useful to the people who use it. GrandEscogriffe (talk) 21:25, 3 November 2023 (UTC)
- To take the example of Lina Wasserburger (Q94694204): The probably correct precise value is sourced with user-generated content and a primary source. The statement with year precision however has a secondary source, so do the other deprecated statements. In theory, you could also query statements that are sourced by Österreichische Schriftstellerinnen 1880–1938 (Q104601081) against our best guess therefore estimating the accuracy (and precision) of a given source which to me is quite an interesting use case. And finally: Precise values can be deleted for all sorts of legitimate reasons – resulting in missing statements instead of other sources statements with lower precision.
- Don’t we have a bot job that periodically sets a preferred rank in those cases? Of course it would be ideal if Frettie took care but I imagine it’s not that simple. --Emu (talk) 21:53, 3 November 2023 (UTC)
- I don’t have a good example for occupation (P106) at hand (and most of those cases would be hard to explain since there is often an element of language dependency and I mostly work with German sources) but in the past (in a very, very similar discussion) I have mentioned Q94694204#P569 as an example in that direction. --Emu (talk) 22:18, 28 October 2023 (UTC)
- Can you give an example? GrandEscogriffe (talk) 12:18, 28 October 2023 (UTC)
- Sure but that doesn’t mean that we have to answer to the whims of infobox programmers from other projects, to put it bluntly. I often find it quite helpful (when researching and/or disambiguating) to have many statements of varying precision and even accuracy. This gives me a fuller picture of what is generally known about a person – whether true or not, whether precise or not. It also sometimes helps to trace how inaccuracies over time evolved into falsehoods. This is different from Wikipedia where we generally only strive for the best available version of the received opinion of the truth. --Emu (talk) 11:41, 28 October 2023 (UTC)
- Oppose to point 4 (with one precisation at my point 4 below); first of all, I very much agree with @Jahl de Vautban: in the comment above: aggregating data from authoritative sources (among which national authority files are surely to be counted) and then ranking the statements; the phrase "The bot is only exposing the problem" (of managing competing or hierarchical values for P106) perfectly summarizes the situation (BTW, since these topics are clearly of general interest, I think they would deserve a RfC, in order to involve more users; the Project chat has tens of messages each day and is very difficult to follow). However, since I understand the concerns motivating users who have expressed critics on some aspects of the activity of Frettiebot, I would like to try to address these concerns proposing a few solutions alternative to the necessity of changing the present activity of the bot (points 1 and 2); I add a small comment about edit wars (point 3); finally, I would like to propose myself one change in the bot activity which, as far as I see, wasn't mentioned above (point 4). I apologize in advance because I will write a lot, but I think the importance of these themes deserves a detailed analysis.
- "is it correct for a bot to repeatedly enter the same data into an element if that data is incorrect, redundant or out of place?" (the initial question by @Pallor:): I think these three categories need to be considered separately (and, as it appears both from comments above, and from my personal experience, the most frequent problem is redundant data, so I will dedicate to this part more space):
- incorrect data can be entered repeatedly by a bot, if supported by an authoritative source (as I said above, IMHO national authority files are authoritative sources), for two reasons: 1) as a principle, "Wikidata simply provides information according to specific sources; those sources may or may not reflect contemporary thought or scientific consensus" (quotation from Help:Ranking); 2) technically, ""importing bots should only attempt to add data once" is in practice super complicated and IMO not feasible in most situations" (I'm not a bot operator, but I trust @MisterSynergy:, who is a bot operator, so I'm quoting his comment above). Given this premise, in order to avoid incorrect data being received by Wikipedia and other data reusers (a legitimate concern, which I obviously share), these incorrect data need to be set to deprecated rank (as stated by Help:Ranking#Deprecated rank), with qualifier reason for deprecated rank (P2241)error in referenced source or sources (Q29998666) (or typographical error (Q734832), useful in some specific cases). Of course keeping incorrect data as deprecated clutters the items, worsening their readability for humans (which is a legitimate concern, although I think it's rare to see more than 1 or 2 incorrect deprecated statements in the same item): this can be addressed in at least two ways, the first being collapsing not-best-ranked-values (see below point 2) and the second being data round-tripping, which I treat here at point 1.1.1.
- Data round-tripping (Wikidata:Data round-tripping) is crucial for Wikidata data quality because, if some authoritative database outside Wikidata contains mistakes, these mistakes risk to damage Wikidata in many ways as long as they exist (the most problematic way is e.g. a deprecated incorrect statement deriving from one import is removed on Wikidata, maybe from a user in good faith just judging it useless, and then another import readds it with normal rank, reintroducing the mistake in full power; the less problematic way, nevertheless problematic, is that deprecated incorrect statements clutter items); ideally we should have a workflow implying that a) when we notice that statement X, supported by an entry Z of the authoritative database Y, is incorrect, we are able to report this mistake to database Y; b) database Y reads our reports and solves them on a regular basis; c) once entry Z is fixed, we can remove statement X (I think that, once the supporting source is fixed, removing the statements has more advantages than keeping it as deprecated), ideally the removal should be performed by the curators of database Y at the same time as they fix entry Z. This workflow should be improved (see e.g. phab:T312718); the more efficient this workflow is, the less time incorrect statements remain on Wikidata. Of course improving this workflow is a task for Wikidata community and not for bot operators; however, if a bot operator has a longstanding collaboration with the curators of the database which they periodically import to Wikidata, they could encourage the curators of the database to improve this workflow (and to remove from Wikidata incorrect statements sourced by their entries, once they have fixed these entries).
- redundant data can be entered repeatedly by a bot, for the reason 1 quoted about incorrect data. Redundant data clutter the items, worsening their readability for humans (which is a legitimate concern, especially in the case of occupation (P106), and I very much share it; in fact I periodically remove unsourced redundant values of P106 to reduce a bit the issue, which is very serious): this can be addressed IMHO in one main way, i.e. collapsing not-best-ranked-values (see below point 2). Redundant data can also clutter Wikipedia and other data reusers receiving them (another legitimate concern), and this should be avoided using ranks. It needs to be noticed here that deprecated rank is designed "for statements that are known to include errors (i.e. data produced by flawed measurement processes, inaccurate statements) or that represent outdated knowledge (i.e. information that was never correct, but was at some point thought to be)" (quotation from Help:Ranking), so not for redundant statements, which aren't wrong stricto sensu. I propose two different procedures for ranking redundant values:
- for properties having single-best-value constraint (Q52060874) (mainly date of birth (P569), date of death (P570), place of birth (P19), place of death (P20)), if there are 2(+) values all supported by authoritative sources, the most precise one should get best rank; if the values only differ in precision (i.e. day vs year, or village vs municipality), the best rank can be motivated with qualifier reason for preferred rank (P7452)most precise value (Q71536040). I requested to do it for dates through a bot (preferrably, but not necessarily operated by the same bot operator adding less precise values) a few years ago, and I think it is presently done by BorkedBot (per this task approved in 2021; @ BrokenSegue: could you confirm?); programming a bot to do the same for places, on the basis of recursive located in the administrative territorial entity (P131), should be doable and I would support it; of course, the automatisation has some limitations both for dates and places (see the mentioned bot task), e.g. if a birth date has values 1948, 31/10/1948 and 1949 (or a birth place has values Paris, XVIII arr. of Paris and Saint-Denis) we need a human to choose if 31/10/1948 (or XVIII arr. of Paris) deserves best rank, but in fact a bot can safely operate in most cases.
- for properties allowing multiple values (mainly occupation (P106)), which are more seriously affected by the issue of redundancy, two choices are possible: a) set to best rank all "good" values (with qualifier reason for preferred rank (P7452)most precise value (Q71536040)), leaving redundant values in normal rank; b) set to deprecated rank all redundant values (with qualifier reason for deprecated rank (P2241)value to be decided), leaving "good" values in normal rank. Since the number of "good" values is in most cases higher than the number of redundant values, I would probably prefer solution b) just because it would imply to change fewer ranks than option a); however, solution b) has the drawback of deprecating statements which are redundant but not wrong stricto sensu, and this contradicts the present definition of deprecated rank. I think this choice deserves further reflection and discussion. Once we choose one option, it can be mostly applied by the bot, as in the previous case: we just need a bot operating on the basis of recursive subclass of (P279), which will allow it to know which values are redundant and which aren't; of course I would support such a bot.
- out of place data (which I would define as values neither incorrect nor redundant, but problematic because they are placed under the wrong property) must not be entered by a bot, neither one nor multiple times. Given this principle, let's draw some practical consequences, outlining different responsabilities: 1) the community (not bot operators) should add constraints to property, wherever possible, so that out of place data get marked as constraint violations; 2) bot operators must avoid adding data which trigger constraint violations, ideally using a mechanism which is always synchronized with constraints (which frequently are added, edited and sometimes removed); 3) if a guideline states that a certain combination of property-value is out of place and should be fixed to another one, but this guideline has not been "translated" into a constraint, bot operators are not required to know it (guidelines are scattered among various WikiProjects and it's often difficult to have in mind all of them); 4) however, if a user writes to a bot operator reporting them that a certain combination of property-value is out of place according to a certain guideline and should be fixed to another one, the bot operator must comply the mentioned guideline as soon as possible (I remember one such case, in which I had no complaint about Frettie's answer).
- incorrect data can be entered repeatedly by a bot, if supported by an authoritative source (as I said above, IMHO national authority files are authoritative sources), for two reasons: 1) as a principle, "Wikidata simply provides information according to specific sources; those sources may or may not reflect contemporary thought or scientific consensus" (quotation from Help:Ranking); 2) technically, ""importing bots should only attempt to add data once" is in practice super complicated and IMO not feasible in most situations" (I'm not a bot operator, but I trust @MisterSynergy:, who is a bot operator, so I'm quoting his comment above). Given this premise, in order to avoid incorrect data being received by Wikipedia and other data reusers (a legitimate concern, which I obviously share), these incorrect data need to be set to deprecated rank (as stated by Help:Ranking#Deprecated rank), with qualifier reason for deprecated rank (P2241)error in referenced source or sources (Q29998666) (or typographical error (Q734832), useful in some specific cases). Of course keeping incorrect data as deprecated clutters the items, worsening their readability for humans (which is a legitimate concern, although I think it's rare to see more than 1 or 2 incorrect deprecated statements in the same item): this can be addressed in at least two ways, the first being collapsing not-best-ranked-values (see below point 2) and the second being data round-tripping, which I treat here at point 1.1.1.
- "I feel this is still our (my and Vojtěch's) ongoing dispute over data representation. IMO WD should be not only machine readable, but also human readable. For you it's just aesthetics, for many others this is the matter of usability." (comment by @Draceane:). I very much agree with this comment of Draceane, Wikidata should be readable not only for machines but also for humans. In the points 1.1 and 1.3 I supported keeping inside items both incorrect statements (with deprecated rank) and redundant statements (either with deprecated rank, or in normal rank with most precise statements in best rank); the use of ranks I propose solve the issue of machine readability, meaning that Wikipedia and other data reusers can read only best-ranked data, thus avoiding incorrect and redundant data. In order to make items also easily readable for humans, I propose the solution of collapsing not-best-ranked values: if a property has 2(+) values and these values have 2 or 3 different ranks, a button appears near the property allowing the user to collapse (= hide) all values which haven't the best rank (i.e. if at least one value has preferred rank, all not-preferred-ranked values are collapsed; if a property has only normal and deprecated values, deprecated values are collapsed). I think a gadget like this would make items perfectly readable; the user should also be able to activate it by default (i.e. not-best-ranked values are collapsed when the item is loaded, and the user can just click the button near one or another property to show the not-best-ranked values for that property if they are interested).
- about edit wars between bots: of course they should not happen; I see basically two solutions: 1) the bot operators should encode in their bots some constraint like "if you make the same edit on the same item for a total of N times (with e.g. N = 3), stop editing the item" (I think we have no precise guideline about this, but it would be positive IMHO); 2) we probably need an admin bot which monitors items and, if an edit war between bots develops on one item (e.g. bots A and B adding and removing the same statement on the same item for N times, with e.g. N = 3, then block both bots indefinitely from editing that item and send a message to both bot operators about this). Solution 2 would make 1 not strictly necessary and I hope it's not too difficult to enact.
- finally [precisation], @Frettie: my request of one improvement to Frettiebot's handling of some occupation (P106) values: I have noticed that, for "composite" occupations recorded in NKC, Frettiebot sometimes duplicates them, adding both the composite occupation (correctly) and the basic occupation (incorrectly introducing a redundancy absent in NKC). To be clearer, some examples: humans being both historians and art historians, often sources support both values (e.g. Renate Kohn (Q66685235)) and so everything is fine, but in other cases (e.g. Renata Zemanová (Q95156951) before my last edit) the source NKC has only "historičky umění" as occupation but the bot added also the basic occupation "historian", which in fact is wrong because it is absent in NKC - I have seen other similar cases with "historian" wrongly added where in fact NKC has only "historian of X"; another example, humans being both professors and university professors, in nearly all these cases (e.g. Elliott R. Jacobson (Q112427327) before my last edit) NKC has only "vysokoškolští učitelé" but Frettiebot also added the basic occupation "professor". In these cases the mistake lies in how Frettiebot imports the data from NKC; I would ask to avoid such mistakes when the bot will restart its activity and possibly to try to spot existing cases like the ones outlined above and remove these values (here there is no need of deprecation, because in fact the source mentioned in the references does not contain such values). This is the only change in the bot activity I would require.
- "is it correct for a bot to repeatedly enter the same data into an element if that data is incorrect, redundant or out of place?" (the initial question by @Pallor:): I think these three categories need to be considered separately (and, as it appears both from comments above, and from my personal experience, the most frequent problem is redundant data, so I will dedicate to this part more space):
- --Epìdosis 14:47, 28 October 2023 (UTC) P.S. I have added a subparagraph "Discussion after bot suspension" for better readability, feel free to edit it
- Thank you for your work! I agree, just two notes:
- As you said, deprecating true statements should be avoided – enforcing our ranking rules is difficult enough as it is now without an ad-hoc extra rule just for a set of cases.
- Do we have examples where “statement clutter” is a real problem for human readability? I would imagine that our current color coding for ranks (enabled per default AFAIR) is helpful. In some cases, rearranging values (first best values followed by normal ranks, deprecated ranks at the hand) by hand might be helpful. Collapsed values always carry the danger of overlooking important data and even adding those statements a second time. --Emu (talk) 22:36, 28 October 2023 (UTC)
- @Epìdosis As for (4) - adding historian AND art historian and how it happens - we actually have a conversion table that prevents cases like this and tries to understand the whole phrase "art historian" in descriptions (see [1]). The occupation "historian" was added by me to that item two years ago, before this specific handling of occupations was not possible. Vojtěch Dostál (talk) 15:31, 29 October 2023 (UTC)
- Thank you for your work! I agree, just two notes:
- I'm still preparing for an answer, but it's slower because of my work (an anon archived the section) Pallor (talk) 09:37, 31 October 2023 (UTC)
- Thank you for your patience.
- I also thank Epidosis for the very detailed summary. Many strategic questions have now been emphasized, but I still feel that we would say yes to a data entry method that will gradually make Wikidata more difficult for both machines and humans to read in the long term. This situation is like when the waves of the sea wash over the shore, which we take for granted and do not put a stop to it. But when the water starts washing garbage ashore, we can't say again, "this is the order of nature" and let it happen. In this case, something must be done to keep the coast free from waste, we must install some kind of filter in order to save both the water and the coast from garbage.
- Let's start with the most important, the source.
- You write that you think the "national authority files" are the authentic data. I already wrote about this above: it is a library database, that is, it serves to record the descriptive data of the books. This is complemented by a database that lists the authors of the books to a depth that is absolutely necessary to distinguish authors of the same name. What this resource can be used for is to find out: what is the title of each book, who is the author, publisher, where and when it was published, what is the size, weight, number of pages, binding, what is the theme of the book, etc. In these data, the NKC is just as authentic as any other national library. However, this database cannot be used to find out what the authors' authentic and precise(!) biographical data are. Not only because the database does not take into account who studied where and when they obtained what education, what their family relationships are, but not even the exact birth and death data. He is satisfied with the fact that he was born and died in a certain year, but not where and on what day. simply because it is not needed in the NKC database, it fulfills the purpose it was created for without it. The same is the case with occupations: it is enough for the NKC to write about someone as a priest, teacher or athlete. This is a necessary superficiality that satisfies NKC's needs, but not Wikidata's.
- This situation is like using the database of a company that trades in agricultural products as a source of SI units, citing that they also use the terms ton and metric meter, or if we were to process the product range of a paint factory to support the values of the compounds as a source, citing that chemical engineering is also behind it. In addition, both example databases can be used, obviously only in the right place. The database of the Czech National Library can also be used when it comes to books, in fact, it can be used to create elements of persons missing from Wikidata, but with regard to precise data, a biographical source must be sought, rather than constantly rewriting superficial data just because someone somewhere on the world wide web he belched them up. I would emphasize again that the problem is not that this data was found, but that, although it is trivial that it came from an inappropriate source, it was constantly rewritten.
- If, for example, we were to take over the birth data of the persons in addition to all the values entered precisely in the format: year month day, we could also include the data containing only the year. Or, for example, for all places of birth or death that are narrowed down to a specific administrative unit, we could enter the data of the broader unit one above it. Could this data be wrong? No, they're just not as accurate as what's already in Wikidata. If we accept Epidosis's argument, we open the door to writing any more superficial data from any database. In fact, we could even do it automatically ourselves, since we don't lie with any of them, and sooner or later we're sure to find a source that supports it. Enter only the year of birth under each date of birth. Would Wikidata be better than that? We have to enforce practical aspects that preserve the coherence of Wikidata. And if we don't want to write 1720 next to the exact date of birth (May 8, 1720) in every element, then we have to follow a similar principle for the occupation: we don't want to write pastor next to the Lutheran pastor, and write it next to the secondary school teacher , a pedagogue, next to the hydraulic engineer, that he is an engineer, because it is completely unnecessary. This will just flood Wikidata with unnecessary and meaningless data.
- (I'm showing one more error in Frettiebot's editing, which someone may find correct, but I think it's grossly unnecessary. Some positions usually have an element that applies to a specific country and a specific position. For example, the representatives of a country's parliament have the position held (P39) element used in : member of parliament (Q486839) is obviously not an error, but where a local element exists, we use it (see.
- Compared to this, Frettiebot mercilessly wrote that the person was also Q486839 for the persons for whom it was mentioned in the source, even though the more accurate element was already there. This query shows the current situation, i.e. those who are members of parliament, their position element has Q486839 as a subdivision, but P39:Q486839 is also specified. There are currently 958 results, of which 459 are Czech or Slovak. Let's look at two: Q1294312 or Q895898. Both have Q486839 with five or six sources, all of which are NKC. Do we need this? No. Can we expect the NKC to describe the precise position in the given context as we would use the Wikidata table? Again no. Whether Frettiebot added this unnecessary element or it was included, it is clear that the proportion of data added unnecessarily would decrease by 50 percent if the data Q486839 were deleted from them, or if we look at the reverse, then the number of meaningless data increased by 100 percent. If we project this onto the properties of birth dates, places and occupations, we can see how much Wikidata would swell if we accepted that superficial data should also be included. I only examined this for a single position, obviously if you look at the number of presidents, finance ministers, museum directors, fire chiefs, etc. is in the database, which can be titled as president, minister, director, commander using a more superficial database as a source, essentially we could "expand" Wikidata indefinitely, without adding a single meaningful piece of data. Not to mention that if a co-editor writes it in, we can correct it, but if it's a bot, we can't?)
- Of course, I understand the part of the argument that says that if a common biographical error needs to be corrected, an excellent method is to record the data, source it, make it obsolete, and indicate the correct (according to more recent research) data accurately and with sources, but I think it is quite clear that this is not the case in these cases.
- I still maintain that bot editing should end there, where you upload a piece of data and then leave it up to the community members (the people) to decide if the data is important, necessary, and act responsibly without using a bot they should fight. Pallor (talk) 15:49, 3 November 2023 (UTC)
- Both of the positions expressed throughout this thread can be sympathised with, but think Pallor's post here is an excellent representation of the general approach to weighing which sourced statements belong in Wikidata. My own opinion of this is formed by having read help pages over the years and finding their advice to be well reasoned and appropriately opinionated.
- In summary, this thread revolves around three concepts:
- Imprecise statements (to be unprioritised). There are infinitely many true statements under an open world assumption. As such, these are unnecessary where a more precise statement is available. The exception to this rule is when their sourcing makes the imprecision somehow notable (e.g. an imprecise year of birth thought to be irrecoverable and widely sourced as such, later discovered precisely in historic records).
- Incorrect statements (to be deprecated). There are infinitely many incorrect statements under an open world assumption. As such, these are unnecessary where their sourcing is insignificant or not authoritative.
- Appropriate sourcing. This is the crux of the issue discussed here, because it applies to both of the above. I think Pallor has covered it well in the message I'm responding to, but we probably shouldn't be pulling biographical from a library database unless there is no better source already present.
- As for the question of the bot, I agree that not restoring statements removed seems appropriate. Adding them in the first place may generate some cruft, but that's not a huge deal - which is why removing it should also be respected. SilentSpike (talk) 09:20, 5 November 2023 (UTC)
- I’m not sure why everybody seems to be so hung up on the fact that NKČR is a library database (or at least has its origins in this field). Why does this make the database less authoritative? --Emu (talk) 11:37, 5 November 2023 (UTC)
- I still maintain that bot editing should end there, where you upload a piece of data and then leave it up to the community members (the people) to decide if the data is important, necessary, and act responsibly without using a bot they should fight. Pallor (talk) 15:49, 3 November 2023 (UTC)
- Let's look at a specific example to understand: Walt Whitman is perhaps a well-known American poet and essayist, so that we can use his data to examine whether the NKC data is suitable as a source. This is what the NKC data sheet looks like: jn19990009101
- This is what some other biographical database items look like:
- DAHR artist ID 102444
- Den Store Danske ID Walt Whitman
- Encyclopædia Britannica Online ID biography/Walt-Whitman
- This is what some other biographical database items look like:
- It can also be determined at a glance that the NKC's data are incomplete and simplified. But not because the NKC is bad, but because the NKC has enough data for its own purposes to distinguish the American writer Walt Whitman from, for example, the American actor Walt Whitman. For Wikidata, however, this is not enough data, because Wikidata strives for completeness. More is needed here.
- But I would also like to add that it is not a problem that many new elements have been added based on the NKC, because each new data sheet opens a door to expand these data sheets, supplement and correct incorrect or incomplete data, and remove unnecessary data. The problem is that this data is written back again and again by the bot, you can't get rid of it. It is as if they want to convey through the bot that there is no more accurate data than the NKC data, although we can clearly see that the data is insufficient because it comes from a database that does not provide a complete biography. This makes the concept flawed. Pallor (talk) 12:20, 5 November 2023 (UTC)
- I had some problems with the bot over the summer but those were fixed. My thoughts on the general principles -
- Wikidata has our own data model, and it may not view the world in exactly the same way as other databases. This is fine - we don't need to mirror the exact structure and content of every other database. For example, whether a certain thing goes in occupation (P106) vs position held (P39) was the issue I had problems with. Similarly we may not want to have a generic item for something (like member of parliament (Q486839), mentioned above) when we can have a more specific one. So if Wikidata prefers to use a different property, or something more precise, we should not worry about imports being moved or updated afterwards.
- A bot should not be edit-warring with people or with autofix bots. If its edits are being repeatedly undone - especially on a day-by-day basis - it should not keep making them. It might be the autofix bot is wrong - so fix that instead, don't just keep making edits that will get undone.
- Considering 1 and 2 above, "Only upload data once" is a good rule of thumb to aim for. Reuploading data should only be done when you are doing it intentionally and you have a reason for doing it.
- Deprecating "wrong information" is good but it shouldn't be done just because we imported it in the wrong way - if it's something like "this value should be in P39 instead of P106" then it's just going to confuse people to keep a deprecated value around. It implies it is incorrect / outdated when it's simply misplaced. Andrew Gray (talk) 23:23, 3 November 2023 (UTC)
- Three notes to @Andrew Gray's points: 1) We *are* trying to get the bot to understand the Autofix templates and NOT editwar with the autofix bots. This is sometimes difficult for us (I think no other bot is trying the same thing as we are) and it would be better for everyone to come up with a systematic solution for all bots. Currently it is difficult for the bots to load all these autofix commands and keep them updated in our code. 2) We are not asking the community to deprecate our statements in cases where the value was just moved from property to property based on Autofix rules. 3) However, in basically all other cases, as MisterSynergy pointed out, it is virtually impossible for bots to avoid adding the statement unless we stick to our rules and deprecate wrong sourced statements. Therefore, we are asking the community to respect this rule, so that content-adding bots have a place in Wikidata. Vojtěch Dostál (talk) 09:49, 5 November 2023 (UTC)
- Vojtěch Dostál: let's be careful not to read what others have written one-sidedly. It's not the problem that sometimes the bot enters incorrect or unnecessary data (although we talked a lot about choosing the right source, didn't we). Sometimes people mess up the data entry, it happened to me too. That's not the problem, because it can be fixed.
- The problem is that the bot uploads incorrect and unnecessary data again and again and again, even though people delete it, which means it CANNOT BE CORRECTED. This should be changed. Pallor (talk) 10:08, 5 November 2023 (UTC)
- Actually, as you know, the bot does *not* reinsert the wrong statement if it is not removed but deprecated instead. So it is not true that the wrong data entered by the bot cannot be corrected. Vojtěch Dostál (talk) 10:29, 5 November 2023 (UTC)
- Then let's start over, because it seems the essence of the discussion didn't get through.
- The request is that the bot does not upload the same data over and over again. The bot is a machine and cannot decide whether that data is unnecessary or incorrect. Sometimes there are data that are both incorrect and unnecessary. Part of the reason for this is that the bot spreads them based on an inappropriate source.
- However, people can decide and have the ability to correct it. Either by making it obsolete or by deleting it. My proposal is to leave this decision to the people. Let the bot upload the data once and let people decide what to do with it.
- I see that there is a consensus that certain erroneous data should be preserved and marked out of date (at least that's what I communicated). Perhaps there is agreement that certain unnecessary data should simply be deleted. Should there be an agreement that this bot should decide, or should we leave it to the people? I prefer the latter. Pallor (talk) 11:25, 5 November 2023 (UTC)
- Respectfully, I know what your proposal is. However, if humans remove incorrect statements, Wikidata will be a much more difficult world for bots. I am merely suggesting that we humans agree to not remove unnecessary or incorrect data - and rather set ranks to them, as is the official Wikidata policy. I feel that we both already know what the other wants, and it's now on the community to either go my way or suggest amendments to Help:Ranking Vojtěch Dostál (talk) 17:06, 5 November 2023 (UTC)
- Oppose Generally is possible to remove incorrect statement, because it might be added by mistake (even with source). But if some bot is readding this again, is better to deprecate this statement and prevent bot-revert-warring. JAn Dudík (talk) 07:02, 8 November 2023 (UTC)
Summary
make This conversation will be archived soon, I would like to summarize it.
The claim raised was: "At the same time, we also don't want a bot to UNOVERWRITABLE fill up Wikidata with unnecessary and/or wrong data." In other words, Frettiebot uploads a piece of data only once, and then entrusts the judgment and fate of that data to the (human) users.
Some of the contributors to the discussion expressed their agreement or opposition by using a template, in which opinions are equal:
- supported by Jonathan Groß, Vicarage, GrandEscogriffe
- opposed by Jahl de Vautban, Epidosis, JAn Dudík.
The others did not use a template, but you can reconstruct from their comments whether they supported or opposed it (if I drew the wrong conclusion, please let me know):
- opposed by @Emu, Vojtěch Dostál, Frettie:
- supported by @Vanbasten 23, SilentSpike, Andrew Gray: and finally myself, Pallor [edit: adding myself - RudolfoMD - I also expressed support]
I judged that @MisterSynergy: suggested a third, intermediate solution.
From the summary, I came to the conclusion that several people support the fact that the bot should add some value to Wikidata only once.
I hope this lesson can be used for future data dissemination by other bots. Questions such as choosing the right source or creating a project sheet to record a significant amount of data, for example, were not discussed, but this discussion may provide ammunition for a debate about these later. Pallor (talk) 10:11, 14 November 2023 (UTC)
- I honestly don't know what the lesson from this discussion is. Some people agree with our rules at Help:Ranking, some don't, but I don't see a consensus for change. Many prolific bot operators explained why it would not work to remove wrong statements instead of deprecating them. Still, one bot is blocked as a result of this inconclusive discussion. Vojtěch Dostál (talk) 20:22, 14 November 2023 (UTC)
- Also, could you please stop making the false claim that Frettiebot added 'unoverwritable' data again and again? I explained numerous time that this is not true, and Frettiebot would not add the data again if they are correctly deprecated. --Vojtěch Dostál (talk) 20:25, 14 November 2023 (UTC)
- Unfortunately, from the very first moment, I feel that the communication moves at the level where you react to whatever you want, but you do not write anything to those comments to which you do not have an adequate answer, in fact, you pretend that they were not written at all. This had already been expressed in me before, but I did not want to make this discourse personal. Thus, it is naturally hopeless to reach a consensus, this expectation is only good for delaying the conclusion of the debate. In this situation, of course, there is no other option than to accept the majority opinion.
- If it's really the case that you don't understand what the seven editors who agreed with my suggestion were trying to achieve, then at this point in the discussion I can't recommend anything other than re-read the conversation. If you really understand what it's about, you just want to dramatize the situation, then please find another partner, because I don't want to get involved in this play.
- I will propose to remove the restriction of the bot's operation, but with the guarantee that it will only write a piece of data to Wikidata once. Pallor (talk) 22:50, 14 November 2023 (UTC)
- I’m still puzzled by this discussion: What exactly seems to be the problem? I think we can all basically agree with the idea that a bot shouldn’t UNOVERWRITABLE fill up Wikidata with unnecessary and/or wrong data. I for one would support such a statement albeit not in the context it was put forward: It has been shown how to handle wrong data. As to “unnecessary“, well, there seems to be some disagreement about what constitutes necessity – but that’s not really a bot issue per se, is it? --Emu (talk) 00:19, 15 November 2023 (UTC)
- Was it ever established what fraction of the bot's changes are regarded as unhelpful? Before we re-enable it we need to know how much human time will be spent clearing up after it, so we can assess whether it is of net benefit to WD. Vicarage (talk) 04:43, 15 November 2023 (UTC)
- Is unnecessary the same as unhelpful? If so, the core of the problem still doesn’t seem to be the potential various misdeeds of the bot but rather different opinions about necessity and helpfulness … --Emu (talk) 06:27, 15 November 2023 (UTC)
- Emu: I want the bot to publish data for an item only once. After that, whatever happens to this data - community members make it obsolete, delete it or fix it - it would no longer publish this data. This is clearly a bot operation issue. Pallor (talk) 09:45, 15 November 2023 (UTC)
- Of course, I am also oppose, btw. And method, that bot will be allowed to work by importing only ONCE is a very dangerous precedent and may lead to threat to Wikidata as an updated (and still actual) database. This will defacto set a precedent where any human edited item can never be overwritten or edited by a bot. And I see that as a huge threat.--Frettie (talk) 10:38, 15 November 2023 (UTC)
- I agree. This proposal would mean that each database could only be imported by a bot once. This would eliminate one of the main advantages of bot usage: Updating statements isn’t exactly fun and we need fun to attract and keep human volunteers. Therefore, this cumbersome process should be left to bots if possible. And they can’t do that if they can only touch statements or even items once. --Emu (talk) 11:07, 15 November 2023 (UTC)
- Of course, I am also oppose, btw. And method, that bot will be allowed to work by importing only ONCE is a very dangerous precedent and may lead to threat to Wikidata as an updated (and still actual) database. This will defacto set a precedent where any human edited item can never be overwritten or edited by a bot. And I see that as a huge threat.--Frettie (talk) 10:38, 15 November 2023 (UTC)
- Emu: I want the bot to publish data for an item only once. After that, whatever happens to this data - community members make it obsolete, delete it or fix it - it would no longer publish this data. This is clearly a bot operation issue. Pallor (talk) 09:45, 15 November 2023 (UTC)
- Is unnecessary the same as unhelpful? If so, the core of the problem still doesn’t seem to be the potential various misdeeds of the bot but rather different opinions about necessity and helpfulness … --Emu (talk) 06:27, 15 November 2023 (UTC)
- Was it ever established what fraction of the bot's changes are regarded as unhelpful? Before we re-enable it we need to know how much human time will be spent clearing up after it, so we can assess whether it is of net benefit to WD. Vicarage (talk) 04:43, 15 November 2023 (UTC)
- I’m still puzzled by this discussion: What exactly seems to be the problem? I think we can all basically agree with the idea that a bot shouldn’t UNOVERWRITABLE fill up Wikidata with unnecessary and/or wrong data. I for one would support such a statement albeit not in the context it was put forward: It has been shown how to handle wrong data. As to “unnecessary“, well, there seems to be some disagreement about what constitutes necessity – but that’s not really a bot issue per se, is it? --Emu (talk) 00:19, 15 November 2023 (UTC)
- I will propose to remove the restriction of the bot's operation, but with the guarantee that it will only write a piece of data to Wikidata once. Pallor (talk) 22:50, 14 November 2023 (UTC)
- Sorry, but there is some fatal misunderstanding here. That's the summary, the discussion continued one stage higher. What you are writing here, you have already partially described above, I think it is unnecessary to describe it in every section. If I did not open a new section for the summary, the discussion would have been archived today.
- What has not been answered at all is, for example, the inappropriate choice of sources. The issue of mitigating the redundant data (for example - but not exclusively - what will be the fate of "National Assembly representatives"). Low-quality communication (even now I could point to a section on Frettiebot's discussion board that is unresolved). And of course I could give other examples that could have been discussed in the above section, but did not take place.
- For my part, I insist that Frettiebot not get his editing rights back as long as he is in danger of uploading unnecessary data to Wikidata, because that poses a greater threat to our database than the issue of updates, and as the dispute stands it seems that bothers several editors. Pallor (talk) 11:37, 15 November 2023 (UTC)
- But those are very different things. Not responding at all is a problem, that is true. Not fixing obvious problems is a problem too. But uploading data you consider to be low quality or unnecessary isn’t a problem per se. The discussion has clearly shown that your views on those concepts aren’t exactly consensus. --Emu (talk) 11:42, 15 November 2023 (UTC)
- Volunteers who find their accurate changes overridden by a bot won't stay. At least with a dispute with a person there can be discussion and the time devoted is equal both sides. That's not true with a bot, especially if the owner will not engage. Remember frettiebot continued to run well after the issues were raised. Vicarage (talk) 11:57, 15 November 2023 (UTC)
- Please be aware that accurate changes, when done properly, are never overridden by Frettiebot. Vojtěch Dostál (talk) 12:28, 15 November 2023 (UTC)
- For my part, I insist that Frettiebot not get his editing rights back as long as he is in danger of uploading unnecessary data to Wikidata, because that poses a greater threat to our database than the issue of updates, and as the dispute stands it seems that bothers several editors. Pallor (talk) 11:37, 15 November 2023 (UTC)
- Of course, I understand that, from the point of view of the storage space, uploading unnecessary data is not a problem, as it will fit. I also understand that it is not a problem from the point of view of some queries either, because whoever is looking for version "A" does not mind that there is also "A1" and "A2" and "A3". However, there is a problem when we perform maintenance and want to clarify the data entered with the A3 version and correct it to "A" or/and "B" or/and "C".
- It should also be seen that the description of Wikidata data states that the data should serve to better understand the given thing. If I write "pastor" next to the occupation of an evangelical pastor, do we understand better? I do not think so. If I write "member of parliament" next to the position of a member of the Czech parliament, will it be more understandable? No. This is a method that goes against the basic principles of Wikidata. Pallor (talk) 12:37, 15 November 2023 (UTC)
- I understand that you only want the most precise version of a given information to be in Wikidata. Several people including me have tried to explain why this might seem like a good idea at first glance but at the same time is a flawed concept and indeed in many cases detrimental to the project (the occupation (P106) issue is a little more nuanced, I’ll give you that, but we seem to be beyond nuances at this point). In any case, this wish can’t be grounds for blocking a bot. --Emu (talk) 16:51, 15 November 2023 (UTC)
- It should also be seen that the description of Wikidata data states that the data should serve to better understand the given thing. If I write "pastor" next to the occupation of an evangelical pastor, do we understand better? I do not think so. If I write "member of parliament" next to the position of a member of the Czech parliament, will it be more understandable? No. This is a method that goes against the basic principles of Wikidata. Pallor (talk) 12:37, 15 November 2023 (UTC)
SIMPLE: Lymantria blocked Frettiebot "Until resolution of issues on Frettiebot's editing". The consensus that issues with the bot's editing require code changes (which were not forthcoming) is is what caused the block and are the reason it's still in place. Code changes to address the issues haven't been made. There are no grounds for unblocking the bot. The end.
Folks, if you don't understand what the issues are, then "re-read the conversation". If you still don't understand what it's about, leave it to those who do.
RudolfoMD (talk) 05:49, 19 November 2023 (UTC)
- Nope, this is not an accurate summary of this discussion. I am afraid I see no clear consensus for change. @Lymantria, what do you think the bot owner should do in this case to qualify for unblocking? Vojtěch Dostál (talk) 08:56, 23 November 2023 (UTC)
- Lymantria: we did not receive substantial answers to some of the questions that arose, so no consensus could be formed. At the same time, Vojtěch also admitted that some of the entered data was unnecessary. In the summary, I quantified that more people opposed the previous operation of the bot than supported it. Of course, the operation of the bot cannot be blocked forever, but the previous operating principle does not have adequate support. These aspects must prevail. Pallor (talk) 09:40, 23 November 2023 (UTC)
- @Pallor Yes, there indeed is some opposition to how the bot operates, but the discusssion should not be evaluated by the number of 'votes'. Furthermore, that opinion collides with some of our key principles outlined in our written documentation, which I've linked before. To me, the fundamental question arising from this discussion is how we should operate bots when there are clashes on these fundamental principles - should this be further discussed here, or should the bot operator start a RfC on these fundamental topics, or should it be discussed via a Request for a Bot Permission? This is why I tagged @Lymantria who I think is experienced in these matters, but of course, anyone else's opinion is also appreciated. Vojtěch Dostál (talk) 10:00, 23 November 2023 (UTC)
- Is there anything concrete (beyond your rather far-reaching ideas about necessity and usefulness) that the bot operator could fix? --Emu (talk) 18:13, 23 November 2023 (UTC)
- Emu Yes, the data should be uploaded by the bot only once.
- I realize that I am not considered an old editor, because I have only been here for 5 years. But I have never, ever seen a data spread that added the same data to the element multiple times. So far, there have been examples where the bot entered data into the element once. After that, the editors decided whether that data was appropriate, relevant, or not. He Frettiebot also works this way, then I will be satisfied. Pallor (talk) 10:32, 24 November 2023 (UTC)
- I think it has been sufficiently shown by MisterSynergy that this is not a reasonable thing to ask from a bot operator. --Emu (talk) 11:16, 24 November 2023 (UTC)
- Lymantria: we did not receive substantial answers to some of the questions that arose, so no consensus could be formed. At the same time, Vojtěch also admitted that some of the entered data was unnecessary. In the summary, I quantified that more people opposed the previous operation of the bot than supported it. Of course, the operation of the bot cannot be blocked forever, but the previous operating principle does not have adequate support. These aspects must prevail. Pallor (talk) 09:40, 23 November 2023 (UTC)
- I don't think that a bot should judge whether sufficiently sourced data is "unnecessary" or not. I do think however ranking correctly can be requested from a bot. A bot should not be asked to deprecate correct data, but it can be asked to give preferred rank to more (in fact the most) precise data, which it can determine by subclass of (P279). Is Frettie capable to change its bot in order to take care of this? If data is wrong, but sourced, it should be deprecated if a (bot or a) human notices that. I noted that Frettiebot recognise deprecated data and does not change its ranking. Correct but possibly "unnecessary" data I judge as unproblematic if coming from a source that has shown to be a useful, as is the case in this discussion. --Lymantria (talk) 19:52, 23 November 2023 (UTC)
- The request to assign a preferred rank if a more precise information is already available seems fair to me. --Emu (talk) 09:17, 24 November 2023 (UTC)
- Lymantria I'm sorry that you see it this way, since I wrote at length about the fact that the source used is not the most optimal, there are better sources, and I supported this with examples. Much of the data entered is too imprecise, redundant or simply not Wikidata compatible. With such a decision, we are opening the door for all parliamentarians, ministers, mayors, ambassadors, etc. among his positions, let's add the general designation next to his already existing specific position, all this just to make it obsolete: Minister of Foreign Affairs in Belgium (Q1670832)=minister (Q83307) or Lord Mayor of London (Q73341) = mayor (Q30185), etc. We are sure to find a source where these common names are mentioned. And it's actually priceless to find a generic, unnecessary name for anything and fill Wikidata with it. So I still maintain that this is not a good source for the uploaded data.
- If I ask you to write an RfC for this, will you do it? I have not done this before and my English is not strong enough. Pallor (talk) 10:23, 24 November 2023 (UTC)
- I fixed the indentation on your comment, Pallor. Also, I think it falls to someone wanting the bot reactivated to make the case/write an RFC, rather than on Lymantria. My summary was accurate. RudolfoMD (talk) 11:39, 24 November 2023 (UTC)
- Okay, let’s be more specific: Are you suggesting that the bot shouldn't import certain positions that are unsuitable for occupation (P106) usage because they belong to position held (P39) and are too unspecific? And could you come up with a list of those values? This could be a compromise that is beneficial to Wikidata. --Emu (talk) 14:31, 24 November 2023 (UTC)
- The solution has been presented umpteen times. The bot should keep track of what it has added (or use wikidata history) to not override manual deletions. Again, this is not just about P106. RudolfoMD (talk) 21:55, 24 November 2023 (UTC)
- I repeat: I think it has been sufficiently shown by MisterSynergy that this is not a reasonable thing to ask from a bot operator. --Emu (talk) 22:03, 24 November 2023 (UTC)
- I don't. It wasn't shown. And he did NOT say it was infeasible in this situation. FS! RudolfoMD (talk) 22:24, 24 November 2023 (UTC)
- To expand on this: The "09:05, 28 October 2023 (UTC)" doesn't 'show' anything. It makes a claim. And not the one you present.
- Clarifying what is the most appropriate solution IS productive, IMO. RudolfoMD (talk) 22:32, 24 November 2023 (UTC)
- Also, the bot doesn't have to literally keep track of what it has added or use wikidata history; when re-run, it could only add new data by only extracting new data to add in the first place. RudolfoMD (talk) 23:48, 24 November 2023 (UTC)
- I repeat: I think it has been sufficiently shown by MisterSynergy that this is not a reasonable thing to ask from a bot operator. --Emu (talk) 22:03, 24 November 2023 (UTC)
- The solution has been presented umpteen times. The bot should keep track of what it has added (or use wikidata history) to not override manual deletions. Again, this is not just about P106. RudolfoMD (talk) 21:55, 24 November 2023 (UTC)
- Lymantria I'm sorry that you see it this way, since I wrote at length about the fact that the source used is not the most optimal, there are better sources, and I supported this with examples. Much of the data entered is too imprecise, redundant or simply not Wikidata compatible. With such a decision, we are opening the door for all parliamentarians, ministers, mayors, ambassadors, etc. among his positions, let's add the general designation next to his already existing specific position, all this just to make it obsolete: Minister of Foreign Affairs in Belgium (Q1670832)=minister (Q83307) or Lord Mayor of London (Q73341) = mayor (Q30185), etc. We are sure to find a source where these common names are mentioned. And it's actually priceless to find a generic, unnecessary name for anything and fill Wikidata with it. So I still maintain that this is not a good source for the uploaded data.
- I don't mind if we decide to run a bot which up-ranks the most precise occupations, and if this is the only issue standing in the way of unblocking the bot, I am sure Frettie would assist and we could together devise such a bot job. But can you please make this proposal clearer? Because we need to define such a job and we need the help of those who propose it. For example, we might want to up-rank all occupations when no other statement is subclass of that occupation. We probably want to do this for all statements, not just the sourced ones. However, we might want to skip the statements which already have a non-normal rank. And we might also skip all items where no occupation statement is a subclass of another occupation statement. This is already getting quite complicated and it shows why ranking is usually left to human editors... Vojtěch Dostál (talk) 19:58, 24 November 2023 (UTC)
- I'm skeptical Frettie is willing to make such bot. Evidence is needed. (Also, my read is that there is much opposition to this solution, as there is a lot of support for not adding low-quality position info when there is high-quality position info; the bot should simply be modified to stop adding low-quality position info when there is high-quality position info. Many maintain it is the case that Frettiebot kept adding 'unoverwritable' data again and again because deprecation is not the correct solution; saying it is over and over doesn't make it so. And you've been chastised for pushing this over and over already, e.g. by Jonathan Groß.) There's already a ton of info on what the bot should not add for Frettie to act on, but no interest expressed in doing so that I have seen. RudolfoMD (talk) 21:17, 24 November 2023 (UTC)
- Please try to be productive. --Emu (talk) 22:12, 24 November 2023 (UTC)
- Please clarify. Clarifying what is the the situation and most appropriate solution IS productive, IMO. RudolfoMD (talk) 22:35, 24 November 2023 (UTC)
- Frettie hasn't yet replied to Vicarage's comment of 22:05, 21 October 2023 (UTC), far above. There is reason for skepticism. RudolfoMD (talk) 23:52, 24 November 2023 (UTC)
- Please bear in mind that we are all volunteers here and nobody is under any obligation to respond in a certain time frame or at all. --Emu (talk) 08:26, 25 November 2023 (UTC)
- I find that comment is inappropriate. I asked you to clarify and you are avoiding/refusing to do so. On wikipedia, at least, there is an expectation (PAG) that admins, especially, respond to reasonable questions. Not here. Your comment that I asked you to clarify was implicitly threatening me with your tools, and tersely/harshly critical, yet you refuse to clarify. I would ask that you strike it if you won't clarify it, or at least drop the matter. A comment below supports that my skepticism about willingness is well-founded. RudolfoMD (talk) 09:44, 26 November 2023 (UTC)
- Useless to fix a bot at a time when there is discussion about possibly banning all active (other than only insert once) bots. --Frettie (talk) 11:59, 25 November 2023 (UTC)
- Please bear in mind that we are all volunteers here and nobody is under any obligation to respond in a certain time frame or at all. --Emu (talk) 08:26, 25 November 2023 (UTC)
- Please try to be productive. --Emu (talk) 22:12, 24 November 2023 (UTC)
- Surely its trivial to flag pairs of occupations where one is a subclass of the other, and remove the most generic. Vicarage (talk) 22:50, 24 November 2023 (UTC)
- Removal where? --Emu (talk) 23:25, 24 November 2023 (UTC)
- From the person. But it equally well applies for all the military museums that are also instances of museum and tourist attraction. Vicarage (talk) 06:34, 25 November 2023 (UTC)
- Valid referenced statements should never be deleted Piecesofuk (talk) 08:12, 25 November 2023 (UTC)
- Exactly. --Emu (talk) 08:27, 25 November 2023 (UTC)
- As so often other sources do not match the WD ontology, they can pollute as well as as inform. WD needs to be a consistent, editable, queriable resource, not a rag-bag of others facts Vicarage (talk) 11:28, 25 November 2023 (UTC)
- I like this. This is a good direction. Everyone should think about this. Pallor (talk) 01:39, 25 November 2023 (UTC)
- This could be part of a solution. Would need to also address what other axes? locations? dates? remove the most generic, yes? (as you mentioned, Paris, year of death...) I'll be pleasantly surprised if its easier than avoiding overrides. RudolfoMD (talk) 07:20, 25 November 2023 (UTC)
- I would not participate on developing a bot which *removes* sourced statements, as opposed to up-ranking. Vojtěch Dostál (talk) 07:12, 25 November 2023 (UTC)
- I agree. I also see a lot of possible criticism from other users who don't want to delete everything that three users here wish.--Frettie (talk) 12:01, 25 November 2023 (UTC)
- Frettie, I still see this as low level communication. On the one hand, because you know full well that it is not the wish of three users, since I have aggregated how many editors disagree with the editing principle of your bot. On the other hand, because what you read is a suggestion in the direction of compromise. You don't have to accept it, but not to discuss it is to reject the compromise. Please consider this to be the first suggestion in the debate between the two positions that points in the direction of a possible solution. (However, it is possible that RudolfoMD is right, and that it is a more complicated solution than setting the bot to edit once per data, but in a democracy sometimes the more complicated and costly solutions represent the consensus.) Pallor (talk) 12:15, 25 November 2023 (UTC)
- I agree. I also see a lot of possible criticism from other users who don't want to delete everything that three users here wish.--Frettie (talk) 12:01, 25 November 2023 (UTC)
- Removal where? --Emu (talk) 23:25, 24 November 2023 (UTC)
- I'm skeptical Frettie is willing to make such bot. Evidence is needed. (Also, my read is that there is much opposition to this solution, as there is a lot of support for not adding low-quality position info when there is high-quality position info; the bot should simply be modified to stop adding low-quality position info when there is high-quality position info. Many maintain it is the case that Frettiebot kept adding 'unoverwritable' data again and again because deprecation is not the correct solution; saying it is over and over doesn't make it so. And you've been chastised for pushing this over and over already, e.g. by Jonathan Groß.) There's already a ton of info on what the bot should not add for Frettie to act on, but no interest expressed in doing so that I have seen. RudolfoMD (talk) 21:17, 24 November 2023 (UTC)
- The request to assign a preferred rank if a more precise information is already available seems fair to me. --Emu (talk) 09:17, 24 November 2023 (UTC)
- I don't think that a bot should judge whether sufficiently sourced data is "unnecessary" or not. I do think however ranking correctly can be requested from a bot. A bot should not be asked to deprecate correct data, but it can be asked to give preferred rank to more (in fact the most) precise data, which it can determine by subclass of (P279). Is Frettie capable to change its bot in order to take care of this? If data is wrong, but sourced, it should be deprecated if a (bot or a) human notices that. I noted that Frettiebot recognise deprecated data and does not change its ranking. Correct but possibly "unnecessary" data I judge as unproblematic if coming from a source that has shown to be a useful, as is the case in this discussion. --Lymantria (talk) 19:52, 23 November 2023 (UTC)
Second Summary
To sum up again: The bot is currently blocked [u]ntil resolution of issues on Frettiebot's editing. When questioned what specifically has to change, a few ideas emerged:
- Data should only be added once: MisterSynergy’s assessment of the impracticality of this request has not been substantially questioned, at least I haven’t found a rebuttal when reading the whole discussion again.
- The bot should keep track of what it has added and not override manual deletions: Do you have the same doubts that apply to your response to request #1, @MisterSynergy?
- The bot should set the most precise occupations to preferred rank: There seems to be no real opposition but it seems to be questionable if that‘s really the problem.
- Certain values should be avoided: The interested parties haven’t come up with a list of those values.
- The bot should delete imprecise statements even when sourced.
- Low-quality communication should be improved upon.
The main problem with #5 seems to be that this goes against several Wikidata principles. The problem with #6 seems to be that it’s unclear what should change and how change would be measured. --Emu (talk) 14:42, 25 November 2023 (UTC)
- What I contributed earlier to this discussion still stands. Bot editing is effectively a stateless operation; a bot does not have sufficient access to its previous edits, or to edits others have made to a given page. While revision histories and contribution lists can be accessed to read revision metadata, it is super difficult to extract useful information from it regarding the actual editorial content of an edit. It is thus reasonable to assume that by default all bots do not know anything about past activity; and that every bot operates based only on the current state of an item page, and the content of an external source (in this case).
- In order to change that, a bot operator would somehow need to set up a shadow database regarding previous edits of their bot, but given the wide range of different edits a bot can make, it is unclear how this could work in a reliable way and there is no existing solution one could readily use. If a bot would be required to do this, it would effectively render its operation impossible.
- In other words: #1 and #2 would kill this bot, and set a dangerous precedent for future cases. —MisterSynergy (talk) 18:33, 25 November 2023 (UTC)
- I think #1 is a perfectly reasonable request in this context, which is a bot that got blocked mostly because it kept adding data even when people were removing it. I don't agree that we can't ask for it because it wouldn't work for all bots ever.
- Is it impossible for this bot to make a reasonable attempt to not upload the same data? (I don't think anyone has said yes or no on this - only talked about general precedents.) I don't know how its data is generated, but it feels like this should be achievable. No need for edit-history parsing or 100% accuracy, just a reasonable good-faith attempt to avoid pushing the same data into WD over and over again. Most bots & batch uploaders seem to manage it. Andrew Gray (talk) 00:47, 26 November 2023 (UTC)
- After rereading the whole thing again, I'm still not convinced that this a bot problem at all.
- 1 and 2 at least wouldn't arose if people deprecated validly sourced statement instead of deleting them because they think that the source is worthless. I certainly do thing that some sources are worthless, but not national library authority files. I have only seen Walt Whitman (Q81438) put forward as an example of why library authority files shouldn't be used, and apart from the dates that are year and not day level I don't see anything wrong with the data.
- 3 might be a good idea theoretically, but if that means pushing as preferred rank values which are unsourced I don't think it's a progress. As a basic we also need to be sure of our subclasses' modelling quality.
- 4 as for dates, following the example I took earlier, an improvement would be not to import dates when a more precise and sourced value is available, though I'm not sure if a bot can tell that a value is more precise than another without a qualifier to say so. However my main concern with Frettiebot is when it's edit warring with KrBot over autofixed values. This is what lead to László Szalay (Q1294312) or Ferdinand Friedensburg (Q895898) situation with member of parliament (Q486839). But Vojtěch said previously that keeping track of all autofix template is difficult and I see no reason not to believe them. Still, that would be a really good thing.
- 5 is an absolute no.
- 6 frankly, I have seen a lot of passive-aggressive comments or outright mistrust of good faith in this thread and I think it isn't only on the bot operator side to improve their communication. --Jahl de Vautban (talk) 07:01, 26 November 2023 (UTC)
- A few notes on what @Andrew Gray and @Jahl de Vautban have written (thank you both for civil comments, appreciated). This discussion may create an impression that this bot's edit have a high revert rate. I don't think this is true - 99.99% edits are OK. The largest share of the edit-warring is the aforementioned Autofix template, which develops over time and it is sometimes hard for us to keep pace with it - even though we try to do our best. This issue is relevant for all potential similar bots and I would like that we build a framework that provides easy access to Autofix commands to all bots in real-time. This is something we are actively thinking about. The last concern is the "add-only-once" policy. Adding our data only once is difficult (among other reasons) because the entries improve over time, and we want to make these updates appear in Wikidata. Therefore, a more complex system outlined by @MisterSynergy would be required. It is not impossible but I think that a better solution - more in touch with our existing policies - is to deprecate the (very few) outright-false statements which may appear in National Authority files. Vojtěch Dostál (talk) 07:44, 26 November 2023 (UTC)
- SIMPLE: Lymantria blocked Frettiebot "Until resolution of issues on Frettiebot's editing". The consensus that issues with the bot's editing require code changes (which were not forthcoming) is is what caused the block and are the reason it's still in place. Code changes to address the issues haven't been made, and Frettie has expressed little interest in making any. There are no grounds for unblocking the bot. The end.
- 4: Misconstrues. There's a lack of consensus on those values.
- How long will we argue over how many angels can dance on the head of a pin? I think this conversation should be closed; at this rate, it'll be months to get anywhere close to resolution. Not worth it; seems more disruptive that productive. Unsubscribing. RudolfoMD (talk) 12:17, 26 November 2023 (UTC)
- It would be simple if you would use ranks, just as it is the norm in Wikidata and as it was suggested several times in this discussion. IMO the bot should immediately be unblocked without any requirements. —MisterSynergy (talk) 12:51, 26 November 2023 (UTC)
Boushaki family
I was trying to warn all wikipedias that Mustapha Ishak-Boushaki (Q25455134) has been creating articles for himself in more than a hundred wikipedias, when I realised this was only the tip of the iceberg. He has been also creating articles about his whole family, as you can see here. I've been posting the {{Delete}}
template to dozens of articles like mg:Abdenour Boushaki. Is there a better way to fight against this kind of cross-wiki spam? Thanks in advance.
- More info: ca:Discussió:Mustapha Ishak Boushaki
- @Orland: Paucabot (talk) 14:48, 5 November 2023 (UTC)
- Thank you for fighting this war, @Paucabot. There might be some meta procedures fighting this, but I have not discovered them yet. It might however be a good idea to set up a central unit to describe and log the spamming and its deletion, like we did here and here. Then all local discussions can be pointed to one big description of the problem. Bw Orland (talk) 20:22, 5 November 2023 (UTC)
- Lately, I've been redirecting discussions here. Paucabot (talk) 20:53, 5 November 2023 (UTC)
Notability of one article subject with the name Boushaki
An innocent question: How is him having an article about himself in 93 wikis a problem if he meets notability criteria? (He is a full professor, after all). Jonathan Groß (talk) 20:59, 5 November 2023 (UTC)
- IMHO, he does not meet notability criteria: there are no independent sources covering his biography. All we can find are his published articles and news about him published by his own university. You can check the references yourself. And this is only talking about him: there are family members that are less notable than him but they are extremely overreferentiated, like sw:Mohamed Nassim Boushaki, sw:Feriel Boushaki, ha:Djilali Ishak Boushaki, es:Yahia Boushaki (político argelino) or sw:Khaled Boushaki. Others that I think that could be notable are es:Chahinez Boushaki or en:Sidi Boushaki, but I'm not entirely sure. Paucabot (talk) 21:11, 5 November 2023 (UTC)
- Well, @Jonathan Groß. It might seem like an what's really the problem?. As one of those who are trying to stop interwiki spamming, some of my points are a) that these people are misusing Wikipedia to promote themselves. So there is first the Conflict of Interest question. And b) a huge number of IW is IMHO a false signal of significance. I list some cases at my english userpage. Like the 2009 case of the italian actress who then had an wp-article in 43 languages, even though she had hardly played outside Italy. In comparision Roberto Benigni appeared in 34 and Isabella Rossellini i 19 languages. Bw Orland (talk) 22:10, 5 November 2023 (UTC)
- For this cross-wiki spam, at least users Shamilouv, Authentise, Touroukiyya, Iamovich, Soufiyoune, Tidjany, Thalayous, Zouaoui16, Lalitose, Futbalino, Bengalios, Malikose, Soufiyoune, Marinianse, Moscovas, Fatma0005, Mamillia, Dmytrouf, Amarusse, Lollita14, Soufiyya, Misticose, Mahayero, Boushakino, Versitioh, Houloumy, Maloubiou, Xylocopiya, Waloinia, Egoloun, Uppsalask, Tyfoulio, Djamiloub, Yaltabov, Xoulouj, Picosou, Tyrouly, Flanovoi, Zeaulov, Fixiouse, Buffalouse, Foulouha, Nigeriou, Rwandania, Congolya, Kuroundia, Bareedanou, Wakomba, Shonobua, Faliotas, Gualiofa, Sandtown, Ergoutch, Oulumins, Tyliboup, Hukumoya, Guluping, Quanqun, Noukolp, Dounida, Darjio, Fulioju, Uluiona, Pacifishto, Khantoush, Houyouf, Valuatio, Juliouto, Rezki213, Khostostan, Bolchotin, Bengaloure, Astroubot, Ulimop, Hoggariou, Mandelaya, Giousseppe, Luftanio, Texassiou, Hamidofic, Khotopov, Baloufink, Viloupin, Koulibalou, Chyprious, Suzanoa, Moulania, Fayyoumy, Relativition, Balouristov, Familinou, Lyndaouva, Mandelaou, Irishmania, Maninanos, Hollandstug, Merchicha, Theses1234, Luiggini, Noor1972, Soufismo, Acildz, Michellines and Robocopat are globally blocked. Paucabot (talk) 22:48, 5 November 2023 (UTC)
- Some of the discussions:
- AR: ar:تصنيف:صانعو دمى جوارب/Zouaoui16 + ar:ويكيبيديا:طلب_تدقيق_مستخدم/أرشيف/2022/أبريل#Zouaoui16
- AZ: az:Müzakirə:İbrahim Buşaki
- AST: ast:Alderique:Mustapha Ishak Boushaki
- BCL: bcl:Olay:Mustapha Ishak Boushaki
- CA: ca:Discussió:Mustapha Ishak Boushaki + ca:Tema:Xsdne5ugfbtjh1no
- COMMONS: commons:Commons:Deletion requests/Creator:Mustapha Ishak Boushaki
- DE: de:Diskussion:Mustapha Ishak-Boushaki + de:Diskussion:Mohamed Belhocine + de:Benutzer Diskussion:Paucabot + de:Wikipedia:Administratoren/Anfragen/Archiv/2023/November#Cross-wiki spam + de:Wikipedia:Löschkandidaten/7. November 2023#Mustapha Ishak-Boushaki
- EN: en:Wikipedia:Articles for deletion/Mustapha Ishak-Boushaki + en:Wikipedia:Administrators' noticeboard/3RRArchive336#User:Zouaoui16 reported by User:Xyaena (Result: Blocked 1 week) + en:Talk:Mustapha Ishak Boushaki + en:Talk:Brahim Boushaki + en:Wikipedia:Administrators' noticeboard/IncidentArchive1142#Boushaki family cross-wiki spam + en:Wikipedia:Sockpuppet investigations/Zouaoui16
- ES: es:Discusión:Brahim Boushaki + es:Wikipedia:Tablón de anuncios de los bibliotecarios/Portal/Archivo/Miscelánea/Actual#Cross-wiki spam
- EU: eu:Eztabaida:Mustapha Ishak Boushaki
- EXT: ext:Caraba:Mustapha Ishak Boushaki
- FA: fa:بحث:مصطفی اسحاق بوسحاقی
- FR: fr:Wikipédia:Bulletin des administrateurs/Juin 2019#Palette prières et mosquées + fr:Wikipédia:Faux-nez/Melha
- GN: gn:Puruhára myangekõi:Hugo.arg#Cross-wiki spam
- GV: gv:Resooney:Mustapha Ishak-Boushaki
- HA: ha:Talk:Ali Boushaki#Cross-wiki spam
- ID: id:Pembicaraan:Mustapha Ishak-Boushaki/Arsip#Cross-wiki spam
- IT: it:Wikipedia:Pagine da cancellare/Mustapha Ishak Boushaki
- KAB: kab:Amyannan umsqedac:Johannnes89
- KSH: ksh:Klaaf:Mustapha Ishak Boushaki
- LV: lv:Diskusija:Džīlāli Ishāks Būshāki + lv:Diskusija:Mustafa Ishāks Būshaki
- META: m:User talk:علاء#crosswiki spam? + m:Talk:Steward requests/Global#Boushaki family cross-wiki spam + m:Talk:Wikiproject:Antispam#Boushaki family + m:Talk:Steward requests/Miscellaneous#Cross-wiki spam Boushaki family
- MT: mt:Diskussjoni:Mustapha Ishak Boushaki
- MZN: mzn:گپ:مصطفی اسحاق بواسحاقی
- PCD: pcd:Discussion:Mustapha Ishak Boushaki
- PT: pt:Usuário(a) Discussão:Zouaoui16 + pt:Wikipédia:Pedidos a verificadores/Arquivo/2021/05#Lollitano
- RU: ru:Обсуждение:Мустафа Исхак Бусхаки + ru:Википедия:К_удалению/13_ноября_2023#c-QBA-II-bot-20231113060600-Бусхаки,_Мухаммад_Аль-Сагир
- SW: sw:Majadiliano ya mtumiaji:Paucabot
- TG: tg:Баҳс:Мустафо Исҳоқ Бушҳоқӣ
- UK: uk:Вікіпедія:Кнайпа (адміністрування)#c-Seva Seva-20231106211900-Крос-вікі спам і проблема бота Paucabot
- UR: ur:تبادلۂ خیال:ابراہیم بوسحاقى
- ZH: zh:Talk:穆斯塔法·伊沙克·布塞哈基 Paucabot (talk) 07:16, 6 November 2023 (UTC)
- Wikidata items with sitelinks: Ali Boushaki (Q28936494), Boushaki universe expansion tool (Q106525765), Mustapha Ishak-Boushaki (Q25455134), Djilali Ishak Boushaki (Q106376074), Q106509002, Mohamed Seghir Boushaki (Q24205953), Q112121520, Mohamed Seghir Boushaki (Q24205953), Amine Iben El Boushaki (Q25455322), Brahim Boushaki (Q25455059), Rabah Rahmoune (Q111521941), Yahia Boushaki (Q42331664), Khaled Boushaki (Q105953875), Sidi Boushaki (Q19629001), Shahnez Boushaki (Q106522361), Toufik Boushaki (Q28756434), Mohamed Nassim Boushaki (Q106299615), Abderahmane Boushaki (Q45107526), Bouzid Boushaki (Q111263400), Category:Boushaki family (Q105974898), Q112115658, Boushaki (Q18589908), Abdenour Boushaki (Q106313060), Brahim Boushaki Library (Q111904760), Yahia Boushaki Boulevard (Q65716406), Yahia Boushaki Tramway Station (Q111479348), 1920 Algerian Political Rights Petition (Q110233804), Zawiyet Sidi Boushaki (Q43033365), Mohamed Belhocine (Q19606689) and Boushaki (Q121781793)
- Wikidata items with no sitelinks: Q122239348, Zakari Ishak-Boushaki (Q122239503), Q112705217, Cheikh Mohamed Boushaki (Q111442083), Dijlali Boushaki (Q106451053), Q112623711, Q112624164, Q112630843, Fayçal Rahmoune (Q106418343), Adel Rahmoune (Q103320382), Q122227768, Rabah Boushaki (Q122920425) and Q122227804. Paucabot (talk) 13:31, 6 November 2023 (UTC)
- The main article, Mustapha Ishak-Boushaki (Q25455134), has been deleted in bgwiki, gcrwiki, xmfwiki, gvwiki, wawiki, pswiki, huwiki, bswiki, rowiki, kshwiki, srwiki, ukwiki, idwiki, afwiki, eswiki, uzwiki, lgwiki, nahwiki, azwiki, extwiki, euwiki, ptwikinews, enwikiquote, bnwiki, hifwiki, hiwiki, eswikinews, diqwiki, fiu_vrowiki, arwiki, gawiki, ruwikinews, cawiki, astwiki, nywiki, cywiki, mswiki, dewikinews, enwikinews, bat_smgwiki, nlwiki, enwikiversity, elwiki, sewiki, szlwiki, hsbwiki, plwiki, fiwiki, svwiki (twice), nowiki, lijwiki, tkwiki, arzwiki, dawiki, bewiki, jawiki, dewiki (twice), ukwiki, itwiki (twice), ptwiki, frwiki (three times), trwiki (twice), enwiki (twice), ruwiki and zhwiki. Paucabot (talk) 13:51, 6 November 2023 (UTC)
- I just locked another bunch of their socks! The projects need to be notified in some way, I have the same problem with another cross-wiki spammers, who try to create everywhere the pages about two not notable people (Q123257513 and Q123140531). It's difficult to notify everyone about this kind of spam! Superpes15 (talk) 17:15, 6 November 2023 (UTC)
- Irish Wikipedia (ga) sysop here. @Paucabot: - may I recommend that local sysops SALT the articles, per en:WP:SALT, to prevent their re-creation. Enough is enough - Alison (talk) 08:07, 8 November 2023 (UTC)
- I'm logging here my comment in the de:wp deletion dicussion. An excerpt: This is a man/family who has spammed the whole Wikipedia community to a level I've never seem before. He has submitted personal and unsorced information about himself and his family to Wikidata as if Wikidata were Tinder and LinkedIn. Bw, Orland (talk) 11:42, 10 November 2023 (UTC)
- Granted that he's spamming himself, but he is genuinely a Fellow of the American Association for the Advancement of Science, which could justify claims of notability? DS (talk) 18:50, 11 November 2023 (UTC)
- It seems it is, indeed. But I don't think that mention (that is given to more than five hundred people every year) grants notability to him. I haven't seen any reliable source covering significantly his biography. All the references are his own works or news from his own university. Not even in https://www.aaas.org/fellows/historic is there any kind of biography of him. And there are even less evidences for his nephew Djilali Ishak Boushaki (Q106376074), cousin Bouzid Boushaki (Q111263400) or the majority of his relatives. Paucabot (talk) 19:57, 11 November 2023 (UTC)
- And there are even items of his two sons, Q122239348 and Zakari Ishak-Boushaki (Q122239503)... Paucabot (talk) 20:03, 11 November 2023 (UTC)
- I find it difficult to delete ckb:Mustapha_Ishak_Boushaki, i see it as a notable professor. Sakura emad (talk) 13:02, 12 November 2023 (UTC)
- @Sakura emad: This article only has three sources: two of them are from his own university and the third one only lists his name and university (Mustapha Ishak-Boushaki The University of Texas at Dallas). I think it's not nearly enough to be considered notable. There's not any reliable source (of course, external) covering his biography. Paucabot (talk) 13:12, 12 November 2023 (UTC)
- @Paucabot I anticipated you would mention that, but English Wikipedia clearly indicates that this professor is eligible for an independent article. When I mentioned finding it difficult to delete it, I consulted English Wikipedia sources, and it appears that the subject is notable enough to warrant staying on the platform rather than being deleted.
Well, unless you have any specific objections to English Wikipedia itself, I'm afraid the article will remain as it is. Sakura emad (talk) 16:23, 12 November 2023 (UTC)
- @Sakura emad: For me, the English article is a clear case of curriculumesque spam with the majority of references being his papers and news from his university, but I will not decide what to do at :en nor in :ckb. I only deleted this article in Catalan Wikipedia. A hundred other wikipedias have already deleted this article but some have decided not to do it. I can just inform about it all of them. And that's what I've done. Thanks anyway, Paucabot (talk) 16:47, 12 November 2023 (UTC)
- The English Wikipedia indicates no such thing. Once real editors got at it — indeed once the article subject xyrself became aware of it — a lot of things written by the creator and single contributor got edited out (by both the article subject and editors of long-standing) and it became very obvious that a lot of it was sourced to interpretations of primary source materials that were by the subject not about the subject. I suggest that you be thorough in your source reviews. English Wikipedia editors were, and the article turns out to have quite a lot of unsupported material. Uncle G (talk) 08:31, 14 November 2023 (UTC)
- @Sakura emad: For me, the English article is a clear case of curriculumesque spam with the majority of references being his papers and news from his university, but I will not decide what to do at :en nor in :ckb. I only deleted this article in Catalan Wikipedia. A hundred other wikipedias have already deleted this article but some have decided not to do it. I can just inform about it all of them. And that's what I've done. Thanks anyway, Paucabot (talk) 16:47, 12 November 2023 (UTC)
- @Paucabot I anticipated you would mention that, but English Wikipedia clearly indicates that this professor is eligible for an independent article. When I mentioned finding it difficult to delete it, I consulted English Wikipedia sources, and it appears that the subject is notable enough to warrant staying on the platform rather than being deleted.
- @Sakura emad: This article only has three sources: two of them are from his own university and the third one only lists his name and university (Mustapha Ishak-Boushaki The University of Texas at Dallas). I think it's not nearly enough to be considered notable. There's not any reliable source (of course, external) covering his biography. Paucabot (talk) 13:12, 12 November 2023 (UTC)
- I find it difficult to delete ckb:Mustapha_Ishak_Boushaki, i see it as a notable professor. Sakura emad (talk) 13:02, 12 November 2023 (UTC)
- And there are even items of his two sons, Q122239348 and Zakari Ishak-Boushaki (Q122239503)... Paucabot (talk) 20:03, 11 November 2023 (UTC)
- It seems it is, indeed. But I don't think that mention (that is given to more than five hundred people every year) grants notability to him. I haven't seen any reliable source covering significantly his biography. All the references are his own works or news from his own university. Not even in https://www.aaas.org/fellows/historic is there any kind of biography of him. And there are even less evidences for his nephew Djilali Ishak Boushaki (Q106376074), cousin Bouzid Boushaki (Q111263400) or the majority of his relatives. Paucabot (talk) 19:57, 11 November 2023 (UTC)
- Granted that he's spamming himself, but he is genuinely a Fellow of the American Association for the Advancement of Science, which could justify claims of notability? DS (talk) 18:50, 11 November 2023 (UTC)
- I'm logging here my comment in the de:wp deletion dicussion. An excerpt: This is a man/family who has spammed the whole Wikipedia community to a level I've never seem before. He has submitted personal and unsorced information about himself and his family to Wikidata as if Wikidata were Tinder and LinkedIn. Bw, Orland (talk) 11:42, 10 November 2023 (UTC)
- Irish Wikipedia (ga) sysop here. @Paucabot: - may I recommend that local sysops SALT the articles, per en:WP:SALT, to prevent their re-creation. Enough is enough - Alison (talk) 08:07, 8 November 2023 (UTC)
- I just locked another bunch of their socks! The projects need to be notified in some way, I have the same problem with another cross-wiki spammers, who try to create everywhere the pages about two not notable people (Q123257513 and Q123140531). It's difficult to notify everyone about this kind of spam! Superpes15 (talk) 17:15, 6 November 2023 (UTC)
- The main article, Mustapha Ishak-Boushaki (Q25455134), has been deleted in bgwiki, gcrwiki, xmfwiki, gvwiki, wawiki, pswiki, huwiki, bswiki, rowiki, kshwiki, srwiki, ukwiki, idwiki, afwiki, eswiki, uzwiki, lgwiki, nahwiki, azwiki, extwiki, euwiki, ptwikinews, enwikiquote, bnwiki, hifwiki, hiwiki, eswikinews, diqwiki, fiu_vrowiki, arwiki, gawiki, ruwikinews, cawiki, astwiki, nywiki, cywiki, mswiki, dewikinews, enwikinews, bat_smgwiki, nlwiki, enwikiversity, elwiki, sewiki, szlwiki, hsbwiki, plwiki, fiwiki, svwiki (twice), nowiki, lijwiki, tkwiki, arzwiki, dawiki, bewiki, jawiki, dewiki (twice), ukwiki, itwiki (twice), ptwiki, frwiki (three times), trwiki (twice), enwiki (twice), ruwiki and zhwiki. Paucabot (talk) 13:51, 6 November 2023 (UTC)
- Some of the discussions:
- For this cross-wiki spam, at least users Shamilouv, Authentise, Touroukiyya, Iamovich, Soufiyoune, Tidjany, Thalayous, Zouaoui16, Lalitose, Futbalino, Bengalios, Malikose, Soufiyoune, Marinianse, Moscovas, Fatma0005, Mamillia, Dmytrouf, Amarusse, Lollita14, Soufiyya, Misticose, Mahayero, Boushakino, Versitioh, Houloumy, Maloubiou, Xylocopiya, Waloinia, Egoloun, Uppsalask, Tyfoulio, Djamiloub, Yaltabov, Xoulouj, Picosou, Tyrouly, Flanovoi, Zeaulov, Fixiouse, Buffalouse, Foulouha, Nigeriou, Rwandania, Congolya, Kuroundia, Bareedanou, Wakomba, Shonobua, Faliotas, Gualiofa, Sandtown, Ergoutch, Oulumins, Tyliboup, Hukumoya, Guluping, Quanqun, Noukolp, Dounida, Darjio, Fulioju, Uluiona, Pacifishto, Khantoush, Houyouf, Valuatio, Juliouto, Rezki213, Khostostan, Bolchotin, Bengaloure, Astroubot, Ulimop, Hoggariou, Mandelaya, Giousseppe, Luftanio, Texassiou, Hamidofic, Khotopov, Baloufink, Viloupin, Koulibalou, Chyprious, Suzanoa, Moulania, Fayyoumy, Relativition, Balouristov, Familinou, Lyndaouva, Mandelaou, Irishmania, Maninanos, Hollandstug, Merchicha, Theses1234, Luiggini, Noor1972, Soufismo, Acildz, Michellines and Robocopat are globally blocked. Paucabot (talk) 22:48, 5 November 2023 (UTC)
- Well, @Jonathan Groß. It might seem like an what's really the problem?. As one of those who are trying to stop interwiki spamming, some of my points are a) that these people are misusing Wikipedia to promote themselves. So there is first the Conflict of Interest question. And b) a huge number of IW is IMHO a false signal of significance. I list some cases at my english userpage. Like the 2009 case of the italian actress who then had an wp-article in 43 languages, even though she had hardly played outside Italy. In comparision Roberto Benigni appeared in 34 and Isabella Rossellini i 19 languages. Bw Orland (talk) 22:10, 5 November 2023 (UTC)
A series of "new" users
Some weeks ago, on october 7th, I rushed through some of the editions, leaving {{delete|Machine translated crosswiki spam}} at many of them. Today, someone in the Boushaki family, eager to keep this massive selfpromoting spam operation running, has created a new user, in order to remove my deletetion notices, with the false claim Removing spam vandalism of a banned user. Look at
- Hupolio (talk • contribs • logs) at https://guc.toolforge.org/?by=date&user=Hupolio
Hupulio aslo claims that Paucabot is a banned user. To @Jonathan Groß: In my experience, it is quite characteristic for these spamming operations that they are not arguing for notability, but are responing with lies and accusations against those of us standing up for the sake of Wikipedia's reliability. Bw Orland (talk) 10:26, 6 November 2023 (UTC)
- New users in this business today are Lalamoi (talk • contribs • logs) and Kontaktou (talk • contribs • logs). Orland (talk) 12:43, 6 November 2023 (UTC)
- Also WikipsBot (talk • contribs • logs). Paucabot (talk) 12:49, 6 November 2023 (UTC)
- Also Ljubjiano (talk • contribs • logs). This is really what we in Norway would call a Duracell bunny. Bw Orland (talk) 13:44, 6 November 2023 (UTC)
- And we have Ulmanous (talk • contribs • logs). Is anyone counting? Bw Orland (talk) 14:38, 6 November 2023 (UTC)
- Now the active one is JeanPaul02. Paucabot (talk) 11:48, 7 November 2023 (UTC)
- There is now a simulated bot BideBot doing some strange editions on some articles: Special:CentralAuth/BideBot. Paucabot (talk) 09:55, 26 November 2023 (UTC)
- Now the active one is JeanPaul02. Paucabot (talk) 11:48, 7 November 2023 (UTC)
The translation method
When i looked at the source code in the Faroes edition of fo:Mustapha Ishak Boushaki, I discovered something funny, strange and revealing: Boushaki seem to copy his article structure from Einstein biographies, and in this edition he left
{{DEFAULTSORT:Einstein, Albert}}
unaltered when he published. I've seen something similar some years ago, in en:Wikipedia:Articles for deletion/Curdy where an author used J.K.Rowling biographies as his structure. Bw Orland (talk) 12:55, 7 November 2023 (UTC)
Attribution
Have a care with attributing this to a named person. See w:en:Project:Administrators' noticeboard/Incidents#Boushaki family cross-wiki spam for what turns out to be the case upon deeper investigation. Uncle G (talk) 03:14, 12 November 2023 (UTC)
More cases
Next is Joao Grimaldo (Q108153171). It went from 16 interwikis in september to the 115 that has now. All the major edits done by Sonia197881. The articles must be machine-translated like this: eu:Joao Grimaldo. Paucabot (talk) 09:38, 19 November 2023 (UTC)
- And I still found another one: Percy Liza (Q107674650), that has 81 interwikis. Edits made by different users Abel2001, Club Sporting Cristal, Awindy1712, Genio2022, MrKDunleavy, LiviaQEZ, Augusto Martínez Rimarachín, Palabra de Gol and PERU2022 and Roberto2043. Both cases seem to be related as the players play at the same team. Paucabot (talk) 09:48, 19 November 2023 (UTC)
- Another case, that seems unrelated to the previous two: Torsten Haß (Q108493269) (103 sitelinks now). Paucabot (talk) 09:55, 19 November 2023 (UTC)
- Hello, @Paucabot. Obviously spamming. And not to be recommended. But both Percy Liza (Q107674650) and Joao Grimaldo (Q108153171) are playing football on a level (national team and winning team in own country) that is harder to delete. Home in no:wp we have a well established policy about football player notability. And both of there are within.
- When it comes to Torsten Haß (Q108493269) on the other hand, there seem to be an interesting discussion at de:Diskussion:Torsten_Haß: most of his books seem to be selfpublished, and the Wikipedia entries are thus blatant vanity. Bw Orland (talk) 21:14, 20 November 2023 (UTC)
- Of course. The two football players should not be deleted if they have good translations, which I don't know if it's the case.
- Torsten Haß (Q108493269): the Catalan article is a very bad translation and there's not one single source that covers his biography in detail. I suppose it will be speedily deleted. Paucabot (talk) 06:53, 21 November 2023 (UTC)
- While, as in the case above, they can be notable, so the biography can not be deleted on various projects, we have to contrast the cross-wiki spamming and the machine translation. We cannot accept this behavior. Superpes15 (talk) 10:14, 21 November 2023 (UTC)
- Another case, that seems unrelated to the previous two: Torsten Haß (Q108493269) (103 sitelinks now). Paucabot (talk) 09:55, 19 November 2023 (UTC)
Clarify Wikidata:Account creators abilities and where to request
For now, we can see Wikidata:Account creators said this user group has override-antispoof
and tboverride-account
(this one should be tboverride
I think?), but according to Special:ListGroupRights#accountcreator, they don't have these two rights. Second, we don't have this user group in Wikidata:Requests for permissions, maybe we should add it? (Or maybe Wikidata:Bureaucrats' noticeboard)--S8321414 (talk) 11:51, 15 November 2023 (UTC)
- How often is this needed? Wouldn't it be enough for admins to just do those things? ChristianKl ❪✉❫ 19:16, 15 November 2023 (UTC)
- I have no idea how often is this needed, but you can check its talk page, someone asked for this permission but nowhere can request (though it is a 2021 request...). So I think clarify how this permission can be requested would be good idea.--S8321414 (talk) 23:49, 15 November 2023 (UTC)
- I believe this is used primarily for outreach events, like hackathons or educational courses. People need to be able to sign up for accounts without an administrator necessarily being on hand. This right is often given temporarily. Bovlb (talk) 00:37, 16 November 2023 (UTC)
- Should be, but we don't have related information on that page.--S8321414 (talk) 11:26, 17 November 2023 (UTC)
- Can you write an RFC for adding a page to request the user group (and a description about when the right should be granted) along with a request to have for the user group having access to the two rights? ChristianKl ❪✉❫ 14:51, 19 November 2023 (UTC)
- Done--S8321414 (talk) 12:39, 20 November 2023 (UTC)
- @S8321414: Currently that page lists a series of questions. It would be better if you make a proposal that people can then either support or reject. Without a clear proposal, it's harder for it to move forward. ChristianKl ❪✉❫ 13:00, 23 November 2023 (UTC)
- Added :)--S8321414 (talk) 13:41, 23 November 2023 (UTC)
- @S8321414: Currently that page lists a series of questions. It would be better if you make a proposal that people can then either support or reject. Without a clear proposal, it's harder for it to move forward. ChristianKl ❪✉❫ 13:00, 23 November 2023 (UTC)
- Done--S8321414 (talk) 12:39, 20 November 2023 (UTC)
- Can you write an RFC for adding a page to request the user group (and a description about when the right should be granted) along with a request to have for the user group having access to the two rights? ChristianKl ❪✉❫ 14:51, 19 November 2023 (UTC)
- Should be, but we don't have related information on that page.--S8321414 (talk) 11:26, 17 November 2023 (UTC)
- I believe this is used primarily for outreach events, like hackathons or educational courses. People need to be able to sign up for accounts without an administrator necessarily being on hand. This right is often given temporarily. Bovlb (talk) 00:37, 16 November 2023 (UTC)
- I have no idea how often is this needed, but you can check its talk page, someone asked for this permission but nowhere can request (though it is a 2021 request...). So I think clarify how this permission can be requested would be good idea.--S8321414 (talk) 23:49, 15 November 2023 (UTC)
Is anyone happy that it would be some human (probably hardly notable)? Infovarius (talk) 15:52, 15 November 2023 (UTC)
- CC @Tol as item creator. Bovlb (talk) 16:43, 15 November 2023 (UTC)
- I was creating items for Proceedings for the 38th Annual Symposium on Telescope Science (Q123411364) articles and saw that this number was coming up; I thought I could time some item creations to use this item number for my (indeed barely notable) self. I'm still doing author disambiguation, so not everything is nicely linked yet, but this is for Classification of Washington Double Star Systems Using Escape Velocities Based on Measurements from Gaia DR2 (Q123457065)author (P50)Q123456789. I think this is the Wikidata equivalent of a vanity license plate — fun to have but completely useless, haha. Tol (talk | contribs) @ 17:35, 15 November 2023 (UTC)
- I'm not at all keen on the practice of creating bare items in order to reserve interesting Q numbers, but I see no justification here to repurpose this item. Please flesh it out soon. See also Wikidata:Project_chat/Archive/2023/10#Should_Q123456789_be_special?. Bovlb (talk) 17:43, 15 November 2023 (UTC)
- I'm in the process of getting author items linked and data added for all of the articles I created items for. Tol (talk | contribs) @ 17:55, 15 November 2023 (UTC)
- I frankly find the reasoning behind this creation highly questionable. It’s purely vanity. —-Jahl de Vautban (talk) 18:39, 15 November 2023 (UTC)
- I'm in the process of getting author items linked and data added for all of the articles I created items for. Tol (talk | contribs) @ 17:55, 15 November 2023 (UTC)
- Do we regard proceedings from conference notable? If so, I'll create hundreds of them for our institution. --Infovarius (talk) 18:49, 15 November 2023 (UTC)
- Hundreds of items is very little in the context of Wikidata. ChristianKl ❪✉❫ 19:02, 15 November 2023 (UTC)
- @Infovarius, Wikidata currently has more than 6000 proceedings (Q1143604) items (search) and more than 13000 conference paper (Q23927052) items (search). I would think that proceedings from reasonably notable conferences would be themselves notable under criterion 2. Proceedings for the Symposium on Telescope Science (Q123299064) is indexed on Astrophysics Data System (Q752099), so I would consider it notable. Tol (talk | contribs) @ 19:02, 15 November 2023 (UTC)
- I'm not at all keen on the practice of creating bare items in order to reserve interesting Q numbers, but I see no justification here to repurpose this item. Please flesh it out soon. See also Wikidata:Project_chat/Archive/2023/10#Should_Q123456789_be_special?. Bovlb (talk) 17:43, 15 November 2023 (UTC)
- I was creating items for Proceedings for the 38th Annual Symposium on Telescope Science (Q123411364) articles and saw that this number was coming up; I thought I could time some item creations to use this item number for my (indeed barely notable) self. I'm still doing author disambiguation, so not everything is nicely linked yet, but this is for Classification of Washington Double Star Systems Using Escape Velocities Based on Measurements from Gaia DR2 (Q123457065)author (P50)Q123456789. I think this is the Wikidata equivalent of a vanity license plate — fun to have but completely useless, haha. Tol (talk | contribs) @ 17:35, 15 November 2023 (UTC)
- Is there a reason not to delete this item? It‘s repurposed and doesn’t even seem to have any particular meaning in its first version. --Emu (talk) 19:22, 15 November 2023 (UTC)
- I would support deletion - Nikki (talk) 19:38, 15 November 2023 (UTC)
- It should not take any editor over an hour to add the first claim and over two hours to add the first label. Tol appears to have created dozens of such long-empty items in the span of 15s, presumably with the specific intention of gaining a prestige number. All of these items could easily have been deleted as empty. Now they've been filled in (or merged), notability is a question for RFD. While this is clearly bad editing practice on Tol's part, I don't think we need to bicker over this pointless "prize", and we can use our normal processes for notability and deletion. Bovlb (talk) 19:41, 15 November 2023 (UTC)
- The item had no meaning at first (plans purely in the minds of users aren’t relevant to me), then became a geographical object and then a specific human. That‘s repurposing and that’s grounds for deletion even if one or both meanings are notable. --Emu (talk) 13:54, 16 November 2023 (UTC)
To me, it looks like this was an item about a mountain pass (before that it was empty, no statements at all), which was repurposed to be about a human. We do not repurpose items, so I would say either we revert it to the mountain pass, or we delete it. Jean-Fred (talk) 07:23, 16 November 2023 (UTC)
- The repurposing was the other direction. Tol (talk | contribs) @ 11:53, 16 November 2023 (UTC)
- My take (as I said in my previous message) was that the item was empty before being a mountain pass: there were no statements at all, no labels. The English description does not count in my book. Jean-Fred (talk) 16:12, 16 November 2023 (UTC)
- When @OddlyAngled repurposed the item, it was 11 minutes after item creation. While it's not good editing practice to leave empty items sitting for even that long, this is well within the normal grace period we allow editors for fixing empty items. (I usually allow at least an hour after last edit before deleting.) While the item was empty, there was a description, and the repurposing was clearly not justified. It took Tol 36 minutes to revert the repurposing and another 53 minutes to add the first claim. Many admins would have deleted the item if they had come across it in that period, but that's not where we are now.
- There is no case here for forced repurposing. The is the wrong venue for a deletion discussion. Bovlb (talk) 17:29, 16 November 2023 (UTC)
- it looked like there were quite a lot of empty, otherwise useless, items created in an attempt to id squat. then they all sat empty for quite a while. it wasn't a single item. OddlyAngled (talk) 17:36, 16 November 2023 (UTC)
- clearly they were not intended to be used, nor valuable - many have already been replaced with forwards e.g. Q123456778 OddlyAngled (talk) 18:01, 16 November 2023 (UTC)
- There is no question that Tol was messing about here to "id squat", creating many empty items and duplicates. We see that a lot from new users, but Tol is experienced enough to know better. I can understand how that might cause annoyance, especially in anyone who had a proposal for this prestige number that was not mere self-promotion. We should not, however, be ignoring our normal processes in order to inflict some sort of punishment. The disruption is done and is (hopefully) unlikely to be repeated. We have a process to debate notability and delete items. We do not condone repurposing. Bovlb (talk) 18:33, 16 November 2023 (UTC)
- fair enough but deleting the item after it is only a day old seems not like punishment. a new item can easily be created. OddlyAngled (talk) 18:43, 16 November 2023 (UTC)
- It sounds like a bit of a punishment to me. The item has clear purpose and notability and there's no actual reason to delete it, as far as I can see. Even if it was created under such strange and unsound circumstances. Vojtěch Dostál (talk) 19:26, 16 November 2023 (UTC)
- I apologise for my conduct with regard to this incident (both with regards to 'squatting' on an interesting number, and not filling out the items quickly). I did not anticipate that this would cause disruption (as it has evidently done), and will not repeat either behaviour. Tol (talk | contribs) @ 19:43, 16 November 2023 (UTC)
- fair enough but deleting the item after it is only a day old seems not like punishment. a new item can easily be created. OddlyAngled (talk) 18:43, 16 November 2023 (UTC)
- There is no question that Tol was messing about here to "id squat", creating many empty items and duplicates. We see that a lot from new users, but Tol is experienced enough to know better. I can understand how that might cause annoyance, especially in anyone who had a proposal for this prestige number that was not mere self-promotion. We should not, however, be ignoring our normal processes in order to inflict some sort of punishment. The disruption is done and is (hopefully) unlikely to be repeated. We have a process to debate notability and delete items. We do not condone repurposing. Bovlb (talk) 18:33, 16 November 2023 (UTC)
- clearly they were not intended to be used, nor valuable - many have already been replaced with forwards e.g. Q123456778 OddlyAngled (talk) 18:01, 16 November 2023 (UTC)
- The is the wrong venue for a deletion discussion. I’m not so sure about that. WD:RFD is generally used for notability discussions, not for other reasons for deletion like repurposing. It doesn’t even seem to work for “deletion because of sorted-out conflation requests” … --Emu (talk) 18:49, 16 November 2023 (UTC)
- We don't normally delete items for repurposing. Are you saying this entity is notable, but we should delete the item anyway? Bovlb (talk) 19:17, 16 November 2023 (UTC)
- We do when repurposing creates a situation similar to conflation, i. e. when it’s not really clear what the item is about. Granted, this doesn’t happen often and generally repurposing is just quietly reverted unless much time has passed. But I do think this situation is comparable although in a fraction of the usual time scale. --Emu (talk) 21:34, 16 November 2023 (UTC)
- @Bovlb: We don't normally delete items for repurposing. Yes, we do. There were about 200 items deleted earlier this year as those items were being repurposed. 132.234.228.214 02:31, 17 November 2023 (UTC)
- We don't normally delete items for repurposing. Are you saying this entity is notable, but we should delete the item anyway? Bovlb (talk) 19:17, 16 November 2023 (UTC)
- it looked like there were quite a lot of empty, otherwise useless, items created in an attempt to id squat. then they all sat empty for quite a while. it wasn't a single item. OddlyAngled (talk) 17:36, 16 November 2023 (UTC)
- My take (as I said in my previous message) was that the item was empty before being a mountain pass: there were no statements at all, no labels. The English description does not count in my book. Jean-Fred (talk) 16:12, 16 November 2023 (UTC)
I handled Wikidata:Requests_for_deletions#Q123456789 and restored the original usage. Multichill (talk) 21:23, 16 November 2023 (UTC)
- In the interests of clear communication, it appears that Multichill has restored the second usage, the historical mountain pass, and not the original usage which, while technically empty, was nevertheless clearly inconsistent with that. Bovlb (talk) 22:02, 16 November 2023 (UTC)
- Please don’t be mad, I now restored the original non-meaning of the item. --Emu (talk) 22:22, 16 November 2023 (UTC)
- And I have restored the last good version, which is the one about which this discussion was opened. It's not for anyone - admin or otherwise - to pre-empt this community discussion. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:36, 16 November 2023 (UTC)
- Please don’t be mad, I now restored the original non-meaning of the item. --Emu (talk) 22:22, 16 November 2023 (UTC)
What a train wreck of a situation. We are now wasting our energy by having lengthy, totally avoidable discussions, and we are close to an edit war situation regarding "the right state" of the item in question. Just because one user couldn't leave a particular Q-ID alone by toying around with Wikidata—and accidentally tanking their own reputation. Folks, leave these "special" Q-IDs alone; this is not a playground, and it only causes trouble for everyone. Since we cannot agree what that Q-ID refers to anyways, it would be the best way out of this mess to delete the item page as a conflation. —MisterSynergy (talk) 23:19, 16 November 2023 (UTC)
- The door seems wide open for that solution: Special:Diff/2012596665 --Emu (talk) 23:52, 16 November 2023 (UTC)
- That seems like the best way forward.
- I note that Multichill has now restored the item to his preferred version and fully protected it in that version. Bovlb (talk) 05:56, 17 November 2023 (UTC)
- Disgraceful abuse of admin privileges. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:01, 18 November 2023 (UTC)
- At your service! It's always nice to have a confirmation I took the right decision. Multichill (talk) 18:52, 19 November 2023 (UTC)
- I’m not so sure about that, to be honest. I was about to unprotect the item but I didn’t want to start a wheel war. --Emu (talk) 21:26, 19 November 2023 (UTC)
- At your service! It's always nice to have a confirmation I took the right decision. Multichill (talk) 18:52, 19 November 2023 (UTC)
- Disgraceful abuse of admin privileges. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:01, 18 November 2023 (UTC)
- May I suggest to delete the item with the log message "This item caused more problems than I could count"? Thus the item creator gets to have a funny non-item. Infrastruktur (talk) 22:07, 18 November 2023 (UTC)
- simply "conflation" is enough, because this is exactly what the problem is here. no need to further the drama with emotional edit summaries for administrative actions. ---MisterSynergy (talk) MisterSynergy (talk) 22:26, 18 November 2023 (UTC)
Time for a useless comment, sorry. I also cared about that special ID and dreamed to 'catch' it, but in order to succeed this I should've been at my work PC until the very late which was not worth it. After all, I created Head Count (Q123456889) ('count', lol) making just one digit failed. That's it.
If to be serious, the situation with the edit history clearly assumes the conflict of purposing, which leads to deletion of the item. I doubt that the community is going to use every possibility to keep it, like removal of revisions, or such. It's not a catastrophe, just both involved users should draw certain conclusions. --Wolverène (talk) 06:02, 17 November 2023 (UTC)
- None of the reasons for WD:REVDEL seem to be met --Emu (talk) 08:38, 17 November 2023 (UTC)
- I should point out that, because it was deleted for procedural reasons, we now have a redlink on the front page. DS (talk) 01:19, 22 November 2023 (UTC)
Merge requested
Hi, I frequently come across pairs of Wikidata items that are obviously describing the same item/taxon/concept etc. Today, I added the merge tool to my user profile to try to merge an example pair (Q1191360 and Q25796510) and it failed. Why did it fail and can these items be merged? Loopy30 (talk) 21:26, 16 November 2023 (UTC)
- You can't merge items that both have sitelinks to the same project. Sjoerd de Bruin (talk) 22:10, 16 November 2023 (UTC)
- Wouldn't you want to though? Especially those items that are "in use"? What is the way ahead to resolve this? Should duplicate sitelinks be deleted so that the Wikidata items can be merged or is it OK to just accept that items are sometimes duplicated and just move on from there? Loopy30 (talk) 22:21, 16 November 2023 (UTC)
- I'm not an expert, but it appears to be both consensus and practice here not to merge taxons that have distinct formal names, regardless of any current scientific views about their distinctness. (We should probably have a policy document to point to regarding this.) You might consider using permanent duplicated item (P2959) or said to be the same as (P460) to link the two items. Bovlb (talk) 22:41, 16 November 2023 (UTC)
- Thanks Bovib, I'm not looking at synonyms or items with distinct formal names though. Many times they are identical (as in the example given above), other times they are just different ways to describe the same concept (eg. Foo is a genus of beetles, and Foo is a genus of insects). Sometimes there are multiple statements and identifiers on one ID, and none on the other (or just a Google Knowledge graph ID#). Loopy30 (talk) 22:51, 16 November 2023 (UTC)
- Again, I am not an expert in this field, but notwithstanding the current English label, Hippopotamus lemerlei (Q1191360) appears to be about "Hippopotamus lemerlei", while Malagasy hippopotamus (Q25796510) is about "Malagasy hippopotamus". The ENWP article explains that "Lemerle's dwarf hippopotamus (Hippopotamus lemerlei) is an extinct species of Malagasy hippopotamus." Bovlb (talk) 22:58, 16 November 2023 (UTC)
- Ah, you are correct. Perhaps the problem is that Q1191360 should be relabeled to Hippopotamus lemerlei to prevent any further ambiguity. 'Cheers, Loopy30 (talk) 23:32, 16 November 2023 (UTC)
- The EnWiki article for Malagasy hippopotamus (Q25796510) suggests that it's a term that means multiple species, Hippopotamus lemerlei, Hippopotamus laloumena and Hippopotamus madagascariensis. That's also in line with the it being organisms known by a particular common name (Q55983715). ChristianKl ❪✉❫ 14:37, 19 November 2023 (UTC)
- Yes, Bovlb has already pointed out that it was a poor example and it has now been changed. However, what about my original query for terms that are identical (eg. Foo is a genus of beetles, and Foo is a genus of insects), each with its own separate separate Wikidata entry? I will go back and look for an actual example here. Loopy30 (talk) 21:45, 22 November 2023 (UTC)
- New example: Murid betaherpesvirus 3 (Q70641768) and Murid betaherpesvirus 3 (Q85787101). How do I merge (or ask to merge) these items? Loopy30 (talk) 00:40, 23 November 2023 (UTC)
- @Loopy30 See Help:Merge. JAn Dudík (talk) 09:07, 23 November 2023 (UTC)
- Thanks JAn, the automated merge tool worked on this pair (it didn't on the original example I gave, for the reason mentioned by Sjoerd de Bruin) so I will try to use this in the future as I come across matching pairs of Wikidata items. Loopy30 (talk) 13:28, 23 November 2023 (UTC)
- @Loopy30 See Help:Merge. JAn Dudík (talk) 09:07, 23 November 2023 (UTC)
- New example: Murid betaherpesvirus 3 (Q70641768) and Murid betaherpesvirus 3 (Q85787101). How do I merge (or ask to merge) these items? Loopy30 (talk) 00:40, 23 November 2023 (UTC)
- Yes, Bovlb has already pointed out that it was a poor example and it has now been changed. However, what about my original query for terms that are identical (eg. Foo is a genus of beetles, and Foo is a genus of insects), each with its own separate separate Wikidata entry? I will go back and look for an actual example here. Loopy30 (talk) 21:45, 22 November 2023 (UTC)
- Again, I am not an expert in this field, but notwithstanding the current English label, Hippopotamus lemerlei (Q1191360) appears to be about "Hippopotamus lemerlei", while Malagasy hippopotamus (Q25796510) is about "Malagasy hippopotamus". The ENWP article explains that "Lemerle's dwarf hippopotamus (Hippopotamus lemerlei) is an extinct species of Malagasy hippopotamus." Bovlb (talk) 22:58, 16 November 2023 (UTC)
- Thanks Bovib, I'm not looking at synonyms or items with distinct formal names though. Many times they are identical (as in the example given above), other times they are just different ways to describe the same concept (eg. Foo is a genus of beetles, and Foo is a genus of insects). Sometimes there are multiple statements and identifiers on one ID, and none on the other (or just a Google Knowledge graph ID#). Loopy30 (talk) 22:51, 16 November 2023 (UTC)
- I'm not an expert, but it appears to be both consensus and practice here not to merge taxons that have distinct formal names, regardless of any current scientific views about their distinctness. (We should probably have a policy document to point to regarding this.) You might consider using permanent duplicated item (P2959) or said to be the same as (P460) to link the two items. Bovlb (talk) 22:41, 16 November 2023 (UTC)
- Wouldn't you want to though? Especially those items that are "in use"? What is the way ahead to resolve this? Should duplicate sitelinks be deleted so that the Wikidata items can be merged or is it OK to just accept that items are sometimes duplicated and just move on from there? Loopy30 (talk) 22:21, 16 November 2023 (UTC)
Bypassing the Global Blacklist
Could someone explain to me why Reinheitsgebot have the ability to bypass the global blacklist? Trade (talk) 00:33, 19 November 2023 (UTC)
- @Trade per Special:ListGroupRights#bot, bots have the
sboverride
right, so they can bypass the spam blacklists. —Mdaniels5757 (talk • contribs) 01:25, 19 November 2023 (UTC)- Is that really a good idea? Trade (talk) 01:37, 19 November 2023 (UTC)
- Presumably it's done for efficiency reasons. The global blacklist is 14500 entries long, and the bots often do high-volume edits. Infrastruktur (talk) 07:19, 19 November 2023 (UTC)
- It is a quite new feature. See phab:T313107 for the reasoning.
- It could backfire with issues like phab:T350480, but when phab:T337431 and phab:T349261 are done, we can set up a guideline for bots that they must not import links from the blacklist. --Matěj Suchánek (talk) 16:25, 20 November 2023 (UTC)
Update Kazakh language (kk) translations
Hello, please update:
'kk': 'улгі Уикимедиа'
to'kk': 'Уикимедиа үлгісі'
Thanks in advance. Ұлы Тұран (talk) 10:22, 19 November 2023 (UTC)
- @Ұлы Тұран: Where? --Matěj Suchánek (talk) 14:58, 20 November 2023 (UTC)
- Description of Wikidata templates
'kk': 'улгі Уикимедиа' to 'kk': 'Уикимедиа үлгісі'
needs to be changed. Ұлы Тұран (talk) 19:06, 21 November 2023 (UTC)
- Description of Wikidata templates
WikiHooku - An application to visualize Wikipedia content
Hi, I'm developing a tool that aims to compare entries from the Wikipedia. It is at a very early stage and it can only compare people lifetime so far. I'm trying to find a place to give visibility to it so people can find it and use. This will help to improve the tool and more ideas may come to make it grow in features. I've found this page: https://www.wikidata.org/wiki/Wikidata:Tools/Visualize_data that is related to Wikidata tools but I'm not sure if mine is more related to Wikidata or Wikipedia. In case it is more related to Wikipedia is there any place to publicize it?
The tool is called WikiHooku and can be found here: https://www.wikihooku.org/
It uses the Wikipedia API: https://www.wikipedia.org/w/api.php
Can anybody please bring me some light?
Thank you very much in advance. Xcarol (talk) 21:30, 19 November 2023 (UTC)
Removal of pre-existing (English) names
Hi all, @Titus Gold, has recently removed the previous labels and even any trace of the pre-existing (usually English) names for multiple wikidata items relating to Wales, and replacing it with the Welsh ones, almost universally, even under English. Per Help:Label (although seems a proposal/draft?), it should be the "common name", especially for the English label, but they seem to be basing it on either recent events on "official Welsh names being only used from now" (but not necessarily the common name) or just replacing the English name entirely with the Welsh on even settlements. Since reverted them at Red Wharf Bay (Q3405755), Bala Lake (Q1335466) and Bull Bay (Q4996547), but it may apply to many more entries or grow. The previous names weren't even added as aliases, so now impossible to search with the previous name. I would also note they also seem to be moving a lot of articles on various (non-English) Wikis.
Ping two Wales-based Wikidata editors who may be more accustomed to the issues, @Ham II and @Jason.nlw. Note: I am not opposed to using Welsh names overall, and support such if they become the common name, but not yet, surely? Any help appreciated. Apologies if this is the incorrect place. Even if a switch of the names is supported, surely keeping the old in alias would be helpful to users? DankJae (talk) 23:16, 19 November 2023 (UTC)
- @DankJae: I've added
{{Q}}
templates to your first paragraph for ease of reading; I hope you don't mind. Ham II (talk) 06:13, 20 November 2023 (UTC)
- @DankJae: I've added
- I have some sympathy for the English label for Bala Lake (Q1335466) being Llyn Tegid with alias Lake Bala, but its certainly wrong for the British English to only be the former. Vicarage (talk) 23:31, 19 November 2023 (UTC)
- Yes I'm not aware of any current finished guidance on this on wikidata.
- The reason for the lakes and waterfalls in Eryri (Snowdonia) change was that Welsh names are now the official names for lakes there as was recently in the news and also now seem to be commonly used.
- For any other pages not in Eryri, the official English name should be left under "British English" e.g I left Bull Bay as it was for "British English". (Thanks to DankJae for correcting Traeth Coch/Red Wharf Bay. Apologies for any similar mistakes.).
- I agree we should be using Welsh names overall. I also think we should be using the Welsh names only, for lakes and waterfalls in Eryri (Snowdonia) on Wikidata as they are now the only official names and seem to be more commonly used also.
- Of course, English names should remain the first name on Wikidata for places where bilingual names are both official (outside of Eryri). I would also support keeping English names as aliases if we are to establish a precedent.
- Thanks Titus Gold (talk) 02:52, 20 November 2023 (UTC)
- "British English" is largely still English? I doubt "English" refers to the "correct" English preferred by some locals in Wales. Or Turkey (Q43) should be Türkiye, in all languages?
- Additionally, considering on entries relating to Anglesey (Q168159), you changed the descriptions for many places from "in Anglesey" to "in Ynys Mon (constituency)"[2], also became concerning.
- While you state "official names", you also changed St Asaph (Q548248) to Llanelwy?[3] The official name for that is clearly St Asaph right now, nothing to do with recent decisons on lakes in other parts of Wales. You even intentionally changed the [4] Encyclopædia Britannica Online ID (P1417) of that entry to Llanelwy meaning it is no longer working as EB uses Saint Asaph NOT Llanelwy. So its clear this "lakes and waterfalls" is not your only reason, but part of a wider initative for English names to be removed, and it wasn't a "mistake" until I pointed it out, because you repeated it multiple times.[5] I am even fine with using Llyn Tegid etc in English labels (not sure on other languages), alongside an alias, but removing not only the name but anything that references it seems not only tedious but breaking a lot more things.
- Plus surely the recent "official names" event largely only applies to English? Not the dozens of other languages where the English name or its derivatives were removed. Otherwise the Welsh label for Ivory Coast (Q1008) should be Cote d'Ivoire and not this "Arfordir Ifori". DankJae (talk) 11:09, 20 November 2023 (UTC)
- @Titus Gold, while you argue "official names" for lakes and waterfalls, the official name for the village of Bull Bay (Q4996547) is Bull Bay not Porth Llechog, with you removing the offical name claim for Bull Bay in this edit. While you say it was a "mistake" this clearly seems an intentional edit because of how precise it needed to be. DankJae (talk) 11:20, 20 November 2023 (UTC)
- I believe that old labels are supposed to be kept as aliases as a rule anyway, though personally I wouldn't replicate a former label as an alias if I were correcting an error like a spelling mistake. These places should still be searchable by the English names which have been discredited in official usage, but that can be achieved by having, e.g., Portmadoc (which was dropped long ago) as an alias in a single language (i.e. English) in Porthmadog (Q950671). Then I would expect languages which aren't Welsh or English to use the Welsh names only where that's now the official preference. Ham II (talk) 06:13, 20 November 2023 (UTC)
- @Ham II, I fully agree with the fact that the old name should be an alias, which is why I raised it here. TG intentionally removed them, and on Identifiers that used the "old" name. Porthmadog was an accepted change even on en.Wikipedia, so makes sense for that now to be the label as the clear commonname. While there is the case for other languages to use the Welsh names, is there proof that they so far have? They may, but doubt a few days after the lakes announcement all collectively decided to drop the English name or the derivatives like Lago di Bala. DankJae (talk) 11:17, 20 November 2023 (UTC)
- The Eryri Nationa Park Authority have decided to use the Welsh names only, in the Eryri National Park. This is to encourage preservation and use of the Welsh names. This doesn't mean the English language names aren't the common name, or that the English language names have ceased to be used. The Welsh language names certainly aren't now 'English language' names. Sionk (talk) 12:47, 20 November 2023 (UTC)
- Yep, but another concern is TG has been changing the names in foreign languages into their Welsh names (even in Hebrew and Arabic) over the English ones. Surely and unfortunately most media relating to Wales is in English and would use the English version, (but it could change, but just not on a random day in November 2023 unannounced). Doubt Indonesians use the Welsh name Wrecsam over Wrexham as TG claimed here, especially with the TV show using Wrexham. DankJae (talk) 12:56, 20 November 2023 (UTC)
- Yes applied Porth Llechog to all languages except "British English" where I left Bull Bay. Same goes for Wrexham for languages except "British English" where I did not apply it. I'm not aware of an Indonesian spelling for Wrexham, so just applied Welsh spelling. I agree with Sionk that official "British English" names should be left as they are. For other languages other than Welsh or English they seem to mostly just copy the English wikipedia so I thought it would be better to apply the Welsh spelling/name. I didn't think that common name would apply in e.g Indonesian languages since there is no Indonesian spelling for Wrexham and again, the English wikipedia is usually copied.
- Again, I'm happy to conform immediately to any precedent that is set and make any corrections based on this. Titus Gold (talk) 13:34, 20 November 2023 (UTC)
- Why split off British English? That largely is the same as English? "English" here does not mean "Welsh English" or the English preferred by Welsh-speakers, but the common term in English overall, not just locally correct. I'm not aware of an Indonesian spelling for Wrexham, so just applied Welsh spelling, if you're not aware then don't change it, leave it to actual Indonesians if they get to it. You do not seem to actually agree with Sionk, who is referring to English overall, not the new spin off of "British English" where you demoted the "anglicised name". English Wikipedia is unfortunately the most copied because most other languages will more likely refer to English sources not the Welsh ones, so you're clearly trying to write the great wrong of place-names and promoting Welsh place-names as much as you can. Individually, each language may start to use Welsh names at some point, but you have blanketly applied it to all of them. Both spellings are used, English and Welsh, the most common should take precedent in that language, if sources in that language use the Welsh one more then yes, they should use that, but you provided no evidence, and it is likely they (unfortunately) use the English spelling because of how much more used it is because of how many more speak English. DankJae (talk) 14:07, 20 November 2023 (UTC)
- Here is two indonesian sources using "Wrexham" not "Wrecsam". [6] [7] DankJae (talk) 14:10, 20 November 2023 (UTC)
- So taking Bala Lake as an example, perhaps someone could clarify for me the purpose of the 21 entries of language labels, where half of them just default to the Welsh word with no evidence. It seems to me, but I may be missing something, that if a language doesn't have a word used in that language for the lake, that there should not be an entry at all. Why tell me that the Dutch for Bala Lake is Llyn Tegid, when that is clearly not a Dutch word, and will not be pronounced correctly in Dutch? Because it is not just the "Ll" that is a problem there. In Dutch, Tegid is not going to be pronounced correctly unless a Dutch person applies English orthography rules (which they are much more likely to know than Welsh). In Dutch, that g is a different sound completely.
- And do the Dutch have a word for Bala lake? Nope. Here's a source (an older one) that just calls it Bala-meer.[8] which is Bala Lake, using the English word. Here is a newer interesting one: [9]. It is interesting because it has "Llyn Tegid or in English, Bala lake." But that is not a vote for Llyn Tegid. The piece is telling you the Welsh and English names, but then says "Wandel het Bala meer rond" (walk around Bala lake), so there is no Dutch word. Incidentally the Dutch page on the lake is translated from German. So to my original question: shouldn't the Dutch word be blank? It doesn't exist. If it does exist, it is commonly Bala-meer, but that is just a borrow word. I don't see the benefit in specifying anything here. What am I missing? Sirfurboy (talk) 14:01, 26 November 2023 (UTC)
- Here is two indonesian sources using "Wrexham" not "Wrecsam". [6] [7] DankJae (talk) 14:10, 20 November 2023 (UTC)
- Why split off British English? That largely is the same as English? "English" here does not mean "Welsh English" or the English preferred by Welsh-speakers, but the common term in English overall, not just locally correct. I'm not aware of an Indonesian spelling for Wrexham, so just applied Welsh spelling, if you're not aware then don't change it, leave it to actual Indonesians if they get to it. You do not seem to actually agree with Sionk, who is referring to English overall, not the new spin off of "British English" where you demoted the "anglicised name". English Wikipedia is unfortunately the most copied because most other languages will more likely refer to English sources not the Welsh ones, so you're clearly trying to write the great wrong of place-names and promoting Welsh place-names as much as you can. Individually, each language may start to use Welsh names at some point, but you have blanketly applied it to all of them. Both spellings are used, English and Welsh, the most common should take precedent in that language, if sources in that language use the Welsh one more then yes, they should use that, but you provided no evidence, and it is likely they (unfortunately) use the English spelling because of how much more used it is because of how many more speak English. DankJae (talk) 14:07, 20 November 2023 (UTC)
- Yep, but another concern is TG has been changing the names in foreign languages into their Welsh names (even in Hebrew and Arabic) over the English ones. Surely and unfortunately most media relating to Wales is in English and would use the English version, (but it could change, but just not on a random day in November 2023 unannounced). Doubt Indonesians use the Welsh name Wrecsam over Wrexham as TG claimed here, especially with the TV show using Wrexham. DankJae (talk) 12:56, 20 November 2023 (UTC)
Emojis on Wiktionary
I see no obvious way to get sitelinks through w:en:Module:Wd, but maybe I don't need them. I want to insert a link to Wiktionary if Wiktionary has an entry for a given emoji, see for example the fire emoji on Wikipedia. (which currently specifies the Wiktionary entry exists through a template parameter. Yuck)
I see two ways to accomplish this on Wikidata: either add a sitelink to Wiktionary (but we usually don't do that, do we?) and figure out how to obtain that sitelink through w:en:Module:Wd, or add an identifier like Wikidata already does for non-WMF projects like GlyphWiki and Emojipedia. I feel an identifier may be a better fit in this case? (and it would keep my template code simpler) — Alexis Jazz (talk or ping me) 23:54, 19 November 2023 (UTC)
- See also phab:T163734. GZWDer (talk) 10:50, 20 November 2023 (UTC)
- Hello @Alexis Jazz, GZWDer:
- I have 3 remarks:
- If you add the sitelink in 🔥 (Q87581001) towards wikt:en:🔥, it will stay, but you will receive a warning by the abuse filter 97.
- If you add the sitelink, the English wikipedia article will display the interwiki link towards the English wiktionary entry, but the English wiktionary entry won't show the interwiki link towards the English wikipedia article. mw:Extension:Cognate prevents it, cf. example in #Why some projects do not show some interlanguage links?.
- Instead of w:en:Module:Wd, perhaps you should look at w:en:Template:Interwiki extra. But if you add the sitelink in Wikidata, the English wikipedia article won't need this template. And this template doesn't exist in the English wiktionary (and even if it does, I suppose mw:Extension:Cognate will prevent it to work).
- Regards --NicoScribe (talk) 11:17, 20 November 2023 (UTC)
- NicoScribe,
1. Okay. I've created a request to make a property. I probably did it wrong but we'll see.
2. I don't care whether interwiki sitelinks are displayed or not, I was only interested in querying them from a template to show in the article body.
3. That template doesn't seem to do anything, but even if it did I don't think it would work. The whole idea was to query Wikidata to find links to other resources without having to enter template parameters. — Alexis Jazz (talk or ping me) 20:56, 20 November 2023 (UTC)- @Alexis Jazz: great. Now I understand your need (but can't help you). Sorry for my first answer, off-topic. --NicoScribe (talk) 20:59, 22 November 2023 (UTC)
- NicoScribe,
Property for judicial clerkships
I'd like to propose a property to express the relationship between a law clerk (Q883231) and the judge that they clerked for (e.g. Amy Coney Barrett (Q29863844) clerked for Antonin Scalia (Q11156)). I'm familiar with the US system but recognize that the systems for judicial clerkships vary by country. Before submitting a proposal, I'm interested in feedback on how such a property could be implemented without creating confusion. gobonobo + c 14:09, 20 November 2023 (UTC)
- In regards to your proposal, I could see a modelling using existing properties. Use supervised by (P7604) of Antonin Scalia (Q11156) and a qualifier subject has role (P2868) of law clerk (Q883231) (and possibly object of statement has role (P3831) of Associate Justice of the Supreme Court of the United States (Q11144) and court (P4884) of Supreme Court of the United States (Q11201)). Additional qualifiers would include start time and end time.
- I understand that modelling might be a little too intensive and a new property likely makes sense. If there was a proposal of a "clerked for" property, I could see myself supporting it.
- For information about properties and how to create them see Help:Properties and Wikidata:Property proposal. Another place to have further discussions might be the talk page of Wikidata:WikiProject Law. -- William Graham (talk) 19:26, 20 November 2023 (UTC)
Wikidata weekly summary #603
- Discussions
- Closed request for adminship: Congratulations to our new Admin! S8321414 - (See the closed request)
- New requests for permissions/Bot: KormiSKbot - (Task: Linking newly created pages on Slovak Wikipedia to the appropriate Wikidata items)
- Events
- Upcoming:
- Data Modelling Days, from November 30th to December 2nd: 3 days of online events to address data modelling challenges, discuss how to improve the way we structure data together, and discover the point of view of external reusers. Feel free to have a look at the program (under construction) and to sign up as a participant.
- Wikidata Lab XXXIX: Structuring the Wikimedia Ecosystem presented by Wiki Movimento Brasil. November 21 at 2:00 PM CEST. The presentation will be held in English by the wikimedian Mike Peel.
- Linked Data for Libraries LD4 Wikidata Affinity Group Working Hour November 20th, 2023: Over the summer and into the fall the LD4 Wikidata Affinity Group will be offering a series of Wikidata Working Hours to give folks an opportunity to try out various Wikidata-related skills and tools by assembling a data set of diverse library and information science (LIS) materials (articles, conference proceedings, books) and adding it to Wikidata. Wikidata Working Hours provide hands-on Wikidata experience in a supportive space. We hope you will join us if you are interested in learning more about Wikidata, exploring LIS literature, and have been looking for a fun Wikidata project to contribute to. The seventh Wikidata Working Hour will cover the Author Disambiguator tool, which helps users assign authors to articles.During the session we will demonstrate how to use the tool on an author who was created during a previous working hour, and another who doesn't exist in Wikidata yet. After the demonstration, participants are encouraged to try the tool themselves during the rest of the working hour. This session will build on the work done in previous Working Hours by connecting authors to the articles they have written. This session will be recorded and the recording shared on the event page
- Wikibase for art and cultural data (German) - #9 of kuwiki tips & tools, taking place Thursday, 23 November 2023, 19-20.30
- Ongoing:
- Weekly Lexeme Challenge #118: Diseases (Challenge started on 2023-11-20 12:01:41)
- Past:
- ItWikiCon '23 (Italian) was hosted in Bari, Italy between the 17th - 19th November. Check the Programme for details on sessions and check for recordings or slidedecks of presentations.
- GLAM Wiki 2023 took place in Montevideo, Uruguay. There were several Wikidata-related sessions some of which are linked in the Videos section.
- Upcoming:
- Press, articles, blog posts, videos
- Blogs
- Papers
- Can you trust Wikidata? - is a paper exploring Wikidata's veracity and trustability for providing values to Knowledge Graphs. Written by V. Santos et al.
- Videos
- Wikidata for Cultural Heritage, available in Spanish, Portugese, English
- Find-A-Grave of Swedish politicians in Magnus Sälgö's exploration of Wikidata, OpenRefine, SPARQL and Svenskagravar (Swedish).
- Wikidata Live Editing #111 with the Wikipedia Weekly Network, hosted by Ainali and Abbe98
- Wikidata for Wiki Loves Monuments w/ Content Partnerships presented by Hub Wikimedia Sweden & Wikimedia Uganda. Get essential Wikidata-editing skills for Listeria, OpenRefine, run queries and structured data on Commons.
- How to upload collections to Wikimedia Commons and Wikidata using Open Refine (in Portuguese). Training on uploading collections to Wikimedia Commons and Wikidata using Open Refine.
- Cohort 2 Graduation: AfLIA Wikidata Online Course. The session includes exciting testimonials from participants and goodwill messages.
- Enhancing Factuality in Large Language Models with Wikidata's 12B Facts. This HackerNews video discusses a paper that explores the use of Wikidata, which contains over 12 billion facts, to improve the factuality of large language models (LLMs)
- Back to basics and #SPARQL #Wikidata (in French). Using Wikidata SPARQL query by VIGNERON
- Wikidata for cultural heritage (ES) - GLAM Wiki Conference
- Presentations: "Wikidata Lexemes: Introduction to the Possibilities" - workshop on at WikiConference North America, by User:Mahir256
- Notebooks:
- The End of an Era - A study on the major deaths that have occurred in our generation and the people that were left behind. (1997-2012)
- Relationship Between Senators and Political Parties - A sample network graph that depicts the political parties of Senators.
- Family Network of Female Horse - A sample network graph that depicts the relationships of female horses.
- Poets and the Monarchs they were appointed by
- Metro stations of The Metropolitan line (Q19891) of the London Underground Metro System - A sample network graph that depicts the stops and adjacent stops of metro stations of the London Underground's Metropolitan line.
- Tool of the week
- User-level gender statistics for Wikipedia - a tool that computes the number of articles created by gender has been repaired after some months of unavailability. It relies on xtools and P21 property.
- Luthor - tool for finding usage examples from Wikisource and adding them to lexemes on Wikidata.
- Other Noteworthy Stuff
- Did you know?
- Newest properties:
- General datatypes: none
- External identifiers: Fondazione Fiera ID, EIDR party ID, Gaming Wiki Network article ID, Cultural Heritage Online (Japan) heritage ID, Cultural Heritage Online (Japan) institution ID, Playdate community wiki ID, Danacode (short), Danacode (long), archINFORM ID (awards), shukach.com ID, BNE periodical SID, Moviepilot.de person ID, FilmAffinity person ID, Archiefpunt archive ID, Archiefpunt compiler ID, Archiefpunt curator ID, Capitolium Art artist ID, BioGRID ID, la Repubblica TV series ID, CECC Political Prisoner ID, One Earth ecoregion ID, Spectrum Computing ID, IDVT, BISAC Subject Heading
- New property proposals to review:
- General datatypes:
- Papers with Code URL (URL for subject in Papers with Code system)
- PAEnflowered taxon URL (URL for a plant taxon found in Pennsylvania on the PAEnflowered website)
- BioCyc ID (Pathway/Genome Databases) ()
- External identifiers: Star Citizen Tools Wiki ID, Alaska Women's Hall of Fame ID, Shanghai Library organization ID, Shanghai Library era ID, Shanghai Library surname ID, Paradox wikis article ID, Shanghai Library movie ID, Internet Dictionary of Polish Surnames ID, Thesaurus Linguae Aegyptiae lemma ID, Thesaurus Linguae Aegyptiae object ID, Rare Species Guide ID, TLA thesaurus ID, Minnesota Plant List ID, ELMCIP person ID, Flora of the Southeastern United States ID, SWERIK Person ID, Digital Atlas of the Virginia Flora ID, Go Botany taxon ID, Team Wales ID, Unified Saudi Occupational Classification, Pinakes work ID, Search System of Japanese Red Data ID, ELMCIP organization ID, Japan Search ID
- General datatypes:
- Query examples:
- Articles on English Wikipedia tagged 'Water pollution' with no equivalent article in Spanish (source)
- Highest point (in meters) per counties in Finland (source)
- Week 46, 2023: Top album languages found on Wikidata right now (source) "Basque are back on top for the second time this month"
- Newest WikiProjects:
- WikiProject Manuscripts - This WikiProject coordinates efforts on Wikidata to gather and curate structured data on manuscripts.
- WikiProject Grove Hall Black Women Lead - aims to shed light on the lives and stories of Black women leaders who have shaped Boston’s history from the colonial era to the present day.
- Newest database reports: User:Pasleim/projectmerge/enwiki-svwiki - 3875 merge candidates in English Wikipedia and Swedish Wikipedia based on same sitelink name.
- Showcase Items: University of Konstanz (Q835440) - University in Konstanz, Germany (feel free to suggest the next one for next week)
- Showcase Lexemes: rød - Danish word for red with features in many compounds and derivations (feel free to suggest the next one for next week)
- Newest properties:
- Development
- We are taking steps towards making many more languages available in Lexemes and monolingual text statement values. (phab:T341409)
- Nikki fixed a bug where the CSS class for a statement rank wasn't updated after a rank change (phab:T209138)
- We are continuing the work on improving EntitySchemas by making it possible to link to them in statements.
- We are migrating several tools from the Wikit design system to the Codex design system to be able to deprecate Wikit in the future.
You can see all open tickets related to Wikidata here. If you want to help, you can also have a look at the tasks needing a volunteer.
- Monthly Tasks
- Add labels, in your own language(s), for the new properties listed above.
- Comment on property proposals: all open proposals
- Contribute to a Showcase item.
- Help translate or proofread the interface and documentation pages, in your own language!
- Help merge identical items across Wikimedia projects.
- Help write the next summary!
Сreating a property without discussion
Hello.
I wanted to create a property that would display the figure skater's position in the International Skating Union ranking. I have done this. Q123438743
But after I created it, I realized that I should have put the creation of this property up for discussion first, which I didn't do. What should I do now? Can I use this property? Beonus (talk) 14:32, 21 November 2023 (UTC)
- Hello, this Wikidata item (Q16222597) isn't a Wikidata property (Q18616576) yet (and so can't be used as one) —if you think there is potential (need/demand/use) for a related property, you can make a proposal here, thanks, Maculosae tegmine lyncis (talk) 14:42, 21 November 2023 (UTC)
- As a comparandum, see the use of ranking (P1352) on Novak Djokovic (Q5812), Maculosae tegmine lyncis (talk) 14:48, 21 November 2023 (UTC)
- Thank you.
Entries with names such as أنتونيو توبانغو (correct form: António Topango)
Hi all, I've come across entries such as Cubango (Q12180777) (meant to be António Topango) and not sure what the correct way of dealing with it is -- looks like when Arabic entries are "transliterated", you end up with these gibberish names. Does one manually change each entry? Plaça de Maig (talk) 22:55, 21 November 2023 (UTC)
- I think this is more to do with the blocked user User:&beer&love, who made large numbers of poor automated edits, including many with scrambled characters. The user seemed to be poor at communication, so I can't tell if a root cause for the scrambled character errors was identified. Someone more experienced with queries than I am may be able to construct a query that finds these scrambled labels. A first step to rectifying would be to blank the scrambled entries. Editors with automated tools or manual edits can then work on insetting valid labels. From Hill To Shore (talk) 23:35, 21 November 2023 (UTC)
How to ask API for QID with Commons Category
Hello, I want to ask the API to search for the Commons category (P373) like "Albert Einstein". I only know the name of the Commons "Category:Albert Einstein" and I want to know the QID of Artikel Albert Einstein (Q937). If I know the QID, then it is easy API-Request. But how can I ask without the QID. Somthing like this Cirrussearch. --sk (talk) 23:22, 21 November 2023 (UTC)
- This must be possible since Hub can do it: https://hub.toolforge.org/P373:Albert%20Einstein?site=wikidata&format=json (but I don't know how). Vojtěch Dostál (talk) 11:58, 22 November 2023 (UTC)
- The Cirrussearch query can reproduced via API action=query. --Matěj Suchánek (talk) 09:18, 23 November 2023 (UTC)
- @Matěj Suchánek@Vojtěch Dostál: Great! Thanks for your help! --sk (talk) 19:58, 23 November 2023 (UTC)
Synonymous taxons?
Endopterygota (Q304358) and Holometabola (Q37140800). I gather there is some confusion here, and I may have accidentally added to it. en:Endopterygota was recently moved to en:Holometabola; similarly for Commons. I see that on Wikidata these two items have no formal relationship at all, although do note
. Anyway, someone with more experience about modeling taxons here should be able to sort this out correctly. I suspect the correct modeling is that one (probably Holometabola (Q37140800)) is the "real" taxon and the other is treated as some sort of deprecated synonym. - Jmabel (talk) 00:09, 22 November 2023 (UTC)
- Wikidata:WikiProject Taxonomy might be a better place to ask. --Jahl de Vautban (talk) 08:25, 22 November 2023 (UTC)
Requesting input on community wishlist item
I'm planning to introduce this proposal in the 2024 Community Wishlist Survey, which starts in January. I would greatly appreciate feedback before then.
(I sought input from Wikiproject property constraints but didn't get much.) Thanks! Swpb (talk) 16:33, 23 November 2023 (UTC)
Made from material constraint
I think there is a problem with the constraint in padding (Q47415676). The values of made from material (P186) seem correct, but constraint violations are reported. --2A02:8108:50BF:C694:F16C:4A6D:33BD:455E 20:40, 23 November 2023 (UTC)
Wikidata games (old) not working for most?
So I am running a class and wanted to demonstrate some Wikidata:Games. Thing is, old game page (https://wikidata-game.toolforge.org/) works for me - I can display it for my students - but for all of them, it's blank (the list of games does not load). The new page (https://wikidata-game.toolforge.org/distributed/) does work for everyone, but it does not have the simple games (gender, job, nationality) that again work for me but the same links provide blank screens for students. What's going on? My account is old and has a bunch of edits, students have new accounts - could the games be restricted from new accounts, perhaps? Hanyangprofessor2 (talk) 06:06, 24 November 2023 (UTC)
- The games haven't been updated in years, the project is too big nowadays that the way these were programmed doesn't work anymore. Sjoerd de Bruin (talk) 08:10, 24 November 2023 (UTC)
Coming up soon: Wikidata Data Modelling Days, online, November 30-December 2
Hello all,
If you are regularly involved in adding, organizing or reusing data from Wikidata, you certainly encountered some questions or issues related to data modelling: how to describe and structure information in a consistent way on Wikidata. This is a big topic for the community at large, and that's why we will address it together during a 3-days online event, the Data Modelling Days, that will take place next week, on November 30th, December 1st and 2nd.
During this online gathering, we will have lots of discussions on various topics that you can discover in the program: we will talk about Entity Schemas and how they can be useful to improve data quality and consistency on Wikidata, how to model heritage, gender, references or web fiction, the challenges encountered by people reusing Wikidata's data inside and outside the Wikimedia projects, how to model data on a fresh new Wikibase instance, and many other exciting topics.
Aside from attending sessions and joining the discussions, you can also join our Data Modelling Clinic sessions, where you can bring any topic you are working on, ask questions or ask the community for feedback or help. You will find these sessions on each day in the program.
The event is taking place online on the video conference platform Jitsi, it is free, no registration needed (although you are invited to add your name to the participants list). Most sessions will be recorded in video and have collaborative notes, and we will publish a list of outcomes and next steps for each session.
We are hoping to see a lot of you at the event!
If you have any questions, feel free to ask on the talk page or directly by writing to me. Best, Lea Lacroix (WMDE) (talk) 15:53, 24 November 2023 (UTC)
Mix'n'match (no preliminary matches)
Hello, why with this catalog, are there 0 preliminary matches? How do I get it to recheck and suggest some? Did I do something wrong when uploading? Thank you, Maculosae tegmine lyncis (talk) 16:52, 24 November 2023 (UTC)
synonym (Q42106) as class?
I have found that some 564 items have instance of (P31)synonym (Q42106), for example desire (Q775842).
Since synonymy is a relation between constructs in a language, not concepts (if two words or phrases are synonyms, they refer to the same concept), I consider this problematic. (In the example: Yes, desire may a synonym of longing in English, but I completely fail to understand what the notion of Q775842 is a synonym of Q16513670 is supposed to be. This might rather be a case for permanent duplicated item (P2959) or said to be the same as (P460), but there are other cases like the one I corrected earlier, which described a German word rather than the concept behind it.)
Thoughts? Opinions? --2A02:8108:50BF:C694:75FB:C6CC:43F8:CE91 20:50, 24 November 2023 (UTC)
- I agree Vojtěch Dostál (talk) 09:07, 25 November 2023 (UTC)
- If there are two articles for the same concept Wikimedia duplicated page (Q17362920) can be used unless there is consensus to keep separate pages or other reasons not to merge. I use P460 where they seem to be describing the same thing but I am not sure; P2959 is usually for pages with different writing systems in the same project. Peter James (talk) 09:50, 25 November 2023 (UTC)
person removing
There is user that removed everything at Eineik Kaddabeen (Q5349739), Khalas Sameht (Q6399268) and a few other items 115.188.140.167 21:53, 24 November 2023 (UTC)
- Probably an attempt to delete the items as either not notable or duplicates. Q5349739 and Q6399268 had no identifiers, no Arabic labels, and no sitelinks after the English articles were deleted. There were items with Arabic sitelinks for the same albums, so I merged them. Peter James (talk) 08:45, 25 November 2023 (UTC)
- Two more were duplicates (although one had an English sitelink that had not been moved to the new item). The reasons for the edits to Yama Alou (Q8047573) and Saharna Ya Leil (Q28717366) (where English sitelinks were removed from items that also had Arabic sitelinks) are unclear. Peter James (talk) 09:02, 25 November 2023 (UTC)
Can gender (Q48277) be a criterion(criterion used (P1013)) for sports?
We use "sex" in the title of mixed (Q1940854). biological sex (Q290) or sex or gender (Q18382802) look like more logical criterion in sport items. 대한민국 정치 (talk) 10:42, 25 November 2023 (UTC)
Policy on using deprecation for inactive social media accounts
The policy for using deprecation (or not) for inactive social media accounts seems a bit confusing right now. I've opened a thread over at Help talk:Ranking and would appreciate guidance from folks more experienced with the historical use of deprecation across Wikidata. Eloquence (talk) 00:34, 26 November 2023 (UTC)
Property opposite
We have related property, do we have "Property opposite of". Related property can be a long list, we should be able to see the opposite more readily. RAN (talk) 04:19, 26 November 2023 (UTC)
- inverse property (P1696), complementary property (P8882) and negates property (P11317).--GZWDer (talk) 06:16, 26 November 2023 (UTC)
Missing a label and ridiculous area unit to sort out
Hello and thanks to whomever takes the trouble of reading this and even more so to whom gives me the answer in an understandable language. Got this https://www.wikidata.org/wiki/Q5725076 with " No label defined (Q5725076) ". For now there is a page for it in es. wiki and I just made one in fr. wiki (now duly added to the page here). Before that could happen, tried to reach wkdata from the fr. page, got a blank wkdata page, and thanks to my complete distrust of both my ability to deal with wkdata and of wkdata itself (wayyy too complicated gibberish), decided to get to wkdata via the es. page. Thus found the page here linked (Q5725076). So ok so far so good i managed (just barely) to avoid creating a double. Now, looking for an answer to the simple question: "how to add a label" (so that others can find it more easily, etc), I erringly find this: https://www.wikidata.org/wiki/Help:QuickStatements#Adding_labels,_aliases,_descriptions_and_sitelinks which looks like it's supposed to give the answer but is definitely no help at all. See some of the "why" in next entry "Simple language".
Ok so there a missing label there and the question is: "How to add one".
Second point: the place has an area of 150 km2. This... "thing" here, does not understand that when I put "km2" it means "km2", not "m2" (?!). Plus, the menu option is lying: it does say "square kilometre" but gives it out in square meter(s) - with no warning, to top it all up. Result: the infobox says One hundred and fifty millions of square meters, which is so ridiculous that it's worth writing in words and in full. You'd think there would be an option for that km//m thingie, but no. And why not? Seems basic to me. (Also explains why I call it "gibberish".) So, is there a way to have it say km2 instead of m2, and if yes which is it?
Thank you for your answer/s. Pueblo89 (talk) 06:49, 26 November 2023 (UTC)
(P.S. please do ring me if you answer, i.e. add [[user:Pueblo89]] or similar so i 1) know there's an answer, 2) get a link to this page. Thanks.)
- Pueblo89 The Wikidata item shows it in km2. It is displayed in m2 in fr:Belic (Niquero) but that seems to be how the infobox converts it. I don't know if there are parameters that can be used in the infobox to change it; similar articles don't use the standard
infobox
, so a more specific infobox could be used -Infobox Localité
displays km2. Peter James (talk) 13:41, 26 November 2023 (UTC)
Simple language, i.e. How to write understandable explanations
Example in this: https://www.wikidata.org/wiki/Help:QuickStatements#Adding_labels,_aliases,_descriptions_and_sitelinks. It starts with:
"As with adding simple text statements, each command must consist of an item, a command, and a string in double quotes."
The way i understand "simple text statements", I don't see any such thing in these pages apart from what's added in the box "Language / Label / Description / Also known as".
That's the first words, which as you can see do not correspond to anything of the subject as far as i'm concerned, and the rest doesn't get any better. Honestly, wtf is all that coded language supposed to mean?!! Ex.: I know off-hand 4 definitions of what a tab is, but clearly none of them fits what is meant there so what does "tab" mean in this context? I won't even mention what "string" and "extra-large size upside-down" - as in "Lxx" - suggest when one is so obscenely lost that even those come to mind. Yes I can see that 'L' is for 'Language', the 'A' for 'Alias' a.s.o.; but even assuming that the 'tab' question is solved - which it is not -, where are we supposed to add that? etc etc etc.
It really looks like these are written by people who have lost touch with "ground zero approach". Understandable, but not helpful - despite all the good will that clearly transpires too.
There's only one solution: have it proof-read by people who ARE at ground zero (= none of yous, obviously) and are willing to spend the time it takes to make you see what is not clear enough.
Yes i shall (only English and French). If asked. Pleasure, be happy to, etc. That's what I'm asking so now it's your turn to do the asking; will you be willing to do it, is the question.
Best wishes to you all. Pueblo89 (talk) 06:49, 26 November 2023 (UTC)