Logo of Wikidata Welcome to Wikidata, Iamcarbon!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, don't hesitate to ask on Project chat. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards!

Notability

edit

Thank you for contributing to Wikidata. I see that you recently created an item that does not clearly indicate its notability. The Wikidata project only accepts items that meet its notability criteria, and your item is therefore likely to be deleted soon. In brief, items must have an associated Wikipedia article, must be needed for statements on another notable item, or must have both identifiers and serious sources. For the last case, a good indication of notability would be multiple articles about the subject in independent publications like newspapers or magazines. You can add such sources as references to specific claims using reference URL (P854), or as top-level claims using described at URL (P973).

Also, this may not apply in this specific case, but you should know that we discourage editors from contributing on topics with which they have a strong personal connection, as this may present a conflict of interest. If you are being paid to edit here, then you are obliged to disclose this. For a longer version, you might find it useful to read the essay "How to create an item on Wikidata so that it won't get deleted".  Madamebiblio (talk) 10:32, 20 March 2024 (UTC)Reply

Google Knowledge Graph Search API

edit

I'm curious which knowledge graph API you're using. I recently imported a bunch of entries for people and was stuck with a 1QPS limit. I was using the enterprise API which is still in preview. Is the source for your bot public? I'm curious how you did the matching (e.g. in ambiguous cases).

I really wish google's API here was better. They clearly have a lot of data tied to these identifiers that they aren't releasing. BrokenSegue (talk) 14:31, 23 March 2024 (UTC)Reply

Hi @BrokenSegue!
I'm also using the same Google Knowledge Graph (GKG) API, but have been able to stay under the 60qpm API limit as the rest of our pipeline is VERY slow.
For context -- I have been building and annotating an internal dataset consisting of images (mostly in the Arts, Architecture, and Fashion fields) over the past ~10 years, which has grown to about ~15M images. I recently ran these through several new AI models (GPT-4 and Claude), and looked up the overlapping labels on Google Knowledge Graph (to determine basic notability / significance). We cross checked these against Wikidata to establish additional notability, where we discovered around ~70K missing ids.
I was hoping to contribute the most notable of these entities and concepts back to Wikidata to improve interoperability with the Knowledge graph -- but have treading softly to make sure these contributions meet the project's goals and policies.
I've started this process by manually looking up each entity / concept and determining whether it already exists (and just needs an association), or deciding whether to add it.
Most of these knowledge graph entities DO exist. When there's only a few items with the same name, it's quick to find the right one and associate the id. When there are a lot of items, it takes time to manually look through them all to find the right one (particularly when the items don't have descriptions or labels). In the worst case, like "Vase of Flowers", there are hundreds of items with exact same name, and we need to match the actual photo. Matching specific pieces of ART is the most tedious, and takes a lot of time. My plan to is automate this in the future using a vision model.
In the case where I have checked all the existing items with the same name, and various aliases and alternate identifiers, and am confident that do not exist, I have been doing a basic notability check (e.g. making sure that have multiple pages Google results mentioning them, with the source coming up first). It's been more difficult to find and reference reputable sources that aren't written by the author, as most of the top ranking websites are polluted with SEO or locked behind a paywall. It can take up-to 15 minutes per item to find 1-2 good sources -- even when the item is well known.
I am estimating that out of 100 newly discovered knowledge graph labels that I've found, only 10 passed my basic notability check. The majority of these already exist (and just need the id association), and maybe 1-2 (of the 100) have been manually added. I've added around 250 items in total using this process. Iamcarbon (talk) 03:28, 24 March 2024 (UTC)Reply
Very cool work. Is this part of a research project or a business or is this just a hobby project? My imports of gkids was done much more conservatively as I checked to see if the URL they had on file matched the enwiki sitelink. I've also been interested in trying to also import Bing entity ID (P9885) but their API is too expensive last I checked. BrokenSegue (talk) 15:18, 24 March 2024 (UTC)Reply
This is currently a personal project exploring how how improve AI/ML algorithms ability to cite authoritative data. I'm hoping to apply this work to research/ personal knowledge building in the future.
Comparing URLs makes a lot of sense. Once I get a bot going, I'll see if I can make any additional matches using this approach too.
I haven't look at the bing entity ids, but these would also be great to get associated as well. I'll add this to my list to look into as well. Iamcarbon (talk) 21:13, 26 March 2024 (UTC)Reply

Q125214215

edit

Hello! I think iPhone 13 Pro Max (Q125214215) is the same as iPhone 13 Pro Max (Q108541741) and they should be merged. If you don't know how, I can! -wd-Ryan (Talk/Edits) 00:27, 30 March 2024 (UTC)Reply

Identical. Merged! (thank you!) Iamcarbon (talk) 01:54, 30 March 2024 (UTC)Reply

Edition, not Book

edit

Please use version, edition or translation (Q3331189) instead of book edition (Q57933693). And please do not use "book" in an English description, WikiProject:Books decided long ago to avoid using "book" because it has too many meanings in English and is confusing. --EncycloPetey (talk) 03:33, 16 June 2024 (UTC)Reply

Hi EncycloPetey.
What would you suggest we use instead of "book" in the description to convey that the edition is a physical object that is printed, bound, protected by a cover (hard or soft), and assigned an ISBN (a numeric commercial book identifier) by a publisher.
Are there any discussions you can point me to here? Iamcarbon (talk) 04:09, 16 June 2024 (UTC)Reply
We are rarely talking about a specific physical object. If we did, then we would be talking about one person's book, on the shelf, in their home. We are almost never talking about that.  :: We usually mean an edition published on a certain date, including all the copies that were printed then. Or we are talking about a translation into another language, including all copies of that particular translation, not just one book.
But a "book" can also be the work, in all its editions and translations. A "book" can be the volumes that make up a set, such as an encyclopedia. A "book" can be part of a literary work; and many classical texts are divided into "books". A "book" can also be an abstraction, such as the "book of love" or the "book of life". A "book" can be a major section of a work; The Lord of the Rings was published in three volumes, but it consisted of six "books".
Even in a libaray, "book" might mean the physical objects on the shelves, on which case two copies of the same edition would count as two different "books". Or it might refer to digital "books" accessed via download. Or audio "books" on cassette, disc, or downloaded.
So the reason we do not use "book" is that it is ambiguous. Once you avoid using "book", it becomes clear that there are many things that are "books", and so many in fact that the term is useless for Wikidata. This discussion comes up over and over and over, and WikiProject:Books is a frequent place where this discussion happens. --EncycloPetey (talk) 06:30, 16 June 2024 (UTC)Reply
Hi @EncycloPetey Thanks for the information! I've updated my scripts to use "version, edition or translation" and remove the use of book in the descriptions. Iamcarbon (talk) 00:16, 17 June 2024 (UTC)Reply

Please do not turn a data item for an edition into a data item for a work. The two have completely different data. If the data on an item are primarily information for an edition (publisher, ISBN, number of pages, Open Library edition ID, Goodreads edition ID, LCCN, etc.) then the data item is for an edition. --EncycloPetey (talk) 02:23, 20 June 2024 (UTC)Reply

For example, since you changed Lost Lives, Lost Art: Jewish Collectors, Nazi Art Theft, and the Quest for Justice (Q76574283) to a work instead of an edition, all of these data must be removed. They are data for an edition instead of a work. Please create a data item for the edition, and add all of the removed data to the edition data item. Otherwise, please restore the data and change it back to an edition data item. --EncycloPetey (talk) 02:27, 20 June 2024 (UTC)Reply

Nice spotting! I was actively in the process of breaking that object out into it's own edition, but it looks like you already started updating this object (let me know if there's anything else to do on Q76574283). Great to see you watching over the data and have your help bringing up data quality!
Here's another example of an object that I broke into it's work and edition:
https://www.wikidata.org/wiki/Q19595428
I wouldn't mind a second set of eyes on my contributions, as I've been breaking out thousands of books into their editions. I'm treading lightly, but plan to eventually break out ALL of the editions / translations / etc. from their works. Always welcome to feedback! Iamcarbon (talk) 02:40, 20 June 2024 (UTC)Reply
@EncycloPetey Curious, do you get pinged automatically when I respond? Or does this require a mention in the response? Iamcarbon (talk) 02:41, 20 June 2024 (UTC)Reply
I do not get pinged automatically, but I usually watch for additional conversation. On the Q19595428, note that the Internet Archive ID is for a scan of a specific edition, and should be placed on the correct edition data item. --EncycloPetey (talk) 03:07, 20 June 2024 (UTC)Reply

Mul

edit

You might want to hold off mass deletion of labels in favour of mul until it beds in a bit. Are you following https://www.wikidata.org/wiki/Help_talk:Default_values_for_labels_and_aliases Vicarage (talk) 14:40, 30 July 2024 (UTC)Reply

@Vicarage Agreed. And am now subscribed to Default_values_for_labels_and_aliases directly! Iamcarbon (talk) 18:38, 30 July 2024 (UTC)Reply

Stop mul

edit

Hi @Iamcarbon Do you make massive changes, without having requested bot permission? Please, you do NOT need to delete the labels that were already there. Thank you. Madamebiblio (talk) 04:05, 28 September 2024 (UTC)Reply

If you delete the 'Label' value from the names, there are gadgets that do not work. For this reason, I have now returned the values ​​of a few hundred family names to thousands of Labels.
For those looking to recover the deletion for family names in a language, here's a query: it lists family names that have P407 set but are missing a tag for the same language.
With User:Harmonia_Amanda/namescript.js, hundreds of tags can be restored with a click of a button.

Example German:

SELECT ?item ?itemLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?item wdt:P31 wd:Q101352;
  wdt:P407 wd:Q188.  
  MINUS { ?item rdfs:label ?hulabel FILTER ( lang(?hulabel) = "de" ) }
}
Try it!

Example Finnish:

SELECT ?item ?itemLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?item wdt:P31 wd:Q101352;
  wdt:P407 wd:Q1412.  
  MINUS { ?item rdfs:label ?hulabel FILTER ( lang(?hulabel) = "fi" ) }
}
Try it!

Pallor (talk) 09:05, 28 September 2024 (UTC)Reply

Hi @Pallor
These deletions were intentional and are necessary to identify potential issues before larger-scale deletions occur. Selectively deleting a small batch of labels each day helps discover any tools that may break and identify bots that are not yet 'mul'-aware. This proactive approach helps us address problems early and minimize disruptions when mass deletions eventually happen.
There are already bots deleting labels in bulk across other less popular domains (e.g. Astronomical objects), and these deletions will also be taking place in mass for names (given and family) soon. By conducting a limited number of deletions now and engaging with tool owners, bot owners, an the community early, we can significantly reduce the impact of the upcoming mass removals that are planned to take place.
Note that these deletions have been intentionally limited to identify any new issues. We still have several open Phabricator issues to prevent new labels (and duplicate items) from being re-added, and the community is still becoming aware of this feature.
Are there any specific gadgets or tools that you have found not working as a result of these deletions? Sharing any details would be very helpful so we can assist in making them 'mul'-aware.
It would also very helpful if you could share your thoughts and feedback in either of the following discussions:
WikiProject Names (proposal for 'mul' adoption): https://www.wikidata.org/wiki/Wikidata_talk
  1. Mul_labels_-_proposal_of_massive_addition
And the general discussion on deleting values and labels: https://www.wikidata.org/wiki/Help_talk
I have ceased making any additional deletions until your concerns can be addressed. Iamcarbon (talk) 00:49, 29 September 2024 (UTC)Reply

Missing mul constraint

edit

I believe that, before massively removing the multi-language "reflexive" labels, a blocking constraint should be put in place to preemptively block adding identical language labels whenever a mul label is present. The same for aliases. Otherwise we have the risk that your changes are undone (explicit rollback, or implicit additions). There are users that don't know the Default values functionality. I have already updated my bots to no longer add duplicate language labels. --Geertivp 08:33, 28 September 2024 (UTC)Reply

Hi @Geertivp! Do you know if we have a Phab / tracking issue for adding a constraint to prevent duplicates from being re-introduced? Without the constraint, I agree -- any removals will cause more trouble then they solve.
For some additional context: so far, my removals have been intended to identify any bots and tools that need to made mul-aware, and identify any other issues that we're not aware of yet before a broader rollout. I anticipate we may find additional additional issues over the next few months, and that all these issues can be worked on concurrently before any deletion rules are codified by bots. Also, thanks for updating your bot!!!
I also intended for my deletions to stir up some trouble, add pressure for tool and bot owners to update, and help us gather any critical feedback on any issues that need to be addressed by wikidata developers for us to fully adopt mul. Iamcarbon (talk) 01:15, 29 September 2024 (UTC)Reply