User:Epìdosis/Strategy
Connecting Wikidata and library catalogs: reciprocal benefits
[edit]Benefits for Wikidata
[edit]- Possibility of identifying people with more certainty (i.e. confronting works listed by different library catalogs)
- Discovery of duplicate items
- Sources for aliases
- Source for descriptions
- Sources for statements (i.e. dates of birth/death etc.)
Benefits for library catalogs
[edit]- Possibility of comparing with plenty of other sources (i.e. authority IDs, encyclopedias, other databases etc.)
- Possibility of importing and/or showing these sources to the reader (i.e. AuthorityBox; cfr. also Google Knowledge Graph)
- Discovery of duplicate authority IDs
- Source for aliases
- Source for descriptions
- Source for data (i.e. dates of birth/death etc.)
Connecting Wikidata and library catalogs: effective actions
[edit]- Creation of Mix'n'match (Q28054658) catalog with plenty of auxiliary data [note: this point should be expanded]
- 0: Use Mix'n'match catalog in order to improve interconnection between Wikidata and the library catalog
Note: all changes made on Wikidata should be made also on Mix'n'match
- When an authority ID is moved from one item to another, the change is always to be made also on Mix'n'match (in order to avoid their automatic reinsertion in the wrong item)
- When an authority ID is deleted (because the library catalog has redirected or deleted it), the change is always to be made also on Mix'n'match (i.e. marking it as N/A, in order to avoid their automatic reinsertion)
Periodical revision on Wikidata [W]
[edit]- Check unique-value constraint violations (in order to find duplicate items)!
- Check through queries all items having the authority ID but not some fundamental properties, and add manually (or import semiautomatically) missing information
- sex or gender (P21)
- date of birth (P569) or date of death (P570) or floruit (P1317)
- VIAF ID (P214)
- Aliases
- Description in the language of the cataloging agency
Periodical revision on library catalogs [L]
[edit]- Check single-value constraint violations (in order to find duplicate authority IDs)!
- Note: if a great number of duplicates is found through Mix'n'match, it may be good acting in the following way: merge all the duplicate authority IDs in the catalog; remove all the deleted authority IDs from Wikidata (if the property already exists); import a new Mix'n'match catalog and delete the old one
- Check through some internal way all authority IDs being present on Wikidata but not having some fundamental data, and add manually (or import semiautomatically) missing information
- sex or gender (P21)
- date of birth (P569) or date of death (P570) or floruit (P1317)
- VIAF ID (P214)
- Aliases
- Description in the language of the cataloging agency
- Compare through some internal way birth/death dates in authority IDs and in Wikidata (in order to find mismatches and to improve dates on both sides)
- Receive from Wikidata users reports of possible errors in the talk page of the property
Connecting Wikidata and library catalogs: institutions
[edit]Institutions effectively involved
[edit]Property | 0 | W1 | W2 | L1 | L2 | L3 | L4 |
---|---|---|---|---|---|---|---|
Pontificia Università della Santa Croce ID (P5739) | doing | periodically | periodically | periodically | periodically | ||
Unione Romana Biblioteche Scientifiche ID (P8750) | doing | scheduled | periodically | ||||
Portuguese National Library author ID (P1005) | scheduled | ||||||
Museo Galileo authority ID (P8947) | scheduled | periodically | periodically | ||||
Cyprus University of Technology ID (P9251) | scheduled | ||||||
Biblioteca Franco Serantini ID (P9178) | scheduled |
Engageable institutions
[edit]This list includes all the library catalogs for which a Mix'n'match catalog was scraped (or scraping is scheduled) in 2020-2021 by Bargioni, excluding the library catalogs for which a contact has been established (included in the previous list).
Good examples of error reporting
[edit]- GND ID (P227): de:Wikipedia:GND/Fehlermeldung
- [{P|691}}: cs:Wikipedie:Wikidata/Nahlášené duplicity autoritních záznamů
P.S. About conflations
[edit]I would distinguish two types of conflation:
- just one ID is misplaced (conflation lato sensu)
- many parts of the item (labels/descriptions/aliases, statements, IDs) refer to two different entities (conflation stricto sensu)
If the misplaced ID is very important for reconciliation (mainly VIAF ID (P214) and ISNI (P213)), the risk of degenerating from conflation lato sensu to conflation stricto sensu is high
if ID is perfect (no duplications or conflations) | actually (ID having duplications and conflations) | |
---|---|---|
unique-value constraint violation |
|
|
single-value constraint violation |
|
|
Another way to find conflations would be looking at single-value constraint violations of date of birth (P569)/place of birth (P19)/date of death (P570)/place of death (P20), but they can mean:
- many dates with different precision (e.g. day vs. year; the most precise should have preferred rank)
- many dates supported by different sources (e.g. Wikipedia vs. an authority control; the statements supported by the most authoritative source should have preferred rank, statements supported by nothing or only by a Wikipedia should be removed)
- item is conflated
In both cases (using IDs or using basic biographical statements) constraint violations are the best method to find conflations, but in fact conflations (and specifically conflations stricto sensu) are very rare in percentage and they are confused with a very big number of problems which are less damaging but still annoying, because they can degenerate in more serious problems (conflations lato sensu) and because they are annoying for Wikipedia infoboxes (double statements different in precision or in sources' authority) and finally because they preclude the discovery of more serious problems (conflations stricto sensu).
Secondo me i due strumenti fondamentali per beccare le conflazioni sono appunto le violazioni dei vincoli, in particolare:
- violazione di valore unico può indicare o 1) duplicazione in Wikidata (elementi da unire) o 2) conflazione in Wikidata (l'id è presente in due o più elementi ma in uno o più elementi non c'entra e va rimosso) o 3) conflazione nell'id (l'id riguarda due o più elementi, andrebbe diviso)
- violazione di valore singolo può indicare o 1) conflazione in Wikidata (uno o più d'uno dei due o più id presenti non c'entra e va rimosso) o 2) duplicazione degli id (il database contiene due o più id per lo stesso elemento, andrebbero uniti)
Come risulta dai due punti sopra, è fondamentale che la comunità si occupi di tenere il più possibile vuote le violazioni di vincoli, specialmente per quelle proprietà (penso a VIAF e ISNI) tipicamente usate per riconciliare nuovi database (se un VIAF è erroneamente presente in un elemento, rischia di coagulare attorno a sé altri id erronei - lo ho visto spesso accadere), e ciò secondo me non viene fatto abbastanza.
Ma c'è anche un altro punto rilevante che va messo in luce: molte violazioni di vincolo, soprattutto di valore singolo, dipendono da errori (duplicazioni, più raramente conflazioni) dei database cui Wikidata linka: stabilire, specie coi più importanti di essi, degli efficaci meccanismi di segnalazione e soluzione dei problemi individuati per mezzo di Wikidata (es. attraverso tool su Toolforge o altro modo che piaccia anche alla istituzione) aiuterebbe sia a migliorare i loro database sia a diminuire le violazioni di vincolo, rendendo più facile individuare quelle violazioni di vincolo che pertengono specificamente a Wikidata stessa (cioè duplicazioni e più raramente conflazioni su Wikidata stessa).