User:Lea Lacroix (WMDE)/List of lists of languages
Here's an attempt to build a list with all the lists of languages that are used by Wikidata. This is a work in progress, feel free to edit and improve if you have some information or useful links!
Uses of lists
[edit]
Location of the language list/language use | How is it currently displayed? | Where does it come from? Where is the list stored? | What’s the process to update the list? (technical) | What’s the process to update the list? (community) | Image |
---|---|---|---|---|---|
Desktop termbox
List of languages for which one can read or edit labels, descriptions, aliases |
Reading & editing: name of the language in the UI’s language
Order & what languages are shown first is influenced by the thing? that is influenced by BabelBox |
WikibaseContentLanguages::getDefaultTermsLanguages() (unless overridden by WikibaseContentLanguages hook), which returns the MediaWiki languages, which are the ones supported by MediaWiki directly (not sure where those are defined – Names.php?) plus the $wgExtraLanguageNames, plus some languages defined in getDefaultTermsLanguages. | Add to the MediaWiki languages if this should become a general interface language, or to wmgExtraLanguageNames in InitialiseSettings.php otherwise. |
|
|
Mobile termbox
List of languages for which one can read or edit labels, descriptions, aliases |
Reading & editing: name of the language in the UI’s language
Order & what languages are shown first is influenced by the thing? that is influenced by BabelBox |
Same as Desktop termbox? | Same as Desktop termbox? | Same as Desktop termbox? | |
Language field in Special:NewItem | Language code (‘en’)
In the selector list: name + code By default, the suggested language is the user’s current interface language |
Same as Desktop termbox? | Same as Desktop termbox? | Same as Desktop termbox? | |
Language field in Special:NewLexeme (also: Language field when editing a Lexeme) | Label & description in UI’s language | Freely picked from all Items, no restriction. The relevant item (a language or subclass of language?) is not necessarily shown first in the suggestor. | - | Create or update Items | |
“Spelling variant of the Lemma” field in Special:NewLexeme (also: Spelling variant field when editing a Lexeme) | Reading: code only, or “mis-X-Q1234”
Editing: Name and language code. Only the code makes the selector find the relevant language Also: when the language doesn’t exist, people are supposed to type “mis-X-Q1234”, without any help from the interface |
LexemeTermLanguages, which is the MediaWiki languages (see Desktop termbox, above) plus a hard-coded list of additional language codes. The -x-Q### part is validated in LexemeTermLanguageValidator.
Note: Currently there are no validation of existence of item (phab:T201084), as long as there are no more than ten digits after the Q. Also, the full code may not be a valid IETF language tag, as any part of IETF language tag may not be longer than 8 characters (phab:T167166). ... Also: in which conditions exactly does the field appear when creating a new Lexeme? When the language item has no ISO 639-1 code statement. |
Part of the answer here | For the additional languages a Phabricator ticket needs to be created. LangCom input is generally sought. | |
Language of monolingual text: appears when entering or editing a value monolingual text (eg for P1705) | Reading: name of the language in UI’s language.
Editing: name and language code. Problem in editing mode: some languages don’t appear in the selector, but still work once the edit is saved. |
WikibaseContentLanguages::getDefaultMonolingualTextLanguages() (unless overridden by WikibaseContentLanguages hook), which returns the MediaWiki languages (see Desktop termbox, above) plus a hard-coded set of additional language codes minus a hard-coded set of undesired language codes. | Existing documentation on Help:monolingual text languages. Current process:
|
||
Language field for a Gloss on Senses | Reading: name of the language.
Editing: name and code. |
Same as Spelling variant of the Lemma? Note the -x-Q#### tag is not allowed in gloss languages; however, this is only verified in frontend. It is still possible to add such glosses via API. |
Same as Spelling variant of the Lemma? | Same as Spelling variant of the Lemma? | |
Spelling variant field for Forms | Reading: language code only.
Editing: language code, no selector available. |
Same as Spelling variant of the Lemma? | Same as Spelling variant of the Lemma? | Same as Spelling variant of the Lemma? | |
Language of the interface for logged-in users | To change temporarily one one page: add “?uselang=ar” in the URL (language code)
On the interface: switch command with the symbol , name of the language in its own language Suggestions are influenced by: the thing that is influenced by BabelBox? Most used languages? Note: for non logged-in users, the interface stays in English. |
Same as “the MediaWiki languages” in Desktop termbox, above? | Same as Desktop termbox? | Same as Desktop termbox? | |
Languages of the Translate Extension: visible eg when a documentation page has translations enabled | Name of the language in its own language + icon indicating the level of translation done | Same as “the MediaWiki languages” in Desktop termbox, above?
Note that <languages/> only shows languages for which a (partial) translation exists; you can select other languages on Special:Translate |
Same as Desktop termbox? | Same as Desktop termbox? | |
Languages available in the BabelBox | Language code, level, generated sentence in the language
Has effect on: what languages are shown first on desktop & mobile termbox |
What languages are available depends on the babel box sub-templates that are available. | The user can edit their user page and add whatever language code they want to add, even unsupported ones. These are shown as red links and don’t have an effect, until support is added. | The community provides babel box sub-templates for languages they want to support. | |
Existing language versions of Wikipedias | On a Wikidata Item: indicated by the language code and the title of the article in each language. Same for other Wikimedia projects.
On Wikipedia: name of the language in its own language |
m:Special:SiteMatrix | I think it first needs to be added as an interface language, and then add a wiki, specifically populateSitesTable.php. | m:Requests for new languages (for proposal) and Incubator (for development)
Note this will be changed in the future, see phab:T228745 |
|
Interface language of the WDQS | On the interface: switch command with the symbol, name of the language in its own language | Languages supported by jquery.uls, which are periodically imported from wikimedia/language-data. | Follow wikimedia/language-data instructions, then follow jquery.uls instructions, then update version in wikidata/query/gui. (Then wait for a deployment, but the process for that is supposed to change soon anyways.) | Unclear. | |
Items for languages in Wikidata | Wikidata contains an extensive "list" of items about languages, langoids.
|
Wikidata items | create an item for the language | create an item for the language | |
Magic word {{#language:}} | MediaWiki can output the language name with a magic word. Sample: {{#language:en-gb}} renders British English | MediaWiki languages (the ones supported by MediaWiki directly, plus $wgExtraLanguageNames) | Same as Desktop termbox | Same as Desktop termbox | |
Page content language | In "page information" or with magic words {{CONTENTLANG}}, {{CONTENTLANGUAGE}}. By default the same as wiki language, can be changed by translation administrators. | in Mediawiki page properties | ask a translation administrator | ||
Language associated with Wikimedia sitelink | schema:inLanguage on WQS, can be different from language code in URL: https://w.wiki/br3 . See phab:T145535 for similar. | ask at Wikidata:Contact the development team | |||
Language associated with property | Some properties have a language/writing system associated with them that is stated in its label or description. Samples: name in hiero markup (P7383), name in kana (P1814), transliteration properties | free text in label or description, possibly with statements on property | property creator or update of label/description | make a property proposal or suggest change on property talk page |
Lists used
[edit]- codes
- ISO 639-2/ISO 639-3 (sample: en) covers
- macro languages
- natural languages (living, dead, many or few native speakers)
- artificial languages
- special codes: mis, und, zxx, mul
- To request new ones: ..
- IANA language subtags
- To request new ones: ..
- script subtags
- region subtags (sample: nds-nl)
- IETF language tag (based on previous or other) (sample: en-gb)
- WMF specific (sample: simple, sr-el, zh-classical)
- legacy codes
- Wikidata items (sample: Q1860)
- wikt:en:Wiktionary:List of languages - "all language codes that are recognised by Wiktionary" - "currently 8163 language codes"
- Data stored in these Module: pages: wikt:en:Category:Language data modules
- Help:Wikimedia language codes/lists/all
- SUL supported languages: https://github.com/wikimedia/language-data/blob/master/data/langdb.yaml
- language names
- ..
- configuration
- labels of Wikidata items (sample: English )
General ideas
[edit]General ideas about its use on Wikidata:
- use available codes whenever possible
- identifying the language of a word in Wikidata doesn't require living native speakers nor a Wikipedia language edition
- avoid creating codes for macro languages to define the language of the word in a specific, clearly identified language of that macro language
- add a script subtag (e.g. "-cyrl") when the writing system isn't the primary one for the language or it's unclear what that is
- use a region subtag (e.g. "-gb") to describe regional variants
- use lowercase for subtags
- ask for the addition of IETF language tags when exisiting ones don't describe the language of a monolingual string accurately
- use "mis" while it's not available at Wikidata
- lexemes lacking the appropriate language tag, use:
- the appropriate parent code (e.g. "mis" or an actual language code "eo"),
- followed by "-x-" to introduce a private use subtag,
- and the QID for the langoid.
- Sample: "eo-x-Q3505590" for System H (Q3505590). See query for others: https://w.wiki/cMB
- If non-private subtags for IETF language tags are available, these should be requested.