Wikidata:WikiProject CJKV character
There are a lot of Chinese characters (Q8201) and related characters (CJKV character (Q53764732)) commonly used in some Asian countries, including China, Japan, Korea, and Vietnam. The purpose of this project is to maintain items, properties, and ontology related to these CJKV characters.
Items under the scope of this project
editOverview based on Unicode
editAs of Unicode 14.0 (2021), a total of 93,867 CJKV characters have been encoded:
- CJK Unified Ideographs (4E00 to 9FFF) - 20,992 items
- CJK Compatibility Ideographs (F900 to FAD9) - 472 items, of which 12 are unique and not normalizable (FA0E,FA0F,FA11,FA13,FA14,FA1F,FA21,FA23,FA24,FA27,FA28,FA29)
- CJK Compatibility Ideographs Supplement (2F800 to 2FA1D) - 542 items
- CJK Unified Ideographs Extension A (3400 to 4DBF) - 6,592 items
- CJK Unified Ideographs Extension B (20000 to 2A6DF) - 42,720 items
- CJK Unified Ideographs Extension C (2A700 to 2B73F) - 4,153 items
- CJK Unified Ideographs Extension D (2B740 to 2B81D) - 222 items
- CJK Unified Ideographs Extension E (2B820 to 2CEA1) - 5,762 items
- CJK Unified Ideographs Extension F (2CEB0 to 2EBE0) - 7,473 items
- CJK Unified Ideographs Extension G (30000 to 3134A) - 4,939 items
- Kangxi Radicals (2F00 to 2FD5) - 214 items
Note: Items from CJK Compatibility Ideographs (Q2493848) and CJK Compatibility Ideographs Supplement (Q2493862) that are not unique will be labeled with its normalized character followed by the codepoint in brackets. Additional properties to be used (only for compatibility ideographs): different from (P1889) and normalized Unicode character (P5591)
Note that CJKV characters not (yet) present in Unicode such as 𱁬 (Q7676480) are also under the scope of this project. See the relevant list in the English Wiktionary.
Priority for frequently used characters
editThe following characters will be prioritized (all regions have the same priority):
- Japan: 1006 items in Kyōiku Kanji Done
- Japan: 2136 items in Jōyō Kanji (2010) Done (1006 Kyōiku Kanji + 1135 Jōyō Kanji - 5 deleted characters)
- Japan: 212 items in Jinmeiyō Kanji (2015)
- Japan: 6355 items in JIS X 0208 (Q905260)
- Mainland China: 8105 items in Table of General Standard Chinese Characters (Q14941454) Done
- Taiwan: 4808 items in Chart of Standard Forms of Common National Characters (Q6498184)
- Taiwan: 6341 items in Chart of Standard Forms of Less-Than-Common National Characters (Q11273197)
- Taiwan: 18388 items in Chart of Standard Forms of Rarely-Used National Characters (Q11608510)
- Hong Kong: 4762 items in List of Graphemes of Commonly-used Chinese Characters (Q6152418)
- Hong Kong: 4602 items in HKSCS (Q1627000)
- South Korea: 1800 items in Basic Hanja for educational use (Q485267) [1]
- South Korea: Approximately 8000 items in Table of hanja for personal names (Q56145673) [2] (Text version: [3])
- Vietnam: 17565 items listed by Vietnamese Nôm Preservation Foundation (Q17490564)
- Historical: 9831 items in Shuowen Jiezi (Q1072348)
- Historical: 49030 items in Kangxi Dictionary (Q850590)
Note that items in the list above are not unique. The same item may be found again in another list.
Important Properties
editNote: The values must be subclasses of CJKV character (Q53764732), which is a subclass of character (Q3241972). The source fields referred to are groupings of sources from the Ideographic Research Group (Q5988470).
- compulsory value to be included in every item
- almost always compulsory, except in rare instances
- only for any CJKV character (Q53764732) from a Chinese (Q7850) source, confirmed by the presence of Unihan Database (Q63443408) fields kIRG_GSource, kIRG_HSource, kIRG_MSource, kIRG_SSource and/or kIRG_TSource, or in some instances kIICore, kIRG_USource or kIRG_UKSource (though refer to the documentation)
- only for CJK unified ideograph (Q796156) used in mainland China (Q19188)
- only for any CJKV character (Q53764732) from a Japanese (Q5287) source, confirmed by the presence of Unihan Database (Q63443408) fields kIRG_JSource and/or kIRG_SSource, or in some instances kIICore, kIRG_USource or kIRG_UKSource (though refer to the documentation)
- subclass of kanji character (Q53764782) for non-Chinese-origin CJKV ideograph (Q11420697) that were created in Japan (Q17)
- only for any CJKV character (Q53764732) from a Korean (Q9176) source, confirmed by the presence of Unihan Database (Q63443408) fields kIRG_KSource and/or kIRG_KPSource, or in some instances kIICore, kIRG_USource or kIRG_UKSource (though refer to the documentation)
- subclass of hanja character (Q55712979) for non-Chinese-origin CJKV ideograph (Q11420697) that were created in Korea (Q18097)
- only for any CJKV character (Q53764732) from a Vietnamese (Q9199) source, confirmed by the presence of Unihan Database (Q63443408) field kIRG_VSource, or in some instances kIICore, kIRG_USource or kIRG_UKSource (though refer to the documentation)
Other non-essential values:
Mostly obsolete encodings for specific character sets:
- Chinese
- Note that no reference to GB 18030 (Q1484877) is necessary as it re-encodes Unicode (Q8819).
- Japanese
- Korean
- Vietnamese
- kanji (Q82772)
- traditional Chinese characters (Q178528)
- simplified Chinese characters (Q185614)
- legacy Chinese character (Q10885554) - characters used in modern Chinese (Q7850) that are neither traditional nor simplified.
- Hanja (Q485619)
- chữ Nôm (Q875344) - for characters that use chữ Nôm reading (Q56066660)
- chữ Hán (Q1378119) - for characters that use Sino-Vietnamese reading (Q10805375)
- Sawndip (Q923677)
Note: the following are instances of standardized writing system (Q55692290).
- standard form of national characters (Q906019) - used in Taiwan (Q865). May include simplified Chinese characters (Q185614).
- standard hanzi character (Q8044489) - used in mainland China (Q19188). May include traditional Chinese characters (Q178528).
Note: Usually, the values of this property are authoritative dictionaries or encyclopedias. See the #Unihan Database section for examples of reputable sources according to Unicode.
- CJK Unified Ideographs (Q994386) and not CJK unified ideograph (Q796156)
- CJK Compatibility Ideographs (Q2493848)
- CJK Compatibility Ideographs Supplement (Q2493862)
- CJK Unified Ideographs Extension A (Q632516)
- CJK Unified Ideographs Extension B (Q545703)
- CJK Unified Ideographs Extension C (Q2494132)
- CJK Unified Ideographs Extension D (Q188379)
- CJK Unified Ideographs Extension E (Q20638416)
- CJK Unified Ideographs Extension F (Q30410190)
- CJK Unified Ideographs Extension G (Q85874480)
- CJK Unified Ideographs Extension H (Q113956966)
- CJK Unified Ideographs Extension I (Q122569272)
- Kangxi Radicals (Q2493936)
- CJK stroke (Q1689024)
- quantity (P1114)
- image (P18): For stroke order images, eg. 雨-bw.png
- Table of Jōyō kanji (Q55502741)
- Chart of Standard Forms of Common National Characters (Q6498184)
- Table of General Standard Chinese Characters (Q14941454)
- Basic Hanja for educational use (Q485267)
- Table of hanja for personal names (Q56145673)
Note: Additional qualifier applies to jurisdiction (P1001) to be used if there is more than one value for stroke count (P5205).
Note: All items having this property must be instances of kanji character (Q53764782). (Check for the violation)
- grade 1 kyōiku kanji (Q53785300)
- grade 2 kyōiku kanji (Q53785302)
- grade 3 kyōiku kanji (Q53785303)
- grade 4 kyōiku kanji (Q53785305)
- grade 5 kyōiku kanji (Q53785308)
- grade 6 kyōiku kanji (Q53785310)
- jōyō kanji (Q875368)
- jinmeiyō kanji (Q1439720)
- hyōgai kanji (Q4502863)
- values are instances of Chinese character radical (Q849778) (Query)
Note: Qualifier residual stroke count (P5281) has to be used in this property.
Note: Additional qualifier applies to jurisdiction (P1001) to be used if there is more than one value for residual stroke count (P5281).
Note: Must be 5 numeric characters.
Note: 1 to 5 alphabetic letters.
Example: 雪 (Q3595029) → 相 (Q54874870) (series ordinal (P1545) → 1)
- reference: stated in (P248) → Guangyun (Q2189818)
- 雪 (Q3595029) → 絕 (Q55753032) (series ordinal (P1545) → 2)
- reference: stated in (P248) → Guangyun (Q2189818)
Note: Must be represented by two character items along with references from rime dictionary (Q2191807) such as Guangyun (Q2189818), Jiyun (Q35792), Qieyun (Q1271885) or historical dictionaries such as Longkan Shoujian (Q6148139).
Compiled values are available from dictionaries such as Hanyu Da Zidian (Q1442751) and Kangxi Dictionary (Q850590). Please cite the original reference (rime dictionary (Q2191807)) where the values are quoted from.
Note: Hangul syllable (Q55809450) such as 우 (Q55809436), of which 480 can be found in Table of hanja for personal names (Q56145673), which is an extension of Basic Hanja for educational use (Q485267). Values from the Unihan database, which is based on KS X 1001 (Q489423) and KS X 1002 (Q12581371) is slightly dated and should be avoided.
Note: Any valid Quốc Ngữ syllable with mandatory qualifier: sinogram reading pattern (P5244) containing either one or both of the following values: chữ Nôm reading (Q56066660) and Sino-Vietnamese reading (Q10805375).
Additional qualifier of writing system (P282) with values of simplified Chinese characters (Q185614) or traditional Chinese characters (Q178528) may be used along with Sino-Vietnamese reading (Q10805375).
- Sino-Vietnamese reading (Q10805375) are literary Chinese readings derived from phiên thiết in Middle Chinese while chữ Nôm reading (Q56066660) are vernacular readings used in the pronunciation of chữ Nôm (Q875344). (See wikt:鈙#Vietnamese for example).
- Note that readings obtained from ① the Unihan database, ② Vietnamese Nôm Preservation Foundation (Q17490564) or ③ Vietnamese Wiktionary (Q33109114) (based on Template:R:WinVNKey:Lê Sơn Thanh (Q55889066)) may contain some errors.
- If possible, obtain readings from printed references such as Tự Điển Chữ Nôm Dẫn Giải (Q56070779) and Giúp đọc Nôm và Hán Việt (Q56070751).
Examples:
- 一 (Q4025820) → u4e00-j
applies to jurisdiction (P1001) → Japan (Q17)(The glyph does not varies by regions, so applies to jurisdiction (P1001) is not needed.)
- 漢 (Q54872914) → u6f22-j
- 漢 (Q54872914) → ufa47
- 漢 (Q54872914) → u6f22-t
- 漢 (Q54872914) → u6f22-v
- 雨 (Q3595028) → u96e8
- 雨 (Q3595028) → koseki-478690
Note: The value of this property must refer to an actual glyph on GlyphWiki. For example, use u4e00-j instead of u4e00 or u4e00-g (u4e00 and u4e00-g are aliases of u4e00-j). Use ufa47 instead of u6f22-g (u6f22-g is an alias of ufa47). And use koseki-478690 instead of u96e8-t (u96e8-t is an alias of koseki-478690).
Note: If and only if the glyph of the character varies by countries or regions, applies to jurisdiction (P1001) can be used as qualifier to indicate region where glyph is used.
These are the regional codes used by GlyphWiki:
- G - mainland China (Q19188)
- T - Taiwan (Q865)
- J - Japan (Q17)
- K - South Korea (Q884)
- KP - North Korea (Q423)
- V - Vietnam (Q881)
- H - Hong Kong (Q8646)
- instances of sinogram (Q53764738)
- instances of ideographic description character (Q55589899)
A link to the relevant lexeme. Note that it is not yet clear what kind of data should go only in the lexeme item or the Q-item, or in both.
Unihan Database
editThis is a mapping of Unihan Database (Q63443408) and Wikidata properties.
Participants
editThe participants listed below can be notified using the following template in discussions:{{Ping project|CJKV character}}