Wikidata:WikiProject CJKV character

There are a lot of Chinese characters (Q8201) and related characters (CJKV character (Q53764732)) commonly used in some Asian countries, including China, Japan, Korea, and Vietnam. The purpose of this project is to maintain items, properties, and ontology related to these CJKV characters.

Items under the scope of this project

edit

Overview based on Unicode

edit

As of Unicode 14.0 (2021), a total of 93,867 CJKV characters have been encoded:

  1. CJK Unified Ideographs (4E00 to 9FFF) - 20,992 items
  2. CJK Compatibility Ideographs (F900 to FAD9) - 472 items, of which 12 are unique and not normalizable (FA0E,FA0F,FA11,FA13,FA14,FA1F,FA21,FA23,FA24,FA27,FA28,FA29)
  3. CJK Compatibility Ideographs Supplement (2F800 to 2FA1D) - 542 items
  4. CJK Unified Ideographs Extension A (3400 to 4DBF) - 6,592 items
  5. CJK Unified Ideographs Extension B (20000 to 2A6DF) - 42,720 items
  6. CJK Unified Ideographs Extension C (2A700 to 2B73F) - 4,153 items
  7. CJK Unified Ideographs Extension D (2B740 to 2B81D) - 222 items
  8. CJK Unified Ideographs Extension E (2B820 to 2CEA1) - 5,762 items
  9. CJK Unified Ideographs Extension F (2CEB0 to 2EBE0) - 7,473 items
  10. CJK Unified Ideographs Extension G (30000 to 3134A) - 4,939 items
  11. Kangxi Radicals (2F00 to 2FD5) - 214 items

Note: Items from CJK Compatibility Ideographs (Q2493848) and CJK Compatibility Ideographs Supplement (Q2493862) that are not unique will be labeled with its normalized character followed by the codepoint in brackets. Additional properties to be used (only for compatibility ideographs): different from (P1889) and normalized Unicode character (P5591)

Note that CJKV characters not (yet) present in Unicode such as 𱁬 (Q7676480) are also under the scope of this project. See the relevant list in the English Wiktionary.

Priority for frequently used characters

edit

The following characters will be prioritized (all regions have the same priority):

  1. Japan: 1006 items in Kyōiku Kanji   Done
  2. Japan: 2136 items in Jōyō Kanji (2010)   Done (1006 Kyōiku Kanji + 1135 Jōyō Kanji - 5 deleted characters)
  3. Japan: 212 items in Jinmeiyō Kanji (2015)
  4. Japan: 6355 items in JIS X 0208 (Q905260)
  5. Mainland China: 8105 items in Table of General Standard Chinese Characters (Q14941454)   Done
  6. Taiwan: 4808 items in Chart of Standard Forms of Common National Characters (Q6498184)
  7. Taiwan: 6341 items in Chart of Standard Forms of Less-Than-Common National Characters (Q11273197)
  8. Taiwan: 18388 items in Chart of Standard Forms of Rarely-Used National Characters (Q11608510)
  9. Hong Kong: 4762 items in List of Graphemes of Commonly-used Chinese Characters (Q6152418)
  10. Hong Kong: 4602 items in HKSCS (Q1627000)
  11. South Korea: 1800 items in Basic Hanja for educational use (Q485267) [1]
  12. South Korea: Approximately 8000 items in Table of hanja for personal names (Q56145673) [2] (Text version: [3])
  13. Vietnam: 17565 items listed by Vietnamese Nôm Preservation Foundation (Q17490564)
  14. Historical: 9831 items in Shuowen Jiezi (Q1072348)
  15. Historical: 49030 items in Kangxi Dictionary (Q850590)

Note that items in the list above are not unique. The same item may be found again in another list.


Important Properties

edit

Note: The values must be subclasses of CJKV character (Q53764732), which is a subclass of character (Q3241972). The source fields referred to are groupings of sources from the Ideographic Research Group (Q5988470).

compulsory value to be included in every item
almost always compulsory, except in rare instances
only for any CJKV character (Q53764732) from a Chinese (Q7850) source, confirmed by the presence of Unihan Database (Q63443408) fields kIRG_GSource, kIRG_HSource, kIRG_MSource, kIRG_SSource and/or kIRG_TSource, or in some instances kIICore, kIRG_USource or kIRG_UKSource (though refer to the documentation)
only for CJK unified ideograph (Q796156) used in mainland China (Q19188)
only for any CJKV character (Q53764732) from a Japanese (Q5287) source, confirmed by the presence of Unihan Database (Q63443408) fields kIRG_JSource and/or kIRG_SSource, or in some instances kIICore, kIRG_USource or kIRG_UKSource (though refer to the documentation)
subclass of kanji character (Q53764782) for non-Chinese-origin CJKV ideograph (Q11420697) that were created in Japan (Q17)
only for any CJKV character (Q53764732) from a Korean (Q9176) source, confirmed by the presence of Unihan Database (Q63443408) fields kIRG_KSource and/or kIRG_KPSource, or in some instances kIICore, kIRG_USource or kIRG_UKSource (though refer to the documentation)
subclass of hanja character (Q55712979) for non-Chinese-origin CJKV ideograph (Q11420697) that were created in Korea (Q18097)
only for any CJKV character (Q53764732) from a Vietnamese (Q9199) source, confirmed by the presence of Unihan Database (Q63443408) field kIRG_VSource, or in some instances kIICore, kIRG_USource or kIRG_UKSource (though refer to the documentation)

Other non-essential values:

Mostly obsolete encodings for specific character sets:

  • Chinese
Note that no reference to GB 18030 (Q1484877) is necessary as it re-encodes Unicode (Q8819).
  • Japanese
  • Korean
  • Vietnamese

Note: the following are instances of standardized writing system (Q55692290).

Note: Usually, the values of this property are authoritative dictionaries or encyclopedias. See the #Unihan Database section for examples of reputable sources according to Unicode.

Note: Additional qualifier applies to jurisdiction (P1001) to be used if there is more than one value for stroke count (P5205).

Note: All items having this property must be instances of kanji character (Q53764782). (Check for the violation)

Note: Qualifier residual stroke count (P5281) has to be used in this property.
Note: Additional qualifier applies to jurisdiction (P1001) to be used if there is more than one value for residual stroke count (P5281).

Note: Must be 5 numeric characters.

Note: 1 to 5 alphabetic letters.

Example: (Q3595029)(Q54874870) (series ordinal (P1545) → 1)

reference: stated in (P248)Guangyun (Q2189818)
(Q3595029)(Q55753032) (series ordinal (P1545) → 2)
reference: stated in (P248)Guangyun (Q2189818)

Note: Must be represented by two character items along with references from rime dictionary (Q2191807) such as Guangyun (Q2189818), Jiyun (Q35792), Qieyun (Q1271885) or historical dictionaries such as Longkan Shoujian (Q6148139).
Compiled values are available from dictionaries such as Hanyu Da Zidian (Q1442751) and Kangxi Dictionary (Q850590). Please cite the original reference (rime dictionary (Q2191807)) where the values are quoted from.

Note: Hangul syllable (Q55809450) such as (Q55809436), of which 480 can be found in Table of hanja for personal names (Q56145673), which is an extension of Basic Hanja for educational use (Q485267). Values from the Unihan database, which is based on KS X 1001 (Q489423) and KS X 1002 (Q12581371) is slightly dated and should be avoided.

Note: Any valid Quốc Ngữ syllable with mandatory qualifier: sinogram reading pattern (P5244) containing either one or both of the following values: chữ Nôm reading (Q56066660) and Sino-Vietnamese reading (Q10805375).
Additional qualifier of writing system (P282) with values of simplified Chinese characters (Q185614) or traditional Chinese characters (Q178528) may be used along with Sino-Vietnamese reading (Q10805375).

Examples:


Note: The value of this property must refer to an actual glyph on GlyphWiki. For example, use u4e00-j instead of u4e00 or u4e00-g (u4e00 and u4e00-g are aliases of u4e00-j). Use ufa47 instead of u6f22-g (u6f22-g is an alias of ufa47). And use koseki-478690 instead of u96e8-t (u96e8-t is an alias of koseki-478690).
Note: If and only if the glyph of the character varies by countries or regions, applies to jurisdiction (P1001) can be used as qualifier to indicate region where glyph is used.

These are the regional codes used by GlyphWiki:

A link to the relevant lexeme. Note that it is not yet clear what kind of data should go only in the lexeme item or the Q-item, or in both.


Unihan Database

edit

This is a mapping of Unihan Database (Q63443408) and Wikidata properties.

Field Wikidata property
kAccountingNumeric numeric value (P1181)
kBigFive code (P3295) with qualifier encoding (P3294)=Big5 (Q858372)
kCangjie Cangjie input (P5519)
kCantonese Wikidata:Property proposal/Jyutping, or transliteration or transcription (P2440) with qualifier determination method or standard (P459)=Jyutping (Q649913)
kCCCII code (P3295) with qualifier encoding (P3294)=Big5 (Q858372)
kCheungBauer described by source (P1343)=The Representation of Cantonese with Chinese Characters (Q7259605) with qualifier code (P3295)
kCheungBauerIndex described by source (P1343)=The Representation of Cantonese with Chinese Characters (Q7259605) with qualifier section, verse, paragraph, or clause (P958)
kCihaiT described by source (P1343)=Cihai (1983) (Q55687733) with qualifier page(s) (P304), column (P3903) and section, verse, paragraph, or clause (P958)
kCNS1986 (subset of CNS1992)
kCNS1992 code (P3295) with qualifier encoding (P3294)=Dessert (Q73121)(separate item for 1992 version?)
kCompatibilityVariant
kCowles described by source (P1343)=A Pocket Dictionary of Cantonese (1999) (Q55686021) with qualifier section, verse, paragraph, or clause (P958)
kDaeJaweon described by source (P1343)=Dae Jawaeon (1988) (Q41662105) with qualifier page(s) (P304) and section, verse, paragraph, or clause (P958)
kDefinition
kEACC (subset of CCCII)
kFenn described by source (P1343)=The Five Thousand Dictionary (1979) (Q55686473) with qualifier code (P3295)
kFennIndex described by source (P1343)=Fenn's Chinese-English Pocket Dictionary (1942) (Q55686451) with qualifier page(s) (P304) and section, verse, paragraph, or clause (P958)
kFourCornerCode four-corner method (P5518)
kFrequency
kGB0 code (P3295) with qualifier encoding (P3294)=GB 2312 (Q1421973)
kGB1 code (P3295) with qualifier encoding (P3294)=GB 12345 (Q10847246)
kGB3 code (P3295) with qualifier encoding (P3294)=GB/T 7589–87 (Q55682422)
kGB5 code (P3295) with qualifier encoding (P3294)=GB/T 7590–87 (Q55682721)
kGB7
kGB8 code (P3295) with qualifier encoding (P3294)=GB/T 8565.2–88 (Q55683269)
kGradeLevel
kGSR described by source (P1343)=Grammata Serica Recensa (1957) (Q55686626) with qualifier code (P3295)
kHangul Hangul pronunciation (P5537)
kHanYu described by source (P1343)=Hanyu Da Zidian (1986-1990) (Q55686683) with qualifier volume (P478), page(s) (P304) and section, verse, paragraph, or clause (P958)
kHanyuPinlu
kHanyuPinyin Hanyu Pinyin transliteration (P1721)
kHDZRadBreak
kHKGlyph catalog (P972)=List of Graphemes of Commonly-used Chinese Characters (Q6152418) with qualifier start time (P580)
kHKSCS code (P3295) with qualifier encoding (P3294)=HKSCS (Q1627000)
kIBMJapan
kIICore
kIRG_GSource
kIRG_HSource
kIRG_JSource
kIRG_KPSource
kIRG_KSource
kIRG_MSource
kIRG_TSource
kIRG_USource
kIRG_VSource
kIRGDaeJaweon (same as kDaeJaweon)
kIRGDaiKanwaZiten (same as kMorohashi)
kIRGHanyuDaZidian (same as kHanyuDa)
kIRGKangXi (same as kKangXi)
kJa
kJapaneseKun
kJapaneseOn
kJinmeiyoKanji grade of kanji (P5277)=jinmeiyō kanji (Q1439720) with qualifier start time (P580)
kJis0 code (P3295) with qualifier encoding (P3294)=JIS X 0208 (Q905260)
kJis1 code (P3295) with qualifier encoding (P3294)=JIS X 0212 (Q841021)
kJIS0213 code (P3295) with qualifier encoding (P3294)=JIS X 0213 (Q6108269)
kJoyoKanji catalog (P972)=Table of Jōyō kanji (Q55502741) with qualifier start time (P580)
kKangXi described by source (P1343)=Kangxi Dictionary (1989) (Q55686777) with qualifier page(s) (P304) and section, verse, paragraph, or clause (P958)
kKarlgren described by source (P1343)=Analytic Dictionary of Chinese and Sino-Japanese (1974) (Q55686809) with qualifier section, verse, paragraph, or clause (P958)
kKorean (deprecated)
kKoreanEducationHanja catalog (P972)=Basic Hanja for educational use (Q485267) with qualifier start time (P580)
kKoreanName
kKPS0 code (P3295) with qualifier encoding (P3294)=KPS 9566 (Q712676)
kKPS1 code (P3295) with qualifier encoding (P3294)=KPS 10721 (Q55683587)
kKSC0 code (P3295) with qualifier encoding (P3294)=KS X 1001 (Q489423)
kKSC1 code (P3295) with qualifier encoding (P3294)=KS X 1002 (Q12581371)
kLau described by source (P1343)=A Practical Cantonese-English Dictionary (1977) (Q55686864) with qualifier section, verse, paragraph, or clause (P958)
kMainlandTelegraph code (P3295) with qualifier encoding (P3294)=PRC telegraph code (Q55683724)
kMandarin Hanyu Pinyin transliteration (P1721)
kMatthews described by source (P1343)=Mathews' Chinese-English Dictionary (1975) (Q55687056) with qualifier section, verse, paragraph, or clause (P958)
kMeyerWempe described by source (P1343)=Student's Cantonese-English Dictionary (1947) (Q55687107) with qualifier section, verse, paragraph, or clause (P958)
kMorohashi described by source (P1343)=Dai Kan-Wa Jiten (1986) (Q55687276) with qualifier section, verse, paragraph, or clause (P958)
kNelson described by source (P1343)=The Modern Reader's Japanese-English Character Dictionary (1974) (Q55687372) with qualifier section, verse, paragraph, or clause (P958)
kOtherNumeric numeric value (P1181)
kPhonetic described by source (P1343)=Ten Thousand Characters: An Analytic Dictionary (1980) (Q55687844) with qualifier section, verse, paragraph, or clause (P958)
kPrimaryNumeric numeric value (P1181)
kPseudoGB1
kRSAdobe_Japan1_6 code (P3295) with qualifier encoding (P3294)=Adobe-Japan1 (Q55688369); radical (P5280) and residual stroke count (P5281)
kRSJapanese radical (P5280) and residual stroke count (P5281)
kRSKangXi radical (P5280) and residual stroke count (P5281)
kRSKanWa radical (P5280) and residual stroke count (P5281)
kRSKorean radical (P5280) and residual stroke count (P5281)
kRSUnicode radical (P5280) and residual stroke count (P5281)
kSBGY described by source (P1343)=Song Ben Guang Yun (Q55687631) with qualifier page(s) (P304) and section, verse, paragraph, or clause (P958)
kSemanticVariant
kSimplifiedVariant CJKV variant character (P5475)
kSpecializedSemanticVariant
kTaiwanTelegraph code (P3295) with qualifier encoding (P3294)=Taiwanese telegraph code (Q55683771)
kTang
kTGH catalog (P972)=Table of General Standard Chinese Characters (Q14941454) with qualifier start time (P580) and series ordinal (P1545)
kTotalStrokes stroke count (P5205)
kTraditionalVariant CJKV variant character (P5475)
kVietnamese Vietnamese reading (P5625)
kXerox code (P3295) with qualifier encoding (P3294)=?
kXHC1983 described by source (P1343)=Xiandai Hanyu Cidian (2 ed) (1983) (Q55688435) with qualifier page(s) (P304) and section, verse, paragraph, or clause (P958); Hanyu Pinyin transliteration (P1721)
kZVariant CJKV variant character (P5475)

Participants

edit

The participants listed below can be notified using the following template in discussions:
{{Ping project|CJKV character}}