Wikidata:WikiProject CJKV character

There are a lot of Chinese characters (Q8201) and related characters (CJKV character (Q53764732)) commonly used in some Asian countries, including China, Japan, Korea, and Vietnam. The purpose of this project is to maintain items, properties, and ontology related to these CJKV characters.

Items under the scope of this project

Overview based on Unicode

As of Unicode 14.0 (2021), a total of 93,867 CJKV characters have been encoded:

CJK Unified Ideographs (4E00 to 9FFF) - 20,992 items
CJK Compatibility Ideographs (F900 to FAD9) - 472 items, of which 12 are unique and not normalizable (FA0E,FA0F,FA11,FA13,FA14,FA1F,FA21,FA23,FA24,FA27,FA28,FA29)
CJK Compatibility Ideographs Supplement (2F800 to 2FA1D) - 542 items
CJK Unified Ideographs Extension A (3400 to 4DBF) - 6,592 items
CJK Unified Ideographs Extension B (20000 to 2A6DF) - 42,720 items
CJK Unified Ideographs Extension C (2A700 to 2B73F) - 4,153 items
CJK Unified Ideographs Extension D (2B740 to 2B81D) - 222 items
CJK Unified Ideographs Extension E (2B820 to 2CEA1) - 5,762 items
CJK Unified Ideographs Extension F (2CEB0 to 2EBE0) - 7,473 items
CJK Unified Ideographs Extension G (30000 to 3134A) - 4,939 items
Kangxi Radicals (2F00 to 2FD5) - 214 items

Note: Items from CJK Compatibility Ideographs (Q2493848) and CJK Compatibility Ideographs Supplement (Q2493862) that are not unique will be labeled with its normalized character followed by the codepoint in brackets. Additional properties to be used (only for compatibility ideographs): different from (P1889) and normalized Unicode character (P5591)

Note that CJKV characters not (yet) present in Unicode such as 𱁬 (Q7676480) are also under the scope of this project. See the relevant list in the English Wiktionary.

Priority for frequently used characters

The following characters will be prioritized (all regions have the same priority):

Japan: 1006 items in Kyōiku Kanji Done
Japan: 2136 items in Jōyō Kanji (2010) Done (1006 Kyōiku Kanji + 1135 Jōyō Kanji - 5 deleted characters)
Japan: 212 items in Jinmeiyō Kanji (2015)
Japan: 6355 items in JIS X 0208 (Q905260)
Mainland China: 8105 items in Table of General Standard Chinese Characters (Q14941454) Done
Taiwan: 4808 items in Chart of Standard Forms of Common National Characters (Q6498184)
Taiwan: 6341 items in Chart of Standard Forms of Less-Than-Common National Characters (Q11273197)
Taiwan: 18388 items in Chart of Standard Forms of Rarely-Used National Characters (Q11608510)
Hong Kong: 4762 items in List of Graphemes of Commonly-used Chinese Characters (Q6152418)
Hong Kong: 4602 items in HKSCS (Q1627000)
South Korea: 1800 items in Basic Hanja for educational use (Q485267) [1]
South Korea: Approximately 8000 items in Table of hanja for personal names (Q56145673) [2] (Text version: [3])
Vietnam: 17565 items listed by Vietnamese Nôm Preservation Foundation (Q17490564)
Historical: 9831 items in Shuowen Jiezi (Q1072348)
Historical: 49030 items in Kangxi Dictionary (Q850590)

Note that items in the list above are not unique. The same item may be found again in another list.

Important Properties

instance of (P31)

Note: The values must be subclasses of CJKV character (Q53764732), which is a subclass of character (Q3241972). The source fields referred to are groupings of sources from the Ideographic Research Group (Q5988470).

sinogram (Q53764738)

compulsory value to be included in every item

Unicode character (Q29654788)

almost always compulsory, except in rare instances

sinogram (Q53764738)

only for any CJKV character (Q53764732) from a Chinese (Q7850) source, confirmed by the presence of Unihan Database (Q63443408) fields kIRG_GSource, kIRG_HSource, kIRG_MSource, kIRG_SSource and/or kIRG_TSource, or in some instances kIICore, kIRG_USource or kIRG_UKSource (though refer to the documentation)

standard hanzi in mainland China (Q15904197)

only for CJK unified ideograph (Q796156) used in mainland China (Q19188)

kanji character (Q53764782)

only for any CJKV character (Q53764732) from a Japanese (Q5287) source, confirmed by the presence of Unihan Database (Q63443408) fields kIRG_JSource and/or kIRG_SSource, or in some instances kIICore, kIRG_USource or kIRG_UKSource (though refer to the documentation)

kokuji (Q1185862)

subclass of kanji character (Q53764782) for non-Chinese-origin CJKV ideograph (Q11420697) that were created in Japan (Q17)

hanja character (Q55712979)

only for any CJKV character (Q53764732) from a Korean (Q9176) source, confirmed by the presence of Unihan Database (Q63443408) fields kIRG_KSource and/or kIRG_KPSource, or in some instances kIICore, kIRG_USource or kIRG_UKSource (though refer to the documentation)

gukja (Q1554195)

subclass of hanja character (Q55712979) for non-Chinese-origin CJKV ideograph (Q11420697) that were created in Korea (Q18097)

Nôm character (Q15100640)

only for any CJKV character (Q53764732) from a Vietnamese (Q9199) source, confirmed by the presence of Unihan Database (Q63443408) field kIRG_VSource, or in some instances kIICore, kIRG_USource or kIRG_UKSource (though refer to the documentation)

Other non-essential values:

part of (P361)

Mostly obsolete encodings for specific character sets:

Chinese

Note that no reference to GB 18030 (Q1484877) is necessary as it re-encodes Unicode (Q8819).

Japanese

Korean

Vietnamese

NSCII (Q109349916)

image (P18)

stroke order (P6655)

catalog code (P528)

writing system (P282)

kanji (Q82772)
- shinjitai (Q1055887)
- kyūjitai (Q1147857)
traditional Chinese characters (Q178528)
simplified Chinese characters (Q185614)
legacy Chinese character (Q10885554) - characters used in modern Chinese (Q7850) that are neither traditional nor simplified.
Hanja (Q485619)
chữ Nôm (Q875344) - for characters that use chữ Nôm reading (Q56066660)
chữ Hán (Q1378119) - for characters that use Sino-Vietnamese reading (Q10805375)
Sawndip (Q923677)

Note: the following are instances of standardized writing system (Q55692290).

standard form of national characters (Q906019) - used in Taiwan (Q865). May include simplified Chinese characters (Q185614).
standard hanzi character (Q8044489) - used in mainland China (Q19188). May include traditional Chinese characters (Q178528).

Commons category (P373)

described by source (P1343)

Note: Usually, the values of this property are authoritative dictionaries or encyclopedias. See the #Unihan Database section for examples of reputable sources according to Unicode.

Unicode character (P487)

Unicode code point (P4213)

Unicode block (P5522)

has part(s) of the class (P2670)

CJK stroke (Q1689024)
- quantity (P1114)
- image (P18): For stroke order images, eg. 雨-bw.png

catalog (P972)

CJKV variant character (P5475)

stroke count (P5205)

Note: Additional qualifier applies to jurisdiction (P1001) to be used if there is more than one value for stroke count (P5205).

grade of kanji (P5277)

Note: All items having this property must be instances of kanji character (Q53764782). (Check for the violation)

radical (P5280)

values are instances of Chinese character radical (Q849778) (Query)

Note: Qualifier residual stroke count (P5281) has to be used in this property.
Note: Additional qualifier applies to jurisdiction (P1001) to be used if there is more than one value for residual stroke count (P5281).

four-corner method (P5518)

Note: Must be 5 numeric characters.

Cangjie input (P5519)

Note: 1 to 5 alphabetic letters.

fanqie (P5523)

Example: 雪 (Q3595029) → 相 (Q54874870) (series ordinal (P1545) → 1)

reference: stated in (P248) → Guangyun (Q2189818)

雪 (Q3595029) → 絕 (Q55753032) (series ordinal (P1545) → 2)

reference: stated in (P248) → Guangyun (Q2189818)

Note: Must be represented by two character items along with references from rime dictionary (Q2191807) such as Guangyun (Q2189818), Jiyun (Q35792), Qieyun (Q1271885) or historical dictionaries such as Longkan Shoujian (Q6148139).
Compiled values are available from dictionaries such as Hanyu Da Zidian (Q1442751) and Kangxi Dictionary (Q850590). Please cite the original reference (rime dictionary (Q2191807)) where the values are quoted from.

Hangul pronunciation (P5537)

Note: Hangul syllable (Q55809450) such as 우 (Q55809436), of which 480 can be found in Table of hanja for personal names (Q56145673), which is an extension of Basic Hanja for educational use (Q485267). Values from the Unihan database, which is based on KS X 1001 (Q489423) and KS X 1002 (Q12581371) is slightly dated and should be avoided.

Vietnamese reading (P5625)

Note: Any valid Quốc Ngữ syllable with mandatory qualifier: sinogram reading pattern (P5244) containing either one or both of the following values: chữ Nôm reading (Q56066660) and Sino-Vietnamese reading (Q10805375).
Additional qualifier of writing system (P282) with values of simplified Chinese characters (Q185614) or traditional Chinese characters (Q178528) may be used along with Sino-Vietnamese reading (Q10805375).

Sino-Vietnamese reading (Q10805375) are literary Chinese readings derived from phiên thiết in Middle Chinese while chữ Nôm reading (Q56066660) are vernacular readings used in the pronunciation of chữ Nôm (Q875344). (See wikt:鈙#Vietnamese for example).
Note that readings obtained from ① the Unihan database, ② Vietnamese Nôm Preservation Foundation (Q17490564) or ③ Vietnamese Wiktionary (Q33109114) (based on Template:R:WinVNKey:Lê Sơn Thanh (Q55889066)) may contain some errors.
If possible, obtain readings from printed references such as Tự Điển Chữ Nôm Dẫn Giải (Q56070779) and Giúp đọc Nôm và Hán Việt (Q56070751).

GlyphWiki ID (P5467)

Examples:

一 (Q4025820) → u4e00-j
~~applies to jurisdiction (P1001) → Japan (Q17)~~ (The glyph does not varies by regions, so applies to jurisdiction (P1001) is not needed.)
漢 (Q54872914) → u6f22-j
applies to jurisdiction (P1001) → Japan (Q17)
漢 (Q54872914) → ufa47
applies to jurisdiction (P1001) → mainland China (Q19188)

applies to jurisdiction (P1001) → South Korea (Q884)
漢 (Q54872914) → u6f22-t
applies to jurisdiction (P1001) → Taiwan (Q865)

applies to jurisdiction (P1001) → Hong Kong (Q8646)
漢 (Q54872914) → u6f22-v
applies to jurisdiction (P1001) → Vietnam (Q881)
雨 (Q3595028) → u96e8
applies to jurisdiction (P1001) → mainland China (Q19188)

applies to jurisdiction (P1001) → Japan (Q17)

applies to jurisdiction (P1001) → South Korea (Q884)
雨 (Q3595028) → koseki-478690
applies to jurisdiction (P1001) → Taiwan (Q865)

applies to jurisdiction (P1001) → Hong Kong (Q8646)

applies to jurisdiction (P1001) → Vietnam (Q881)

Note: The value of this property must refer to an actual glyph on GlyphWiki. For example, use u4e00-j instead of u4e00 or u4e00-g (u4e00 and u4e00-g are aliases of u4e00-j). Use ufa47 instead of u6f22-g (u6f22-g is an alias of ufa47). And use koseki-478690 instead of u96e8-t (u96e8-t is an alias of koseki-478690).
Note: If and only if the glyph of the character varies by countries or regions, applies to jurisdiction (P1001) can be used as qualifier to indicate region where glyph is used.

These are the regional codes used by GlyphWiki:

ideographic description sequence (P5753)

instances of sinogram (Q53764738)
instances of ideographic description character (Q55589899)
- ⿰ (Q55589918), ⿱ (Q55589919), ⿲ (Q55589920), ⿳ (Q55589921), ⿴ (Q55589923), ⿵ (Q55589924), ⿶ (Q55589925), ⿷ (Q55589926), ⿸ (Q55589927), ⿹ (Q55589928), ⿺ (Q55589929), ⿻ (Q55589930),⿼ (Q122584292) ,⿽ (Q122584296) ,⿾ (Q122584298) ,⿿ (Q122584299)

subject lexeme (P6254)

A link to the relevant lexeme. Note that it is not yet clear what kind of data should go only in the lexeme item or the Q-item, or in both.

Unihan Database

This is a mapping of Unihan Database (Q63443408) and Wikidata properties.

Field	Wikidata property
kAccountingNumeric	numeric value (P1181)
kBigFive	code (P3295) with qualifier encoding (P3294)=Big5 (Q858372)
kCangjie	Cangjie input (P5519)
kCantonese	Wikidata:Property proposal/Jyutping, or transliteration or transcription (P2440) with qualifier determination method or standard (P459)=Jyutping (Q649913)
kCCCII	code (P3295) with qualifier encoding (P3294)=Big5 (Q858372)
kCheungBauer	described by source (P1343)=The Representation of Cantonese with Chinese Characters (Q7259605) with qualifier code (P3295)
kCheungBauerIndex	described by source (P1343)=The Representation of Cantonese with Chinese Characters (Q7259605) with qualifier section, verse, paragraph, or clause (P958)
kCihaiT	described by source (P1343)=Cihai (1983) (Q55687733) with qualifier page(s) (P304), column (P3903) and section, verse, paragraph, or clause (P958)
kCNS1986	(subset of CNS1992)
kCNS1992	code (P3295) with qualifier encoding (P3294)=Dessert (Q73121)（separate item for 1992 version?)
kCompatibilityVariant
kCowles	described by source (P1343)=A Pocket Dictionary of Cantonese (1999) (Q55686021) with qualifier section, verse, paragraph, or clause (P958)
kDaeJaweon	described by source (P1343)=Dae Jawaeon (1988) (Q41662105) with qualifier page(s) (P304) and section, verse, paragraph, or clause (P958)
kDefinition
kEACC	(subset of CCCII)
kFenn	described by source (P1343)=The Five Thousand Dictionary (1979) (Q55686473) with qualifier code (P3295)
kFennIndex	described by source (P1343)=Fenn's Chinese-English Pocket Dictionary (1942) (Q55686451) with qualifier page(s) (P304) and section, verse, paragraph, or clause (P958)
kFourCornerCode	four-corner method (P5518)
kFrequency
kGB0	code (P3295) with qualifier encoding (P3294)=GB 2312 (Q1421973)
kGB1	code (P3295) with qualifier encoding (P3294)=GB 12345 (Q10847246)
kGB3	code (P3295) with qualifier encoding (P3294)=GB/T 7589–87 (Q55682422)
kGB5	code (P3295) with qualifier encoding (P3294)=GB/T 7590–87 (Q55682721)
kGB7
kGB8	code (P3295) with qualifier encoding (P3294)=GB/T 8565.2–88 (Q55683269)
kGradeLevel
kGSR	described by source (P1343)=Grammata Serica Recensa (1957) (Q55686626) with qualifier code (P3295)
kHangul	Hangul pronunciation (P5537)
kHanYu	described by source (P1343)=Hanyu Da Zidian (1986-1990) (Q55686683) with qualifier volume (P478), page(s) (P304) and section, verse, paragraph, or clause (P958)
kHanyuPinlu
kHanyuPinyin	Hanyu Pinyin transliteration (P1721)
kHDZRadBreak
kHKGlyph	catalog (P972)=List of Graphemes of Commonly-used Chinese Characters (Q6152418) with qualifier start time (P580)
kHKSCS	code (P3295) with qualifier encoding (P3294)=HKSCS (Q1627000)
kIBMJapan
kIICore
kIRG_GSource
kIRG_HSource
kIRG_JSource
kIRG_KPSource
kIRG_KSource
kIRG_MSource
kIRG_TSource
kIRG_USource
kIRG_VSource
kIRGDaeJaweon	(same as kDaeJaweon)
kIRGDaiKanwaZiten	(same as kMorohashi)
kIRGHanyuDaZidian	(same as kHanyuDa)
kIRGKangXi	(same as kKangXi)
kJa
kJapaneseKun
kJapaneseOn
kJinmeiyoKanji	grade of kanji (P5277)=jinmeiyō kanji (Q1439720) with qualifier start time (P580)
kJis0	code (P3295) with qualifier encoding (P3294)=JIS X 0208 (Q905260)
kJis1	code (P3295) with qualifier encoding (P3294)=JIS X 0212 (Q841021)
kJIS0213	code (P3295) with qualifier encoding (P3294)=JIS X 0213 (Q6108269)
kJoyoKanji	catalog (P972)=Table of Jōyō kanji (Q55502741) with qualifier start time (P580)
kKangXi	described by source (P1343)=Kangxi Dictionary (1989) (Q55686777) with qualifier page(s) (P304) and section, verse, paragraph, or clause (P958)
kKarlgren	described by source (P1343)=Analytic Dictionary of Chinese and Sino-Japanese (1974) (Q55686809) with qualifier section, verse, paragraph, or clause (P958)
kKorean	(deprecated)
kKoreanEducationHanja	catalog (P972)=Basic Hanja for educational use (Q485267) with qualifier start time (P580)
kKoreanName
kKPS0	code (P3295) with qualifier encoding (P3294)=KPS 9566 (Q712676)
kKPS1	code (P3295) with qualifier encoding (P3294)=KPS 10721 (Q55683587)
kKSC0	code (P3295) with qualifier encoding (P3294)=KS X 1001 (Q489423)
kKSC1	code (P3295) with qualifier encoding (P3294)=KS X 1002 (Q12581371)
kLau	described by source (P1343)=A Practical Cantonese-English Dictionary (1977) (Q55686864) with qualifier section, verse, paragraph, or clause (P958)
kMainlandTelegraph	code (P3295) with qualifier encoding (P3294)=PRC telegraph code (Q55683724)
kMandarin	Hanyu Pinyin transliteration (P1721)
kMatthews	described by source (P1343)=Mathews' Chinese-English Dictionary (1975) (Q55687056) with qualifier section, verse, paragraph, or clause (P958)
kMeyerWempe	described by source (P1343)=Student's Cantonese-English Dictionary (1947) (Q55687107) with qualifier section, verse, paragraph, or clause (P958)
kMorohashi	described by source (P1343)=Dai Kan-Wa Jiten (1986) (Q55687276) with qualifier section, verse, paragraph, or clause (P958)
kNelson	described by source (P1343)=The Modern Reader's Japanese-English Character Dictionary (1974) (Q55687372) with qualifier section, verse, paragraph, or clause (P958)
kOtherNumeric	numeric value (P1181)
kPhonetic	described by source (P1343)=Ten Thousand Characters: An Analytic Dictionary (1980) (Q55687844) with qualifier section, verse, paragraph, or clause (P958)
kPrimaryNumeric	numeric value (P1181)
kPseudoGB1
kRSAdobe_Japan1_6	code (P3295) with qualifier encoding (P3294)=Adobe-Japan1 (Q55688369); radical (P5280) and residual stroke count (P5281)
kRSJapanese	radical (P5280) and residual stroke count (P5281)
kRSKangXi	radical (P5280) and residual stroke count (P5281)
kRSKanWa	radical (P5280) and residual stroke count (P5281)
kRSKorean	radical (P5280) and residual stroke count (P5281)
kRSUnicode	radical (P5280) and residual stroke count (P5281)
kSBGY	described by source (P1343)=Song Ben Guang Yun (Q55687631) with qualifier page(s) (P304) and section, verse, paragraph, or clause (P958)
kSemanticVariant
kSimplifiedVariant	CJKV variant character (P5475)
kSpecializedSemanticVariant
kTaiwanTelegraph	code (P3295) with qualifier encoding (P3294)=Taiwanese telegraph code (Q55683771)
kTang
kTGH	catalog (P972)=Table of General Standard Chinese Characters (Q14941454) with qualifier start time (P580) and series ordinal (P1545)
kTotalStrokes	stroke count (P5205)
kTraditionalVariant	CJKV variant character (P5475)
kVietnamese	Vietnamese reading (P5625)
kXerox	code (P3295) with qualifier encoding (P3294)=?
kXHC1983	described by source (P1343)=Xiandai Hanyu Cidian (2 ed) (1983) (Q55688435) with qualifier page(s) (P304) and section, verse, paragraph, or clause (P958); Hanyu Pinyin transliteration (P1721)
kZVariant	CJKV variant character (P5475)

Participants

[+] Add yourself to the list

The participants listed below can be notified using the following template in discussions:
{{Ping project|CJKV character}}