A Chinese whole character, or whole Chinese character (Pinyin: hànzì zhěngzì; Traditional Chinese: 漢字整字; Simplified Chinese: 汉字整字), is a complete Chinese character. It lies at the final level of the stroke-component-character Chinese character composition.[1]
According to their structures, Chinese characters can be divided into undecomposable characters (独体字) and decomposable characters (合体字). An undecomposable character is formed by one primitive component and is also called a single-component character, a decomposable character can be decomposed into two or more components and is also called a multi-component character. [2][3]
Undecomposable characters
Definition
An undecomposable character is directly formed by strokes, can not be decomposed into smaller components, though may be a component of a decomposable character. [2] For example, 人 is an undecomposable characters formed by strokes ㇓ and ㇏, and is used to form character 丛.
Lists of undecomposable characters
The following are some lists of undecomposable characters created by different authors.
"Chinese Character Information Dictionary" (漢字信息字典)[4] contains a total of 7,785 standardized characters in the China Mainland. According to static statistics, there are 323 undecomposable characters, accounting for 4.149%. According to dynamic statistics, undecomposable characters account for 25.910% of the corpus. Because many single-component characters are frequently-used characters. The list of undecomposable characters is as follows (in Pinyin order):
Su [5] segmented 7,000 commonly-used characters and obtained 233 undecomposable characters, accounting for 3.4%. The list of undecomposable characters is as follows (in stroke-based order):
In the "Specification of the Undecomposable Characters Commonly Used in the Modern Chinese" (现代常用独体字规范), 256 modern commonly used undecomposable characters have been identified within the scope of modern Chinese characters, forming the "List of Modern Commonly Used Undecomposable Characters". [6] The list of undecomposable characters is as follows (in stroke-based order):
The "List of Commonly Used Character Components" in the "Specification of Common Modern Chinese Character Components and Component Names (现代常用字部件及部件名称规范)" includes a total of 311 commonly used character components, i.e., undecompasable characters. [7] The list of undecomposable characters is as follows (in Pinyin order):
Based on the above experimental results, it is estimated that the number of undecomposable characters in modern Chinese characters approximately account for 4%. [8]
Since each experimenting family has slight different understandings of components and often uses different character sets, there are differences in the number of undecomposable characters obtained. But generally speaking the results are quite similar, ranging between over 200 to over 300.
Decomposable characters
Definition
A decomposable character can be decomposed into more than one component. For example, "字" (character) is formed by two components (宀+子).
There are two frequently-used modes of component combination in the study of Chinese character structures: first-level component combination and primitive component combination.[9]
First-level component combination
The first-level component combination mode or pattern is what people often call the structures of Chinese characters. According to this analysis, the structures of decomposable characters can be divided into 4 major categories and 13 subcategories:[10][11]
Left to right structure
Left to right (⿰, 2FF0 [b]), for example: 部, 件, 結 and 構.
Left to middle and right (⿲, 2FF2): 衡, 班 and 辯.
Above to below structure
Above to below (⿱, 2FF1): 要, 思 and 想.
Above to middle and below (⿳, 2FF3): 鼻, 曼 and 率.
Surrounding structure
Full surround :
Surrounded from four sides (⿴, 2FF4): 圍, 國 and 囪
Surrounded from three sides
Surround from above (⿵, 2FF5): 問, 同 and 風
Surround from below (⿶, 2FF6): 凶, 画 and 函
Surround from left (⿷, 2FF7): 匡, 匠 and 匣
Surrounded from two sides
Surround from upper left (⿸, 2FF8): 廣, 居 and 病.
Surround from upper right (⿹, 2FF9): 句, 可 and 氧.
Surround from lower left (⿺, 2FFA): 這, 建 and 題.
Surround from lower right (N/A):斗 and 头.
Overlaid structure
Overlaid (⿻, 2FFB): 巫, 爽 and 承.
Chinese Character Distribution by Structures
The following data is excerpted from "Chinese Character Information Dictionary", with 7,785 Mainland Standard Chinese Character. [4]
According to statistics from the "Chinese Character Information Dictionary" ,[12] there are a total of 49 characters of overlaid (or nested, including fully surrounded) structure among the 7,785 mainland standard characters in the dictionary:
哀褒乘囱囤固国裹回困圃囚圈衰爽四田图团围巫因幽园圆衷噩囟胤兖袤亵裒囝囡囵囫囹囿圄圊圉圜豳囮囷奭圐㘥
If the full surrounded characters are moved to the surrounding category, the overlaid characters will be even less.
Primitive component combination
According to the planar analysis by primitive components, Chinese character structures include the following modes or patterns:[13]
A. For characters composed of two primitive components, there are 9 different structures, as shown by the following example characters: 吕认压达勾问区凶团.
B. For characters composed of three components, there are 21 different structures, such as: 荣花型培树缠抛挺润抠捆部庶厢逞逊闾圄幽乖巫.
C. For characters composed of four components, there are 20 different structures, such as: 营蕊蓝寤嫠筐辔椁摄燃游榧额韶欧剩腐遮阔匿.
D. For characters composed of five components, there are 20 different structures, such as: 赢蒿膏寝蘧嚣篮樊搞澡缀渤漉髂齁敲酃戳魔噩.
E. For characters composed of six components, there are 10 different structures, such as: 臀翳麓瀛灌骥歌豁豌衢.
F. For characters composed of seven components, there are 3 different structures, such as: 戆麟饕.
G. For characters composed of eight components, there is 1 structure, such as: 齉.
H. For characters composed of nine components, there is 1 structure, such as: 懿.
The level to which Chinese character components should be divided must be determined based on specific needs. For example, Chinese character teaching often uses a coarser level of analysis in order to be concise, while component-encoding Chinese character input methods often use relatively detailed analysis in order to reduce coding elements. [14]
Chinese character distribution by numbers of components
The following data is excerpted from "Chinese Character Information Dictionary". The components here refer to primitive components. [15]
Chinese Character Distribution by Numbers of Components
Components
Characters
Character %
Character occurrences
Occurrence %
1
323
4.149
5611317
25.910
2
2650
34.040
10191803
47.061
3
3139
40.321
4652330
21.482
4
1276
16.391
1046913
4.834
5
323
4.149
142005
0.656
6
70
0.899
11192
0.052
7
3
0.038
1017
0.005
8
1
0.013
1
0.002
total
7785
100
21656578
100
The static distribution is mainly concentrated in the number of components 2, 3, and 4; the dynamic distribution is mainly concentrated in the number of components 1, 2, and 3.
In both dynamic and static distribution statistics, more than 99% of the characters have less than 5 primitive components.
Note that the static component counts for component numbers 1 and 5 are the same, but their dynamic component counts are very different.
Chinese character fonts
Fonts
The popular fonts of modern Chinese characters include Song or Ming (宋體, 明體), FangSong (仿宋體), Kai (regular, 楷體), Li (clerical, 隸體), Hei (black, sans-serif, 黑體) and Wei (魏體).[16]
Internationally, font sizes are generally measured by "points".
In China, in addition to the "points" measure system, a unique "number" system is also used for Chinese characters. For example, the simplified Chinese version of MS Word allows setting font sizes by points or by numbers.[20]
The point system
After nearly three hundred years of development and improvement, the most influential writing point standards in the world now include the Didot point system in continental Europe (one point is approximately 0.3759 mm) and the Anglo-American point system (one point is approximately 0.3515 mm). China uses the latter point system.
The available point values on MS Word are all numbers between 1 point and 1638 points that are divisible by 0.5, that is, the set {1, 1.5, 2, 2.5, ..., 1637, 1637.5, 1638}. These regulations can be verified directly on the computer. [20]
The number system
The font size options provided by the simplified Chinese version of Windows and Word are arranged in ascending order of font sizes:
No. 8 (八号), No. 7 (七号), Small No. 6 (小六号), No. 6 (六号), Small No. 5 (小五号), No. 5 (五号), Small No. 4 (小四号), No. 4 (四号), Small No. 3 (小三号), No. 3 (三号), Small No. 2 (小二号), No. 2 (二号), small No. 1 (小一号), No. 1 (一号), Small initial number (小初号), Initial number (初号).
"No. 8" (or size 8) is the smallest, equivalent to 5 points (British and American system), and the font height is about 1.757mm; "Initial number" (or size A) is the largest, equivalent to 42 points, and the font height is about 14.761mm.
Number-point correspondence
The following is a Chinese character font size "number-point" corresponding table created by Dr. Zhang. [20]
Chinese character font size "number-point" corresponding table
Fu, Yonghe (傅永和) (1999). 中文信息处理 (Chinese Information Processing) (in Chinese) (3rd ed.). Guangzhou: 广东教育出版社 (Guangdong Education Press). p. 84. ISBN9-787540-640804.
Li, Dasui (李大遂) (2013). 简明实用汉字学 (Concise and Practical Chinese Characters) (in Chinese) (3rd ed.). Beijing: Peking University Press. ISBN978-7-301-21958-4.
Li, Gongyi (李公宜,劉如水 (主編)) (1988). 漢字信息字典 (Chinese Character Information Dictionary) (in Chinese). Beijing: 科学出版社 (Science Press). ISBN7-03-000862-6.
Peking University, Modern Chinese Language Teaching and Research Office (2004). Modern Chinese (现代汉语) (in Chinese). Beijing: Commercial Press. ISBN7-100-00940-5.
Su, Peicheng (苏培成) (2014). 现代汉字学纲要 (Essentials of Modern Chinese Characters) (in Chinese) (3rd ed.). Beijing: 商务印书馆 (Commercial Press). ISBN978-7-100-10440-1.
Zhang, Xiaoheng (张小衡) (2006). "The Number, Point and Metric Systems of Font Size (字形的"号制""点制"与"米制")". Computer Engineering and Applications (计算机工程与应用). 42 (2006) (10): 175–177 & p 215.
Notes
^ Here it is stipulated that a component in a multi-component character has more than one stroke
^ Unicode 2FF0, IDC (Ideographic description character) LEFT
TO RIGHT