The Six Categories of Kanji (六書): Pictographs, Ideographs, and Phono-Semantic Compounds
The six categories of kanji, known in Japanese as 六書 (rikusho, literally "six writings"), are the traditional Chinese classification of how kanji were formed and reused. Xu Shen gave them their canonical definitions in the Shuowen Jiezi around 100 CE.12 One category, 形声 (phono-semantic compound), accounts for roughly 85% of all kanji.3 That is why a framework that looks like a dry taxonomy turns out to be the foundation of nearly every productive kanji-study technique.
Overview
六書 organizes every Chinese character, and therefore every kanji, under one of six headings. Four describe how a graph was originally made, and two describe how an existing graph was reused. The four formation principles are 象形 (pictograph), 指事 (indicative), 会意 (compound ideograph), and 形声 (phono-semantic compound). The two usage principles are 転注 (derivative cognate) and 仮借 (phonetic loan).45
For a learner, this four-plus-two structure is the most important reading of Xu Shen. It explains why 転注 and 仮借 feel different from the first four when you meet them.
What 六書 means and where the name comes from
The term 六書 (Mandarin liùshū) is best translated literally as "six writings."5 The name first appears in the Zhouli 周礼 (Rites of Zhou) as one of the six arts a noble child was taught, but the Zhouli never defines what the six categories actually are.46
The definitions and examples come from Xu Shen 許慎 (c. 58 to 148 CE) in the postface to his dictionary Shuowen Jiezi 説文解字. The dictionary was completed around 100 CE and presented to the Han court in 121 CE.12 It catalogues 9,353 character entries plus 1,163 graphic variants, organized into 540 radical sections, and is the direct ancestor of the modern radical-based lookup system.2
Wiktionary lists rikusho (六(りく)書(しょ)) as the standard Japanese reading.5 If you have only heard the Mandarin liùshū in a Chinese-studies context, "rikusho" in a Japanese-language textbook can be confusing. Both refer to the same six categories.
Four formation principles plus two usage principles
The first four categories describe how a graph was originally built: a drawing of a thing, a drawing of a relationship, a fusion of two meanings, or a meaning component paired with a sound component. The last two describe how an already-existing graph was repurposed: to extend its meaning, or to borrow its sound.4
Wikipedia, summarizing the standard reading of Xu Shen, puts the split sharply: "Xu's categories are neither rigorously defined nor mutually exclusive: four refer to the structural composition of characters, while the other two refer to techniques of repurposing existing shapes."4
A textual hint reinforces the split. In the body of the Shuowen Jiezi, Xu Shen labels the four formation categories explicitly on individual entries, but he does not label 転注 or 仮借 instances directly.2 Even the original author of the framework seems to have treated the last two at a different level from the first four.
Why this framework is still taught
Outlier Linguistics gives the two-part defence directly: 六書 is the oldest systematic account of how Chinese characters work, and four of its six categories still describe live structural patterns you can use.7 Pictographs hint at meaning by drawing it; indicatives hint at meaning by pointing at it; compound ideographs hint at meaning by addition; and phono-semantic compounds hint at both reading and meaning by position.
The framework is not paleographically perfect, meaning it does not always match the oldest known character forms. Outlier names three structural problems: the constraint to exactly six categories was driven by Han-era numerology rather than by data, the categories are not mutually exclusive, and Xu Shen's definitions are underspecified.7 The Good to know section returns to the paleographic critique in more detail.
The practical takeaway is the one EDRDG names: 形声 alone accounts for "about 85% of characters," which is why the framework remains a study aid even where its edges are debated.3
象形文字 (shōkei moji): pictographs
Definition and historical shape arc
A 象形文字 is a graph that originated as a stylized drawing of a concrete object.47 The canonical examples Wikipedia gives are 日 ('sun') and 木 ('tree').4
The shape arc of a pictograph runs through five major script stages: oracle bone (甲骨文, c. 1250 BCE under King Wu Ding)8, bronze (金文), seal script (篆書, the form Xu Shen had available)2, clerical (隷書), and regular script (楷書, the modern printed form). Xu Shen wrote his etymologies from seal-script forms. Later oracle-bone evidence, which he did not have access to, has revised many of his readings.78
The canonical examples (日, 月, 山, 木)
| Kanji | On-yomi | Kun-yomi | Strokes | Grade |
|---|---|---|---|---|
| 木 | モク, ボク | き, こ- | 4 | G1 |
| 山 | サン | やま | 3 | G1 |
| 日 | ニチ, ジツ | ひ, -び | 4 | G1 |
| 月 | ゲツ, ガツ | つき | 4 | G1 |
Wiktionary describes 木 as "象形 (xiàngxíng): a tree: branches on top, roots on the bottom," directly attributing the pictographic origin.9 By the time the shape is filed down through seal and clerical script into the modern 木, the picture story is reconstructed from the script arc rather than visually obvious. That gap between modern shape and oracle-bone drawing is what the historical-shape arc is for.
山の上に木がある。9
"There is a tree on top of the mountain."
月が出た。10
"The moon came out."
Why there are so few pictographs
A writing system has to cover the entire lexicon, and the set of objects drawable as a single icon is small. Pictographs make up only a few hundred kanji.47 An oracle-bone-era breakdown reported in Wikipedia gives 23% pictographs at the earliest attested stage. That share falls to a low single-digit percentage of the modern lexicon as the system grew by combination.8
The structural reason the other five categories had to be invented is exactly this: drawable things run out fast.
指事文字 (shiji moji): indicative or abstract symbols
Definition: drawing an abstraction, not an object
A 指事文字 is a graph in which a mark or position annotates a base shape to point at a relationship, a count, or a location, rather than depicting a concrete object.47 Wikipedia's canonical examples are 一 ('one') and 上 ('up').4
The mechanism is the same each time: take a reference shape, such as a line or an existing graph, and place a mark on it. The mark turns the drawing into a comment about that shape rather than a picture of the shape itself.
Counted strokes (一, 二, 三) and positional marks (上, 下, 本, 末)
一, 二, and 三 are pure tally indicatives: one, two, and three horizontal strokes count themselves.4 上 and 下 mark "above" and "below" a reference line.4
The clearest demonstration of the principle is the pair 本 and 末. Both place a single extra stroke on the tree-pictograph 木. 本 puts the stroke at the base to point at the root (Wiktionary glosses 本 as "a tree with an extra stroke marking the base, emphasizing roots versus the top")11, and 末 puts the stroke at the top to point at the tip ("a tree with its top highlighted").12 Once you see this pair, the indicative mechanism clicks.
| Kanji | On-yomi | Kun-yomi | Strokes | Grade |
|---|---|---|---|---|
| 本 | ホン | もと | 5 | G1 |
| 末 | マツ, バツ | すえ | 5 | G4 |
本の値段。11
"The price of the book."
月末に支払う。12
"Pay at the end of the month."
The overlap with pictographs
Some characters are classified differently by different scholars. The Wikipedia summary is blunt: Xu Shen's categories "are neither rigorously defined nor mutually exclusive."4 Outlier flags the same point as one of the three structural problems with the framework: the categories overlap at the edges.7
This article does not need to adjudicate individual borderline cases. The point for learners is structural: 六書 is a useful map, not an airtight taxonomy, and the overlap is built in.
会意文字 (kaii moji): compound ideographs
Definition: two meanings, added
A 会意文字 is built from two or more component characters, each contributing semantic content. Together, they form a new character whose meaning is the sum or fusion of the parts. Pronunciation is not part of the principle.4
Wikipedia's canonical examples are 武 ('military') and 信 ('truthful').4 The classroom-Japanese canon shifts to compounds learners often half-know already: 林, 森, 休, 明, 好, and 男. Wiktionary explicitly classifies all of them as ideogrammic compounds.131415161718
The canonical examples (林, 森, 休, 明, 好, 男)
| Kanji | Components | Meaning | On-yomi | Grade |
|---|---|---|---|---|
| 林 | 木 + 木 | grove | リン | G1 |
| 森 | 木 + 木 + 木 | forest | シン | G1 |
| 休 | 亻 (person) + 木 (tree) | rest | キュウ | G1 |
| 明 | 日 (sun) + 月 (moon) | bright | メイ, ミョウ | G2 |
| 好 | 女 (woman) + 子 (child) | to like | コウ | G4 |
| 男 | 田 (field) + 力 (strength) | man | ダン, ナン | G1 |
The Wiktionary glosses spell out the additive logic directly: 林 is "duplication of 木 (tree) to represent a forest"13; 森 is "triplication of 木 (tree) to suggest a large number of trees"14; 休 is "人 (person) + 木 (tree), a man leaning against a tree, resting"15; 明 is "日 (sun) + 月 (moon), the sun just rising and the moon not yet set, dawn"16; 好 is "女 (woman) + 子 (child)"17; 男 is "田 (field) + 力 (strength)."18
森の中に小さな家がある。14
"There is a small house in the forest."
今日は休みです。15
"Today is a day off."
明日は明るくなる。16
"Tomorrow will be brighter."
Why this category is smaller than students assume
Many kanji that look like compound ideographs are in fact phono-semantic compounds with a phonetic component whose pronunciation has drifted over time. Wikipedia notes that "many characters formerly classed as compound ideographs are now believed to have been misidentified," with phono-semantic analysis usually providing the better explanation.4
Outlier flags the pedagogical risk: classroom and SRS (spaced repetition system) sources often re-cast a 形声 character as a 会意 character because a meaning story is easier to remember than a phonetic relationship.7 The Good to know section returns to this directly. For now, treat 会意 as a smaller live category than the textbook count of canonical examples suggests.
形声文字 (keisei moji): phono-semantic compounds
Definition: meaning component plus sound component
A 形声文字 has two functional parts: a semantic component (the 意符 ifu, a meaning hint, often the kanji's radical) and a phonetic component (the 音符 onpu, a sound hint inherited from the Chinese source).4319
Wikipedia's canonical examples are 沐 ('wash') and 菜 ('vegetable').4 The principle exists because there are not enough drawable things or addable meanings to cover the lexicon. But there are plenty of existing characters whose sound can be borrowed and paired with a semantic radical to spell something new.
Roughly 85% of kanji are phono-semantic
形声 accounts for about 85% of kanji, per EDRDG (Electronic Dictionary Research and Development Group, Jim Breen, Monash University): "by far the largest category, making up about 85% of characters."3 The Kanji Code reports the same figure for the jōyō set, noting that phonetic components appear in keisei moji "which make up 80% of the Joyo or daily use kanji."19
The figure looks different on different samples. Wikipedia, citing Yang Runlu's 2008 Xiandai Hanzixue, gives 58% phono-semantic among the 3,500 most frequent characters in Standard Chinese.420 The 58% and 85% figures are not contradictory. The more frequent a character is, the more likely it is to be an old pictograph or indicative, while the long tail of low-frequency characters is overwhelmingly 形声.420
The 85% share is what makes 形声 the framework's center of gravity. If you treat the six categories as six equal slices, you miss the headline. If you treat 形声 as the default case, with the other five categories as exceptions, you will calibrate your study habits correctly.3
Position rule: semantic left, phonetic right (mostly)
The default layout places the semantic component on the left and the phonetic on the right. Wikipedia summarizes the pattern this way: phono-semantic compounds "are typically composed of two characters, one (called the 形 xíng) suggesting the general semantic category to which the character belongs, and the other (the 聲 shēng) suggesting the sound."4
The principle permits other arrangements, and four layouts are common enough that a learner will meet them all early.
The left-right layout is the high-prior guess; the other three are common enough that a learner who fixes on left-right alone will start to miss the pattern. The verification step is the same in every case: identify the candidate phonetic, then check a dictionary.43
A worked phonetic series (青 sei: 清, 晴, 請, 精, 静)
The clearest demonstration of the principle is the 青 (sei) series. The phonetic component 青 anchors a small family of kanji. Each one adds a different semantic radical to produce a new meaning while inheriting the on-yomi セイ.
| Kanji | Composition | On-yomi | Gloss |
|---|---|---|---|
| 青 | base | セイ, ショウ | "blue / green" |
| 清 | 氵 (water) + 青 (phonetic) | セイ, ショウ | "clear, pure" |
| 晴 | 日 (sun) + 青 (phonetic) | セイ | "clear weather" |
| 請 | 言 (speech) + 青 (phonetic) | セイ, シン | "to request" |
| 精 | 米 (rice) + 青 (phonetic) | セイ, ショウ | "refined, essence" |
| 静 | 青 + 争 | セイ, ジョウ | "quiet, still" |
All six characters retain a primary on-yomi セイ. Some carry ショウ or ジョウ as secondary readings reflecting different on-yomi strata.212223242526 The Kanji Code lists 清, 晴, 請, 精 as the standard learner demonstration of the SEI series and notes "they all have an on-reading of SEI."19
Wiktionary's etymology for the traditional form 靜 treats it as a phono-semantic compound with both 青 and 争 contributing phonetic content. It also notes a separate Shuowen reading in which 青 is the semantic component.26 The on-yomi セイ is still inherited along the 青 series, so 静 belongs in a learner-facing 青-series demonstration. The caveat is that the component roles are not as clean as the surface layout suggests. This is exactly the kind of "categories are not mutually exclusive" case Outlier flags as a general weakness of the 六書 framework.726
空が晴れた。23
"The sky cleared up."
清い水が流れる。22
"Clear water flows."
静かにしてください。26
"Please be quiet."
助けを請う。24
"To request help."
When the phonetic clue is reliable, and when it is not
The phonetic component is a hypothesis about the on-yomi, not a guarantee. The sound borrowing happened in Old Chinese. The surviving on-yomi reflects later sound changes in both Chinese and Japanese.7
The Japanese-specific complication is the on-yomi strata, or historical layers of Sino-Japanese readings. 呉音 (go-on), 漢音 (kan-on), and 唐音 (tō-on) entered Japanese at different times from different Chinese dialects, so a phonetic component can yield different on-yomi for different characters in the same series.4 The 青 series above includes both セイ (kan-on) and ショウ (go-on), inherited from the same phonetic.2122
The phonetic component does not predict kun-yomi at all; kun-yomi are native Japanese readings mapped onto the Chinese semantics.4
The conservative rule is: read the right side as a hypothesis for the on-yomi, then verify in a dictionary or a browser hover tool. A phonetic component is a strong prior, not a deterministic readout. The verification step is what keeps the heuristic safe across the on-yomi strata.319
転注文字 (tenchū moji) and 仮借文字 (kasha moji): the two usage principles
Why these two sit apart from the first four
The first four categories describe character creation: how a new shape was built. 転注 and 仮借 describe character reuse: what an already-existing shape was repurposed to do.47 No new graph is invented under either usage principle. An existing graph picks up a second job.
This is the structural split the Overview previewed, and it is the reason 転注 and 仮借 feel different from the first four when a learner meets them. They are not at the same logical level.
転注 (tenchū): a character's meaning extends to a related sense
In 転注, one character's original meaning extends into a related abstract or metaphorical sense, and the same shape carries both senses. The classical illustration Wikipedia cites is the pair 考 and 老, glossed as "derivative cognates," meaning related words whose meanings extend into each other.4
The Japanese-classroom illustration is 楽. The same graph carries the senses "music" (on-yomi ガク) and "comfort, pleasure" (on-yomi ラク). The on-yomi split tracks the meaning split.27 Wiktionary notes that 音楽 (ongaku) refers to music generally, while ラク conveys comfort and ease.27
| Kanji | On-yomi | Kun-yomi | Grade |
|---|---|---|---|
| 楽 | ガク (music), ラク (comfort) | たの-しい, たの-しむ | G2 |
音楽を聴く。27
"To listen to music."
楽な生活を送る。27
"To live a comfortable life."
Both Wikipedia and Outlier note that 転注 is the least-agreed-upon of the six categories even among traditional scholars; later commentators have proposed multiple competing readings of what Xu Shen meant.47
仮借 (kasha): a character is borrowed for sound alone
In 仮借 (Mandarin jiajie), a character's shape and pronunciation are used to write an unrelated word that happens to be pronounced the same way.4 No meaning is carried across; only the sound.
The classical Chinese illustration is 来. The character was originally a pictograph of a wheat plant. It was borrowed phonetically to write the homophonous verb "to come." The wheat sense was later written with the dedicated character 麦, while 来 kept the borrowed "come" sense.428
The Japanese descendant of this principle is 当て字 (also written 宛字), or ateji: kanji used phonetically to represent native or borrowed words without regard to the underlying meaning of the characters.29
Canonical Japanese ateji include 寿司 sushi (literally "lifespan" plus "administer," semantically unrelated to the food), 珈琲 kōhī (coffee), 倶楽部 kurabu (club), and 亜米利加 Amerika, abbreviated as 米国 Beikoku (USA, literally "rice country").29 The country-name abbreviation pattern is the live 仮借 mechanism in modern Japanese. The kanji were chosen for sound, and the meaning of the chosen character is incidental.29
| Kanji | On-yomi | Kun-yomi | Grade |
|---|---|---|---|
| 来 | ライ | く-る, きた-る | G2 |
来年日本に行く。28
"Next year I will go to Japan."
寿司を食べる。29
"To eat sushi."
米国の大統領。29
"The president of the United States."
Why these categories are unstable in modern paleography
Modern paleography, based on the oracle-bone and bronze-inscription evidence accumulated since 1899, treats 転注 and 仮借 as historically real but conceptually overlapping with the first four.830
A quantitative consequence shows up in the oracle-bone breakdown. Wikipedia reports this structural distribution of oracle-bone characters: 23% pictographs, 2% simple indicatives, 32% associative compounds, 11% phonetic loans, 27% phono-semantic compounds, and 6% undetermined.8 That distribution differs materially from the modern lexicon, where 形声 dominates.
Many characters Xu Shen explained as 会意 on seal-script grounds have been reanalyzed as 形声 or 象形 once their oracle-bone shapes became available. The same is true in reverse for some traditional 仮借 assignments. Outlier names this as the central reason "六書 is a useful map, not a final taxonomy."7
What this means for your kanji study
Habit 1: assume the right side is a phonetic clue, then verify
Because roughly 85% of kanji are 形声, the "right side is sound" heuristic pays off across the lexicon.3 The right side (or the bottom, or the inside, depending on layout) is a strong prior for the on-yomi. The verification step, such as a dictionary lookup or a browser hover, is what keeps the heuristic safe when the on-yomi has drifted across the go-on / kan-on strata.4
Habit 2: assume the left side or the outside is a meaning clue
The semantic radical is the most learnable signal in kanji and the one that survives best across the on-yomi / kun-yomi split.3 Learning a short list of high-frequency semantic radicals repays itself within the first 200 kanji. Examples include 氵 (water, as in 清), 木 (tree, as in 林), 言 (speech, as in 請), 心 / 忄 (heart, as in 想), 日 (sun, as in 晴), 口 (mouth, as in 味), and 女 (woman, as in 妹).432223243132
Habit 3: be suspicious of any mnemonic that sells a character as a "story"
Many classroom and SRS mnemonics re-cast 形声 characters as 会意 stories because a meaning story is easier to remember than a phonetic relationship. The cost is the loss of the phonetic-series connection. Outlier names this as the central pedagogical critique of story-based kanji methods.7
The rule is additive, not substitutive: keep the mnemonic and note the phonetic. If you memorize 清 as "water plus blue = clear" without noticing that 青 is the phonetic セイ, you cut yourself off from 晴, 請, 精, and 静.
Good to know
The category names themselves diagram the principles
The six names are not opaque. Their parts point to the principles: 象 in 象形 means "image, shape" (and "elephant"); 指 in 指事 means "to point at"; 会 in 会意 means "to meet, to join"; 形 in 形声 means "shape, form" and 声 means "sound, voice"; 転 in 転注 means "to turn, shift" and 注 means "to pour, annotate"; 仮 in 仮借 means "temporary, borrowed" and 借 also means "to borrow."45
The names are a six-line summary of the framework, which is part of why they have survived more than two thousand years of pedagogy.
Japanese names use on-yomi, Chinese sources use Mandarin
The Japanese category names (shōkei, shiji, kaii, keisei, tenchū, kasha) are on-yomi readings; the Chinese-language equivalents (xiàngxíng, zhǐshì, huìyì, xíngshēng, zhuǎnzhù, jiǎjiè) refer to the same six categories.45
If you move between Chinese-character-studies sources and Japanese-linguistics sources, expect both name sets. Do not assume the two literatures are talking about different frameworks.
The 妹 and 味 phonetic-series shortcut
If you have met both 妹 (younger sister) and 味 (taste), you have seen the same phonetic component 未 (mi) twice, with two different semantic radicals (女 and 口). Wiktionary classifies both as 形声: 妹 = 女 (semantic) + 未 (phonetic); 味 = 口 (semantic) + 未 (phonetic), with reconstructed Old Chinese mɯds for 味.3132
This single observation often turns keisei from one category among six into the framework's center of gravity.
Re-casting a 形声 character as a 会意 story
The most common classroom error is to treat a 形声 character as if it were 会意 in a mnemonic, then stop there. A common example is 清 as "water (氵) plus blue (青) equals clear water." The story is memorable, but it cuts 清 off from its phonetic series.
The corrected reading keeps both pieces in view: 氵 (semantic, "water") plus 青 (phonetic, セイ) yields 清, on-yomi セイ, gloss "clear, pure."22
Outlier names this as the most costly classroom error, because the same phonetic component (青) recurs in 晴, 請, 精, and 静, and a learner who has overwritten the phonetic with a meaning story will not see the pattern when those kanji arrive later.7
Trusting Xu Shen's seal-script etymology without checking the oracle-bone shape
Many Shuowen Jiezi etymologies have been revised by oracle-bone evidence that Xu Shen did not have access to. Oracle-bone inscriptions were recognized as ancient Chinese writing by Wang Yirong in 1899. The material was being sold as "dragon bones" in Beijing apothecaries and was later traced to Anyang in Henan. The find post-dates Xu Shen's seal-script reference forms by more than a millennium on the script arc, but pre-dates them by more than a millennium in attestation.830
If you cite Xu Shen as a final authority, you are using a c. 100 CE source against material that did not surface until 1899. Pre-1899 etymologies are best treated as the first word, not the last.
国字 (kokuji) as a candidate seventh category
Kokuji (国字, also 和製漢字 wasei kanji) are kanji created in Japan rather than borrowed from China.33 Canonical examples include 峠 tōge ("mountain pass," 山 + 上 + 下, "going up and going down a mountain")3334 and 働 hatara-ku or ドウ ("to work," 亻 + 動, the most common kokuji and the rare kokuji with both an on-yomi and a kun-yomi).3335 Other examples are 込 ko-mu ("crowded into"), 畑 hatake ("dry field"), and 辻 tsuji ("crossroads").33
Most kokuji follow the compound ideograph (会意) principle in their formation. That is why traditional sources fold them into the existing six categories rather than treating kokuji as a separate type.33 The label "六書" rather than "七書" is, in part, a vote in favor of folding them in.
See also
- What Is Kanji? A Complete Beginner's Introduction
- How to Predict the Reading of an Unknown Kanji Compound: The On+On Default, Jūbako, Yutō, and the Look-It-Up Bucket
- Radicals vs. Components: Why They Are Not the Same Thing
- How Many Kanji Do You Need? A Realistic Count
- Kanji Stroke Order: The General Rules Behind Every Character