How to Predict the Reading of an Unknown Kanji Compound: The On+On Default, Jūbako, Yutō, and the Look-It-Up Bucket
Predicting the reading of a kanji compound is less a single rule than a short decision flow. Assume on+on, check a handful of named exceptions in a fixed order, and recognize the small set of compounds where prediction stops working and the dictionary takes over.12 If you run the flow on every unfamiliar two-kanji string, you train the right intuition even when you miss. Each branch you consider is itself a piece of structural knowledge.3
Overview
Japanese two-kanji compounds sort into four reading patterns, plus one off-flowchart exit. The four patterns are on + on, kun + kun, on + kun (called jūbako-yomi 重箱読み), and kun + on (called yutō-yomi 湯桶読み).24 The fifth route, jukujikun (熟字訓), gives the whole compound a reading that shares no segment with any of the kanji's normal readings. It cannot be derived by rule.56
The on+on pattern is the dominant default, reported in the 75–80% range across corpus and pedagogy literature.178 The two mixed patterns are minority routes named after their archetypal examples. Both archetypes are autological, meaning they demonstrate the pattern they name: 重箱 itself reads on+kun, and 湯桶 itself reads kun+on.24
The decision in one minute
The four-pattern landscape
Together, the four patterns and the jukujikun exit form a five-way classification for any two-kanji compound a learner is likely to meet. The named patterns are well established in standard reference dictionaries.24
The on+on share rises in technical, legal, and news writing, where kango is more common. It falls in literary or everyday prose, where wago and mixed compounds are more common.19 The Sino-Japanese (kango) and native (wago) layers together account for roughly 88–92% of vocabulary across all BCCWJ (Balanced Corpus of Contemporary Written Japanese) registers. Foreign loans (gairaigo) take the rest.910
The most rigorous public source for the per-pattern distribution is kanjidatabase.com. It is built on the Mainichi Newspaper corpus 2000–2010, with approximately 282.8 million morpheme tokens after excluding proper nouns.7 The exact percentage depends on whether the count is by type (distinct compounds) or by token (running occurrences).7
Why a procedure beats memorization
Two-kanji compounds form a productive open class in Japanese. New ones enter the lexicon constantly through news, academia, and trade names, so a learner cannot memorize the whole inventory.13
A staged procedure, meaning a default plus exception checks, trains the right intuition even when the guess is wrong. Each branch the learner considers is itself a piece of structural knowledge.3 The goal is a confident first guess plus a clean rule for when to stop guessing and consult a dictionary.
Step 1: default to on+on
Why on+on dominates two-kanji compounds
Most Japanese two-kanji compounds entered the language as kango (Sino-Japanese vocabulary) borrowed from Middle Chinese between the 5th and 9th centuries. They arrived as whole compound units, with their Chinese-derived on-readings already locked in.11112
This historical fact, not a synchronic rule (a rule operating in the language now), is what makes the on+on pattern statistically dominant. The same productive process still operates in coinages like 携帯 keitai, 環境 kankyō, and 情報 jōhō.112
Reports of the on+on share cluster around 75–80% of two-kanji compounds. The exact figure depends on the corpus (technical writing higher, literary writing lower) and on whether the count is by type or by token.178
学校13
"school"
電車3
"electric train"
図書館3
"library"
経済3
"economy"
What on+on looks like in practice
In a monolingual or learner dictionary, on'yomi are conventionally written in katakana while kun'yomi are written in hiragana. A compound whose components both appear with katakana readings in the entry is almost always read on+on.3
Having no okurigana on either kanji is necessary, but not sufficient, for on+on. An okurigana tail almost always signals a kun reading.8
Cross-link to the mental model
If you need the foundation before the flow chart, review the deep dive on the two-reading system. It explains why on+on, kun+kun, and mixed patterns exist in the first place.
Step 2: pick the right on-reading when several exist
Domain signals: Buddhist, classical, scientific
Japanese on'yomi fall into three main historical layers, plus a fourth catch-all. The same kanji often carries readings from more than one layer.1114
- Go-on (呉音): the oldest layer, borrowed from the Wu region of southern China between the 5th and 6th centuries; preserved heavily in Buddhist vocabulary.1114
- Kan-on (漢音): borrowed from the standard pronunciation of Chang'an during the Tang dynasty (7th–9th centuries); the dominant layer in modern compound coinage.1114
- Tō-on (唐音), also called Sō-on: later borrowings from the Song and Ming dynasties (from the 10th century onward), brought mainly by Zen monks and merchants; preserved in a small set of words such as 椅子 isu and 蒲団 futon.1114
- Kan'yō-on (慣用音): "customary" readings that drifted from any of the above and became standardized.14
A compound's layer usually signals its domain. The kanji 行 carries go-on gyō, kan-on kō, and tō-on an, and each surfaces in a different register.14
銀行3
"bank"
修行3
"Buddhist ascetic training"
行脚4
"pilgrimage on foot"
When the dictionary lists two on-readings, prefer the kan-on default
Kan-on is the modal, or most common, layer for modern jukugo. Among kanji with multiple on-readings, the kan-on entry is more often the one a learner will meet first in everyday compounds.14 It is not a strict majority, and no per-corpus count is available. So kan-on functions as a confident first guess rather than a guarantee.
When a kanji's dictionary entry lists multiple on'yomi and the compound is not flagged as Buddhist or culinary, kan-on is the safer first guess.14 Buddhist vocabulary (経 kyō "sutra" rather than kei; 行 gyō; 明 myō) and a small set of food and household items (布団 futon, 行灯 andon) carry go-on or tō-on, respectively.11144
Cross-link to the deep dive
The historical layers above earn their own treatment in the dedicated article on the Sino-Japanese reading strata.
Step 3: use the right kanji's phonetic component (形声)
How phonetic series predict the on-reading
Keisei (形声) characters are phono-semantic compounds and one of the traditional 六書 categories. They pair a semantic element ("radical") with a phonetic element that signals the character's on-reading.215 In Shirakawa's classification, 形声 characters account for 61% of the 2,136 jōyō kanji overall. That breaks down as 49% of the 1,026 educational (kyōiku) kanji and 72% of the remaining 1,110 more advanced jōyō characters.16 Pedagogy sources report broader estimates of 70–90% depending on the inventory and classification scheme.15
The 青 sei ("blue/green") series is one of the most useful series for learners. The on-reading sei (sometimes the alternate shō) surfaces across compounds that contain 青 on the right-hand or enclosed side.154
| Kanji | Composition (semantic + phonetic) | On'yomi | Verified compound |
|---|---|---|---|
| 清 | 氵(water) + 青 | sei, shō | 清潔 seiketsu |
| 晴 | 日(sun) + 青 | sei | 晴天 seiten |
| 精 | 米(rice) + 青 | sei, shō | 精神 seishin |
| 請 | 言(speech) + 青 | sei, shin | 申請 shinsei |
| 静 | 青 + 争 | sei, jō | 静止 seishi |
| 情 | 忄(heart) + 青 | jō, sei | 感情 kanjō |
清潔3
"cleanliness, hygiene"
晴天3
"clear weather"
When the phonetic component lies
Sounds changed over the centuries between the original Middle Chinese borrowing and modern Sino-Japanese. As a result, a given phonetic element can carry different on-readings in different characters. The learner verifies, never trusts blindly.1115
A clean counter-example is 寺 ji as a phonetic component. The element predicts ji inside 持 ji ("hold") and 時 ji ("time"), but inside 詩 it shifts to shi ("poem").15 The first guess is still worth making. The phonetic match is evidence, not proof, and a quick verification step catches the divergent cases.
Hamilton's rule of thumb is that the phonetic gives a "first guess" with an estimated 50–70% success rate on jōyō kanji, depending on the series.15 Treat a phonetic match as a strong lead that survives or fails on the second check.
A 50–70% success rate means the phonetic component is right about as often as a coin flip weighted in the learner's favor. Always confirm the predicted on-reading against the kanji's dictionary entry or against a compound the learner already knows. Never overwrite a known reading on the strength of a phonetic match alone.15
Cross-link to the six-categories article
The 形声 category and the other five categories are covered in the dedicated article on the traditional 六書 typology.
Step 4: check for rendaku on the second element
When rendaku fires
Rendaku (連濁, "sequential voicing") is a sound change in compounds. A voiceless initial consonant of the second element becomes voiced (k → g, s → z, t → d, h → b).1718
Rendaku applies productively to native (wago) second elements. In practice, that means mostly kun+kun compounds and the jūbako (on+kun) pattern, when the kun-read second element starts with a voiceless obstruent.1718
Sino-Japanese (kango) second elements do not undergo rendaku, which is why pure on+on compounds keep the second element's underlying voiceless onset.17182
花火3
"fireworks"
手紙19
"letter"
朝日3
"morning sun"
When rendaku is blocked
Four blockers cover most cases.
- Lyman's Law: rendaku is blocked when the second element already contains a voiced obstruent (g, z, d, b) elsewhere. This is the strongest blocker identified in the literature.171820
- Sino-Japanese morphology: kango second elements resist rendaku, which is why on+on compounds keep their voiceless onsets.1718
- Recent loanwords: gairaigo second elements almost never undergo rendaku.18
- Dvandva ("A and B") compounds: coordinative compounds, where the two elements are joined as equals rather than arranged as modifier and head, tend not to voice.18
The textbook contrast for dvandva blocking is 山川 yamakawa ("mountains and rivers", coordinative, no rendaku) versus 山川 yamagawa ("mountain river", modifier-head, with rendaku). The Vance and Irwin (2016) volume is the safest reference for the direction of this contrast.18
山風17
"mountain wind"
Cross-link to the rendaku deep dive
The full rendaku picture, including the productive cases and the long tail of lexical exceptions, is covered in the dedicated article on sequential voicing in kanji compounds.
Step 5: recognize the kun+kun pattern
Signals of a native compound
Kun+kun compounds (also called wago compounds) typically belong to concrete, native-feeling semantic fields: nature, body parts, daily life, and kinship.112
A short native suffix attached to a one-kanji base is a strong kun+kun signal. The endings 〜火 -bi, 〜道 -michi, 〜月 -zuki, 〜人 -hito, and 〜日 -hi all tend to take kun first elements.312
山道3
"mountain path"
月見3
"moon-viewing"
朝日3
"morning sun"
花火3
"fireworks"
Why no okurigana does not mean on'yomi
The "two kanji, no kana between them, no okurigana = on+on" heuristic fails on kun+kun compounds. By definition, these compounds pack two native readings into kanji without okurigana on the second element.38
Okurigana absence is a necessary but not sufficient signal for on+on. The learner must also weigh semantic field, vocabulary register, and the presence or absence of rendaku.3
When kun+kun is also a name
Many surnames and place names are kun+kun compounds whose kanji carry their everyday native readings: 田中 Tanaka, 山田 Yamada, 中野 Nakano. The same surface compounds can also lexicalize as common nouns elsewhere.53
For proper nouns, an additional layer of irregular readings called nanori (名乗り) is sanctioned by the Jōyō Kanji List appendix and the Family-Name Kanji table. Nanori extends the kun inventory specifically for names.5
Cross-link to the names article
Proper-noun readings, including the nanori inventory and the place-name conventions that overlap with it, are covered in the dedicated article on name-only kanji readings.
Step 6: recognize the two mixed patterns
Jūbako (on + kun)
Jūbako-yomi (重箱読み) names the pattern in which the first kanji uses its on'yomi and the second uses its kun'yomi.221422 重箱 itself is the archetype that names itself: 重 jū is on'yomi, and 箱 (hako → bako with rendaku) is kun'yomi.422
重箱22
"tiered lacquer food box"
役場4
"town office"
額縁4
"picture frame"
Yutō (kun + on)
Yutō-yomi (湯桶読み) names the mirror pattern: kun'yomi first, on'yomi second.22142425 湯桶 itself is the archetype that names itself: 湯 yu is kun'yomi, and 桶 tō is on'yomi.214
Yutō-yomi is the less frequent of the two mixed patterns. Jūbako outnumbers yutō in standard reference dictionaries' worked lists. This reflects the historical asymmetry in which Sino-Japanese first elements naturalized more readily as modifiers than as heads.124
場所24
"place, location"
夕刊25
"evening newspaper edition"
How to detect a mixed reading before you confirm it
Three structural clues can signal a mixed pattern before the learner reaches a dictionary.
A concrete, native-feeling first kanji (湯, 場, 手, 見, 夕) paired with an abstract Sino-Japanese head is a yutō clue.4 A Sino-Japanese first element with a homely native second (重箱, 台所, 本屋) is a jūbako clue.4 More generally, a register mismatch between the two halves is the first signal that the compound is not pure on+on or kun+kun: kango formality on one side, wago concreteness on the other.14
Step 7: stop guessing when it is jukujikun
Signals that the compound is jukujikun
A jukujikun (熟字訓) is a reading assigned to a kanji compound as a whole. The reading shares no segment with the individual kanji's standard on or kun readings.5627
The giveaway is a mismatch in surface shape. The spoken form is a short, native-sounding word, but the written form is two or three Chinese-looking characters.56
大人27
"adult"
今日6
"today"
田舎5
"the countryside"
紅葉5
"maple leaf; autumn-colored leaves"
明日5
"tomorrow"
Why the look-it-up branch is the right call
Jukujikun readings are stored with the compound as whole-word readings. They are not synthesized from its parts. The Jōyō Kanji List appendix of 116 jukujikun entries (常用漢字表 付表) is the authoritative finite inventory of the modern standard set. It exists precisely because these readings cannot be derived by rule.5
Spending more than a few seconds trying to predict a jukujikun reading wastes time and reinforces wrong intuitions. The correct branch is to recognize the off-flowchart status and consult a dictionary.53
Cross-link to the dedicated treatment
The full inventory of jukujikun readings and the appendix's role in fixing the modern standard set are covered in the dedicated article on whole-word kanji readings.
The ateji boundary
Ateji (当て字) and jukujikun both sit outside the four-pattern flow chart. They break predictability for opposite reasons.284
Ateji chooses kanji for sound with disregard for meaning. The kanji's readings are honored, but the semantic match is loose or absent. Examples include 寿司 sushi (the kanji mean "long life" and "administer" but are picked for their sounds) and 珈琲 kōhī ("coffee," purely phonetic).284
Jukujikun chooses kanji for meaning with disregard for sound. The semantic match is exact (大人 = "big person" = adult), but the reading is the native word, not the sum of the kanji's readings.628
Ateji often appears in foreign loans, brand names, and pre-modern phonetic transcriptions, while jukujikun is concentrated in core native vocabulary listed in the Jōyō appendix.5284
Cross-link to ateji
The phonetic-kanji-selection process and its conventional inventory are covered in the dedicated article on sound-first kanji choice.
The full flow chart, in one place
The seven-step procedure, condensed
Here is the full procedure, in the order a learner should run it on an unfamiliar two-kanji compound.
The flow is iterative, not one-shot. A learner often enters at Step 1, detects a register or semantic mismatch at Step 5 or 6, and loops back through Step 4 to reapply rendaku on the now-reclassified compound.3
How to use it on a real page
Two walk-throughs show how that iteration works.
朝食 chōshoku ("breakfast"). Step 1 hypothesis: use the on+on default and predict on-readings for both kanji.13 Step 2 check: 朝 has chō (kan-on), and 食 has shoku (kan-on). The kan-on default holds.13
Step 3 check: 食 contains the phonetic 食, and shoku is consistent.13 Steps 4–7 produce no detour. Read it as chōshoku; the dictionary verifies.13
朝食13
"breakfast"
手紙 tegami ("letter"). Step 1 hypothesis: the on+on default would predict shushi (手 shu + 紙 shi), a non-word a native speaker would not recognize.19 The detour starts immediately because the meaning belongs to daily life. Drop to Step 5 and reclassify it as kun+kun, te + kami.19
Step 4, the rendaku check, applies to the wago second element (k → g), giving tegami.1719 The dictionary verifies tegami.19
手紙19
"letter"
The walk-throughs show the iterative shape of the flow. Each branch supplies evidence for or against the current guess, and revising the hypothesis costs nothing because the earlier checks are still valid.3
Good to know
The 75–80 percent figure is a guide, not a guarantee
The widely cited 75–80% on+on share for two-kanji compounds is a corpus average, not a promise. Technical, legal, and news writing skew higher because they use more kango. Literary prose and conversation skew lower because they use more wago and mixed compounds.19710 A learner who reads manga or fiction will hit kun+kun and mixed compounds far more often than one who is studying for a JLPT vocabulary list. Calibrate expectations to the input source.9
Three-kanji and four-kanji compounds change the math
Three-kanji compounds usually parse as a 2+1 or 1+2 morphological structure. The four-pattern flow applies to each chunk independently. 図書館 parses as (図書) + 館, all on; 大食漢 parses as (大食) + 漢, all on.13
Four-character compounds called yojijukugo (四字熟語) almost always read on+on+on+on with classical Chinese word order. That is because they are usually direct quotations or paraphrases from Classical Chinese sources.112
一石二鳥3
"kill two birds with one stone"
Place names and surnames are their own ecosystem
Place-name (chimei) and surname (myōji) readings draw heavily on nanori, a category of kanji readings sanctioned only for proper nouns. They also draw on historical reading conventions that pre-date the modern on/kun reform.53
Examples include 中野 Nakano, 新宿 Shinjuku, 大阪 Ōsaka, 神戸 Kōbe, and 日本橋 Nihonbashi (kun+kun+on with rendaku).3 The four-pattern flow chart is built on the open-class common-noun lexicon and will misfire on proper nouns. Treat names as their own lookup category.53
Reading an on+on compound with kun'yomi because the kanji are common
A learner who treats both kanji in 山道 as on'yomi just because the compound has no okurigana will produce the non-existent reading sandō. The correct reading is yamamichi (kun+kun, with concrete native semantics overriding the on+on default). The "no okurigana → on'yomi" heuristic is a one-way implication, not a biconditional. Kun+kun compounds are a recognized exception class.38
山道3
"mountain path"
Forgetting that on+on blocks rendaku
A learner who reaches for rendaku on every compound will incorrectly predict 学校 as a voiced-second-element form. The correct reading is gakkō, with no rendaku. Rendaku is a wago-specific process, and the kango second element 校 resists it.1718
学校13
"school"
Trying to predict a jukujikun reading
A learner who reads 大人 as daijin, tainin, or ōhito is applying the four-pattern flow to a compound outside that flow. The correct reading is otona, a jukujikun lexicalized to the whole compound. The reading is not synthesizable from the parts, and the compound belongs in the look-it-up bucket.5627
大人27
"adult"
Treating ateji as on+on
A learner who reads 寿司 as jushi or kotobuki-tsukasa is honoring the kanji's standard readings in a compound where the kanji were chosen for sound, not meaning. The correct reading is sushi, an ateji form: phonetic kanji selection for an existing native or borrowed word. The standard on/kun readings of the chosen kanji are bypassed.284
寿司4
"sushi"
The 75–80 percent figure as a confidence anchor
A useful default stance for an unfamiliar compound is this: assume on+on, kan-on, and no rendaku. The learner will be right roughly four times out of five on bare two-kanji compounds in a news article. The misses are concentrated in semantically native or register-mismatched compounds that flag themselves at Step 5 or Step 6.178 Anchoring the default keeps the learner from overfitting to the exception classes and freezing on every compound.
Pure on+on compounds carry a kango formality register
Kango is the prestige layer of the Japanese lexicon and the layer often used in formal writing. Substituting a wago synonym shifts the register noticeably. 食事 shokuji (kango) reads as neutral or formal "meal", while 食べ物 tabemono (wago) reads as everyday "food".112 The on+on / kun+kun split is not only a sound prediction. It is also a register signal.
Why jūbako outnumbers yutō
Sino-Japanese morphemes were borrowed predominantly as modifiers, meaning left-hand elements that modify native heads. Because Japanese is head-final, the kango modifier + wago head order fits the language's structure, producing jūbako. The mirror order (wago modifier + kango head) is structurally available but historically rarer. That is why yutō remains the minority pattern of the two mixed readings.14
The Jōyō appendix is a finite inventory of the standard jukujikun set
The 2010-revised 常用漢字表 lists 116 jukujikun and ateji forms in its appendix (付表). This is the closed list that a JLPT learner is expected to know on sight. The rest are learned ad hoc as the learner encounters them.5 Knowing that the appendix exists, and that it is finite, makes the look-it-up branch feel less open-ended.
Why your first guess is worth making even when wrong
Active prediction followed by verification produces stronger memory traces than passive lookup, even when the prediction is wrong. The prediction error itself is informative. A wrong guess identifies which branch of the flow chart the learner's intuition misfired on.3 The pedagogy literature on the testing effect (retrieval practice) supports this framing for vocabulary learning generally. The kanji-compound case is a domain-specific application.3
See also
- Jukugo (熟語): How Kanji Combine to Form Japanese Words
- The Four Jukugo Construction Patterns
- Rendaku: When K Becomes G in Compound Words
- The Jōyō Kanji List (常用漢字): The 2,136-Character Set Explained