Japanese Focus Prosody: Pitch Widening, Contrastive は, and Information Structure
Japanese focus prosody is the system that marks the emphasized part of a sentence by pitch shape, not by loudness as English often does. It widens the pitch range on the focused element and compresses everything that follows it.12 If you already control lexical pitch accent and basic sentence intonation, focus prosody is the next layer up. It explains why a Japanese sentence can carry emphasis without sounding louder, and why Japanese speech can seem flat to an ear that is listening for a stressed syllable.
Overview
Focus prosody sits on top of the two prosodic layers covered elsewhere in the pronunciation track: lexical pitch accent on each word, and boundary intonation at the right edge of phrases and utterances. This article treats focus as a phrase-level property. It reshapes the pitch range of the accentual phrase carrying new or contrastive information, without inserting any new tones.342
What "focus" means in linguistics, in one paragraph
In Japanese linguistics, 焦点 (shōten, "focus") names the part of a clause that conveys new information or contrasts with alternatives.45 Information focus is the new fact that answers a wh-question. Contrastive focus picks one element out against alternatives.
Both kinds are realized prosodically in Tokyo Japanese, and contrastive focus is the reliably more prominent type in production studies.12 A short question-answer pair makes the information-focus case concrete.
何を買いましたか。1
"What did you buy?"
パンを買いました。1
"I bought bread."
The new information in the answer is パン. That is the focus, and the prosody of the answer should put pitch-range expansion on パン and compress the rest of the sentence into a low tail.
Why Japanese seems flat to English ears
English often combines loudness, length, and pitch into a single perceived stress on a focused word. Japanese keeps these dimensions more separate. Its main focus cue is pitch-range geometry, the size of the F0 excursion across a whole phrase, rather than localized loudness.341
Tokyo Japanese has no word stress in the English sense. The only word-level pitch event is a single H-to-L drop whose location is lexically fixed (the canonical 端 / 箸 / 橋 contrast), so emphasis cannot be realized by "stressing" a syllable the way English does.67
Even native Japanese listeners identify focus from F0 alone with surprisingly low accuracy: 31% for final focus, up to 57% for neutral or broad focus. This confirms that focus prosody in Japanese is real, but it operates within a narrower pitch geometry than the English ear is calibrated for.8
In production studies of native-non-native pairs, L2 learners of Japanese "did not rely systematically on f0 nor duration cues" when identifying focus. That finding is the direct experimental counterpart of the "Japanese sounds flat" perception, and it is what the rest of this article is calibrated to fix.9
Two prosodic layers, again
Lexical pitch accent is encoded as a per-word H*+L drop. Its location distinguishes lexical items and is fixed in the dictionary.610 Phrase- and utterance-level intonation is encoded as boundary tones (H%, L%, LH%, HL%) at the right edge of the accentual phrase and the intonational phrase, independent of the lexical drops inside the phrase.31112
Focus prosody rides on both layers. It does not insert new tones; it manipulates the pitch range of the accentual phrase that carries the focused element and compresses the range of all following accentual phrases.321
Two examples make the layering visible. The first carries an accented word whose lexical H-to-L survives any focus pattern; the second carries an unaccented (heiban) word with no internal drop, so focus has to show up in range and post-focal compression.
雨が降りました。6
"It rained."
飴が好きです。6
"I like candy."
The two-component focus pattern
Focus prosody in Tokyo Japanese has two reliable acoustic correlates: on-focus pitch-range expansion and post-focal compression. The same string can carry either or both, depending on lexical accent type and discourse context. Production studies document each component independently.12
Component 1: on-focus pitch-range expansion
The focused word keeps its lexical accent shape, but the H*+L excursion is amplified: the peak is higher and the following low is lower. This expands the vertical pitch range of the accentual phrase that carries the focus.12 Lee and Xu's production study reports consistent expansion of F0 range on the focused item across speakers and focus positions in accented stimuli.1
The size of the expansion depends on lexical accent type. Accented words can expand the H–L drop directly. Heiban (unaccented) words have no internal drop to amplify, so the on-focus expansion shows up mostly as a higher overall pitch register on the focused accentual phrase rather than as a sharper drop.128
Lee, Chiu, and Xu's perception study confirms the production picture from the listener side. Focus identification was more accurate when the focused words were accented than when they were unaccented. All-accented sequences yielded the highest focus-identification accuracy (56%), compared with unaccented-accented-unaccented sequences (32%).8
The same sentence spoken under broad focus and then under narrow focus on the subject makes the expansion audible.
田中さんがパンを買いました。1
"Tanaka bought bread." (broad focus, neutral range)
田中さんがパンを買いました。1
"TANAKA bought bread." (range expansion on 田中さん)
Component 2: post-focal compression
Everything after the focused element is compressed into a narrow, low pitch band. Sugahara's dissertation labels this the "post-focus compression" of F0 movement and shows that it is the main cue distinguishing focused from non-focused versions of the same string.2
Maekawa's earlier production data on Tokyo Japanese wh-questions reported the same pattern. Even when the focused wh-phrase itself shows no F0 rise, the post-focal reduction was consistently observed. This establishes post-focal compression as the more reliable of the two components.213
Lee and Xu's quantitative analysis finds that post-focus F0-range compression appears only in accented stimuli. In unaccented stimuli, focus is marked by minimum-F0 lowering. Post-focal mean F0 is significantly lower than in the neutral-focus baseline.114
English uses a localized loudness spike on the focused word and lets the post-focus tail return to a neutral declination. Japanese flattens and lowers the post-focus tail. If you ignore the tail and listen only for an emphasized word, you will miss most of the focus information in a Japanese sentence.19
Shifting the focus along the sentence moves the boundary between expanded and compressed regions.
田中さんがパンを買いました。1
"TANAKA bought bread." (post-focal compression on がパンを買いました)
田中さんが学校でパンを買いました。1
"Tanaka bought bread AT SCHOOL." (post-focal compression on パンを買いました)
田中さんが学校でパンを買いました。1
"Tanaka bought BREAD at school." (post-focal compression on 買いました only)
What about pre-focal material?
Pre-focal accentual phrases are largely unchanged in F0 range. Lee and Xu find no consistent expansion of the pre-focal region across focus positions.1 Igarashi's chapter notes that pre-focal accentual phrases retain their normal cumulative downstep and lexical-accent shapes. The focus effect is localized to the focused phrase and propagates only rightward as post-focal compression.4
Some experimental work reports mild raising of the pre-focal region for certain focus positions, notably penultimate focus, but Lee and Xu treat the effect as position-dependent rather than as a general property of pre-focal material.1
田中さんがパンを買いました。1
"Tanaka bought BREAD." (pre-focal 田中さんが keeps its neutral shape; only the post-focus tail compresses)
A worked example with three focus placements
The standard worked example in the production literature is a transitive sentence with a clear subject-locative-object-verb order. The same sentence is recorded under three focus conditions, with focus on each major argument in turn.12 In each case the pitch trace shows the same lexical accent pattern on the focused phrase with vertical range expansion. Everything to the right is compressed into a narrow low band, and everything to the left is unchanged.12
田中さんが学校でパンを買いました。1
"TANAKA bought bread at school." (wide range on 田中さん, compressed 学校でパンを買いました)
田中さんが学校でパンを買いました。1
"Tanaka bought bread AT SCHOOL." (wide range on 学校で, compressed パンを買いました)
田中さんが学校でパンを買いました。1
"Tanaka bought BREAD at school." (wide range on パン, compressed 買いました)
An accented focused word can amplify its existing H–L drop. A heiban focused word has no drop to amplify, so the on-focus cue is reduced. The post-focal compression cue then carries proportionally more of the listener's identification work.128
Contrastive は vs. thematic は
The particle は supports two readings. Their semantics have been distinguished since at least Kuno's 1973 treatment, and their prosodic correlates are formulated explicitly in Heycock's Oxford Handbook chapter.155 Thematic は is the unmarked ground-marker. Contrastive は is melodically prominent and carries the same two-component focus pattern described above.
Thematic は: the unmarked ground-marker
Thematic は picks an entity that is already established in the discourse, generic or anaphoric, and marks it as the ground against which the rest of the sentence (the comment) is asserted.15516 Kuno's foundational analysis states that themes marked by は must be either generic or anaphoric. Non-anaphoric, non-generic referents are typically marked by が, not by は.15
Prosodically, thematic は is unmarked. The は-phrase carries its ordinary accentual-phrase shape. The comment that follows carries the focus prosody appropriate to the discourse, broad focus by default.52 Heycock characterizes non-contrastive (thematic) topics as showing neither the on-topic pitch peak nor the radical post-topic lowering that contrastive topics show.5
私は学生です。5
"I'm a student."
田中さんは医者です。5
"Tanaka is a doctor."
Contrastive は: pitch boost and post-particle drop
Contrastive は forces a sharp pitch peak on the は-marked element and a deep drop on what follows. Heycock formulates the prosodic signature as "the presence of a prominent high-pitch accent on some part of themselves and a radical lowering of the pitch accent of the phrases following them."5
This is structurally the same two-component pattern as focus prosody more generally: on-focus expansion plus post-focal compression, realized on the は-marked constituent.52 The convention has been recognized at least since Kuno's 1973 treatment. Kuno separated contrastive は (alternative-evoking, often translatable with "at least", "though", or implicit "but…") from thematic は.155
The same written は supports both readings. Prosody and context are the only consistent disambiguators when mapping writing to speech.516
私は学生です。5
"As for ME, I'm a student (whatever the others are)."
寿司は食べません。5
"Sushi, I don't eat (other things, maybe)."
田中さんは来ます。5
"TANAKA is coming (but others, who knows)."
A test you can apply
Heycock's diagnostic is the two-component prosodic test: would a native speaker raise the pitch range on the は-phrase and drop everything that follows it? If yes, the reading is contrastive. If no, the reading is thematic.5
Oshima's complementary semantic test is the "as for X, at least…" paraphrase. A sentence that admits an "at least" or "as for" hedge is contrastive; a sentence that does not is thematic.16 Both tests pull on the same underlying contrast: contrastive は evokes alternatives, thematic は does not.1615
雨は降りませんでした。5
"It didn't rain (though something else may have happened)."
The next sentence is ambiguous in writing. The thematic reading is "Rain, I hate it," with no alternative evoked. The contrastive reading is "RAIN, I hate (other weather, OK)," with a pitch peak on 雨は and lowering on 嫌いです.
雨は嫌いです。5
"I hate the rain." (ambiguous between thematic and contrastive without prosody)
When は contrasts something not in the sentence
The alternative that contrastive は evokes does not need to be stated out loud. The contrast can be implicit, and the discourse fills in the unspoken alternative.1516 This is the source of the "as for X, at least…" reading. The speaker marks X as one of several possible alternatives without naming the others, and the hearer infers the contrast set from context.16
Oshima documents that the implicit-alternative reading is the majority case of は in actual corpus use, supporting his title-claim that は "most often does not mark a topic" in the unmarked thematic sense.16
ビールは飲みます。16
"BEER I drink (wine, sake, who knows)."
今日は早く帰ります。16
"TODAY I'm going home early (unlike usual)."
Focus and the sentence-final particles
Focus prosody shapes the body of the utterance. Boundary tones shape the right edge. Because the two cues target different parts of the string, they can layer on the same sentence without interfering with each other.
Why focus and final-particle tunes don't fight
Focus prosody operates on the body of the utterance. It expands the range of the focused accentual phrase and compresses every accentual phrase after it. Boundary tones operate at the right edge, on the last mora or two of the accentual or intonational phrase.11124
The two layers are independent in the X-JToBI transcription system. The same string can carry any combination of focus prosody and final boundary tone (L%, LH%, H%, HL%) without their interfering with each other's primary cues.1112 The TUFS pronunciation module makes the same point for the question rise specifically: the rise is realized on the final mora only, while maintaining the accent patterns of the words in the sentence.17
田中さんは来ましたよ。4
"TANAKA came." (focus on 田中さん; falling よ on a compressed tail)
田中さんは来ましたか。18
"Did TANAKA come?" (focus on 田中さん; LH% rises on か itself, post-focal compression on the body)
Focus before ね, よ, よね
When the focus is mid-sentence, post-focal compression carries all the way to the sentence-final particle. The particle's own boundary tune (rise on ね, fall on よ, fused tune on よね) sits on top of a low, flat tail rather than on a neutral declination.412
The boundary pitch movement remains identifiable as a tonal event because it targets the final mora only, distinct from the compressed body of the utterance.111217 The practical consequence is that the particle's tune is unchanged in shape but realized at a lower absolute pitch than in the broad-focus version of the same sentence.4
田中さんが来ましたね。4
"TANAKA came, didn't he?" (small rise on ね sits on a compressed tail)
違いますよ。19
"That's WRONG." (focus on 違います; よ falls on a compressed tail)
Focus on the particle itself
The sentence-final particle can itself be the focused element in narrow contexts, most often when the speaker is overriding a prior claim. The よ particle in 違うよ can take its own prominence, with a pitch boost on the particle morpheme.194
Hirayama's account of rising declaratives notes that particles like よ vary their tune (yo-falling versus yo-rising) to convey distinct discourse moves on the same morpheme, which presupposes that the particle is a target for prosodic prominence rather than a tonally fixed segment.19
違うよ。19
"I'm telling you, it's wrong."
行きますよ。19
"I AM going."
Cross-reference the dedicated intonation pages
The dedicated intonation sibling pages lay out the boundary-tone inventory used throughout this section: the rises and falls on the final mora, the LH% on yes/no questions, and the L% on assertive よ. Two are already published: Japanese Sentence Intonation: Falls, Rises, ね, よ, よね, which covers the boundary-tone inventory and the politeness and discourse functions of the sentence-final particles; and Japanese Questions Without か: Rising Intonation and の, which covers the question rise on declarative-shaped strings. The X-JToBI labels referenced here are consistent with the inventories used there.1211
When focus is not pitch-boosted
Pitch boost is the prototypical focus cue, but it has exceptions. Two well-documented situations weaken or remove it: certain question types, and lexical-accent contexts where there is no H–L drop to amplify. In those situations Japanese uses secondary cues, including boundary insertion and segmental lengthening. Post-focal compression remains the more reliable signal.
The why-question case
Tomioka reports a surprising prosodic pattern in Japanese why-questions. The phrase that immediately follows a causal wh-phrase (the focus associate of なぜ or どうして) "can be considered as the focus associate without any focal prominence," contradicting the otherwise general claim that a focused phrase in Japanese receives a pitch boost.20
The earlier Maekawa 1991 production data, cited by Sugahara, reported the same effect for wh-questions more broadly: no significant F0-rise on the wh-phrase, while the post-focal reduction was consistently observed.2 Ishihara's ICPhS 2011 paper isolates the case where the wh-phrase is lexically unaccented and shows that on-focus expansion is reduced or absent there too. The post-focal compression cue still operates and remains the more reliable side of the two-component pattern.13
The take-away for learners is that pitch boost is the typical cue, but not an exceptionless one. A wh-question without the expected on-wh pitch rise is not a defective question. It is a known sub-pattern of focus prosody.2013
なぜ田中さんは来なかったの。20
"Why didn't Tanaka come?" (focus associate is 田中さんは, no pitch boost on it)
誰がパンを買いましたか。13
"Who bought bread?" (heiban 誰 carries little on-focus expansion; robust post-focal compression on パンを買いました)
Boundary insertion as a backup cue
Imai, Lee, and Xu's production study documents an "Edge-Reinforcing Strategy." When pitch-range expansion is unavailable or blocked (already-low pitch register, post-accentual context, dialectal mismatch), Tokyo-Japanese speakers signal focus through edge-reinforcing cues, including silence, segmental lengthening, and jaw-opening at prosodic boundaries.21
The study tested nine educated Tokyo-Japanese speakers producing genitive noun phrases under broad versus narrow focus. Acoustic measures included word duration and jaw-opening estimates. The sample is small, and the strategy as described has not yet been replicated outside that dataset, but speakers in the study reliably restructured prosodic domains through these boundary-based cues.21
If you listen only for pitch contour, you can miss focus entirely in stretches of speech where the F0-range cue is suppressed. A small prosodic break before or after the focused element is a separate, secondary cue.21 Earlier work converges from the F0 side: focus in unaccented stimuli is marked by minimum-F0 lowering rather than by an expanded HL excursion, showing that the system has more than one cue available.114
田中さんの本です。21
"It's Tanaka's book." (broad focus)
田中さんの本です。21
"It's TANAKA'S book." (narrow focus on 田中さん; lengthening on the genitive boundary, jaw-opening cue)
Downstep and accent type can mute the cue
On a heiban (unaccented) phrase there is no internal H–L drop to amplify, so the on-focus expansion cue reduces to a register raise rather than a sharper excursion. The post-focal compression and boundary cues become proportionally more important.128
Within a single intonational phrase, cumulative downstep lowers each subsequent accentual phrase relative to the previous one. An accented focus near the right edge of a long intonational phrase has less pitch headroom for expansion than one near the left edge.311
Lee, Chiu, and Xu's perception study quantifies the resulting asymmetry. Focus identification accuracy is lowest for final focus (31%) and highest for neutral or broad focus (57%), with accented focus consistently easier to identify than unaccented focus.8 The pedagogical consequence is that the same focus instruction, "widen the pitch on X," works well for an accented X near the left edge of a sentence but poorly for a heiban X near the right edge. In the harder case, the learner has to lean on post-focal compression and boundary cues rather than on range expansion.821
| Focus condition | Lexical accent type | Position | Identification accuracy |
|---|---|---|---|
| Neutral / broad focus | mixed | n/a | 57% |
| Narrow focus, accented sequence | accented | non-final | 56% |
| Narrow focus, unaccented-accented-unaccented sequence | mixed (target accented) | medial | 32% |
| Narrow focus, final | mixed | final | 31% |
Source: Lee, Chiu, and Xu's perception study.8
田中さんが歌を歌いました。8
"Tanaka sang a song." (broad focus; accented 田中さん, heiban 歌)
田中さんが歌を歌いました。8
"Tanaka sang a SONG." (focus on heiban 歌; no internal HL to amplify; register raise plus post-focal compression on 歌いました do the work)
Good to know
Loudness instead of pitch geometry
English speakers tend to mark a focused word by raising volume on it, the way they would say "TANAKA bought bread." In Japanese the right move is the opposite of localized loudness: widen the pitch range on the focused accentual phrase and compress the following phrases into a narrow low band, without changing volume.
Japanese focus prosody is realized in F0 range and post-focal compression, not in loudness. Loud emphasis without pitch geometry is heard as anger, not as emphasis.129 The corrected version of the example sentence keeps the volume even and lets pitch do the work.
田中さんがパンを買いました。1
"TANAKA bought bread." (range expansion on 田中さん, compression on がパンを買いました; volume steady)
Over-applying contrastive は
A common beginner habit is to produce every は with a contrastive pitch peak and post-particle lowering, even when the discourse is unmarked. The default reading of は is thematic and prosodically flat. Contrastive prosody belongs only where alternatives are evoked. Producing every は as contrastive sounds like the speaker is constantly correcting the listener.516 The thematic version of the canonical example carries no on-particle peak.
私は学生です。5
"I'm a student." (thematic は, no pitch peak on 私は)
"Boost the bit, squash the rest."
A five-word summary of the two-component pattern: on-focus pitch-range expansion plus post-focal compression. The phrase captures the asymmetry between the focused accentual phrase and everything that follows it. It also works well as a self-coaching cue while practicing.12
Post-focal compression weakens in fast casual speech
Post-focal compression is most reliable in careful or news-register speech. In fast colloquial speech, speakers may neutralize the F0 cue and rely more on word order, particle choice, or edge-reinforcing boundary cues.211 If you train exclusively on news audio, you will hear the cue cleanly. In spontaneous conversation, the same cue may be partially absent, so recognition has to lean on the other components.
焦点 (shōten) "focus"
焦点 in Japanese means "focal point." It was originally a physics term for the focal point of a lens or mirror, and was adopted into Japanese linguistics as the standard translation for what English-language linguistics calls "focus." Both languages converged on the same optical metaphor for "the salient point of an utterance."4
Treating OJAD output as a focus-prosody guide
OJAD (Online Japanese Accent Dictionary) is a lexical-pitch-accent tool. Its rendered pitch contour reflects word-level accent patterns and accentual-phrase boundaries, not sentence-level focus prosody. Sentence-level focus prosody is invisible to OJAD's audio synthesis. For ear-training, use connected speech with varying focus placement, not isolated word lists.22
See also
- Topic vs. Subject in Japanese: The Hidden Slot
- は vs が in Japanese: A Beginner's First Pass
- Sentence-Final Particles in Japanese (終助詞): Overview
- Sentence-Level Prosody Practice in Japanese: Drilling Whole-Sentence Pitch Contours
- Japanese Filler Words and Hesitation Prosody: あの, えーと, まあ, and the Long-Vowel Stall
- Nakadaka (中高): The Middle-High Japanese Pitch-Accent Pattern