Stress vs. Pitch: Does Japanese Have Stress?
Before studying any Tokyo pitch pattern, English speakers should settle one typological question: does Japanese use stress or pitch accent? Japanese has no stress accent at all. It has only pitch accent.12
Prominence inside a Japanese word is carried by pitch alone. The loudness, length, and vowel-quality cues that English fuses into "stress" are not available to do the work.23
Overview
The one-line answer
Japanese has no stress accent. It has a pitch accent: the prominent unit of a word is marked by a pitch difference (typically a fall from High to Low) and by nothing else.12 Loudness, length, and vowel quality do not change at the accented mora.2
The Japanese term for the system is 高低アクセント (kōtei akusento, literally "high-and-low accent").1 Wikipedia's typology page defines a pitch-accent language as one with "certain syllables in words or morphemes that are prominent, as indicated by a distinct contrasting pitch rather than by volume or length."2
The two confusions English speakers bring in
An English speaker meeting Japanese phonology for the first time usually tangles two independent axes.
The prominence axis is about what makes a unit "stand out" inside a word: stress, pitch, or nothing at all.23 The rhythm axis is about how beats are spaced in time: stress-timed, syllable-timed, or mora-timed.45
These two axes are independent in principle. A language can be mora-timed and stress-accented. Japanese happens to be mora-timed and pitch-accented, and that combination compounds the difficulty for an L1-English learner.42
The article handles both axes, prominence first and rhythm woven through.
Where this sits in the phonology subcategory
This is the last article in the phonology fundamentals subcategory. Its prerequisite on the rhythm axis is Mora vs. Syllable: Why Japanese Is Mora-Timed. Readers who have not yet internalized the equal-mora beat will get less out of the contrast that follows.
The article immediately after this one opens the pitch-accent subcategory and walks through the four Tokyo pitch patterns themselves. Treat this article as the bridge.
What English bundles into "stress"
The four cues English fuses together
A stressed syllable in English is not marked by one thing. It is marked by four. Wikipedia's Stress (linguistics) article identifies the correlate set as "dynamic accent in the case of loudness, pitch accent in the case of pitch..., quantitative accent in the case of length, and qualitative accent in the case of differences in articulation."3 Ladefoged and Johnson identify the same inventory, noting that stressed vowels in English are "longer, closer, more rounded" than unstressed ones.6
English uses what is called variable stress accent: the four cues move together. A stressed English syllable is usually louder, longer, higher in pitch, and full-vowelled all at once.3 Native English listeners hear "stress" as one perceptual category because in their language the four signals never have to be heard apart.36
| Cue | Stressed syllable in English | Unstressed syllable in English |
|---|---|---|
| Loudness | greater | lesser |
| Length | longer | shorter |
| Pitch | higher | lower |
| Vowel quality | full vowel | reduced, often schwa /ə/ |
The fact that all four cues line up on the same syllable is the habit a Japanese learner is trying not to import.
Schwa reduction is the cost of stress
Because stressed English syllables get the four cues, unstressed syllables pay. Their vowels collapse toward the centre and most often become schwa /ə/, "the most common reduced vowel in English."7 The Stress (linguistics) article gives the standard pair: "the unstressed first syllable of the word photographer contains a schwa, whereas the stressed first syllable of photograph does not."3
The pattern is visible across the photo- family. Photograph /ˈfoʊ.tə.ˌɡræf/ has stress on the first syllable and a schwa in the middle.3 Photography /fəˈtɑː.ɡrə.fi/ shifts stress to the second syllable, and the first and third vowels collapse to schwa.7 Photographer /fəˈtɑː.ɡrə.fər/ shows the same pattern again.7
The morpheme photo- is spelled identically every time, but its vowels are driven entirely by where stress lands.37
Vowel reduction is not a universal property of human language; it is a property of stress-timed languages specifically. The isochrony literature states that "stress-timing is strongly related to vowel reduction processes."4 Stress-timed languages such as English, Russian, and Arabic reduce vowels in unstressed positions; mora-timed languages such as Japanese do not exhibit the same reduction.48 That asymmetry is why importing the English habit into Japanese destroys the mora count, which the Why English-accented Japanese sounds foreign section will show in detail.
Word stress is contrastive in English
English stress placement can carry lexical meaning on its own. The minimal pair insight /ˈɪnsaɪt/ versus incite /ɪnˈsaɪt/ is "distinguished in pronunciation only by the fact that the stress falls on the first syllable in the former and on the second syllable in the latter."3
The largest productive class of stress-driven minimal pairs in English is the noun-verb alternation: PREsent (noun) versus preSENT (verb), REcord versus reCORD, CONduct versus conDUCT, PERmit versus perMIT. When stress shifts, meaning shifts with it.3
This is the habit a Japanese learner must not import. In Japanese, no word's meaning is ever distinguished by where loudness or length falls. The only suprasegmental feature that can carry a minimal-pair contrast is pitch, and even pitch is restricted to a single fall location per word.19
What Japanese uses instead
Pitch alone, no loudness change
The typological definition is precise: a pitch-accent language has "certain syllables in words or morphemes that are prominent, as indicated by a distinct contrasting pitch rather than by volume or length."2 Loudness and length are not part of the contrast. Pitch is the entire signal.
Japanese is the canonical example. Wikipedia's Pitch-accent language page names Japanese alongside Swedish, Norwegian, Lithuanian, Latvian, Serbo-Croatian, Slovenian, Basque, Ancient Greek, and Vedic Sanskrit.2 The Japanese pitch accent article gives the native term, 高低アクセント, and treats Tokyo Japanese as the reference variety.1
A pitch-accent system is culminative: it has at most one prominent unit per word. This is the property pitch accent shares with stress. It is also the property that separates pitch accent from tone languages, where "practically every syllable can have an independent tone."2
Hyman 2009 argues that "pitch accent" is not a coherent typological category and that the languages so labelled are better analyzed as restricted tone systems.10 The label is still the standard one in Japanese-as-a-second-language teaching, and this article uses it. The full caveat lives in Good to know below.
Mora-equal duration carries the rhythm
The mora-vs-syllable article carries the rhythm-axis half of the story in full. The short version is enough here.
Japanese is mora-timed: each mora is heard as an equal beat. The classification is standard in the isochrony literature.4 It is also supported by experimental evidence that native Japanese listeners segment speech moraically, while English listeners segment by stress and French listeners by syllable.8
Strict millisecond-level isochrony is contested. Han 1962 presents phonetic evidence for approximately equal mora duration;11 Beckman 1982 argues against strict isochrony and treats the mora as a phonological rather than phonetic unit.12
For learner purposes, the relevant fact is the perceptual equality documented by Otake et al. 1993: Japanese listeners hear each mora as an equal beat, regardless of whether the acoustic durations are millisecond-exact.8
The pedagogical consequence is direct. Because Japanese rhythm is mora-counted rather than stress-counted, there is no alternation of strong and weak beats. Every mora carries equal perceptual weight, and the four-cue English stress package has nowhere to land.48
The HL contour is the prominence cue
In Tokyo-dialect Japanese, pitch accent is the location of a single H-to-L fall, or the absence of such a fall. From the Japanese pitch accent article: "when the accent is on a mora other than the first or the last, then the pitch has an initial rise from a low starting point, reaches a near-maximum at the accented mora, then drops suddenly on any following morae"; the accent itself represents "a steep fall from a high tone to a low tone."1 Vance 2008 is the standard book-length English-language source for the same description and for the privative nature of Tokyo pitch accent: a word either has an accent kernel or does not.13
The four Tokyo pitch patterns themselves (heiban, atamadaka, nakadaka, odaka) are the subject of the dedicated pitch-accent article that follows this one. Read this section as the typological frame. Read the next subcategory for the patterns themselves.
The two axes a learner is untangling can be laid out in a single picture.
Japanese sits at the intersection of mora-timed and pitch-accent; English sits at the intersection of stress-timed and stress-accent. The two languages disagree on both axes at once, which is why one-axis explanations under-deliver.
Two minimal pairs that prove pitch is the only signal
The classic demonstration that pitch alone carries the contrast is the ame pair. The morae, loudness, and length are the same. The pitch contour is the only difference.
The pitch pattern is a(H)-me(L): high on the first mora, dropping to low on the second. If a particle follows (が, は), the particle stays low. NHK marks this as an accent kernel on mora 1, the pattern called atamadaka.14
The pitch pattern is a(L)-me(H): low on the first mora, rising to high on the second. If a particle follows, it stays high (no fall). NHK marks this as heiban, accentless.14 Cutler and Otake 1999 use this exact pair in a lexical-decision experiment and confirm that the pitch contour is the only signal native listeners use to distinguish ame from ame.9
The hashi pair makes the same point. It also adds a case where the contrast hides until a particle arrives.
The pitch pattern is ha(H)-shi(L), accent kernel on mora 1 (atamadaka). The Japanese pitch accent article quotes this directly: "the sequence 'hashi' spoken in isolation can be accented in two ways, either háshi (accent on the first syllable, meaning 'chopsticks') or hashí (flat or accent on the second syllable, meaning either 'edge' or 'bridge')."1
The pitch pattern is ha(L)-shi(H), with a fall on any following particle: ha(L)-shi(H)-ga(L). This is odaka, accent kernel on the final mora; the fall happens on the particle, not within the word itself. Bridge and edge (端, heiban / accentless) sound identical in isolation and separate only when a particle attaches: 橋が is ha(L)-shi(H)-ga(L), and 端が is ha(L)-shi(H)-ga(H).114
The same five segments produce four meanings. Loudness and length are constant across the set. Pitch carries every distinction.
Why English-accented Japanese sounds foreign
Loudness on the wrong mora
The first symptom is loudness. An L1-English speaker reaches for one mora inside a Japanese word and "hits" it. This imports the loudness component of the stress-accent bundle into a language whose prosodic system does not use loudness for prominence.362 The result sounds emphatic rather than natural.
Muradás-Taylor 2022 documents how stubborn this pattern is. Twenty-one British-English-speaking learners showed 43% accuracy and 40% stability in Japanese pitch-accent production. There was no significant difference between a less-experienced group (250 instructional hours) and a more-experienced group (970 instructional hours); only 18% of words were produced both accurately and stably.15
The study measures pitch directly rather than loudness. But the underlying mechanism, transfer of the unbundled stress package, is the same; the loudness symptom is one face of it.
Lengthening on the wrong mora
English stress lengthens vowels.36 Importing that lengthening cue into Japanese is more than a rhythm error. Vowel and consonant length are phonemic in Japanese, meaning they can distinguish words, and a wrong length can flip a minimal pair.
おじさん13
"uncle"
Four morae: o-ji-sa-n. The second mora is a short /i/.
おじいさん13
"grandfather"
Five morae: o-ji-i-sa-n. The second and third morae together form a long /iː/ (bimoraic vowel). The two words differ only in the length of the /i/. An English speaker who reflexively lengthens a syllable they perceive as "stressed" can produce ojiisan when they meant ojisan, or the reverse.13
The same risk applies to おばさん and おばあさん (aunt and grandmother) and to any minimal pair where length is the only contrast. The full inventory of length contrasts is the business of The Japanese Vowel Inventory: Five Vowels, Done Right for vowels and Geminate Consonants (Sokuon っ): The Silent Pause for consonants. The point here is that lengthening cannot ride along with stress without doing damage.
Schwa reduction destroys morae
The most damaging transfer error is vowel reduction. Schwa reduction is the standard correlate of English stress37 and a property of stress-timed languages specifically, not of mora-timed ones.48 When an L1-English speaker imports it into Japanese, the equal-mora rhythm collapses into something that resembles English syllable structure.
A Japanese five-mora word like こんにちは has five equal beats: ko-n-ni-chi-wa. An L1-English speaker who imports stress-timing tends to render this as something like kuh-NEE-chee-wuh: three perceived syllables, with the second stressed, the first and last reduced to schwa, and the mora-N (ん) collapsed into the preceding consonant. The mora count is destroyed, the rhythm is lost, and the listener hears an accent.
The same collapse is easiest to hear in loanwords. English speakers say Tokyo in two stressed syllables, but the Japanese name holds four equal morae, the contrast drilled in Why "Tokyo" Is Two Syllables in English and Four Morae in Japanese: Loanwords as a Timing Drill.
The one place Japanese does suppress a vowel is devoicing, treated in Japanese Vowel Devoicing: Why です Sounds Like "Des". Devoicing is rule-governed (high vowels between voiceless obstruents), and the mora count survives intact. Schwa reduction in L1-English transfer is unconditioned and destroys the mora count itself. Devoicing can make a vowel inaudible while it still occupies its beat. Reduction deletes the beat.
The empirical anchor is Muradás-Taylor 2022 again: persistent instability of L1-English learners' Japanese prosody even after about 1,000 hours of instruction.15 Her measure is pitch accuracy, but the underlying stress-package transfer is the same mechanism that drives schwa reduction.
Pitch as melody, not as a prominence marker
English speakers use pitch all the time, but as melody riding on top of the stress package, not as the prominence signal itself.23 English pitch is one of four bundled cues. Japanese pitch is the entire signal.2
The practical consequence is that a learner can already vary pitch competently in their L1 and still produce wrong Japanese. In their L1 system, pitch is glued to whichever syllable carries loudness and length. Unbundling pitch from the rest of the stress package is the actual learning task. Pitch practice without that unbundling is not productive, which is Muradás-Taylor's empirical finding15 and the motivation for the drill in the next section.
The "flatten first" habit beginners should install
Why flatten before learning patterns
Layering correct Japanese pitch on top of an intact English stress habit is the documented sticking point for L1-English learners. Muradás-Taylor 2022 reports that at 970 hours of instruction, pitch-accent accuracy is still 43%, statistically indistinguishable from accuracy at 250 hours, and only 18% of words are produced both accurately and stably.15
The remedy in this article is to remove the English stress package before drilling specific pitch patterns. That is J-Compass's pedagogical position. Muradás-Taylor supplies the motivation (advanced L1-English learners' pitch is "inaccurate and unstable"15) but not the prescription.
More pitch drilling on top of an intact stress habit is what those 970 hours already represent. Removing the habit first changes what the drilling can do.
The drill: equal-mora delivery on a known word
Pick a greeting you already know and read it with deliberately equal loudness and equal length on every mora. Ignore pitch entirely for the moment. The goal is to feel the absence of a stress package, not to produce correct pitch.
こんにちは13
"Hello."
Five morae: ko-n-ni-chi-wa. The ん is its own mora, treated in full in The Mora-N (ん) and Its Four Allophones. This is the most diagnostic short word for the schwa-reduction error: five equal beats, not kuh-NEE-chee-wuh.
おはようございます13
"Good morning."
Nine morae: o-ha-yo-u-go-za-i-ma-su. The よう here is two morae (yo + u). The u is the long-vowel mora that lengthens the o, and the final su is often devoiced in Tokyo speech.
ありがとうございます13
"Thank you."
Ten morae: a-ri-ga-to-u-go-za-i-ma-su. Same long-vowel structure (とう = to + u). Deliver every mora at the same loudness and the same length.
Then layer pitch
Once flat delivery is stable, layer in the LH or HL contour the dictionary gives for that word.14 The four-pattern taxonomy itself is the subject of the next article in the pillar, which walks through the four Tokyo pitch patterns one by one. The flat baseline from this section is what the patterns layer onto.
What "flattening" is not
Flattening removes English word-stress prominence. It does not remove all melody.131 Japanese has rich phrase-level intonation on top of word-level pitch accent: question rise, emphasis, emotion. The drill above is a word-level drill and does not apply to phrasal melody. A robotic monotone delivery is not the goal or the destination.
Good to know
Pitch accent is not tone
Japanese pitch accent is not Mandarin- or Cantonese-style lexical tone. Tone languages assign a pitch contour to "practically every syllable" as part of the lexical entry. A pitch-accent language marks "only one prominent syllable per word, similar to stress languages, but achieving this through pitch rather than loudness."2 In Japanese, there is at most one accent kernel per word; in Mandarin, every syllable independently carries one of (typically) four tones. The shorthand: tone is per-syllable, pitch accent is per-word.2
Regional pitch systems are not the Tokyo system
Kansai (Keihan) pitch is a different system, not a broken Tokyo system. The Tokyo system is privative: it distinguishes the presence or absence of a single H-to-L fall. The Keihan system also distinguishes initial register (whether the word begins H or L), giving more contrastive patterns per word.131
The "flatten first" advice still applies to a Keihan-targeting learner because the underlying issue is the L1-English stress habit, not which Japanese pitch system the learner is aiming at.
Pitch accent is not required for being understood
Wrong pitch accent on ame or hashi will almost always be repaired by context. Cutler and Otake 1999 show that native Japanese listeners do use pitch accent in spoken-word recognition (priming with ame HL speeds responses to ame HL but not to ame LH). Pitch therefore matters at the level of word activation;9 but this is a laboratory effect using minimal-pair isolation.
In ordinary conversation, context disambiguates almost every pitch-accent minimal pair before the listener has to commit.9
The honest threshold is this: pitch accent matters for sounding native, for JLPT N1 listening accuracy, and for public-facing professional Japanese. It does not block basic comprehension.
Stress-timing and pitch-accent are two different axes
The rhythm axis (mora-timed, syllable-timed, stress-timed) and the prominence axis (pitch-accent, stress-accent, tone) are independent.423 The rhythm-axis taxonomy is the standard Pike 1945 tripartite system as carried in the isochrony literature;54 the prominence-axis taxonomy is the standard stress/pitch/tone classification.23
One axis is about how beats are spaced. The other is about what makes a beat stand out.
A language could in principle be mora-timed and stress-accented. Japanese happens to be mora-timed and pitch-accented. The two axes compound the difficulty for an L1-English learner who is shifting on both at once.
Why the term "pitch accent" itself is contested
Larry Hyman, in a 2009 paper, argues that "pitch accent" is not a coherent typological category. Drawing on a database of approximately 600 tone systems, he shows that the properties commonly invoked to define pitch-accent systems (obligatoriness, culminativity, privativity, metricality, distributional restriction) are neither all present in any one "pitch-accent" language nor absent from canonical tone systems. He concludes that "pitch accent" is at best a descriptive shorthand for restricted tone systems, not a typological natural class.10
The label remains pedagogically useful for learners and is universally used in the Japanese-as-a-second-language literature.131514 The 2009 date is worth pinning so that future readers can check whether the typological consensus has moved.
See also
- Japanese Pitch Accent: A Complete Beginner's Guide
- Japanese Pitch-Accent Notation: How to Read 0, 1, 2, 3 and the Overline Diagrams
- How to Read OJAD: The Online Japanese Accent Dictionary
- Pitch Accent for Japanese Verbs and Adjectives: The Binary Class Rule and Conjugation Shifts
- Japanese Compound-Word Pitch Accent: How Two Words Combine into One Accent Pattern
- Japanese Pronunciation Drills: A Daily 5-Minute Protocol with Minimal Pairs, Shadowing, and Record-and-Compare