Long vs. Short Vowels in Japanese: The Distinction Beginners Miss
Japanese long vs short vowels are a phonemic contrast. Holding a vowel for one extra beat selects a different dictionary word, not a more emphatic version of the same word.12 For an N5 learner who has already met おばさん and おばあさん in a textbook list, that one extra mora is the difference between calling someone an aunt and calling them a grandmother.34
Overview
Japanese has five vowel phonemes, /a, i, u, e, o/. Every one contrasts short against long.12 The long member is not a stretched-out short. It is a separate phoneme that occupies two morae, the rhythmic units that drive Japanese timing.2
In connected speech, a long vowel lasts roughly 2.5 to 3 times as long as its short partner. That ratio is preserved when speakers talk faster or slower.5 The contrast survives speech-rate changes because both members scale together.
Vowel length involves four separate pieces: (a) the phoneme (the sound), (b) the mora count (the timing), (c) the orthography (which kana spell the long form: おう vs おお, ええ vs えい, the katakana chōonpu ー), and (d) perception (whether an English-trained ear catches the duration). This article covers the phoneme, the mora count, and perception. Orthography is covered in a sibling article.
Vowel length is phonemic: changing length changes the word
What "phonemic" means in plain terms
A contrast is phonemic when swapping one feature for another changes which word is being said. Japanese vowel length passes that test. The consonants are identical, the vowel quality is identical, only the duration differs, and the dictionary entry changes.12
Attested minimal pairs, word pairs that differ in only one sound feature, span all five vowels: obasan (aunt) / obāsan (grandmother),34 hiru (leech) / hīru (heel), kegen (dubious) / keigen (reduction), tokai (city) / tōkai (destruction), and ku (district) / kū (void).2
おばさんは元気です。4
"(My) aunt is well."
おばあさんは元気です。3
"(My) grandmother is well."
The two sentences differ by a single mora. The dictionary entries they point at are two generations apart.34
おじさんはアメリカ人です。6
"(My) uncle is American."
おじいさんはアメリカ人です。7
"(My) grandfather is American."
Infants acquiring Japanese begin to notice vowel length between four and 9.5 months. By around 18 months, they treat duration as a phonological cue, not just a raw acoustic one.8 That developmental pattern is what a phonemic, language-specific contrast looks like, as opposed to a universal acoustic difference any baby could hear.
Why this is not "emphasis" or "drawing out a word"
English duration is a prosodic resource: a longer vowel signals stress, emphasis, or affect ("I'm sooo tired"), not a different word. Japanese duration is lexical: longer means a different word.18
Japanese still has expressive lengthening for affect, written in informal kana with the chōonpu ー. But it is optional and sits above the word's basic sound pattern, so the underlying word is unchanged.1 The contrast below is the same adjective in two emotional registers.
寒い!9
"(It's) cold!"
寒ーい!1
"(It's) cooold!"
The さむーい above stretches the い for emphasis and does not change the word. The おばさん / おばあさん contrast above works differently: the long vowel is part of the word's stored form, and shortening it picks a different lexical entry. English speakers tend to assume any vowel stretch is the first kind. In Japanese, both exist, and they use separate channels.
How to count: a long vowel is two morae
One kana, one beat: counting おばさん vs おばあさん
Japanese is mora-timed: the mora is the rhythmic unit. Kana spelling is a near-perfect mora alphabet, where each full-size kana corresponds to one mora.10 A mora is a beat of timing, not a syllable.
Counting kana gives the mora count. お-ば-さ-ん is 4 morae. お-ば-あ-さ-ん is 5 morae.102 The extra あ is the second mora of the long vowel /aː/. It is not a silent or decorative letter; it pays for itself in timing.
これはおばさんです。4
"This is (my) aunt." (4 morae: お-ば-さ-ん)
これはおばあさんです。3
"This is (my) grandmother." (5 morae: お-ば-あ-さ-ん)
A physical mora tap, one finger on the table for each kana, including the long-vowel second mora, makes the count external. It lets the 4-beat versus 5-beat difference register in the hand before it has to register in the ear.510 The technique uses the same mora-isochrony principle that defines mora-timed languages.
Elementary learners whose first language is British English systematically under-produce the durational difference. Experimental data put them at an 86.5% short-to-long ratio against native speakers' roughly 75%. In other words, learners shrink the long form toward the short one even when they think they have produced the contrast.10
The five long-vowel slots: aa, ii, uu, ee, oo
Each of the five vowels has a long counterpart that contributes a second mora. The table below maps each vowel to a typical hiragana spelling and one minimal-pair anchor.
| Long vowel | Hiragana realization (typical) | N5/N4 example | Mora count | Short partner | Source |
|---|---|---|---|---|---|
| /aː/ | あ + あ | おばあさん "grandmother" | 5 | おばさん (4) | 34 |
| /iː/ | い + い | おじいさん "grandfather" | 5 | おじさん (4) | 76 |
| /uː/ | う + う | くうき "air" | 3 | くき "stem" (2) | 1112 |
| /eː/ | え + い (most) / え + え (some native) | せんせい "teacher" | 4 | (no minimal pair at N5) | 1 |
| /oː/ | お + う (most) / お + お (some native) | こうこう "high school" | 4 | ここ "here" (2) | 11314 |
For /eː/ and /oː/, the most common spelling uses い and う as the second mora (せんせい, こうこう, とうきょう). A closed list of native words instead uses ええ and おお (おねえさん, とおい, おおきい).1 The spelling difference does not change the sound; both spell a single long vowel.
Orthography is a separate problem
This article handles the sound and the count. Spelling follows orthographic rules covered in dedicated articles: see Long Vowels in Hiragana for おう vs おお and ええ vs えい, and Long Vowels in Katakana for the chōonpu ー.1 You can recognize a long vowel by ear and by mora count without yet knowing why こう is written one way and こお another.
The most-confused pairs beginners must hear
Family-vocab pairs: おばさん / おばあさん, おじさん / おじいさん
The two kinship pairs are the standard textbook introduction to phonemic vowel length in English-language Japanese pedagogy.9 They are core N5 vocabulary,3476 keep every other segment identical, and carry a meaning gap (aunt vs grandmother, uncle vs grandfather) that students immediately recognize as socially costly to miss.
A single missed mora moves the person you are referring to by two kinship generations: short = parent's sibling, long = grandparent.3476
おばさんとおばあさんはちがいます。43
"Aunt and grandmother are different (people)."
おじさんとおじいさんはちがいます。67
"Uncle and grandfather are different (people)."
Single-character-different content words: ゆき / ゆうき, くき / くうき, ここ / こうこう
Beyond kinship terms, common-word minimal pairs appear across the vocabulary. Each entry below is a real dictionary headword, not a contrived example: ゆき "snow" (N5) versus ゆうき "courage" (N3),1516 くき "stalk" (N1) versus くうき "air" (N4),1112 and ここ "here" (N5) versus こうこう "high school" (N4).1314
雪がふっています。15
"It is snowing."
勇気があります。16
"(He/She) has courage."
ここでまってください。13
"Please wait here."
高校で日本語をならいました。14
"I learned Japanese in high school."
Loanword and proper-noun traps: 時計 vs 統計, 東京 in romaji
時計 (とけい, "clock") and 統計 (とうけい, "statistics") are a real-word minimal pair: the consonants are the same, and only the first vowel's length differs.1718 とけい is N5 vocabulary; とうけい is N2 vocabulary.
時計は十時です。17
"The clock says ten o'clock."
統計を勉強します。18
"(I) study statistics."
東京 (とうきょう, "Tokyo") has four morae (と-う-きょ-う), not the two ("toh-kyo") that the unmarked English romaji "Tokyo" suggests.19 Its standard spoken form is とうきょう /toːkjoː/, with a long /oː/ in both the first and second halves.
Unmarked English-language romaji, the form most beginners meet first, systematically drops macrons from loan-back place and personal names ("Tokyo," "Osaka," "Kyushu," "sumo"). That hides the long vowel and builds the mistake into English speakers' mental representations of the words.1 Modified Hepburn with macrons (Tōkyō, Ōsaka, Kyūshū, sumō) preserves the contrast.1
東京にすんでいます。19
"(I) live in Tokyo." (4 morae: と-う-きょ-う)
Why English speakers often fail to hear it
English uses duration as a stress cue, not a phoneme
In English, vowel duration varies with stress and with tense/lax vowel quality, but it does not by itself distinguish phonemes.18 English minimal pairs are built on vowel quality (bit / beat, full / fool), not on duration alone.
Because English listeners' phonological grammar treats duration as prosody, they tend to hear a Japanese long vowel as a stressed version of the same word. On first exposure, they miss the lexical contrast.810 The mistake is structural, not laziness. The L1 filter is doing its assigned job and routing duration to the wrong category.
Mora-timing reorders that priority
Japanese rhythm is mora-timed: word duration scales with mora count, not with syllable count.510 A long-vowel mora is rhythmically equivalent to any other mora. That is why a 5-mora word like おばあさん is reliably longer than a 4-mora word like おばさん in connected speech.
The long-vowel mora's extra beat is the same kind of timed unit as the geminate-consonant mora (the silent gap in きって) and the mora-N (the ん in しんぶん). All three are special morae that a mora-timed grammar counts.10
What native-Japanese infants do that adult learners must redo
Infants acquiring Japanese are sensitive to the short/long vowel distinction by about 9.5 months. They treat duration as a phonological cue by about 18 months; the contrast is present in infant-directed speech with duration as the consistent cue.8
Adult L2 learners arrive with their L1 phonological filter already in place. Explicit minimal-pair training is what re-categorizes duration from prosodic to lexical in the learner's mental phonology.10 Without that retraining, even advanced English-L1 learners under-produce long vowels in measurable ways: an 86.5% short-to-long ratio against native speakers' roughly 75%.10
How to train your ear and mouth
Tap the morae while you say the word
A physical mora-tap, one finger-tap per kana, makes the count external. It brings the difference between 4-mora おばさん and 5-mora おばあさん into the hand before it has to register in the ear.510 The technique uses the same isochrony-of-morae principle that defines mora-timed languages.
Drill with minimal pairs, not isolated words
Contrast pairs (obasan / obāsan, yuki / yūki, koko / kōkō) force the brain to treat duration as the only variable.10 Drilling one word in isolation does not train discrimination, because nothing in the input forces the learner's category boundary to sharpen.
L2 learners who improve on phonemic length contrasts do so after targeted exposure that pairs perception and production with feedback. This is the protocol implicit in the pedagogy literature on Japanese phonemic length acquisition.10
Record-and-compare with a native reference
Comparing your own recording to a native model helps you notice the durational gap your L1 filter is quietly closing.10 The cleanest objective check is the 4-mora versus 5-mora total word duration: if the recorded "obāsan" is not noticeably longer than "obasan," the long vowel is being collapsed.
Good to know
Long vowels are not "stressed" in Japanese
Adding a pitch peak or extra loudness to the long mora is an English-L1 reflex that does not exist in standard Japanese. Vowel length and pitch accent are independent contrasts.1 A long vowel may or may not carry the accent kernel, depending on the word's pitch-accent class, and there is no consistent rule that long vowels attract stress.
Long vowels and geminate consonants are both two-mora special beats
The long-vowel mora (the second half of おばあさん's /aː/) and the geminate-consonant mora (the silent gap in きって) are both special morae. Each carries a beat without contributing a full CV syllable.10 They sit in different slots in the mora template (vocalic vs consonantal) but follow the same mora-timing principle. The same family includes the mora-N (the ん in しんぶん), the third non-segmental mora.
Devoicing of a short vowel is not the same as a long vowel
High vowels /i/ and /u/ between voiceless consonants, or word-finally after a voiceless consonant, are routinely devoiced in standard Tokyo Japanese (です sounds like des, すき sounds like ski).1 Devoicing reduces a short vowel's audibility. It never turns a long vowel /uː/ or /iː/ into a short one, because the second mora is still timed even when the vowel is whispered. "The う is silent" (devoicing of a short vowel) and "the う lengthens the previous vowel" (chōon) are two different stories that share a kana.
Romaji traps: Tokyo, Osaka, sumo
Standard English spelling of Japanese place and personal names systematically omits the long-vowel signal: Tokyo for Tōkyō,19 Osaka for Ōsaka, sumo for sumō, judo for jūdō.1 This is not a transliteration error, but a conventional simplification. It conceals the lexical fact from English-speaking learners. The resulting two-syllable English pronunciation ("toe-kyo") encodes the wrong mora count.
See also
- Difficult Japanese Sounds by Native Language: An L1-by-L1 Pronunciation Guide
- Japanese Pitch-Accent Minimal Pairs: The Drill List You Must Hear
- The Japanese Consonant Inventory: Phonemes, Allophones, and the Kana Chart
- The Small つ (Sokuon): How to Read and Pronounce the Geminate Consonant