Why "Tokyo" Is Two Syllables in English and Four Morae in Japanese: Loanwords as a Timing Drill
Why Tokyo is four morae in Japanese but two syllables in English comes down to a unit mismatch. Japanese counts beats by mora, while English counts beats by syllable. When words travel between the two languages, they often arrive with a beat count an English ear has never heard.12 This article counts the morae in Tokyo and a handful of other words English speakers already say, then turns the gap into a daily timing drill.
Overview
The promise in one sentence
Words you already say in English carry a hidden Japanese mora count. Tokyo has four morae in Japanese (と・う・きょ・う) and two syllables in standard English; that four-versus-two gap is the worked headline of this article.23 Karaoke, Pokémon, judo, and other words borrowed back into English show the same pattern.
A mora is a timing unit, not a syllable. Japanese is mora-timed: each mora fills its own slot in time. English is stress-timed: unstressed material compresses. Treating Japanese words with English timing is the single most common source of foreign-accented pronunciation for English first-language learners. The loanwords you already know are the easiest place to feel the difference.14
Who this is for
This article is for an absolute beginner, pre-N5 through N5. You have met hiragana and heard the word mora, but you cannot yet hear long vowels or the moraic nasal ん in real time.5 Measured data backs the difficulty: elementary British learners of Japanese produce two-mora words at 86.5% of the duration of three-mora words, while native speakers produce them at 73.2%. In other words, English first-language learners compress the moraic gap.5 The gap is a pattern, not a stereotype.
What a mora is (one-minute version)
A mora is a sub-syllabic unit of timing. The duration of a Japanese utterance scales linearly with how many morae it contains; this is the foundational mora-timing finding.4 Mora isochrony, or near-equal mora timing, is approximate rather than strict. Even so, the trend is strong enough that Japanese speakers themselves count by mora when they compose haiku in 5-7-5.2
One kana, one beat, with three additions
In Japanese, each non-yōon kana is one mora.12 On top of that base rule, three "special morae" (特殊拍 tokushuhaku) each count as a full mora, even though they do not carry their own syllable vowel:1267
| Special mora | Symbol | Counts as | Example |
|---|---|---|---|
| Long vowel (chōon) | ー / repeat | 1 mora | とう = と + う (2 morae); カー = カ + ー (2 morae) |
| Moraic nasal | ん / ン | 1 mora | ほん = ほ + ん (2 morae) |
| Geminate (sokuon) | っ / ッ | 1 mora | きって = き + っ + て (3 morae) |
Yōon (拗音), or contracted sounds, are the one place where two kana fuse into a single mora. A non-palatal Ci-kana, such as a kana in the i-row, plus small ゃ/ゅ/ょ (きゃ, しゅ, きょ) is one mora, not two.78 In 東京, きょ is a single yōon mora.
Each full-size kana is one tap. Small ゃ/ゅ/ょ attach to the previous kana to make one yōon mora. Small っ is its own silent tap. Chōon (the second う in とう, the second い in いい, the ー in カー) is its own tap.79
At normal speech rate, a Japanese long vowel measures roughly 138 ms and a short vowel roughly 86 ms. That is a long-to-short ratio near 1.6.10 Absolute values shift with speaking speed, but the ratio holds.
Why English speakers undercount
English is stress-timed: spacing falls between stressed syllables, and unstressed material compresses to fit. Japanese is mora-timed: each mora gets its own slot. An English ear trained to compress unstressed material treats long vowels, ん, and っ as "nothing" because English has no equivalent contrast.14
Measured data confirms the bias: British learners whose first language is English have smaller relative spacing between two-mora and three-mora words than native speakers do. That is exactly what syllable-counting carry-over predicts.5
Counting Tokyo: the worked example
English "Tokyo" as two syllables
The standard English pronunciation /ˈtoʊ.kioʊ/ (commonly transcribed TOH-kyoh) has two syllables for most speakers. It sometimes has three when /kjoʊ/ is broken into /ki.oʊ/.11 The two-syllable form is the usual one.
The written form Tokyo, the passport spelling, strips the macrons that Hepburn writes over both long vowels.1213 The spelling hides the two beats that the Japanese form carries. An English reader has no spelling cue telling them where those long vowels are.
Japanese 東京 (とうきょう) as four morae
The kana breakdown of 東京 is と・う・きょ・う, four morae. きょ is one yōon mora because the small ょ fuses with き. Each う is a separate chōon mora lengthening the preceding /o/.378
東京3
"Tokyo."
A learner-friendly way to see the mismatch is to lay the two counts side by side.
The diagram shows the article's main point in miniature: same word, two units, two different counts.
Standard Tokyo Japanese gives 東京 a heiban (type-0) pitch contour: flat, with no drop. The first mora is low, and the remaining morae plus any following particle are high.14153 Pitch is a separate layer from mora timing. The four equal-time beats sit underneath whichever contour the dialect assigns.
東京駅で会いましょう。14
"Let's meet at Tokyo Station."
東京は日本の首都です。14
"Tokyo is the capital of Japan."
Why the English form lost two beats
There are three romanizations of the same word. Each one tells the learner something different about the morae:1213
| Notation | Spelling | What it shows |
|---|---|---|
| Modified Hepburn | Tōkyō | Both long vowels marked with macrons |
| Wāpuro / Kunrei-shiki | Toukyou | Both long vowels written as doubled vowels |
| Passport / road sign | Tokyo | Both long vowels invisible |
All three encode the same とうきょう. None of them is wrong. The bare Tokyo spelling is the only one that hides the morae. It is also the spelling almost every English speaker meets first.
More worked loanwords
Karaoke: 3 English syllables vs 4 Japanese morae
カラオケ has four morae: カ・ラ・オ・ケ. There are no long vowels, no ん, and no っ; the mismatch is purely a syllable-count compression in English.16 Standard Tokyo pitch is heiban (type 0).1416
カラオケに行きませんか。14
"Want to go to karaoke?"
The word is wasei-eigo, or Japanese-made English: a Japanese compound of kara 空 ("empty") + oke (clipped from オーケストラ, "orchestra"). It was coined in 1970s Japan and borrowed back into English from around 1979.1617 The common English pronunciation /ˌkæriˈoʊki/ collapses /ka/ + /ra/ into "carry" and shifts stress to the third syllable. That produces three English syllables for four Japanese morae.11
Pokémon: 3 English syllables vs 4 Japanese morae
ポケモン has four morae: ポ・ケ・モ・ン. The final ん is its own mora: a full beat, not a coda riding on /mo/.1812
ポケモンが好きです。14
"I like Pokémon."
The etymology is wasei-eigo again: a clipped compound of ポケット (poketto, "pocket") + モンスター (monsutā, "monster"). The franchise debuted in Japan in 1996 and in English in 1998.18 English /ˈpoʊ.keɪ.mɒn/ assigns three syllables (POH-kay-mon). ん is absorbed into the final English syllable instead of getting its own beat.11
The moraic nasal ん has several allophones (uvular, alveolar, bilabial, velar) depending on what follows it. Said alone, as in ポケモン at the end of a sentence, ん is typically realized with a uvular or velar closure made toward the back of the mouth, not the alveolar [n] English speakers default to. It is also a full mora with its own beat in time.1
Judo: 2 English syllables vs 4 Japanese morae
じゅうどう (柔道) has four morae: じゅ・う・ど・う. じゅ is one yōon mora; each う is a chōon mora lengthening the preceding vowel.12717 The English spelling judo collapses both long vowels and renders the word as two syllables.
柔道を習っています。14
"I'm learning judo."
The etymology is 柔 jū ("soft, gentle") + 道 dō ("way"). The word naturalized into English without macrons by the early twentieth century.17 The same shape recurs across Japanese vocabulary: a two-kanji compound where every kanji carries a long-vowel mora. English routinely flattens it.
Bonus passes: sake, sumo, anime, Honshu, Shinjuku
Five more words show where the mora count matches and where it does not:
| Word | Kana | Morae | English syllables | What English loses |
|---|---|---|---|---|
| 酒 sake | さ・け | 2 | 2 ("sake") | Final vowel quality (/e/ to /i/), not mora count1417 |
| 相撲 sumō | す・も・う | 3 | 2 ("sumo") | The chōon う lengthening /o/12 |
| アニメ anime | ア・ニ・メ | 3 | 3 ("anime") | Vowel qualities only; mora count matches19 |
| 本州 Honshū | ほ・ん・しゅ・う | 4 | 2 ("Honshu") | Both ん and the chōon う127 |
| 新宿 Shinjuku | し・ん・じゅ・く | 4 | 3 ("Shinjuku") | The ん rides on /ʃɪn/127 |
本州は日本の島です。14
"Honshu is an island of Japan."
The pattern across the table is the same one Tokyo, karaoke, Pokémon, and judo show. When the Japanese form carries a chōon, an ん, or both, the English form drops the special-mora beats and keeps only the obvious vowel syllables.
How to retrain the timing
Mora timing is slow to install because the English stress-timed pattern is overlearned. The shadowing literature reports measurable pronunciation gains within 2–4 weeks of daily 10–20 minute practice. Even advanced English first-language learners narrow the gap with native durations but do not fully close it.520 The three drills below come from established pronunciation-training practice and move from the most mechanical to the most generative.
Drill 1: count beats by tapping
Tap one finger per mora while reading a printed kana word at a steady pace, around 100–150 ms per tap. Equal time per tap matters more than the exact tempo. Tanaka and Kubozono's pronunciation teaching text recommends explicit mora counting and beat tapping as the foundational step for learners whose first language is not mora-timed.4109
Four taps, equal time. If two of the taps want to merge, that is the English habit speaking. Start the word over.
Drill 2: minimal-pair pairs that hurt to miss
A small set of minimal pairs brings the contrast into focus because the wrong duration produces the wrong word.
| Short form | Long form | What changes |
|---|---|---|
| おばさん "aunt" | おばあさん "grandmother" | One chōon mora (3 vs 4 morae)10 |
| おじさん "uncle" | おじいさん "grandfather" | One chōon mora (3 vs 4 morae)9 |
| ここ "here" | こうこう "high school" | Two chōon morae (2 vs 4 morae)9 |
| きて "come (te-form)" | きって "stamp" | One sokuon mora (2 vs 3 morae)17 |
Record yourself reading each pair. Compare it against a native recording, and keep going until the duration ratio between the two columns matches.
Drill 3: re-romanize your loanword list
Take every English word you know that comes from Japanese, write its kana spelling beside it, and re-romanize the kana into wāpuro form: Tokyo → Toukyou, judo → juudou, Honshu → Honshuu, sumo → sumou.1213 If the wāpuro form has more letters than the passport form, the extra letters mark hidden morae you have been dropping. Add each such word to the daily tap-and-shadow list until the longer form feels natural.
How long before this sticks
Two to four weeks of daily 5–10 minute drilling brings noticeable timing change on familiar words. Unfamiliar vocabulary takes longer because the first-language stress-timed pattern reasserts itself whenever the brain is busy retrieving meaning.520 The goal is not to count beats forever. It is to build enough timing reflex that the count happens without thought.
Good to know
Tokyo, Tōkyō, and Toukyou are the same word
The three forms are three notations for one Japanese form. Modified Hepburn writes the macron (Tōkyō). Wāpuro or Kunrei-shiki doubles the vowel (Toukyou). The road-sign and passport variant strips both diacritics and the doubled vowel (Tokyo).
All three encode the same とうきょう, and none of them is wrong in its own context.1213 Only the bare Tokyo hides the morae from an English reader.
English stress is not Japanese timing
The English form Tokyo takes lexical stress on the first syllable (TOH-kyoh). The Japanese form 東京 gives each mora equal time and does not stack English-style stress.1415 Importing the English stress pattern produces a foreign-accented Japanese pronunciation. The first step is to give every mora its own slot, with no compression on the unstressed beats.
The silent っ is also a mora
The sokuon (っ) is a hold mora: a full mora-length silence or closure that English routinely ignores. kite (きて, 2 morae, "come") and kitte (きって, 3 morae, "stamp") are different words. The third mora in kitte is the silent っ. Learners undertime it because English has no comparable contrast.17 The correct form holds the closure on /t/ for one mora before release:
切手を買いました。14
"I bought a stamp."
Some "Japanese" English words hide morae
Tycoon (1857 in English) is from 大君 taikun ("great lord"), a title used for the Tokugawa shogun in foreign correspondence. たいくん has 4 morae, and English flattens the long /uː/.2117
Honcho (1947, U.S. military English) is from 班長 hanchō ("squad leader"). はんちょう has 4 morae (は・ん・ちょ・う). English collapses it to 2 syllables, dropping both the moraic ん and the chōon う.2217
Umami was coined in 1908 by chemist Ikeda Kikunae at Tokyo Imperial University. うまみ has 3 morae. English borrowed it with the mora count preserved, then flattened the vowel qualities in actual pronunciation.23
Kyoto and Osaka hide long vowels too
Kyoto (京都, Kyōto) has 4 morae (きょ・う・と) but only 2 English syllables. The first chōon is dropped. Osaka (大阪, Ōsaka) has 4 morae (お・お・さ・か) but only 3 English syllables. The initial chōon is dropped.1213 The pattern is identical to Tokyo: the passport spelling strips the long-vowel marker, and the English pronunciation drops the mora.
Tokyo means "eastern capital" and dates from 1868
東 (east) + 京 (capital). The name marks the 1868 imperial proclamation that renamed Edo and moved the capital from Kyōto to the new eastern capital.324 The four-mora pronunciation has been stable since then.
See also
- The Small つ (Sokuon): How to Read and Pronounce the Geminate Consonant
- Japanese Pitch Accent: A Complete Beginner's Guide
- The Japanese Vowel Inventory: Five Vowels, Done Right
- Japanese Pitch-Accent Notation: How to Read 0, 1, 2, 3 and the Overline Diagrams
- Japanese Speech Rate: How Fast Do Native Speakers Actually Talk?