Skip to main content

Common Romaji Mistakes That Mislead Pronunciation

The common romaji mistakes that mislead English readers all share one root: romaji recycles the five vowel letters and the consonants r and f for Japanese sounds that those letters do not spell in English. As a result, sounding out a hiragana chart in romaji produces five predictable misfires.12 This article previews those five pitfalls: vowel quality, the flap r, the bilabial f in ふ, devoiced u, and the long-vowel notation crisis. It also points each one to the deeper treatment in the pronunciation pillar. It is written for absolute beginners, perhaps two weeks in, with kana still in progress, who have noticed that reading Japanese from a romaji chart produces something a Japanese listener struggles to recognize.3

The Short Answer

Romaji uses English letters for sounds that are not English

The same Latin letters that romaji uses for Japanese kana spell different sounds in English. The five Japanese vowels /a i u e o/ are pure, short, and consistent in every position, while their English counterparts vary widely with stress and frequently glide into diphthongs.12

The Japanese liquid phoneme /r/ is realized in standard Tokyo Japanese as an apical alveolar tap [ɾ], a single quick contact between the tongue tip and the alveolar ridge. It is neither the English approximant [ɹ] nor a Spanish trill.452

The kana ふ is the voiceless bilabial fricative [ɸ], produced with both lips, not the English labiodental [f] produced with the lower lip against the upper teeth.562

High vowels /i/ and /u/ are near-obligatorily devoiced in standard Tokyo Japanese between two voiceless consonants and after a voiceless consonant before a pause. This is why です sounds like "des" and すき sounds like "ski."789

Long-vowel notation in romaji fractures into at least four conventions (the macron ō, the doubled ou or oo, the MOFA passport OH for personal names, and the macron-stripped Hepburn fallback o), all spelling the same Japanese sound /oː/.3101112

What this article is and is not

This article previews each of the five pitfalls and points you toward the right next step. It is not a full pronunciation lesson. The full phonetic treatment of each sound belongs to the pronunciation pillar: vowel quality, the flap r, the bilabial ɸ, devoicing rules, and the realization of long vowels. Each is linked from the pitfall that introduces it.3

The University of Tokyo Komaba style guide supports the pre-N5 framing. The guide notes that learners encountering Japanese for the first time can "guess the actual sounds of these syllables more accurately from the Hepburn romanizations" than from other systems. It also warns that the approximation is not exact.3

Romaji is a kana mapping, not a pronunciation guide

Romaji (ローマ字, literally "Roman letters") is a kana-to-Latin mapping. Its design goal is to make Japanese readable to non-Japanese audiences who cannot read kana, not to teach the phonetic value of each kana.313

Why Romaji Misleads English Readers in the First Place

Romaji is a transliteration of kana, not a pronunciation guide

Romaji reuses the five Latin vowel letters because they are the available alphabet, not because Japanese vowels match English ones. The Komaba style guide frames the trade-off directly: "English speakers who do not know Japanese can guess the actual sounds of these syllables more accurately from the Hepburn romanizations" than from Kunrei-shiki. But the same guide treats this as a design compromise, not as phonetic accuracy.3

For the system-level policy framing, including how Hepburn diverges from Kunrei-shiki and Nihon-shiki, see the J-Compass article Romaji Explained: Hepburn, Kunrei-shiki, and Nihon-shiki. This article assumes that framing and focuses only on what romaji does to a learner's ear.

Hepburn was designed for English speakers, but the English-speaker assumption is the trap

Modified Hepburn is the system most learners encounter because Japanese government documents, passport offices, JR station signs, and the December 22, 2025 Cabinet notification all standardize on it for foreign-facing communication.101411 The system was chosen because an English reader sounding it out lands closer to the Japanese than other systems would. That is exactly what lulls the reader into thinking the Latin letters work the way they look.

The trap is not Hepburn's failure. It is the gap between "closer than other systems" and "phonetically accurate." Every pitfall in this article lives in that gap.3

The same kana, the same sound, two different romaji conventions:

東京とうきょうおおきい都市としです。310
"Tokyo is a big city."

Modified Hepburn writes that sentence as Tōkyō wa ōkii toshi desu, while Kunrei-shiki writes Tôkyô wa ôkii tosi desu. The kana and the spoken word are identical; only the romanization disagrees.310

The Five Pitfalls

Pitfall 1: The five vowels are not English vowels

Japanese has five short vowel phonemes, /a i u e o/, each with a length contrast (long counterparts /aː iː uː eː oː/). Each vowel keeps a consistent quality across every position. There is no English-style reduction to schwa in unstressed syllables and no diphthong glide on /e/ or /o/.12

In standard Tokyo Japanese, the IPA realizations are: /a/ as a central or near-central [ä], /i/ as a close front [i], /u/ as a close near-back vowel with unrounded or compressed lips [ɯ̟] or [ɯ̟ᵝ] (not the rounded English [u] of "boot"), /e/ as a close-mid front [e̞], and /o/ as a close-mid back [o̞].52

The English-reader trap has two parts. First, readers may treat romaji a as the English [eɪ] of "say" or the [æ] of "cat" instead of [ä]. They may read e as the English [iː] of "see" or the schwa of "the" instead of [e̞], and o as the diphthong [oʊ] of "go" instead of the pure [o̞].1 Second, readers may treat romaji u as the rounded English [u] of "boot," forgetting that Japanese /u/ is unrounded or lip-compressed.52

Pure short vowels demonstrated in a single word:

カラオケをうたう。1516
"I sing karaoke."

The diphthong-free o and the schwa-free e in a place name:

京都きょうときます。3
"I am going to Kyoto."

The unrounded u:

うみはきれいです。52
"The sea is beautiful."

This pitfall is only a preview. The deep treatment of vowel quality, including the unrounded u and the closed Japanese e and o, belongs to the pronunciation pillar.

Pitfall 2: The Japanese r is a flap, not the English r (and not the Spanish trill)

The Japanese liquid phoneme /r/ is most commonly realized as an apical alveolar tap [ɾ]. The tongue tip makes a single brief contact with the alveolar ridge (the bony ridge just behind the upper teeth) and is released by the airflow.452

The same articulation produces the English [ɾ] that occurs as the t or d in American "butter," "ladder," and "city" between vowels. It is not the English approximant [ɹ] of "red," nor the Spanish trill [r] of "perro."42

Allophonic variation, variation in how the same phoneme is pronounced, is wide in Tokyo Japanese. The lateral approximant [l], the apical postalveolar tap [ɾ̠], and other variants occur, but the canonical realization for the romaji-reading learner to internalize is the tap [ɾ].4172

The English-reader trap is to harden /r/ into the English approximant (ら as "raw") or to trill it on the Spanish model (ら as "rrra"). The closer English analogue is the t in "butter," not the r in "red."42

The flap in a common verb:

べる4
"to eat."

The flap in a frequent noun:

さくら4
"cherry blossom."

The flap reduplicated in a mimetic word:

ぱらぱら418
"(rain) pattering / drizzling."

The English r and Japanese r trap is bidirectional

Japanese learners of English often struggle to keep English /r/ and /l/ distinct because the apical alveolar tap [ɾ] is the single liquid phoneme in their inventory. English [ɹ] and [l] both map onto it perceptually. English learners of Japanese over-correct in the opposite direction, hardening every Japanese flap into an English [ɹ].42

Pitfall 3: The fu in ふ is made with the lips, not the lip-and-teeth

The kana ふ realizes /h/ before /u/ as the voiceless bilabial fricative [ɸ]. It is produced by forcing air through a narrow gap between nearly closed lips, with no involvement of the upper teeth.562

The English consonant written f is the voiceless labiodental fricative [f], produced by placing the lower lip against the edge of the upper teeth. The two sounds are acoustically and articulatorily distinct.6

ふ is the only kana in the は-row whose romaji form breaks the column pattern: the row is ha, hi, fu, he, ho in Modified Hepburn. Kunrei-shiki keeps the column-regular hu because ふ is phonemically /hu/, with [ɸ] as the allophone of /h/ before /u/. If you see fu in one source and hu in another, you are not looking at two different sounds.532

The English-reader trap is to pronounce romaji fu with a clean English [f]. The kana represents [ɸɯ], which an English speaker can approximate by shaping the lips as for blowing out a candle.52

The canonical word with the bilabial fu:

富士山ふじさんえる。5
"Mt. Fuji is visible."

Native vocabulary with ふ:

布団ふとんく。5
"I lay out the futon."

ふ in a frequent noun:

ふくる。52
"I put on clothes."

Pitfall 4: The u in です and ます is often silent

Tanner, Sonderegger, and Torreira (2019), following Maekawa and Kikuchi (2005) and Fujimoto (2015), state the Tokyo-Japanese devoicing rule this way: high vowels /i/ and /u/ are near-obligatorily devoiced between two voiceless consonants or after a voiceless consonant before a pause.789 The rule applies only to the short high vowels /i/ and /u/. Long vowels, non-high vowels (/a e o/), and high vowels next to a voiced consonant or another vowel are not devoiced under the same rule.78

The NHK Pronunciation Accent Dictionary, the authoritative reference for standard Tokyo broadcasting Japanese, marks devoiced vowels in its entries as the normative realization in those environments.18

The English-reader trap is to pronounce every romanized vowel because every letter is on the page. The romaji desu and masu write u because the kana す is on the page. But in the standard pre-pausal environment (after the voiceless [s], before a pause), the vowel is realized voiceless or absent.78918

The copula です in pre-pausal position:

学生がくせいです。7818
"I am a student."

The polite suffix ます in pre-pausal position:

きます。7818
"I will go."

The vowel /u/ between two voiceless consonants:

スキーがきです。789
"I like skiing."

The vowel /i/ between two voiceless consonants:

あめりました。789
"It rained."

Devoicing is a feature, not a casual contraction

A learner who hears "des" instead of "desu" sometimes assumes the speaker is being casual or sloppy. The opposite is true. The devoiced form is the standard Tokyo realization, marked as such in the NHK Pronunciation Accent Dictionary. Over-articulating the final /u/ of です and ます produces a non-native pronunciation, not a more formal one.7818

Pitfall 5: The long-vowel notation crisis (ō vs ou vs oo vs oh)

The single Japanese sound /oː/ appears in print under at least four conventions. The disagreement is between romanization systems, not between dialects or speakers.310

Modified Hepburn writes long /oː/ with a macron: Tōkyō, Kyōto, Ōsaka. The Komaba style guide accepts the circumflex (Tôkyô) as the typographic fallback when macrons cannot be set.3 The doubled-letter form (oo or ou) is the wāpuro / IME typing convention. It is the form an IME accepts to produce とうきょう or おおきい: typing toukyou or tookii produces the kana. The 1954 Cabinet Notification accepted oo for native long vowels, and the December 22, 2025 Cabinet notification continues to allow both macron and doubled-letter forms.3101413

Macron-stripped Hepburn (Tokyo, Kyoto, Osaka) is the most common public-facing form. Most English-language books, signage, and news outlets drop diacritics that their typesetting tools cannot reliably set. The underlying kana and sound are unchanged.3

The MOFA passport "OH" form is a personal-name convention only. Japanese passport applicants may render long ō in their surname as OH (permitted from 2000-04-01) or as OO / OU (permitted from 2008-02-01). This gives four legal spellings of 大野 on a passport: Ono, Ohno, Oono, Ouno.1112

The English-reader trap is to treat Tōkyō, Toukyou, Tookyoo, and Tokyo as four different words. Another trap is to read the macron-stripped Tokyo as if the o were short and the second o silent. The actual word is the four-mora /toː.kjoː/ (とう・きょう).310

The same kana spelled five ways:

東京とうきょう3101112
"Tokyo."

Modified Hepburn renders that as Tōkyō. The wāpuro typing form is Toukyou. The 1954 native-long-vowel doubled form is Tookyoo. The macron-stripped Hepburn fallback used on most signage and in most English-language print is Tokyo. A personal-name-only OH form would be Tohkyoh if the word were a surname. It is not, so this fifth form is illustrative only.

A personal name with four legal passport spellings:

大野おおのさん1112
"Mr. / Ms. Ono."

On a Japanese passport, that surname may be rendered Ōno (Hepburn with macron), Ohno (MOFA "OH" allowance), Oono (MOFA "OO" allowance), or Ouno (MOFA "OU" allowance). All four spellings are legally valid for the same person.1112

A common adjective where the doubled-letter form is native, not typing-only:

おおきいいえ313
"A big house."

The two romaji renderings are Ōkii ie (Modified Hepburn with macron) and Ookii ie (1954 Cabinet doubled form).313 For the system-level policy explanation, see the J-Compass article Romaji Explained: Hepburn, Kunrei-shiki, and Nihon-shiki. It covers the long-vowel notation crisis at the level of the systems themselves.

The OH passport form is for personal names only

The MOFA "OH" allowance is restricted to personal names on Japanese passports. It is not a legal spelling for place names or common nouns. The four-spelling illustration for Tokyo above lists Tohkyoh only to show what the convention would produce. The actual public-facing spelling of 東京 cycles between Tōkyō, Tokyo, and Toukyou, never Tohkyoh.1112

Preview: Where Each Pitfall Gets Fixed Properly

Each of the five pitfalls has a dedicated treatment in the pronunciation pillar: vowel quality for Pitfall 1, the flap r for Pitfall 2, the bilabial fricative [ɸ] for Pitfall 3, the high-vowel devoicing rule for Pitfall 4, and the realization of long vowels for Pitfall 5. Each is linked from the pitfall above where it first comes up.

The practical fix is two-fold.

First, hear the sound, do not read it. A kana app with audio teaches the vowels, the flap, and the bilabial fu faster than any chart of letters. Every chart is romaji, and every romaji chart is a source of the misfires this article catalogs.3

Second, move off romaji entirely once kana is reliable. The disagreements between romaji systems disappear the moment you read kana. That is the deeper argument running through every pitfall above.310

Good to know

Karaoke is the canonical worked example

The Japanese word カラオケ (karaoke) is a clipped compound of 空 (kara, "empty") and オケ (oke, short for オーケストラ ōkesutora "orchestra"). It is pronounced as a four-mora /ka.ɾa.o.ke/.1516 The pronunciation demonstrates four of the five pitfalls in one breath: pure short vowels, the apical alveolar flap [ɾ] for the r, no diphthong on the final e, and no long vowels to confuse.411516

The English mispronunciation KAR-ee-oh-kee illustrates every trap the learner is trying to escape: English [æ] in the first syllable, schwa plus [iː] in the second, an English approximant [ɹ] for the flap, an English diphthong [oʊ] on the third, and [iː] on the final mora that should be a short [e̞].12 A learner who can say ka-ra-o-ke correctly has already internalized most of the English-reader vowel-and-flap traps.

The correct realization:

カラオケ1516
"karaoke."

Why fu looks like an exception in the kana chart

The は-row in hiragana romanizes as ha, hi, fu, he, ho in Modified Hepburn, not hu. Hepburn writes fu because the sound is closer to f-like than h-like for an English ear. Kunrei-shiki writes hu because ふ phonemically belongs to the h-column, with [ɸ] appearing as the allophone of /h/ before /u/.532 If you encounter both fu and hu in different sources, you are not looking at two different sounds.

For the policy-level treatment of why the two systems diverge here, see the J-Compass article Romaji Explained: Hepburn, Kunrei-shiki, and Nihon-shiki.

Why the Tokyo problem is a romanization problem, not a pronunciation problem

The four common romaji spellings of Tokyo (Tōkyō, Toukyou, Tookyoo, Tokyo) all represent the same four-mora Japanese word とうきょう. A fifth Tohkyoh-style spelling appears only on Japanese passports for personal names.101112 The disagreement is between romanization conventions, not between dialects or speakers. Once you see the kana, the confusion disappears. That is the deeper argument for moving off romaji as soon as kana is reliable.3

The えい sequence is the pure-vowels rule's one exception worth flagging

The sequence えい (ei) in modern Tokyo Japanese is most often realized as a long [eː]. That means せんせい (sensei) sounds phonetically like /senseː/. The standard romanizations (Hepburn, Kunrei-shiki, and the December 22, 2025 Cabinet notification) all keep the spelling ei, not ē, because Japanese phonology and orthography treat the sequence as e + i rather than as a long vowel.310

The romaji notation is faithful to the kana, not to the pronunciation. This is the only place where reading the letters as written underperforms reading them as kana would suggest. It is also the cleanest small example of the broader point that romaji is a kana mapping, not a pronunciation guide.3

See also

References

Footnotes

  1. Tsujimura, Natsuko. An Introduction to Japanese Linguistics. 3rd edition. Wiley-Blackwell, 2014. 2 3 4 5 6

  2. Wikipedia contributors. "Japanese phonology." Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Japanese_phonology (limitation: used only as cross-verification for primary-source claims pinned to Vance, Okada, Tsujimura, and the IPA Handbook). 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

  3. Department of English Language / Komaba Organization for Educational Development, The University of Tokyo, Komaba. Recommended System for Romanizing Japanese, v1 (2009-04). http://park.itc.u-tokyo.ac.jp/eigo/UT-Komaba-Romanization-of-Japanese-v1.pdf 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

  4. Vance, Timothy J. The Sounds of Japanese. Cambridge University Press, 2008. 2 3 4 5 6 7 8 9 10

  5. Okada, Hideo. "Japanese." In Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press, 1999, pp. 117–119. 2 3 4 5 6 7 8 9 10 11 12 13

  6. International Phonetic Association. Handbook of the International Phonetic Association. Cambridge University Press, 1999. Chart of pulmonic consonants (entry ⟨ɸ⟩, voiceless bilabial fricative, IPA number 126). 2 3

  7. Maekawa, Kikuo, and Hideaki Kikuchi. "Corpus-based analysis of vowel devoicing in spontaneous Japanese: an interim report." In Voicing in Japanese, edited by Jeroen van de Weijer, Kensuke Nanjo, and Tetsuo Nishihara. Mouton de Gruyter, 2005, pp. 205–228. 2 3 4 5 6 7 8 9

  8. Fujimoto, Masako. "Vowel devoicing." In Handbook of Japanese Phonetics and Phonology, edited by Haruo Kubozono. De Gruyter Mouton, 2015, pp. 167–214. 2 3 4 5 6 7 8 9

  9. Tanner, James, Morgan Sonderegger, and Francisco Torreira. "Durational Evidence That Tokyo Japanese Vowel Devoicing Is Not Gradient Reduction." Frontiers in Psychology 10 (2019): 821. https://www.frontiersin.org/articles/10.3389/fpsyg.2019.00821/full 2 3 4 5

  10. Agency for Cultural Affairs (文化庁). 「改定ローマ字のつづり方(答申)」, presented to the Minister of Education on August 20, 2025 (令和7年8月20日). https://www.bunka.go.jp/seisaku/bunkashingikai/sokai/pdf/94261201_01.pdf 2 3 4 5 6 7 8 9 10 11

  11. Ministry of Foreign Affairs of Japan (外務省). 「ヘボン式ローマ字綴方表」, official passport romanization table. https://www.ezairyu.mofa.go.jp/passport/hebon.html 2 3 4 5 6 7 8

  12. Tokyo Metropolitan Government, Citizens, Culture and Sports Bureau (東京都生活文化スポーツ局). 「氏名の表記(ヘボン式ローマ字等)」, passport-application guidance documenting MOFA non-Hepburn allowances including "OH" for long ō and "OO/OU" for おお/おう (effective 2000-04-01 and 2008-02-01 respectively). https://www.seikatubunka.metro.tokyo.lg.jp/passport/documents/0000000485 2 3 4 5 6 7

  13. Cabinet of Japan (内閣). 「ローマ字のつづり方」, Cabinet Notification No. 1 of 1954 (内閣告示第1号, 昭和29年12月9日), with Cabinet Directive No. 1 of the same date (内閣訓令第1号「ローマ字のつづり方の実施について」). Wikisource reproduction: https://ja.wikisource.org/wiki/ローマ字のつづり方_(昭和29年内閣告示第1号) 2 3 4

  14. 時事通信 (Jiji Press). 「ローマ字新表記、22日告示 約70年ぶり改定、ヘボン式基本に」, December 16, 2025. https://www.jiji.com/jc/article?k=2025121600387&g=soc 2

  15. 新村出 (Shinmura Izuru), ed. 『広辞苑』 (Kōjien), 7th edition. Iwanami Shoten, 2018. Entry: カラオケ (karaoke). 2 3 4

  16. 大辞林 (Daijirin), 4th edition. Sanseidō, 2019. Entry: カラオケ (karaoke). 2 3 4

  17. Akamatsu, Tsutomu. Japanese Phonetics: Theory and Practice. LINCOM Europa, 1997.

  18. 日本放送協会放送文化研究所 (NHK Broadcasting Culture Research Institute). 『NHK日本語発音アクセント新辞典』. NHK出版, 2016. https://www.monokakido.jp/ja/dictionaries/nhkaccent2/index.html 2 3 4 5 6