The Japanese ら-Row (ら り る れ ろ): The Alveolar Tap That Is Neither R Nor L
The Japanese ら-row (ら り る れ ろ) is built on a single consonant phoneme, conventionally written /r/. Its citation-form realization is the voiced alveolar tap [ɾ]: a single, brief contact of the tongue tip against the ridge behind the upper teeth.12 English ears tend to file each token under either /r/ or /l/ because English contrasts those sounds and Japanese does not. That is why "neither R nor L" is the most honest one-line description of the sound.13
Overview
The five kana ら り る れ ろ in hiragana and ラ リ ル レ ロ in katakana all share the same onset consonant, followed by /a, i, u, e, o/.14 That onset is one phoneme, not two. The perceptual scatter that English speakers hear comes from the listener, not the sound.35
Why this sound trips English speakers
English has two contrastive liquid phonemes (R-like and L-like sounds): /r/, a postalveolar approximant (typically [ɹ̠] in General American, often with rhotic constriction), and /l/, an alveolar lateral approximant.63 Swapping one for the other changes the word, as in right vs. light or row vs. low.6
Japanese has exactly one liquid phoneme.178 Its phonetic realization lands acoustically between the two English categories. English ears therefore tend to assign each token to whichever English bin it most resembles, producing the impression that the same kana is "sometimes R, sometimes L."
Best and Strange (1992) document the mismatch in both directions. Japanese listeners hear English /r/, /l/, and /w/ as variants of a single Japanese category, while English listeners hear Japanese /r/ tokens scatter between English /r/, /l/, and /d/.3
The diagram below shows the asymmetry. English maintains two boxes where Japanese maintains one; both languages cover roughly the same articulatory region, but they slice it differently.
Where you meet it
The same /r/ phoneme appears in the particle より (yori, "from / than"), in the conditional and provisional endings ~たら and ~れば, in Sino-Japanese morphemes such as 六 (roku, "six") and 料理 (ryōri, "cooking"), and in nearly every English loanword written with the ら-row kana.94 Standard Modified Hepburn romanization writes the row as ra ri ru re ro, and Kunrei-shiki uses the same letters.1
The kana row also carries every English /r/ and every English /l/ that gets borrowed into Japanese. The two source phonemes collapse onto one Japanese phoneme in loanword adaptation.9 This is why ライト covers both English light and English right with no spelling distinction.
The phoneme: one Japanese /r/, not two
IPA: [ɾ] as the citation form
The standard citation form of Japanese /r/ is the voiced alveolar tap [ɾ]. This means a single, brief contact of the tongue tip against the alveolar ridge, with voicing throughout.126 The IPA Handbook's Japanese illustration (Okada 1999) transcribes the phoneme as [ɾ] in citation. It notes that the prototype is "an apical tap, either alveolar [ɾ] or postalveolar [ɾ̠]."2
Labrune summarizes the same point in one sentence: Japanese /r/ is "consistently voiced, apical, alveolar or post-alveolar, with a short or weak closure."7 Ladefoged and Maddieson classify the sound as a tap (a single quick gesture) rather than a flap (a glancing movement from a different starting position). The two IPA terms are often used interchangeably for [ɾ].6
The Japanese phonetic-terminology label for this category is 弾き音 (hajikion, "flicked sound"), which is the term used in the NHK pronunciation reference and in mainstream Japanese phonetics teaching.10
Why "neither R nor L" is a phonemic statement, not just acoustic
The R/L contrast in English is phonemic: swapping the two changes the word, as in right vs. light, row vs. low, and correct vs. collect.63 In Japanese, no minimal pair distinguishes a hypothetical /r/ from a hypothetical /l/, because there is only one phoneme. The contrast is unaskable, not just absent.178
Loanword adaptation makes the collapse visible. English light and English right both borrow as ライト (raito). English glass and English grass both borrow as グラス (gurasu). English lock and English rock both borrow as ロック (rokku).9 No mechanism inside Japanese spelling can keep the source distinction.
Vance frames the conclusion sharply: the Japanese phoneme is not "in between" English /r/ and /l/ in any phonological sense; it simply does not participate in a contrast that English maintains.1
Articulation walkthrough
Step 1: find the alveolar ridge
The alveolar ridge is the bony shelf immediately behind the upper front teeth. In IPA terminology, "alveolar" means an articulation that places the tongue tip or blade at that ridge.611 For Japanese /r/, the active articulator is the tongue tip (apex), and the passive articulator is the alveolar ridge. This is what "apical alveolar" means in Labrune's description.712
English /r/, by contrast, is articulated with a bunched tongue body raised toward the postalveolar region, with no contact at the alveolar ridge.63 English /l/ holds the tongue tip in sustained contact at the alveolar ridge with airflow around the sides of the tongue.6 The Japanese target sits at the same place as English /l/ but with the contact released almost immediately.
Step 2: a single voiced tap, not a sustained contact
A tap is one quick gesture: the tongue tip moves to the alveolar ridge, makes momentary contact, and releases.611 The contact is too brief to build up oral pressure for a stop release, which is what keeps [ɾ] acoustically distinct from [d].26
Voicing is continuous through the gesture, which is what distinguishes [ɾ] from a voiceless tap.72 Spanish single-tap /ɾ/ (as in pero, "but") is articulatorily very close to Japanese /r/. Spanish trilled /r/ (as in perro, "dog") involves multiple taps and is not the Japanese target.13
Step 3: the "butter / water" English-tap anchor and its limits
In American English, /t/ and /d/ between vowels often surface as an alveolar tap [ɾ]. This is the intervocalic flapping rule: butter [ˈbʌɾɚ], water [ˈwɑɾɚ], city [ˈsɪɾi], ladder [ˈlæɾɚ].6 This intervocalic [ɾ] is phonetically the same gesture as the Japanese /r/ tap and is widely used as a pedagogical anchor for English-speaking learners.8
The limit of the anchor: American English never starts a word with [ɾ]; intervocalic flapping is a context-sensitive allophone of /t/ and /d/, not a word-initial option.6 Japanese uses /r/ in word-initial position freely in Sino-Japanese vocabulary and loanwords.4 Pure Yamato (native Japanese) vocabulary historically restricted morpheme-initial /r/. This is one reason native-stem ら-row words are sparse compared with Sino-Japanese and loanword ら-row words.14
Step 4: word-initial vs. intervocalic positions
Intervocalic /r/ (between two vowels) is the closest to the textbook tap, as in あれ (are), それ (sore), 桜 (sakura), and 心 (kokoro).12
桜の花がきれいです。15
"The cherry blossoms are pretty."
Word-initial /r/ and /r/ after the moraic nasal /N/ are often realized with momentary alveolar-ridge contact before release. They are "described variably as a tap, a 'variant of [ɾ]', 'a kind of weak plosive', and 'an affricate with short friction, [d̠ɹ̝̆]'."27 Arai (2013) documents that Japanese-acquiring children commonly substitute a stop for word-initial /r/, producing りんご "apple" as [dingo] or [gingo]. This is consistent with word-initial position being articulatorily closer to a stop than the intervocalic tap is.16
来年、東京に行きます。15
"I'm going to Tokyo next year."
For learners, the safe target in all positions is the textbook tap [ɾ]; the [d]-like onset is a native-speaker variant, not a pedagogical goal.48
Allophones and contexts
The single Japanese /r/ phoneme surfaces in a range of phonetic variants depending on position, speaker, and register. The map below collects the variants discussed in the descriptive literature. The citation form sits at the centre, and the others are positional or stylistic deviations from it.
Word-initial [d]-like onset
Okada describes the word-initial realization as one in which "the tip of the tongue is at first momentarily in light contact with the alveolar ridge before being released rapidly by airflow." He gives the IPA transcription [d̠ɹ̝̆] for this affricate-like variant.2 Labrune treats this as the same phoneme: the increased closure duration in initial position is a positional allophone, not a separate sound.7
The variant is most audible in careful or emphatic word-initial speech. In fluent connected speech, the difference between initial and medial tokens compresses.72
Retroflex flap [ɽ] in some speakers
Some speakers, particularly in informal or emphatic registers, retract the tongue tip (curl it slightly back) and produce a retroflex flap [ɽ] instead of the alveolar tap.17 An even more extreme variant is the alveolar trill [r] (multiple rapid taps), which Japanese phoneticians call 巻き舌 (makijita, "rolled tongue"); it is marked as rough or aggressive and is associated with stereotyped "tough-guy" speech.17
These variants appear in Vance and Labrune as part of the phoneme's allophonic range. They are not targets for learners and are not the citation form for broadcast Japanese.1710
Lateral-like realizations
Vance notes that "the apical alveolar or postalveolar lateral approximant [l] is a common variant in all conditions" of Japanese /r/.1 Akamatsu argues this lateral variant is better described as a lateral flap [ɺ] than as a sustained lateral approximant [l]. The reason is that the contact duration is too brief to match the English /l/ profile.12 A retroflex lateral [ɭ] also appears, particularly before /i/ and /j/, in some speakers.7
The existence of these lateral variants is why an English /l/ substitution by a learner sometimes sounds "acceptable" to a native ear. Still, the substitution is acceptable as a phonetically nearby variant, not as the citation form.112
Why the L1 substitutions fail
English /r/ substitution: the bunched-tongue trap
General American /r/ uses a bunched or retroflex tongue body, with no tongue-tip contact at the alveolar ridge and frequent pharyngeal constriction.63 Substituting this articulation into Japanese yields a long, rhotic vowel-like onset that bears no resemblance to a single tap. The tongue is in the wrong place and stays there too long.6
The corrective drill is mechanical: move the active articulator from the tongue body (English /r/) to the tongue tip (Japanese /r/). Then time the contact as a single touch, not a sustained gesture.84
Pronouncing 寿司を食べる as [sɯʃi o tabeɹɯ] (English bunched [ɹ] for る) is the clearest English-L1 error. The target is [sɯɕi o tabeɾɯ], with one tongue-tip touch at the alveolar ridge for る.63
English /l/ substitution: sustained contact
English /l/ holds the tongue-tip contact at the alveolar ridge for the duration of the consonant, with lateral airflow.6 The Japanese tap releases the contact almost immediately. Sustaining contact produces a mushy, English-accented impression.14
The corrective drill is to shorten the contact rather than change the place of articulation. The alveolar ridge is correct; the duration is not.8
Spanish or Italian trill: too many taps
Spanish has a contrast between /ɾ/ (single tap, as in pero) and /r/ (trill, multiple taps, as in perro).13 Japanese has only the single-tap variant. Substituting a Spanish-style trill is hyper-articulated and lands in the marked makijita register rather than neutral speech.17
Spanish-L1 learners typically have the right articulator and place but need to suppress the multi-tap habit on every ら-row kana.13
Korean ㄹ and Chinese r: close but not the same
Korean ㄹ is a single phoneme with two allophones (context-based pronunciations): an alveolar tap [ɾ] between vowels (e.g. 나라 nara, "country") and an alveolar lateral approximant [l] syllable-finally and before consonants (e.g. 말 mal, "word").17 The Korean intervocalic tap is articulatorily very close to Japanese /r/. Korean learners' main adjustment is to use the tap variant everywhere, including syllable-final position, rather than letting the lateral allophone surface.17
Mandarin /r/ (pinyin r, as in 日 rì) is a voiced retroflex approximant [ɻ] or fricative [ʐ], articulated with the tongue tip curled back without contact. It is acoustically and articulatorily distant from a tap.18 Mandarin-L1 learners need to flatten the retroflex curl and add a brief tongue-tip contact, rather than fine-tune the existing gesture.18
| L1 background | Default reflex | What to change |
|---|---|---|
| English | Bunched [ɹ] (postalveolar approximant) | Move from tongue body to tongue tip; touch ridge once and release. |
| English | Sustained [l] (lateral approximant) | Keep the ridge contact; release immediately. |
| Spanish, Italian | Trilled [r] (multiple taps) | Cap at exactly one tap per kana. |
| Korean | [ɾ] intervocalic, [l] elsewhere | Use the tap variant in every position. |
| Mandarin | Retroflex approximant [ɻ] or fricative [ʐ] | Flatten the curl; add a brief alveolar contact. |
Drills and minimal contrasts
The ら-row tongue twister
Sequential single-mora practice is the standard Japanese elementary-school drill for the ら-row. It is presented as a 早口言葉 (hayakuchi-kotoba, "fast-mouth words") ladder.4 The drill target is one clean tap per kana with no carry-over of contact between adjacent moras.
ら り る れ ろ。4
"ra ri ru re ro (recited as a row, like an English A B C D E)."
The classic ら-row twister stacks /r/ tokens in close succession. It is a standard test of whether the single-tap gesture survives at speed:
すもももももももものうち。1915
"Plums and peaches are both kinds of peaches."
Words that flush out the English-R reflex
Drill these in different positions so the tap appears word-initially, between vowels, and after vowels. Recite slowly enough to feel one touch per kana, then build up speed.
りんごを食べます。15
"I eat an apple."
これは桜の花です。15
"This is a cherry blossom."
よろしくお願いします。15
"Pleased to meet you."
Borrowed words that look like English
Loanwords are the highest-risk environment for this sound. The source word still echoes in the learner's ear and pulls the gesture back toward the English /r/ or /l/. The corrective tactic is to commit to the Japanese tap, not to bend back toward the source word.
レストランで晩ご飯を食べました。15
"I ate dinner at a restaurant."
ラジオを聞きます。15
"I listen to the radio."
Good to know
The butter / water / city anchor and its word-initial limit
The intervocalic flap in American English (the [ɾ] in butter, water, city, ladder) is articulatorily identical to Japanese intervocalic /r/. That is why it is the standard pedagogical anchor in introductory Japanese-phonology texts.68 The anchor has a limit. American English never starts a word with [ɾ], so drilling らりるれろ in word-initial position with the butter anchor still requires moving the gesture from "between vowels" to "at the start of a syllable."6
弾き音 hajikion as the Japanese-phonetics name
The Japanese phonetic-terminology label for the tap category is 弾き音 (hajikion, literally "flicked sound"). It comes from the verb 弾く (hajiku, "to flick, to strike off").10 The Japanese label captures the motor instruction ("flick the tongue") more directly than the English "flap" or "tap" does. Keeping the Japanese term in a study notebook keeps the articulation in mind.10
Sustaining the contact (English /l/ leakage)
A common error is to pronounce ありがとう with the tongue holding contact at the alveolar ridge through り, as in an English /l/. The correct form releases the contact immediately:
ありがとう。14
"Thank you." (with a single, brief tap on り, not a sustained lateral contact)
The rhythm of Japanese is mora-timed. If the consonant of one mora bleeds into the next, the word's timing collapses along with its sound.14
Bunching the tongue (English /r/ leakage)
A second common error is to pronounce 寿司を食べる with English-style bunched [ɹ] on the る. This gives [sɯʃi o tabeɹɯ] instead of the target [sɯɕi o tabeɾɯ]. The English approximant /r/ bunches the tongue body and never makes alveolar-ridge contact. Substituting it produces a rhotic colouring across the vowel that natives hear as a foreign accent.63 The corrective is mechanical: move the active articulator from the tongue body to the tongue tip. Time the contact as a single touch.
The trilled R (巻き舌) is socially marked
The multi-tap alveolar trill [r] is a real allophone of Japanese /r/, but it is socially marked. It is associated with rough, aggressive, or stereotyped "tough-guy" speech (the so-called "yakuza R").7 Learners producing a Spanish-style trill on every ら-row token sound either comical or threatening depending on context. The unmarked target is the single tap.713
Loanwords collapse English R and L
English light and right both become ライト. Grass and glass both become グラス. Lock and rock both become ロック.9 The collapse is direct evidence that the Japanese phoneme covers both English categories at once, not that it sits "between" them. Memorising one collision pair, such as ライト for light and right, anchors the abstract phonemic claim to something concrete.9
Don't trust romaji for this sound
Writing the kana as r in Hepburn is a one-way mapping. It tells a Japanese reader which kana is meant, but it tells an English reader to produce an English /r/, which is the wrong articulation.1 Treat the romaji r as an arbitrary label for the alveolar tap, not as a phonetic instruction.
See also
- How to Pronounce つ (tsu) in Japanese: The Voiceless Alveolar Affricate English Lacks
- How to Pronounce ふ (fu) in Japanese: The Voiceless Bilabial Fricative English Lacks
- Long vs. Short Vowels in Japanese: The Distinction Beginners Miss
- Why "Tokyo" Is Two Syllables in English and Four Morae in Japanese: Loanwords as a Timing Drill
- The Japanese Vowel Inventory: Five Vowels, Done Right
- Geminate Consonants (Sokuon っ): The Silent Pause