Minimal-Pair Production Drills in Japanese: Train Your Mouth to Say the Difference

Minimal-pair production drills in Japanese train your mouth to say two near-identical words distinctly: おじさん versus おじいさん, きて versus きって, はし versus はし. You may already hear that these differ, yet your own output may flatten them the moment you speak at speed.¹

This article is the production-side complement to ear-training on these same pairs. It gives you minimal-pair sets for the three contrasts learners collapse most: vowel length, geminate consonants, and pitch. It also gives you a say-record-compare loop that tells you whether your output actually matches the model.

Overview

The drills here are about output, not perception. The goal is Japanese minimal-pair speaking practice: produce the single differing feature reliably, then check your own recording against a native model.¹²

Perception and the underlying phonological theory live elsewhere and are assumed here: what a mora is, why Japanese is mora-timed, and the acoustics of each contrast. If you cannot yet hear a contrast cleanly, build that ear first, then come back to drill the mouth.

Production vs perception: why you can hear it but not say it

Perception and production are linked, but they are not the same skill. A learner can perceive a contrast reliably while still failing to produce it. That is exactly the situation these drills address.¹²

What a minimal pair is, in one line

A minimal pair is two words that differ in exactly one phonological unit, so the contrast between them isolates that single feature.³ In Japanese, this article drills three units: vowel length, the geminate consonant (sokuon っ, a doubled consonant), and pitch-accent placement.³⁴

The mora is the timing unit that makes vowel length and the geminate count as one beat of difference. This article assumes that unit rather than re-deriving it.

The output gap: ear ahead of mouth

Perception can run ahead of production. Targeted training on one side can transfer to the other, but the two skills are linked rather than identical. A sharp ear does not automatically deliver a sharp mouth.¹²

Bradlow and colleagues trained Japanese listeners only on perceiving English /r/ and /l/. They also measured an untrained gain in the listeners' production of the same contrast, showing that the two skills move together.² The transfer also runs the other way: Linebaugh and Roche gave learners explicit articulatory production training and found their perceptual discrimination of the trained contrasts improved as a side effect.¹

That two-way coupling is the rationale for a production drill. Practising the output also sharpens the ear, so a say-and-compare loop reinforces both skills at once.¹²

Why drilling output is worth the effort

The gains are durable. Bradlow and colleagues found Japanese trainees kept their improved /r/–/l/ production three months after training ended, as judged by native listeners.⁵ Skill built by these drills is meant to stick, not wash out after a session.

High-variability phonetic training, the research approach behind these drills, trains a contrast using many talkers and contexts. The key early study in this line trained Japanese listeners to identify English /r/ and /l/. It used a minimal-pair identification task built from multiple talkers and phonetic environments.⁶ Barriuso and Hayes-Harb review the method as an effective, research-backed bridge from lab research to classroom practice.⁷

The honest framing is modest. Whether perceptual training reliably transfers to production is still debated. Effect sizes vary across studies and contrasts. So the claim is "production training also helps perception, and vice versa," not "either one guarantees the other."⁷¹

The three contrasts you will drill

Three contrasts cause most of the collapses: vowel length, the geminate consonant, and pitch accent. Each fails in its own way, and each has a distinct mouth target.

The vowel-length and geminate pairs below are written in plain kana on purpose. Writing them in kanji would hide the very length or pause being drilled. The pitch pairs use kanji because their kana spelling is identical, and only the meaning distinguishes the words.

Vowel length: おじさん vs おじいさん

The contrast is a short vowel (one mora) against a long vowel (two moras) on じ. The only physical difference is how long you hold that vowel; everything else is identical.³

A Japanese long vowel is two moras of the same vowel quality. So おじいさん is o-ji-i-sa-n (five moras), compared with おじさん o-ji-sa-n (four moras). The mouth does not change shape; it sustains.³

Pair	Meanings	What differs	Length note
おじさん / おじいさん	uncle, middle-aged man / grandfather, old man	length of じ	4 moras vs 5 moras (long じい)⁸⁹
おばさん / おばあさん	aunt, middle-aged woman / grandmother, old woman	length of ば	4 moras vs 5 moras (long ばあ)⁸⁹
ゆき / ゆうき	snow / courage, bravery	length of ゆ	2 moras vs 3 moras (long ゆう)⁸⁹

All three pairs are everyday N5/N4-level words, which makes them safe drill material. You already know both meanings, so a slip is about pronunciation, not vocabulary.

A collapsed long vowel changes the meaning

Shortening the long vowel turns "grandfather" into "uncle" and "grandmother" into "aunt." This is a meaning change, not just an accent slip, and it is the standard textbook illustration of why length is phonemic in Japanese.³

Geminate (sokuon っ): きて vs きって

The contrast is a plain consonant against a geminate, or doubled, consonant. The geminate っ is a full mora of held closure or silence before the consonant is released.³

To produce きって, close at the /t/, hold a silent beat, then release into て. For きて there is no hold. The "extra sound" is really an extra pause.³

Pair	Meanings	What differs	Length note
きて / きって	come (て-form of 来る) / postage stamp	held beat before て	2 moras vs 3 moras (geminate っ)⁸
かこ / かっこ	the past / parenthesis, bracket	held beat before こ	2 moras vs 3 moras (geminate っ)⁸
さか / さっか	slope, hill / writer, author	held beat before か	2 moras vs 3 moras (geminate っ)⁸
いた / いった	was, existed (past of いる) / went (past of 行く)	held beat before た	2 moras vs 3 moras (geminate っ)⁸

The さか/さっか and いた/いった pairs are useful because the geminate appears after different consonants, /k/ and /t/. This lets you drill the held-closure target at more than one place of articulation.³

The held beat is a target, not a gap

Learners whose first language does not use geminates often skip the held beat, merging きって into きて. The mora of silence is a positive timing target, not the absence of one. Over-hold the pause at first, then dial it back.³

Pitch: はし vs はし vs はし (橋 / 箸 / 端)

The moras are identical (は-し). The length, consonants, and vowels are identical too. The only difference is where the pitch drops. This is a three-way set, not a pair. It is the standard illustration of phonemic pitch accent, where pitch changes word meaning, in standard (Tokyo) Japanese.⁴¹⁰

The target is pitch placement, not duration or force. You change which mora is high and where the voice steps down. You do not lengthen or stress anything.¹⁰

Because the words are only two moras, the contrast is easiest to produce and check with a following particle. The particle reveals the accent kernel, the point where the pitch drop is anchored. NINJAL (the National Institute for Japanese Language and Linguistics) states the standard behaviour directly: 「標準語では、『橋』は『し』の後ろで、『箸』は『は』と『し』の間で音が下がります。『端』は音が下がりません。」 ("In the standard language, 橋 drops after し, 箸 drops between は and し, and 端 does not drop.")¹⁰

The が test below is the diagnostic tool: add the particle が and listen for where the pitch falls.⁴¹⁰

Word	Kana	Accent type	Pitch with が (はしが)	Where it falls
箸はし (chopsticks)	はし	頭高 atamadaka (mora 1)	は↓しが (high-low-low)	between は and し
橋はし (bridge)	はし	尾高 odaka (last mora)	はし↓が (low-high-low)	after し, onto が
端はし (edge, end)	はし	平板 heiban (no kernel)	はしが (low-high-high)	does not drop

In isolation, 橋 (odaka) and 端 (heiban) sound the same: both low-high. The が test separates them, because only 橋 drops the pitch onto が.⁴¹⁰ This is why the drill tells you to practise the pair with a particle, not just the bare word.

The same pattern shows up in two more common pitch pairs you can drill with the が test.

Pair	Meanings	Accent types	Pitch note
雨あめ / 飴あめ	rain / candy	頭高 / 平板	雨 drops after mora 1 (HL); 飴 does not drop (LH)⁸⁴¹⁰
今いま / 居間いま	now / living room	頭高 / 平板	今 drops after mora 1 (HL); 居間 does not drop (LH)⁸⁴

Pitch is height and fall, not loudness or length

Learners often try to fix a pitch pair by making one member louder or longer. Japanese pitch accent is about pitch height and the location of the drop. Lengthening は in 箸 does not make it more 頭高; it just makes it wrong.⁴¹⁰

How to run a production drill

The method is a say-record-compare loop. It applies the perception-production link above to output practice: produce the contrast, compare your output to a model, then produce it again. This tight feedback loop trains the motor target and the perceptual category at once.¹²

The loop has four steps, then repeats. The diagram below shows the cycle.

Step 1: contrast first, say the pair back to back

Say both members of the pair one after the other, exaggerating the single difference. Saying them back to back makes the one differing feature especially clear in your own mouth. That is the minimal-pair logic applied to output.³

The point is to feel the one thing that changes (length, pause, or pitch fall) with everything else held constant.

Step 2: record yourself and the native model

Comparison with a model gives you the feedback signal. Record the native clip and your own attempt back to back, so the gap is audible on playback.¹²

The perception-production studies that show production gains all measure the learner's output against a native target. Recording both at home is the practical analogue. The full self-correction method belongs to a dedicated record-and-compare treatment. Here, just capture both and listen.

Step 3: diagnose which contrast collapsed

The three contrast types fail in three recognizable ways, so you can name the symptom and map it to the contrast.

A long vowel that came out short is a vowel-length collapse.
A geminate whose held beat vanished is a sokuon collapse.
A pitch pair where the drop landed on the wrong mora, or disappeared entirely, is a pitch-accent collapse.

For pitch specifically, the が test isolates the error: listen to whether the fall landed on the right mora when you added the particle.⁴¹⁰

Step 4: re-record until the pair is distinct

Do a few quick repetitions, stopping when both members are reliably distinct rather than "perfect." The training literature shows that gains come from repeated contrastive practice with feedback, and that those gains are retained. Short, reliable loops beat one long perfectionist take.⁵⁷

Building your own pair sets

Find more attested pairs by looking up suspected minimal pairs in a standard dictionary (大辞林, 広辞苑) to confirm that both members are real words. For pitch, confirm values in the NHK accent dictionary or a NINJAL-class accent resource.⁴¹⁰⁸

Start from your own confusions. The words you personally mix up are the highest-value drill material, and a dictionary check keeps you from drilling a "word" that does not exist.

Fitting drills into a routine

Short, frequent contrastive practice with feedback is the evidence-backed shape of effective pronunciation training. It also slots into a broader sense of what to prioritize first. The high-variability training literature reports durable gains from repeated short sessions rather than one-off long ones.⁷⁵

A few minutes of say-record-compare per day, cycling through a handful of pairs, fits this profile. Keep the set small and rotate it as pairs become reliable.

These drills target the sound and word level: length, sokuon, and single-word pitch. They complement, not replace, mora-timing work and shadowing, which extend the same skills into connected speech.³ Treat minimal-pair drills as the precision component alongside those rhythm-level and sentence-level practices.

The research supports that the training works and is retained. It does not show that a fixed number of weeks yields a fixed result.⁵⁷ Aim for consistency, not a deadline.

Good to know

Drill in connected speech, not just isolation

A pair you produce correctly in isolation can still collapse mid-sentence, because connected speech adds timing pressure and coarticulation, where nearby sounds influence each other.¹²

The remedy is to move from the bare pair to a short phrase containing it once the isolated contrast is reliable. The two-way perception-production research supports continued contrastive practice as the way to consolidate the gain, rather than stopping at the single-word level.¹²

Pitch is dialect-bound; pick one standard

The はし pitch values above use standard (Tokyo) accent. Pitch accent differs across dialects. Kansai, for example, assigns different patterns, so drilling pitch only makes sense within one chosen system.¹⁰

Choose one standard (Tokyo is the usual default for learners) and keep all your pitch pairs in it. Mixing systems makes the "right" answer undefinable.¹⁰

Perception and production reinforce each other

If a pair will not come out no matter how many reps you do, the bottleneck may be perceptual: you may not yet be hearing the contrast cleanly enough to target it.¹²

Because the two skills are linked, looping back to ear-training on that specific contrast can unblock production. Training either side has been shown to lift the other.¹²

Exaggerate, then dial back

Over-articulate the contrast first: hold the long vowel too long, pause too long on the geminate, and drop the pitch too hard. Then relax toward natural.³⁷

Overshooting the length or pause early is a feature, not a bug. It establishes the target clearly before you tune it down. This matches the contrastive-exaggeration logic of minimal-pair training.³⁷

References

Linebaugh, Gary, and Thomas B. Roche. "Evidence that L2 production training can enhance perception." Journal of Academic Language and Learning, vol. 9, no. 1, 2015, pp. A1–A17. https://journal.aall.org.au/index.php/jall/article/view/326 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³
Bradlow, Ann R., David B. Pisoni, Reiko Akahane-Yamada, and Yoh'ichi Tohkura. "Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production." The Journal of the Acoustical Society of America, vol. 101, no. 4, 1997, pp. 2299–2310. https://doi.org/10.1121/1.418276 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹
Vance, Timothy J. The Sounds of Japanese. Cambridge University Press, 2008. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³
NHK放送文化研究所, ed. 『NHK日本語発音アクセント新辞典』. 日本放送出版協会 (NHK出版), 2016. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹
Bradlow, Ann R., Reiko Akahane-Yamada, David B. Pisoni, and Yoh'ichi Tohkura. "Training Japanese listeners to identify English /r/ and /l/: Long-term retention of learning in perception and production." Perception & Psychophysics, vol. 61, no. 5, 1999, pp. 977–985. https://doi.org/10.3758/BF03206911 ↩ ↩² ↩³ ↩⁴
Logan, John S., Scott E. Lively, and David B. Pisoni. "Training Japanese listeners to identify English /r/ and /l/: A first report." The Journal of the Acoustical Society of America, vol. 89, no. 2, 1991, pp. 874–886. https://doi.org/10.1121/1.1894649 ↩
Barriuso, Taylor Anne, and Rachel Hayes-Harb. "High Variability Phonetic Training as a Bridge from Research to Practice." The CATESOL Journal, vol. 30, no. 1, 2018, pp. 177–194. https://files.eric.ed.gov/fulltext/EJ1174231.pdf ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
松村明編. 『大辞林』第四版. 三省堂, 2019. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰
新村出編. 『広辞苑』第七版. 岩波書店, 2018. ↩ ↩² ↩³
国立国語研究所 (NINJAL). 「英語にはアクセントがありますが、日本語にもあるのでしょうか」, ことば研究館ことばの疑問 Q&A. https://kotoba.ninjal.ac.jp/qa/yokuaru/qa-204/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹

Overview​

Production vs perception: why you can hear it but not say it​

What a minimal pair is, in one line​

The output gap: ear ahead of mouth​

The three contrasts you will drill​

Vowel length: おじさん vs おじいさん​

Geminate (sokuon っ): きて vs きって​

Pitch: はし vs はし vs はし (橋 / 箸 / 端)​

How to run a production drill​

Step 1: contrast first, say the pair back to back​

Step 2: record yourself and the native model​

Step 3: diagnose which contrast collapsed​

Step 4: re-record until the pair is distinct​

Building your own pair sets​

Fitting drills into a routine​

Good to know​

Drill in connected speech, not just isolation​

Pitch is dialect-bound; pick one standard​

Perception and production reinforce each other​

Exaggerate, then dial back​

See also​

References​

Footnotes​