Geminate Consonants (Sokuon っ): The Silent Pause
Japanese geminate consonants are single consonant closures held for one extra mora before release. They are written with the small っ in kana and analyzed phonologically as the moraic obstruent /Q/ plus a following obstruent.12 The contrast is the difference between 肩 kata "shoulder" and 買った katta "bought". Missing it is one of the fastest ways for a learner's spoken Japanese to be misheard or rejected outright.23
Spelling rules (when to write っ), romanization (kk, pp, tt, ss, tch), IME input (typing the doubled consonant, "xtu", "ltu"), and the emphatic word-final っ are handled by the writing-systems article on the small つ. This page covers the phonology: how a geminate is produced, which consonants can geminate, and how long the hold needs to be.4
Overview
What "geminate" means in Japanese
A geminate consonant is a single articulatory gesture (one closure or one constriction) held for roughly one extra mora before its release.12 In phonology, it is treated as a sequence of the moraic obstruent /Q/ plus a following obstruent /C/, surfacing as [Cː].1245
The contrast is phonemic. The same consonant string distinguishes pairs of common words across the voiceless-stop series.
肩が痛いです。3
"My shoulder hurts."
新しい本を買った。6
"I bought a new book."
Native Japanese speakers control the contrast mainly through closure duration. The geminate-to-singleton closure ratio is roughly 2.5 to 3.2 : 1 in careful speech, with Han's 1992 measurements clustering near 2.8 : 1.73 English, by comparison, has "doubled" consonants only across word or morpheme boundaries (book-case, un-known). The ratio there sits around 1.3 to 1.9 : 1, far short of what Japanese listeners need to hear.3
Phonology vs. orthography: how this article splits with the small つ article
The hand-off is real, not just editorial. The phonological object /Q/ is the same whether it is spelled with a small っ, appears inside a Sino-Japanese compound across a kana boundary, or appears in a loanword where the small ッ marks a voiced geminate.28 Spelling is one realization of that object, not the object itself.
This article covers the articulatory mechanism, the manner-class inventory of consonants that actually geminate, the audible-versus-silent hold contrast, and the durational target.12 The writing-systems piece on the small つ covers kana spelling, romanization conventions, IME input mappings, and the casual word-final っ as an emphatic glottal stop.4
Where it sits on your JLPT timeline
Sokuon-bearing words appear in the first weeks of N5 study: 学校 gakkō, 切手 kitte, 雑誌 zasshi, ちょっと chotto, 待って matte, いっぱい ippai.3 The contrast is not a tested grammar item, but it is a perceptual gatekeeper. It is graded indirectly through oral production, shadowing, and dictation.73 Under-holding is often the first thing tutors and automated speech-assessment systems flag.
The articulatory mechanism: how a geminate is produced
A Japanese geminate is one consonant whose closure or constriction is sustained across two morae's worth of time, with a single release at the end.25 Three stages make the production target concrete.
The moraic obstruent /Q/: one mora of held closure
Japanese phonology posits a moraic-consonant slot conventionally transcribed /Q/ (a non-IPA placeholder). It occupies one mora of timing but carries no place or manner features of its own.1245 It takes its phonetic identity from the obstruent that follows it: /Q/ + /p/ surfaces as [pː], /Q/ + /k/ as [kː], /Q/ + /s/ as [sː], and so on.
Three analyses of /Q/ coexist in the literature. It may be treated as an underlyingly placeless obstruent that copies the place and manner of the following segment; as a sequence of two identical consonant phonemes with no separate /Q/ slot; or as identical to /ʔ/, with the glottal-stop realization surfacing only when no following obstruent is available.5 All three analyses converge on the same surface fact: one mora of timed closure or constriction, then a release shared with the singleton release of the same consonant.29
学校に行きます。3
"I'm going to school."
一杯だけ飲みました。8
"I had just one drink."
Stage 1: form the closure or constriction of the following consonant
The articulators move into the position of the following consonant during the /Q/ mora itself.29 The lips close for /Q/ + /p/. The tongue tip contacts the alveolar ridge for /Q/ + /t/. The tongue body raises to the soft palate for /Q/ + /k/, and the tongue blade forms a sibilant constriction for /Q/ + /s/.
There is no glottal stop, no schwa, and no breath between the preceding vowel and the closure. The transition into the closure happens early and quietly. Electropalatography data (measurements of tongue contact with the palate) across five speakers and over eight thousand tokens show that geminates are produced with greater linguopalatal contact than singletons. The difference is on the order of four to five additional electrodes on a 62-electrode artificial palate, confirming that the gesture is not weaker, just longer.9
Stage 2: hold the closure for roughly one extra mora
The hold lasts approximately two to three times as long as the corresponding singleton closure.7 Han's 1992 production study reports absolute mean closure durations of /pp/ ≈ 200.6 ms, /tt/ ≈ 198.6 ms, and /kk/ ≈ 184.2 ms in carrier sentences. Singleton closures cluster between 65 and 75 ms.73
| Geminate | Mean closure (Han 1992) | Singleton closure | Ratio |
|---|---|---|---|
| /pp/ | ~200 ms | ~65–75 ms | ~2.8 : 1 |
| /tt/ | ~198 ms | ~65–75 ms | ~2.8 : 1 |
| /kk/ | ~184 ms | ~65–75 ms | ~2.5 : 1 |
A learner does not need to count milliseconds. The practical rule is that the hold takes one full kana-beat of timing, the same beat that a vowel or any other mora would occupy.
Count out loud "ki – [silent tap] – te" while tapping the table once per mora. The held closure of 切手 kitte gets one tap of timed silence between き and て, just like every other mora in the word. If your taps match your kana count, the hold is in the right place.
Stage 3: release into the following vowel
Place of articulation is the same for singleton and geminate. Only the hold duration differs, with constriction strength as a secondary cue.29 Stop and affricate geminates are pronounced with a single release: the first half of the geminate is an unreleased stop, and the second half is the same stop with a normal release burst into the following vowel.5
This is why phoneticians describe the geminate as "the same sound, held longer," not as a new sound.29 The learner is not adding an articulation but extending one.
Why the place of articulation matters: you cannot geminate a sound you have not formed yet
The /Q/ slot is, by definition, the prolongation of the following consonant's closure or constriction. It has no independent articulator of its own.25 If the articulators do not arrive at the next consonant's place during the /Q/ mora, there is nothing to hold and no gemination is realized.
This phonetic fact drives the production-side fix taught in pronunciation pedagogy: move the articulators into closure at the start of /Q/, not at the end.73 If a learner lets the preceding vowel coast and only starts to form the next consonant when the hold should end, the result is, at best, a slightly longer vowel followed by a normal singleton. That is exactly what Japanese listeners parse as the non-geminate word.
Which consonants can geminate
Native Japanese phonotactics allow /Q/ before a narrow set of consonants and ban it elsewhere. Loanword phonology relaxes those bans in specific, well-documented ways.
The native-vocabulary inventory: voiceless obstruents only
In native (Yamato) and Sino-Japanese (kango) vocabulary, /Q/ appears only before voiceless obstruents: the stops /p t k/, the fricatives /s ɕ/, and the affricate /tɕ/.1284 It does not come before /n m r w j h/ or any voiced consonant.
The native prohibition against voiced geminates is captured in the literature as the markedness constraint *VoiGem. Native vocabulary has no /gg dd bb/ sequences, and emphatic forms use voiceless gemination instead (the emphatic of /ta.da/ "only" is /tat.ta/, never */tad.da/).8 Native /h/ does not geminate as [hh] either. In compound formation, the historical /p/ origin of modern /h/ resurfaces, so the geminate alternant is [pp]. The compounds 一 ichi + 杯 hai → 一杯 ippai, 一 ichi + 夫 fu → 一夫 ippu, and 葉 ha + 葉 ha → 葉っぱ happa all show this pattern.8
| Following consonant | Manner | Example |
|---|---|---|
| /p/ | voiceless stop | 一杯 ippai "one cup" |
| /t/ | voiceless stop | 切手 kitte "stamp" |
| /k/ | voiceless stop | 学校 gakkō "school" |
| /s/ | voiceless fricative | 雑誌 zasshi "magazine" |
| /ɕ/ | voiceless fricative | 一緒 issho "together" |
| /tɕ/ | voiceless affricate | 一致 itchi "agreement" |
学校で勉強します。3
"I study at school."
雑誌を一冊買いました。6
"I bought one magazine."
ちょっと待ってください。3
"Please wait a moment."
一致した意見です。2
"It's a unanimous opinion."
The four manner classes and how they sound different
The four manner classes you will meet share the held-closure logic, but the hold sounds different in each one.
- Stops /pp tt kk/. A silent oral closure of roughly 150 to 200 ms followed by a single release burst.75 The acoustic signature during the hold is silence, not friction.
- Fricatives /ss ɕɕ/. Continuous, audible frication sustained throughout the hold. The /ss/ frication in experimental stimuli is on the order of 260 ms, against a singleton /s/ of roughly 100 ms.10
- Affricates /ttɕ/. A silent closure portion is followed by a release into the fricative tail of the affricate. Hepburn romanizes the geminate of /tɕi/ as "tch" (一致 itchi, 抹茶 matcha) to avoid the ambiguous "cch" string.4
- The /h/ gap. Native /h/ does not geminate as [hh]; the historical /p/ surfaces, giving [pp] in compounds.8
一切食べませんでした。8
"I didn't eat any at all."
一緒に行きましょう。3
"Let's go together."
抹茶アイスが好きです。4
"I like matcha ice cream."
Loanwords break two of the rules
Gairaigo (loanword) phonology licenses geminate voiced obstruents that native phonology bans. Common examples include /baggu/ バッグ "bag", /beddo/ ベッド "bed", /doggu/ ドッグ "dog", and /heddo/ ヘッド "head".8 These voiced geminates often vary freely with their devoiced counterparts (/baggu/ ~ /bakku/, /beddo/ ~ /betto/). The literature treats them as exceptions to the native *VoiGem constraint sanctioned by the more permissive loanword stratum.8
Loanwords from languages with [x] also allow a geminate that does not exist in native Japanese: German Bach is borrowed as /bahha/ バッハ, and Dutch / German Gogh appears as /gohho/ ゴッホ. These are the only contexts in which "hh" surfaces in modern Japanese spelling.8
Loanword gemination is conditioned by prosodic shape rather than by the source-language consonant alone. Gemination occurs when it produces a preferred Heavy-Light or Heavy-Heavy word-final shape. That is why cap → kyappu geminates but captain → kyaputen does not.8
新しいベッドを買いました。8
"I bought a new bed."
このバッグは重いです。8
"This bag is heavy."
バッハの音楽を聞きます。8
"I listen to Bach's music."
Why /Q/ cannot appear at the start or before a vowel
/Q/ has no independent articulators; it is by definition the prolongation of the following obstruent's closure or constriction.25 Word-initial /Q/ would have nothing to prolong. /Q/ followed by a vowel would likewise have no closure to extend. The native and Sino-Japanese phonotactic statements "no word-initial sokuon" and "no sokuon before a vowel or a sonorant" therefore follow as theorems, not stipulations.45
The casual word-final っ in interjections such as あっ, やめろっ, and うっ is the same /Q/ slot. Because it lacks a following obstruent to host it, it surfaces as a glottal stop [ʔ].5 This is the limiting realization predicted by the /Q/-as-/ʔ/ analysis cited above and is treated in detail in the writing-systems article on the small つ.
Audible holds vs. silent holds: what the ear actually catches
The manner-class split above has a direct perceptual consequence: stop and affricate geminates sound like timed silence, while fricative geminates sound like a long, continuous hiss. A learner who has internalized only the "silence" model will often mis-parse fricative geminates, and vice versa.
Silent holds: kk, pp, tt, tch
During a stop-geminate closure, the vocal tract is sealed and there is no airflow. The acoustic signature is silence, or near-silence with at most low-amplitude voicing-into-closure decay for voiced stops.5 In Han's data, the silent closure portion of /pp tt kk/ falls in the 180 to 200 ms range. In Takeuchi's perception stimuli, a representative /kk/ closure is 220 ms.710
Beginners commonly hear the closure as "nothing happening" and either skip the mora or insert a vowel to fill it.3 The correct interpretation is timed silence followed by a release that is sharper than the singleton release, because pressure has been building behind the closure for longer.
もっと大きい声でお願いします。3
"Please speak louder."
学校までは遠いです。3
"It's a long way to school."
Audible holds: ss, ssh
During a fricative geminate, the constriction stays narrow and airflow continues. This produces a long, audible hiss throughout the hold rather than silence.85 In experimentally controlled stimuli, /ss/ frication lasts on the order of 260 ms against a singleton /s/ of roughly 100 ms.10
Because there is no silence to misinterpret, beginners typically pick up fricative geminates faster than stop geminates. The perceptual task is to hear "longer hiss" rather than "extra silence."3
一切お酒を飲みません。8
"I don't drink alcohol at all."
雑誌を読むのが好きです。6
"I like reading magazines."
The pre-geminate vowel: what the phonetic evidence actually shows
A widely repeated piece of folk advice tells English speakers to shorten the vowel before a geminate. That instruction is correct for Italian and Bengali, where pre-geminate vowels do shorten, but it is the wrong target for Japanese.211 In Japanese, the vowel before a geminate is about as long as, or slightly longer than, the vowel before the corresponding singleton. The effect is small and is not the primary durational cue.211
The robust durational cue is the closure itself. The ratio of closure duration to preceding-vowel duration (C/V1) is the most stable classifier across speaking rates. Idemaru and Guion-Anderson find that once C/V1 exceeds approximately 1.69, listeners shift toward geminate identification.11 Japanese listeners are sensitive to preceding-mora duration. A longer preceding mora at the same closure duration biases them toward "singleton" because the C/V1 ratio falls. L1-English learners do not show this sensitivity.113
The single biggest production-side L1-English transfer error is not "wrong vowel length" but under-hold of the closure. That under-hold collapses the C/V1 ratio toward English-like values around 1.3 to 1.9 : 1 and pushes the percept toward the singleton category.73 Keep the vowel at its natural length. The work is on the held closure.
Mora count is preserved even during silence
The silent hold still counts as one full mora for prosodic purposes, including word length, song lyrics, haiku, and rhythmic chanting.1126 Dropping the hold shortens a word by one mora unit. がっこう has four morae (が-っ-こ-う). Reduce it to がこう (three morae), and the word is no longer recognizable as 学校.123
Relational timing means the proportion of the word taken up by the geminate hold. Its constancy across speaking rates is the acoustic correlate of the mora-timing intuition.116 Even when speakers speed up, the geminate keeps its share of the word.
Minimal pairs and ear training
The fastest way to internalize the contrast is to drill it on words you already know. The pairs below span the three manner classes and use minimal pairs documented in the literature.
The stop drill: kata vs katta, oto vs otto, kako vs kakko
The three voiceless-stop geminates each support frequent minimal pairs. For /t/ vs /tt/: 肩 kata "shoulder" vs 買った katta "bought"; 来た kita "came" vs 切った kitta "cut (past)"; 音 oto "sound" vs 夫 otto "husband". For /k/ vs /kk/: 過去 kako "past" vs 括弧 kakko "parentheses".283
Each pair differs only by the presence of a silent /Q/ mora. The C/V1 ratio is the cue that flips the category.113
肩が痛いです。3
"My shoulder hurts."
新しい本を買った。6
"I bought a new book."
大きな音がしました。6
"There was a loud sound."
夫はまだ起きていません。6
"My husband isn't up yet."
The fricative drill: isai vs issai, kasen vs kassen
For /s/ vs /ss/: 異彩 isai "distinctive (literally different color)" vs 一切 issai "(not) at all, everything"; 下線 kasen "underline" vs 合戦 kassen "battle"; 貨車 kasha "freight car" vs 滑車 kassha "pulley".8 You should hear the difference as "shorter hiss" vs "longer hiss", with audible frication throughout the geminate. This is acoustically and perceptually distinct from the silent-closure drill above.810
一切質問はありません。8
"I have no questions at all."
重要な合戦の話です。8
"It is a story about an important battle."
The affricate drill: ichi vs itchi
For /tɕ/ vs /ttɕ/: 位置 ichi "position" vs 一致 itchi "agreement". The compound 抹茶 matcha "matcha tea" shows the same /Q/ + /tɕ/ pattern. Hepburn romanizes the geminate of /tɕ/ as "tch" rather than "cch" to avoid the ambiguous "cch" string.4 The acoustic signature is a silent closure portion followed by a release into the [tɕ] fricative tail. In other words, it is a silent hold plus a normal affricate release at the end.25
二人の意見が一致しました。6
"The two of them reached agreement."
抹茶を一杯ください。4
"One cup of matcha, please."
How to self-test with a recorder
The geminate-to-singleton closure ratio in trained native speech is roughly 2.5 to 3.2 : 1. Absolute geminate closures are in the 150 to 220 ms range, and singleton closures are in the 60 to 80 ms range.73 L1-English learners typically produce a ratio of 1.67 to 2.06 : 1 with geminate closures around 170 to 180 ms. The gap to the native target is reliably visible on a free waveform editor such as Praat or Audacity.73
Because the relevant cue is relational, the practical test is not "is my geminate 200 ms?" but "is my geminate closure visibly longer relative to my preceding vowel than the singleton version is?"116 Record both members of a minimal pair. Line them up, and compare the silent stretch (for stops) or the frication band (for fricatives) with the preceding vowel.
Good to know
Under-holding the closure is the universal L1-English error
L1-English speakers reliably produce a half-length hold and let the preceding vowel stretch to compensate. To native ears, the result sounds like the singleton form with an over-long vowel, not like the geminate. Han 1992 and Takeuchi's perception study both find the same pattern: learners' geminate-to-singleton closure ratio drops from the native 2.5 to 3.2 : 1 down to 1.7 to 2.1 : 1. That puts the production into the range where Japanese listeners parse it as singleton.73
The fix is mechanical. Keep the preceding vowel at its natural length, and treat the closure as a separate beat to be held. The correct form of 買った has roughly 200 ms of silent closure between the first /a/ and the released /t/, with the /a/ itself short.7
本を買った。6
"I bought a book."
Two-release "double-t" articulation is the second-most-common error
English compounds with "doubled" consonants (night-time, hot-tub) get two distinct stop releases, one for each morpheme. A learner who carries that habit into Japanese pronounces /tt/ as two separate /t/ releases. That is wrong in both directions: there is one closure, held longer, with one release at the end.5 The romaji spelling "tt" is a convention, not a phonetic instruction to articulate two t's.
The silent mora is a real mora; do not skip it
The /Q/ slot is one full mora and counts for rhythm in the same way that any kana does. Dropping the hold changes the word's mora count (がっこう has four morae; がこう has three) and, in many cases, changes the word itself.126 For any new word, count every kana on the page, including the small っ, and check that the spoken version has the same number of beats.12
"Same sound, held longer" is the right mental model
The articulatory geometry of a geminate is identical to that of the singleton. Only the closure or constriction is sustained for one extra mora.29 The mnemonic discourages learners from inventing a new articulation when none is needed. There is no special "tt" gesture distinct from /t/, only a longer hold of the same /t/.
Word-final っ is informal and emotive
The casual interjections あっ, やめろっ, and うっ end in a glottal stop that is the bare /Q/ slot with no following obstruent to host it.5 This realization is restricted to speech, manga, and informal writing. It does not appear in formal prose or in standard dictionary headwords.4 The mechanism is the same /Q/ that produces a geminate elsewhere. The surface form differs only because nothing follows.
Why /h/ alternates with /pp/ in compounds
Modern Japanese /h/ derives historically from /p/. When a Sino-Japanese compound puts /Q/ in front of an /h/-initial second element, the historical /p/ resurfaces in the geminate: 一 + 杯 → 一杯 ippai, 一 + 夫 → 一夫 ippu, and 葉 + 葉 → 葉っぱ happa.8 This is why "hh" does not appear in native vocabulary, even though /Q/ + /h/ would otherwise be the predicted pattern.
Why voiced geminates exist in loanwords
The native phonology constraint *VoiGem bans voiced geminate obstruents, which is why no native word has /gg/, /dd/, or /bb/.8 The gairaigo stratum is exempt from this constraint, so /baggu/, /beddo/, /doggu/, and /heddo/ are licit. Even within gairaigo, these forms vary freely with their devoiced counterparts /bakku/, /betto/, /dokku/, /hetto/. A learner who hears バッグ pronounced with a half-voiced hold is not hearing an error.8
See also
- The Mora-N (ん) and Its Four Allophones
- Rendaku: When K Becomes G in Compound Words
- The Japanese Vowel Inventory: Five Vowels, Done Right
- Japanese Vowel Devoicing: Why です Sounds Like "Des"
- Long vs. Short Vowels in Japanese: The Distinction Beginners Miss
- Stress vs. Pitch: Does Japanese Have Stress?