Lookalike Katakana: How to Tell the Most-Confused Kana Apart
Lookalike katakana cluster around a few near-identical shapes. The differences often come down to the direction of one stroke or the count of one bar. For the two headline pairs (シ vs ツ and ソ vs ン), the master test is stroke direction. The shape family around ク, ワ, ウ, and フ resolves by stroke count and the presence of a top bar.12
Overview
Katakana confusion is different from hiragana confusion. For the worst-offending pairs, it has one underlying mechanism.1 The pairs that beginners stall on are not random visual coincidences. They follow from how katakana were derived from kanji in the 9th century, and they all yield to either a stroke-direction test or a stroke-count test.12
Why katakana confusion is its own problem
Katakana was developed in the 9th century by Buddhist monks in Nara to transliterate texts from India. They did this by taking parts of man'yōgana characters, early kanji used to write Japanese sounds, as a form of shorthand.1 The script's name reflects this origin: kata (片) means "partial" or "fragmented." Each sign is one component of a kanji rather than a whole kanji simplified down.1 The Wikipedia entry gives カ (ka) as the worked example: it "comes from the left side of ka (加; lit. 'increase')."1
Hiragana developed by a different route. Each hiragana is "a simplified cursive rendering of a whole kanji," derived through the sōsho cursive style, with あ (a) coming from 安 (an).3
This component-extraction process gives katakana its short, angular, near-isolated strokes. Several characters end up with two or three strokes inside near-identical outer shapes. That is why the same handful of pairs reappear on every learner's confusion list.1
Wikipedia names the four-kana cluster explicitly: "Characters shi シ, tsu ツ, so ソ, and n ン look very similar in print except for the slant and stroke shape. These differences in slant and shape are more prominent when written with an ink brush."1
The source kanji for the ten kana covered in this article are listed below. The table is useful when a mnemonic later in the article refers to the source shape.
| Kana | Source kanji | Notes |
|---|---|---|
| シ | 之 | Simplified in the Heian period from man'yōgana 之.4 |
| ツ | 川 / 州 / 津 / 闘 | Disputed; the Nihon Kokugo Daijiten lists these four candidates.5 |
| ソ | 曽 | Simplified in the Heian period from man'yōgana 曽.6 |
| ン | 尓 (disputed) | Possibly from the first two strokes of man'yōgana 尓, or from a symbol indicating the nasal sound (撥音, hatsuon).7 |
| ク | 久 | Simplified in the Heian period from man'yōgana 久.8 |
| ワ | 和 | Simplified in the Heian period from man'yōgana 和.9 |
| ウ | 宇 (top part) | Taken from the top part of the character.10 |
| フ | 不 | Simplified in the Heian period from man'yōgana 不.11 |
| ヲ | 乎 | Simplified in the Heian period from man'yōgana 乎.12 |
| ノ | 乃 | Simplified in the Heian period from man'yōgana 乃.13 |
The master rule: stroke direction
The two headline confusions (シ vs ツ and ソ vs ン) share one underlying mechanism. シ and ン finish with an upward sweep. ツ and ソ finish with a downward sweep.214 Every other surface cue, such as slant, mark angle, and stroke-end thickness in brush fonts, follows from that direction choice.115
The sci.lang.japan FAQ pairs the four on the same horizontal-vs-vertical axis. That print-shape pattern follows from the underlying stroke direction.2 LearnTheKana groups the same kana the same way: シ and ン are "more HORIZONTAL in every aspect (both the bottom line and the dots)," while ツ, ソ, and ノ are "more VERTICAL in every aspect (both the bottom line and the dots)."14
The diagram above gives the full diagnostic for the four worst-offending kana. Stroke count then separates the 3-stroke pair (シ, ツ) from the 2-stroke pair (ソ, ン).16
How this article is organized
Each of the four core confusion groups follows the same pattern: what the kana share, the structural diagnostic, the stroke-order anchor, and one durable distinguisher. A short cross-script note at the end points to hiragana-katakana confusions, which are out of scope here, and shows where they are treated separately.
What this article does not cover
Hiragana-katakana cross-script lookalikes (り vs リ, へ vs ヘ, か vs カ, や vs ヤ) appear in the Wiktionary "Easily confused Japanese kana" appendix as a separate inventory.17 The diagnostic question there is "which script is this?" It uses a different framework: word context, surrounding script, and font weight. The within-katakana rules in this article do not transfer to it.17
Per-kana stroke instruction belongs in the dedicated katakana stroke-order article on this site. Full-chart reading drills belong in the katakana chart article. Both are referenced where relevant rather than rewritten here.
The four core within-katakana lookalike groups
The within-katakana problem reduces to four groups. The first two yield to stroke direction. The second two yield to stroke count and the presence or absence of a top bar.
The four groups at a glance
| # | Group | One-line structural diagnostic |
|---|---|---|
| 1 | シ vs ツ | Two short marks side-by-side on the left in シ; stacked across the top in ツ.214 |
| 2 | ソ vs ン | Long stroke runs top-down in ソ; bottom-up in ン.1815 |
| 3 | ク vs ワ (with ウ, フ) | Stroke counts: フ = 1, ク = 2, ワ = 2, ウ = 3; ワ adds a top horizontal bar, ウ adds a top crown.1916 |
| 4 | ヲ vs フ | Top-bar count: ヲ has 2 horizontal bars on top (3 strokes); フ has none (1 stroke).1612 |
シ vs ツ: the stroke-direction headline pair
シ (shi) and ツ (tsu) are the single most-cited katakana confusion in beginner teaching. Both are 3-stroke kana with two short marks and one long sweeping stroke. At small print sizes, they share an almost identical outer shape.114
What they share
Both kana have stroke count 3.16 Both consist of two short marks plus one long sweeping stroke. In print, their outer shapes are nearly identical at small sizes.114
The structural difference: stroke direction
The third stroke of シ runs bottom-up, from south to north, and ends with an upward flick. The third stroke of ツ runs top-down, from north to south, and ends with a downward sweep. The sci.lang.japan FAQ states the visible consequence directly: "The lines in シ (shi) are more horizontal than vertical, whereas ツ (tsu) is more vertical than horizontal."2
LearnTheKana groups the broader cluster on the same axis: シ is in the "more HORIZONTAL" subgroup; ツ is in the "more VERTICAL" subgroup.14 Wikipedia confirms the print-vs-handwriting consequence: "Characters shi シ, tsu ツ, so ソ, and n ン look very similar in print except for the slant and stroke shape. These differences in slant and shape are more prominent when written with an ink brush."1
The short-mark orientation as a print-font cue
In static text, readers cannot watch the stroke direction. The two short marks are the visible clue. In シ, the marks stack vertically on the left and lean horizontally, aligning to the left vertical edge of the outer shape. In ツ, the marks sit across the top edge and lean vertically.14
The "S for Side, T for Top" cue is the print-font shorthand widely cited in beginner pedagogy: the short marks of Shi sit on the side; the short marks of Tsu sit on the top.1420
Stroke-order anchor
Both kana are 3 strokes.16 The first two strokes are the short marks. The third stroke is the long sweeping stroke whose direction is the diagnostic.16
A reader who can write both correctly already knows the difference, because the writing motion encodes the direction. The disambiguation mainly fails for readers who learned to recognize the kana from print without ever drawing them.
One durable distinguisher
Tofugu's image is the most-cited mnemonic for the pair: シ "looks like a smiley face, but something is wrong with it. Both eyes are sideways and stacked on top of each other like some deep sea fish," whereas ツ has the same two marks rotated so they sit "across the top," yielding "two needles and thread" rather than a face.20
The mnemonic-free test is shorter. Look at the two short marks. Stacked vertically on the left, leaning horizontal: シ. Stacked horizontally across the top, leaning vertical: ツ.214
The pair shows up adjacent inside the everyday loanword シャツ ("shirt"), so the diagnostic gets exercised inside one common word.
シャツの色は何色?21
"What color is your shirt?"
シャツを脱いだ。21
"I took off my shirt."
Beginner learning communities summarize the シ/ツ slip as the "sushi vs sutsu" reading error: スシ ("sushi") misread as ステュ. The mistake happens because the only on-page difference between シ and ツ is the third-stroke direction. The framing is widespread in forum posts and YouTube tutorials. Treat it as practitioner consensus rather than an academic claim.
ソ vs ン: the same rule applied to two strokes
ソ (so) and ン (n) repeat the シ/ツ diagnostic in a smaller form. Each kana has only two strokes, so the long second stroke carries the direction signal clearly.
What they share
Both kana have stroke count 2.16 Both consist of one short mark plus one longer sweeping stroke. At small sizes, their shapes can look interchangeable.118
The structural difference: stroke direction (again)
The Japanese Page states the rule cleanly: "The small dash for 'so' points South (down). … The small dash for 'n' almost points North (up)."18 The stroke-order consequence is just as clean: "SO ソ starts at the top" and "N ン starts at the bottom."18
SoraNews adds the brush-font cue on the longer stroke: "For 'so,' that first stroke curves slightly downwards, while for 'n' it curves up. The more significant difference, though, is in the direction you write the longer stroke. For 'so,' it's a downward stroke, and for 'n' it's an upwards one. That makes 'so's' longer stroke thick at the top, and 'n's' thicker at the bottom."15
The sci.lang.japan FAQ summarizes the print-shape axis: "The lines in ン (n) are more horizontal than vertical, whereas ソ (so) is more vertical than horizontal."2
The short-mark angle as a print-font cue
The Japanese Page's alignment rule gives two clues at once. "The dash in 'so' is lower and lines up at top (almost). Also, the dash in the 'n' is higher and lines up to the left."18 In other words, ソ's short mark sits high and points south. ン's short mark sits low and points north.
The "ン looks like a lowercase n" image is implicit in the bottom-up long stroke: the rightward climb of the long stroke ends with the upward flick that resembles the right diagonal of the Latin lowercase n.18
Stroke-order anchor
Both kana are 2 strokes.16 The first stroke is the short mark. The second stroke is the long stroke, and its direction is the diagnostic.18
ン cannot start a word in Japanese.18 Position therefore helps in real text: an initial kana has to be ソ. The reverse does not hold at word-final position, because both kana can sit at the end of a katakana word, though ン is far more common there.
One durable distinguisher
The mnemonic-free test is binary: trace the long stroke's final direction. Ends going up = ン. Ends going down = ソ.1815
The word パソコン ("PC") puts ソ and ン inside the same four-mora loanword, both in non-initial positions. The up/down test fires twice in one word.
ソファーに座った。21
"I sat on the sofa."
パソコンは使える?21
"Do you know how to use a computer?"
パン買った。21
"I bought some bread."
The ソ vs ン error matters most when readers sound out unfamiliar transliterated names, such as foreign names in katakana like ジョンソン or ハンソン. In those cases, the kana carries sound information that cannot be recovered from context.15
ク vs ワ (and the ウ/フ neighborhood): the silhouette family
The second cluster does not yield to stroke direction. ク, ワ, ウ, and フ share an outer shape: a top-right corner curving down-left into a sweeping stroke. They differ in what sits inside or on top of that shape.219 The diagnostic is shape and stroke count.
What they share
All four kana share an outer shape: a top-right corner curving down-left into a sweeping stroke.219 Wiktionary's "Easily confused Japanese kana" inventory groups five kana on this shape axis (ウ, ワ, フ, ラ, ヲ). The in-scope four for this section are ウ, ワ, フ, plus ク from the adjacent タ/ク/ヌ cluster.17
The four diagnostics
Stroke counts are the cleanest single diagnostic in this group. フ = 1 stroke, ク = 2 strokes, ワ = 2 strokes, ウ = 3 strokes.16
Each kana's structural features follow from those counts:
- フ is the envelope alone, drawn as a single continuous stroke. No top crown, no internal stroke.16
- ク is the envelope plus a short diagonal stroke inside the top of the envelope (2 strokes). No top horizontal bar, no top crown.16
- ワ is the envelope plus a horizontal bar across the top of the envelope (2 strokes). No top crown.19
- ウ is the envelope plus the top horizontal bar plus a small crown stroke on top of the bar (3 strokes). The crown is the diagnostic for ウ specifically.1916
LearnTheKana frames the ウ/ワ split directly: ウ and ワ share "a slight vertical dip coming down from the horizontal line towards the left side." ウ has "a small vertical stick coming up from the middle" while ワ "lacks this element."19 The sci.lang.japan FAQ phrases the same split more tightly: "ウ (u) has a small line on the top but ワ (wa) has none."2
The diagram captures the roof-and-hat hierarchy. フ has neither. ク has neither but adds an internal diagonal. ワ has a roof. ウ has a roof with a hat on top.
Stroke-order anchor
Stroke counts differ usefully across the four: フ = 1, ク = 2, ワ = 2, ウ = 3.16 Counting strokes alone disambiguates フ and ウ. The ク/ワ pair is the only stroke-count tie in the group. The presence or absence of the top horizontal bar is the structural diagnostic between them.
One durable distinguisher
The roof-and-hat hierarchy is the entire diagnostic, and the image is rotation-invariant; it works at handwriting size and at print size.19
フ has no roof. ク has no roof and adds an internal short diagonal. ワ has a roof (the top horizontal bar). ウ has a hat on top of the roof.1916
The image lines up with the etymology for ウ. The source kanji 宇 contributed its "top part" to the modern shape, so the crown-on-bar shape preserves the upper structure of the source.10
ワインです。21
"It's wine."
どこのクラスなの?21
"Which class are you in?"
The word クラス sits ク next to ラ, so the eye must distinguish the no-roof envelope (ク) from an adjacent shape immediately. ワイン puts ワ in initial position, where the roof bar is visually most prominent.
ヲ vs フ: the rare-kana mirror
ヲ (wo) and フ (fu) share the same outer shape as the previous group. ヲ stacks two horizontal bars on top of it. The diagnostic is just the count of those top bars, and the confusion is small in practice because ヲ is rare in modern text.
What they share
Both end with a sweeping curve to the lower-left.16 ヲ adds two horizontal bars across the top of the same outer shape that ク, ワ, and ウ also share.1612
The structural difference
ヲ has 3 strokes: two horizontal bars on top plus the sweeping bottom stroke. フ has 1 stroke: the bare outer shape only.16 The horizontal-bar count is the entire diagnostic.
Where you will actually see ヲ
ヲ derives from the man'yōgana kanji 乎, simplified in the Heian period.12 In modern Japanese, ヲ is "seldom used." The hiragana を "is used almost exclusively as the direct object particle, and as particles are usually written in hiragana." The "wo" sound in foreign words is rendered with ウォ rather than ヲ.12
All-katakana text, which would force every particle including を into katakana, is rare in modern usage.22 ヲ survives in retro all-katakana video games, such as Downtown Nekketsu Monogatari / River City Ransom, the original Metal Gear, and Moero!! Junior Basket. It also appears in stylized contexts like ヲタク, a katakana respelling of otaku for ironic or subcultural effect.22
For a beginner reader, the practical result is uneven exposure: a learner reading mostly contemporary material will encounter フ many times for each ヲ.1222
Hiragana to katakana cross-script lookalikes
Cross-script confusions (り vs リ, へ vs ヘ, か vs カ, や vs ヤ) are listed separately in the Wiktionary "Easily confused Japanese kana" appendix.17 The diagnostic question is different: "which script is this?" rather than "which kana within katakana?"
It depends on word context, surrounding script, and font weight rather than on a single structural feature within one script.17 The within-katakana rules in this article (stroke direction, stroke count, top-bar count) do not transfer to cross-script confusion.
The hiragana side of the broader script-confusion problem (within-hiragana lookalikes) is covered in this pillar's lookalike-hiragana article. The diagnostic framework there uses loop count, crossbar count, and stroke-ending features, not stroke direction.
Good to know
Stroke direction is the diagnostic, not just a writing tip
The stroke-direction rule that distinguishes ソ/ン and シ/ツ is the same rule writers follow to draw the kana correctly. Print fonts preserve the directional cue as slant, mark angle, and stroke-end thickness.15 A reader who learns to write the four kana correctly automatically learns the diagnostic for reading them.15
The stroke-end thickness cue is most visible in brush fonts. SoraNews notes: "that makes 'so's' longer stroke thick at the top, and 'n's' thicker at the bottom."15 Modern Gothic and Mincho fonts compress that thickness signal, but the slant and the dot orientation survive.15
Reading シ as ツ on a low-resolution screen
At small sizes on low-resolution screens, the slant differences between シ/ツ and ソ/ン can compress until the kana look nearly identical. A reader who scans ニュース and momentarily sees ニューフ is being defeated by the typeface, not by their eyes.
The fix is to zoom in or switch to a Japanese-targeted font, such as Hiragino Kaku Gothic, Yu Gothic, or Noto Sans JP. SoraNews specifically warns that "those stroke order/thickness clues can often disappear with modern, blockier fonts," and beginner forums report the same problem inside small-font menus and vending-machine displays.15
The "sushi vs sutsu" beginner error
Reading the シ in スシ ("sushi") as ツ yields a nonsense word, conventionally written ステュ in the framing widely cited in learner communities. The fix is the stroke-direction rule: the third stroke of シ runs bottom-up. The third stroke of ツ runs top-down.215
The シ/ツ pair is the single most-cited katakana confusion in beginner teaching, and the "sushi vs sutsu" framing is the standard example in learner communities.115 It is practitioner consensus rather than an academic finding.
Mnemonics work, but they are scaffolding
Tofugu's smiling-face image for シ/ツ, the lowercase-n image for ン, and the roof-and-hat hierarchy for ク/ワ/ウ are all useful at the absolute-beginner stage.21920 They are teaching aids, not part of the language. Discard them once recognition is automatic.
LearnTheKana's mnemonic set covers the full シ/ン/ツ/ソ/ノ cluster as a family. シ is a "female face" with "a mouth and two eyes tilted on the side." ツ is the "tsunami" image with "the strongest wave of all." ソ is the "soft" wave with one dot. ン is "the same face except one of the eyes are closed." ノ has "no waves" at all.14
Handwriting exaggerates the diagnostic; print suppresses it
The slant and stroke-end-direction cues are most prominent in brush calligraphy and in hand-printed teaching fonts like Kyokasho-tai.1 Wikipedia makes the point directly: "these differences in slant and shape are more prominent when written with an ink brush."1
Mincho and Gothic fonts on screen normalize the shapes. That is why a kana reader who has no trouble on a textbook page sometimes stalls on a vending-machine display. SoraNews's observation that stroke-end thickness "can often disappear with modern, blockier fonts" describes the same compression effect.15
Font matters: pick a teaching font for early reading
Japanese-targeted fonts, such as Hiragino Kaku Gothic, Yu Gothic, Noto Sans JP, and Kyokasho-tai for textbook style, render the diagnostic features more crisply than Latin-first fallback fonts. If a study text shows two katakana as visually identical, suspect the typeface first. The directional cues that survive in native Japanese fonts are exactly the ones that compress in fallback fonts that fake Japanese glyphs.15
The confusion fades with reading volume, not flashcard volume
After the first month, lookalike-katakana errors usually come from reading speed inside connected text, not from per-kana recognition. Once the binary tests above feel automatic in isolation, replace isolated-kana flashcard time with reading volume: loanword-heavy menus, product packaging, and brand names.
Hiragana lookalikes are a different problem
Within-hiragana confusions (ぬ vs め, わ vs れ vs ね, and so on) are disambiguated by loop count, crossbar count, and stroke-ending features, not by stroke direction.17 The diagnostic framework in this article does not transfer to hiragana, and the hiragana framework does not transfer back to katakana.17
See also
- Long Vowels in Katakana: How the Chōonpu ー Works and Why Hiragana Doesn't Use It
- Extended Katakana for Loanwords (ファ, ヴィ, ティ, トゥ, and the Full Small-Vowel System)
- Hiragana Mnemonics That Actually Work
- Hiragana Stroke Order: Why It Matters Even If You Type
- What Is Gairaigo? A Guide to Loanwords in Japanese