Japanese Shadowing Materials by JLPT Level: What to Shadow from N5 to N1
Japanese shadowing materials by JLPT level are audio sources screened for one task: reproducing native speech aloud, a beat behind, from sound alone.1 Picking the wrong source is the quiet reason most shadowing practice stalls. Material that is too fast, too long, or has no transcript turns the drill into parroting noise.1
Overview
This article is a materials map, not a technique manual. It screens every pick against four shadowing-fitness criteria. Then it routes each JLPT band from N5 to N1 to a level-appropriate source already covered elsewhere on J-Compass.
Two anchors hold the whole map together. The first is the set of four fitness tests below. The second is a small set of measured speech-rate numbers. These let "by level" mean something concrete rather than an "easy/hard" hunch.23
No published study assigns a specific resource to a specific JLPT level, and the JLPT itself publishes no media or vocabulary list. The per-level picks are calibrated J-Compass recommendations. They are anchored to the official can-do descriptors plus measured speech-rate bands, not official classifications.4 Read each row as a starting point you adjust, not a wall you must clear first.
What makes good shadowing material
The four fitness tests: short, clear, transcribed, single-speaker
Shadowing means reproducing heard speech aloud in real time, a short interval behind, from sound alone with no text in view, while the audio keeps playing.15 The four fitness criteria below come from that drill. They are the spine the rest of this map hangs on.
Criterion 1, short. Material should be a short segment, not a long clip. Kadota models shadowing as pushing processing through the phonological loop, the auditory-rehearsal part of working memory that holds sound traces for only a few seconds before they fade.65 A long unbroken clip overruns that buffer, so segments must be short enough to hold. "Short" follows from the working-memory account. It is not a fixed second-count: aim for "short enough to hold," not "exactly N seconds."17
Criterion 2, clear. Audio must be clearly enunciated. Shadowing's best-supported gain is bottom-up phoneme perception, meaning recognition of sounds from the audio itself. The technique trains the ear to whatever input it is fed.18 Muddy or noisy audio gives the perceptual system nothing clean to lock onto, and clarity is also what makes a single clear-voiced host comprehensible in the first place.1
Criterion 3, has a transcript. A transcript lets you check what you mis-shadowed afterward. Shadowing itself is done with no text in view, because reading while shadowing "will change the cognitive process... so it becomes a different practice."7 But Kadota and Tamai's stepped procedure depends on having a script available. You use it to mumble and synchronize before the text is removed at the prosody and content stages.9 You shadow without the text in the moment, then verify against the transcript afterward. With no transcript at any point, there is no audit trail for your errors.
Criterion 4, single speaker (relaxes as level rises). A single speaker is easiest to track. Single-speaker audio gives you three supports at once: one voice, full untruncated turns, and a read or semi-scripted register.2 Overlapping multi-speaker spontaneous speech removes all three. The constraint is strictest at low levels and loosens upward. By N2 to N1, a mature ear can tolerate multi-speaker material. "Single-speaker at lower levels, relaxing upward" is a reasoned recommendation built from that accommodation count, not a sourced per-level rule.92
Material should sit slightly below your comprehension ceiling so you can track it in real time. Comprehensible input is input a learner can understand even when it sits slightly beyond their current level. For shadowing, the bar is stricter because production runs alongside perception.1 Hamada's data show shadowing's listening payoff concentrated in lower-proficiency learners working with level-appropriate material. An over-level text is "too demanding... cognitive overload."1 So the calibration for shadowing skews easier than for passive listening.
How "by level" is calibrated, not absolute
The match between level and material is a recommendation, not a hard gate. The assignments are built from three evidence-anchored axes rather than impressions:
- Approximate speech rate (morae per second). Spontaneous native conversation averages 8.01 morae/s (SD 2.07). Read or careful speech averages 7.11 morae/s. Native broadcast is roughly 450 to 570 morae/min (about 7.5 to 9.5 morae/s). The rate non-native listeners perceive as close to ideal for "easy Japanese" is 320 to 360 morae/min (about 5.3 to 6.0 morae/s).23 These are the anchors every level recommendation leans on.
- Register, ranging from graded-learner through casual, formal or news, to slang and dialect.
- Topic-vocabulary load, expressed as a JLPT-equivalent band.
A mora is the rhythmic beat of Japanese; か, ん, and the small っ each count as one mora, which is why morae per second, not words per minute, is the unit here.
The JLPT-audio caveat applies throughout. JLPT listening is deliberately slowed, over-articulated, and contraction-free. The slowing shrinks from N5 ("spoken slowly") to N1 ("natural speed") but is present below N1.4 No published morae/s figure exists for JLPT tracks, so none is given for them.4
A learner who shadows only JLPT-style audio trains JLPT ears, not native ears.
You may also shadow material one notch below your reading or comprehension level. Because production runs alongside perception, the shadowing-fit level sits below the passive-comprehension level. This is reasoned from the proficiency-dependence finding, not from a study that directly tested difficulty calibration.1
How to use this map
Shadowing material vs. comprehension-listening material
The two libraries overlap, but the selection bar differs. Comprehension (passive) listening tolerates multi-speaker audio, no transcript, and material at or slightly above your ceiling. Shadowing is stricter: it needs the four fitness criteria and material a notch easier than your comprehension ceiling.19
The bar is stricter because shadowing is an online, high-cognitive-load task with no pauses. Passive comprehension, and even listen-pause-repeat, are gentler because a pause gives you time to process meaning.7 A clip that is fine to listen to passively can be too fast or too long to shadow.
In practice the same resource can appear in both a general listening map and this shadowing map for different jobs. A multi-speaker drama scene is good comprehension input at N2, but for shadowing you isolate a single short turn from it rather than shadowing the crosstalk.
The per-level table below is the heart of the map. Each row pairs a level with what to shadow, why it fits, whether a transcript is available, and the main watch-out.
| Level | What to shadow | Why it fits | Transcript? | Watch-out |
|---|---|---|---|---|
| N5 | Textbook audio (Genki, Minna no Nihongo); simplest NHK News Web Easy stories | Scripted, slow, single or clean two-speaker, fully transcribed; all four criteria at once | Yes (printed script; on-screen news text) | NHK Easy audio is synthesized; shadow for pace, not pitch |
| N4 | Teppei Beginner and main Nihongo Con Teppei; YuYu Nihongo (on-screen-transcript version) | Single host, short episodes, clear delivery | Teppei: not confirmed; YuYu: only on the YouTube on-screen-text version | Use short isolated segments; lean transcript-critical work on Sakura Tips / NHK Easy |
| N3 | Sakura Tips; calm single-speaker anime scenes | Scripted, short, single clear host with free JP+EN transcript (Sakura Tips) | Sakura Tips: yes; anime: n/a | Anime carries role language; isolate calm slice-of-life turns only |
| N2 | Contemporary slice-of-life J-drama (single turns); NHK radio/web news | Real vocal-tract prosody (drama); clear formal announcer register (news) | Drama: inconsistent JP subs; news: yes (読むらじる。) | News is formal 書き言葉; does not transfer to casual speech |
| N1 | Native vloggers and to-camera YouTube; short isolated variety turns | Self-recorded single-speaker native audio; full native ceiling | Vlogs: partial (on-screen text / JP auto-captions); variety: no | テロップ are not a transcript; variety sits above the JLPT ceiling |
N5: textbook audio and simplified news
The N5 tier carries the strictest single-speaker and transcript demand of the whole map. As everywhere here, the pick is a recommendation, not a gate.4
Textbook audio (Genki, Minna no Nihongo)
Textbook audio is the ideal first shadowing source because it is scripted, slow, single-speaker (or clean two-speaker), and fully transcribed, with the dialogue printed in the book. It satisfies all four fitness criteria at once, which no native material does at N5.
Genki: An Integrated Course in Elementary Japanese (The Japan Times) includes accompanying audio for its scripted elementary dialogues and listening exercises.10 Minna no Nihongo (みんなの日本語, 3A Corporation) likewise provides scripted dialogue and listening audio keyed to its elementary lessons.11
The durable, citable fact is that these series exist and pair printed (transcribed) scripts with recorded audio.1011 The N5 fit is a calibrated recommendation. Elementary textbook audio is graded for beginners by design. No measured speech-rate figure is published for either series, so none is given.
The printed script is the built-in transcript. The scripted, slow, single or clean-speaker delivery covers short, clear, and single-speaker. Textbook dialogue is polite-register scripted speech, useful precisely because it is clean and checkable; it is not a model of fast casual speech, but at N5 that is the point.
Simplified news and beginner podcasts
NHK News Web Easy is the N5 simplified-news pick. It is a free NHK site that rewrites real news into the simplest Japanese, with furigana on every kanji, an on-demand Japanese hard-word gloss, and a 「ニュースを聞く」 read-aloud audio track.1213 Because the article text is on screen and the audio reads it, the transcript criterion is automatic. The delivery is slow and clear over short articles, and it is a single voice.1314
One caveat decides how you use it. The read-aloud is synthesized speech (合成音), meaning a computer-generated voice. It is generated automatically and read slowly and evenly with adjustable playback speed.14 That makes it good for segmentation and reading or articulation pace. But it is not a reliable model of natural prosody or pitch accent.14
The NHK News Web Easy read-aloud is a synthesized voice, not a human one, read slowly and evenly.14 Use it to train segmentation and articulation pace, where it excels, and take your model of natural intonation and pitch from a human source instead. A synthesized voice cannot teach you pitch accent.
The manuscript is built for the 中級準備レベル (N3合格) learner drawing on roughly 1,600 words from the old JLPT 3・4級 range, so it sits in N4 to N3 territory. Because it rewrites authentic news, harder proper nouns and terms leak through, and difficulty varies story to story.14 For shadowing at N5, pick the simplest stories. For absolute-beginner podcasts, start with the recommended-podcasts list and Teppei Beginner, covered in the N4 section. Its single-host, short-episode format is what makes it shadow-able for a near-beginner.1516
N4: Nihongo Con Teppei and slow learner podcasts
The N4 tier favors single-host, slow-ish, short audio. Recommendation, not a gate.4
Nihongo Con Teppei and Teppei Beginner
Nihongo Con Teppei is a free, single-host Japanese podcast. One teacher, Teppei, talks in natural Japanese about one everyday topic per episode, with little or no background audio.1516 The genuine entry point is the spinoff "Japanese podcast for beginners (Nihongo con Teppei)," commonly called Teppei Beginner, calibrated around N5. The main "Nihongo Con Teppei" series sits one band higher at roughly N4 to N3.15171816
Teppei Beginner episodes run roughly 3 to 5 minutes (Spotify shows about 3 minutes; Tofugu's beginner review reports about 4 minutes); main-series episodes run roughly 10 to 20 minutes.1716 Short beginner episodes satisfy the "short" criterion well. One voice with no co-presenter satisfies the single-speaker criterion that matters most at this level.1516
A common myth is worth correcting: Teppei does not slow his speech much. Comprehensibility comes from simple vocabulary, one topic, and repetition of the key word, all delivered clearly by a single speaker. Tofugu notes he "doesn't really slow his speech much," and "simply speaks clearly, only really pausing around the title or any other key phrases."16 No measured speech rate is published, so none is given. The N5 and N4 to N3 bands describe register and topic vocabulary, not official ratings.151816
The transcript criterion needs an honest flag. The verified facts establish that the show is free across Apple, Spotify, and RSS. They do not establish a free full per-episode transcript for Teppei Beginner.151716
So Teppei is a strong fit on short, clear, and single-speaker, but a weaker fit on transcript than transcript-bundled resources. For transcript-critical shadowing, lean on Sakura Tips and NHK News Web Easy instead.
YuYu Nihongo and other comprehensible-input shows
YuYu Nihongo (YUYUの日本語) is a comprehensible-input resource by Yuusuke Takemori (竹森悠介). It is delivered as both an audio podcast and YouTube uploads that show the Japanese transcript on screen; the register is casual single-host monologue.192021 The on-screen Japanese transcript is the key support: the YouTube version displays the spoken Japanese as readable text. That supplies the transcript criterion and makes it the gentler entry point. The audio-only feed strips that support and is harder.1921
YuYu trends harder than a pure N4 pick. A resource database describes the host as having "very clear audio, and an easy-to-understand way of speaking," but lists the show under intermediate Japanese. Episodes also run long, from about 10 minutes to a half-hour or more.21
As an N4 shadowing source, use the on-screen-transcript YouTube version on short isolated segments, not the long audio-only monologues. Treat YuYu as a level curve: gentler with the transcript, advanced without it. The delivery is well-enunciated but at or near native conversational tempo, with no measured rate published.1921
N3: Sakura Tips and clean anime clips
The N3 tier is the first native-adjacent band. Single-speaker audio is still preferred. Recommendation, not a gate.4
Sakura Tips: slow-but-natural single-host audio
Sakura Tips is a short, scripted Japanese podcast by a single host, Mari. Every episode comes with a free Japanese-and-English transcript posted on sakuratips.com.222324 Episodes run about 4 to 5 minutes each and are numbered in sequence.2324
This is the best fit on the whole map for the transcript criterion: scripted (so no filler), short, single clear-voiced host, and a free per-episode transcript. That satisfies all four fitness criteria in one resource, which is why it anchors N3.222324 Mari "speaks in a very clear voice, and slows down her speech, while not overdoing it so much to make it odd," striking "a nice balance between comprehensibility and naturalness."24 No measured speech rate is published. Describe it qualitatively as slower than native conversation yet faster and less robotic than from-zero drilled audio.24
The clear, slowed delivery feels beginner-friendly. The topic vocabulary (family, work, seasons, temples and religion, decluttering, Japan-culture asides) reaches the N4 to N3 band. That label is a J-Compass calibration against the JLPT N4 and N3 can-do descriptors, not an official level.22244
Anime clips: pick calm, single-speaker scenes
Anime is multi-speaker, fast, and register-distorted, so it does not satisfy the criteria as a whole. The shadowing-fit move is to isolate calm, single-speaker scenes: monologue or narration rather than banter or crosstalk. This recovers the single-speaker and short criteria from an otherwise unfit source.
No corpus measures anime's own dialogue rate. But it is delivered by professional voice actors at or above natural broadcast pace, which sits in or above the 450 to 570 morae/min band. That is well above the slowed 320 to 360 morae/min "easy Japanese" target.3 Anime is fast; that is why you isolate short calm turns and why N3 is the floor for this pick.
The register trap matters most here. Register means the style of language you use in a particular situation. Anime carries 役割語 (role language), a stylized fictional speech tied to character types that is "usually partially or entirely distinct from the real-life language" of the people it depicts.2526 Markers like わし and 〜じゃ (old man), わたくし and 〜ですわ (refined lady), 拙者 and 〜でござる (samurai), and gendered sentence-enders appear much more often in fiction than in real speech.2526
Do not shadow these into your real-speech default. Pick slice-of-life or contemporary-setting scenes, which carry less role-marked speech because role language indexes the archetypes that period and fantasy genres foreground.25
N2: J-drama and NHK news
The N2 tier tolerates multi-speaker audio and widens into formal register. Recommendation, not a gate.4
Japanese drama (dorama): realistic conversational rhythm
Live-action drama (実写ドラマ) is performed by real actors. Prosody, articulation, breath, and timing therefore come from a real vocal tract. Contemporary-setting dramas default to 標準語 (standard Japanese) rather than role language.272526 What transfers is realistic intonation and conversational rhythm. That is why live-action beats anime for transferable prosody.
Scripted drama is cleaner than spontaneous speech. CSJ (Corpus of Spontaneous Japanese) measured spontaneous speech at 8.01 morae/s (SD 2.07) against 7.11 morae/s for read or performed speech. So drama sits one rung below spontaneous speech and above textbook audio.28 Native broadcast and drama pace (about 450 to 570 morae/min) is well above the learner-comfortable band (about 320 to 360 morae/min). That is why N3-plus is the floor and N2 the comfortable shadowing tier.3
Drama is multi-speaker, which N2 ears can tolerate. For shadowing, isolate short single turns and shadow those, not overlapping exchanges. The transcript dependency is real. Japanese subtitles, which shadowing needs, are inconsistent and region-dependent across platforms. English-only subs do not serve shadowing.27
For calibrated picks, contemporary slice-of-life and everyday-speech titles are the shadowing-friendly band, such as 深夜食堂 (Midnight Diner, 2009, slice-of-life 標準語) and 逃げ恥 / 逃げるは恥だが役に立つ (TBS, 2016, everyday adult and workplace speech).2729 Treat these as illustrative difficulty anchors, not a streaming catalog. Avoid theatrical (半沢直樹) or genre-jargon (ドクターX, medical) titles, whose register and vocabulary do not generalize.
NHK Radio/Web News: formal-register shadowing
らじる★らじる (NHK ONE) streams NHK's hourly radio newscasts with a 同時配信 live simulcast and 聴き逃し catch-up, plus a 読むらじる。 read-along text companion. NHK News Web carries the same native register in text with attached audio or video.3031 The read-along text is the transcript pairing that makes intensive shadowing practical.
The announcer reads clearly and to a standard. NHK's news-reading standard is about 300 字/分, with a former-announcer working range of 300 to 350 字/分. Converting on the roughly 1 字 ≈ 1 mora basis lands at about 5 to 6 morae/s. This is a calibration from a reading standard rather than a measured newscast rate.32 Clean, standard enunciation is precisely what makes news good shadowing practice once listening is solid.33
News audio is formal written-style (書き言葉), heavy with 漢語 (Sino-Japanese vocabulary) and reporting forms like 〜とみられます (reporting passive) and 〜ということです (hearsay). It does not transfer to casual conversation.33 JLPT listening is also slower and cleaner than a real NHK newscast. Passing N3 listening does not predict you can shadow NHK news.4 Shadow it for formal register and articulation, then pair it with casual input.
One access note: らじる★らじる is 日本国内限定 (Japan-only, IP-enforced); from abroad, use NHK News Web, NHK WORLD-JAPAN audio, and NEWS WEB EASY instead.3031
N1: variety shows and native vloggers
The N1 tier means full native speed, slang, and overlapping speech. It sits above the JLPT ceiling. Recommendation, not a gate.4
Variety shows: the unscripted native ceiling
バラエティ番組 (variety shows) stack every hard variable at once: spontaneous overlapping speech, rapid wordplay, dialect, on-screen テロップ, and dense cultural reference.34 The genre as a whole fails the single-speaker and short criteria. At N1, the shadowing-fit move is to shadow short, isolated single turns, not crosstalk.
Spontaneous speech averages 8.01 morae/s and is far more variable than scripted speech (SD 2.07). In the fast tail, about 0.1% of utterances exceed 14.2 morae/s. That is the measured basis for the rapid-fire feel.2835 CSJ is academic and simulated-speech data. It is an anchor rather than a measurement of variety TV. The JLPT publishes no overlapping multi-speaker, dialect-dense listening section, so passing N1 does not predict variety comprehension; treat variety as above the test ceiling.428
On-screen テロップ are not a transcript. They are selective, stylized, affect-driven open captions pitched at native reading speed and often kanji-heavy. They emphasize and editorialize rather than transcribe, so they do not satisfy the transcript criterion.34
Much TV comedy talent also arrives through the Osaka お笑い pipeline, so variety carries a baseline of 関西弁 (Kansai dialect, such as copula や and negative 〜へん) that standard-dialect study never teaches. One verified show wears it in its title, ガキの使いやあらへんで (NTV, 1989).3536 Recognize that dialect, but do not necessarily shadow it into standard output. For difficulty illustration only, 水曜日のダウンタウン (TBS, 2014) and ガキの使いやあらへんで (NTV, 1989) are real shows cited as anchors. They are not a viewing prescription.3736
Native vloggers and YouTube channels
A to-camera native vlog keeps the single-speaker support that variety removes. So self-recorded single-speaker native audio is the more shadow-able native tier.38 Three picks span the range:
- HikakinTV (Hikakin) offers broad daily vlogging at full conversational speed, with frequent on-screen Japanese text but no guaranteed native subtitles. The on-screen text therefore only partially serves the transcript criterion.38
- That Japanese Man Yuta runs subtitled street interviews. Interviewees speak at natural conversational speed, but the subtitles keep it followable. This makes it a transcript-supported bridge for isolating short native turns.39
- NHK きょうの料理, a long-running NHK cooking program (since 1957), has narrow, repeating cooking vocabulary and a calmer instructional pace. It is a gentler native on-ramp than vlogger channels and a durable public-broadcaster source.40
Two flags close the tier. Entertainer and vlogger register (exaggerated reactions, slang, gendered or anime-adjacent phrasing) is authentic but does not transfer cleanly to formal or workplace Japanese. Shadow for the ear, and study register separately.38 For native YouTube, turn on Japanese (not English) auto-captions where available. English subs convert shadowing-prep into reading and do not serve verification.3839
Good to know
The transcript is non-negotiable for shadowing
You shadow from sound with no text in the moment, but you cannot diagnose what you mis-shadowed afterward without a reference text. Kadota and Tamai's procedure builds on having a script available to synchronize against before it is removed. With no transcript at any stage, there is no audit trail for your errors.97
Do not waive this criterion at any level. It is why transcript-bundled resources (Sakura Tips, NHK News Web Easy, 読むらじる。, textbook scripts) outrank transcript-poor ones for shadowing specifically.9132330
Shadow one notch below your comprehension level
If the audio is above your level, you fall off the stream and produce noise rather than meaning. Shadowing's listening payoff concentrated in lower-proficiency learners working with level-appropriate material. An over-level text causes cognitive overload.1 Because production runs alongside perception, the shadowing-fit level sits below the passive-comprehension level. "JLPT-correct but too fast" becomes parroting noise.
The "one notch below" heuristic is a pedagogical recommendation; the supporting datum is the proficiency-dependence finding, not a study that directly tested difficulty calibration.1
Don't ship anime/variety register into real speech
Anime carries 役割語 (role language) "usually partially or entirely distinct from the real-life language" it depicts (わし and 〜じゃ, 拙者 and 〜でござる, gendered enders). Variety and vlogger speech carry slang and gendered or anime-adjacent phrasing.25263438 A learner who shadows these as defaults imports a fictional or rough register into real speech.
The fix is to shadow these sources for the ear, then run a "would a real person say this to a real person in my situation?" filter before adopting any phrase. Shadow contemporary 標準語 (drama, news, vlogs) for production-safe register.2526
The JLPT-audio trap
JLPT listening is deliberately slowed, over-articulated, and contraction-free. The slowing shrinks from N5 to N1 but is present below N1.4 Shadowing trains the ear to whatever input it is fed. Practicing only slow, over-articulated test audio builds perception of slow, over-articulated speech, not native-rate connected speech.18 Real speech is much faster: spontaneous speech is about 8.0 morae/s, and broadcast is about 450 to 570 morae/min, compared with the learner-preferred 320 to 360 morae/min.23
The fix is to keep native-rate material (drama, news, vlogs) in the shadowing rotation alongside any test-style audio. There is no single study pitting JLPT audio against native audio for shadowing; this is a reasoned implication of the mechanism.18 Put plainly: shadowing JLPT audio trains JLPT ears, not native ears.
See also
- Bilingual News and Other Native-Level Japanese Podcasts: Listening with No Learner Accommodation
- Transcription Drills for Japanese Listening: Using Dictation to Train Your Ear
- The Daily Listening Loop: A 30-Minute Japanese Routine
- Active vs. Passive Listening in Japanese: When Each Actually Works
- Japanese Pronunciation Drills: A Daily 5-Minute Protocol with Minimal Pairs, Shadowing, and Record-and-Compare