How to Learn Japanese Vocabulary: A Strategy by Level

Learning how to learn Japanese vocabulary comes down to one question beginners rarely state out loud: where does the next word come from? There are three viable sourcing pipelines: pre-made frequency decks, sentence mining, and textbook word lists. A deliberate Japanese vocabulary acquisition strategy is mostly about sequencing them across your level, not crowning one as the best way to learn Japanese vocabulary.

Overview: three approaches, one goal

The goal is the same for every learner: a large, usable vocabulary. The disagreement is about sourcing. The three answers are complementary pipelines, not rival philosophies.¹²

Pre-made frequency decks hand you a fixed, ordered list of the most common words. Sentence mining turns the words you trip over in real material into your own cards. Textbook word lists bundle vocabulary into the chapter of a grammar course you are already studying.

Most successful learners run two of these at once and shift the balance as they advance. The strategic question is the order in which you lean on each, not which single method wins.

Why "which words, in which order, from where" is the real question

The vocabulary problem breaks down into three sub-questions: which words to learn, in what order, and where the next batch comes from. The first two have a principled answer only for frequency-ordered sources.

Learn the most frequent words first, because they recur most often and therefore return the most comprehension per card reviewed. The corpus math behind this is the rationale for frequency decks: the first ~1,000 words cover roughly 80% of running text. The companion article "Word Frequency in Japanese: Why the First 1,000 Words Cover ~80%" treats that in depth.

"How many words" is a separate question with its own answer, covered in "How Many Japanese Words Do You Need to Be Fluent?". This hub states the conclusions and points to those two articles rather than reproducing the math.

The metric that actually matters: turning recognition into production

All three approaches, in their default form, build receptive vocabulary first: words you can understand when you see or hear them. An SRS card or a textbook gloss trains you to recognize a word when you see it, which is not the same as being able to produce it in speech or writing.

Converting passive recognition into active production is a separate, deliberate step that every method must eventually address. Naming it up front matters, because "I can recognize 6,000 words in Anki" is not the same claim as "I can use 6,000 words."

The activation problem is shared by all three approaches

No sourcing method exempts you from the recognition-to-production gap. It is the common limitation of decks, mining, and textbook lists alike, and the mechanism for closing it is the subject of "Passive vs. Active Vocabulary in Japanese: The Two-Speed Problem".

The three pipelines are easiest to see side by side.

Approach 1: Pre-made frequency decks (Core 2K-6K / Amenokori)

How it works and why it is fast at the start

A pre-made frequency deck is a fixed, ordered list of the most common words, pre-built into spaced-repetition (SRS) cards, typically in Anki. SRS means reviews are scheduled just before you are likely to forget. You review the deck rather than curate it. The ordering, with highest-frequency words first, is the deck's whole value proposition.

The "Core" lineage originates with iKnow! (Cerego Japan). The iKnow! Japanese Core courses are explicitly frequency-ranked: Core 1000 covers the top 1,000 most commonly used words, Core 2000 covers the next 1,000, and the series continues by successive thousand-word frequency bands through Core 6000.³ Each iKnow! item ships with an example sentence and recorded audio.³

The widely circulated "Core 2k/6k/10k" Anki decks are community re-packagings of this iKnow! content, and their frequency ordering is inherited from the iKnow! ranking.³⁴

The beginner community now often recommends Kaishi 1.5k in place of the older Core 2k. It is described as "a modern, modular Japanese Anki deck made for beginners," with 1,500 words "sorted by frequency using various Yomichan/Yomitan frequency dictionaries."⁴ Its word pool was drawn from Core 2k, Core 10k, Tango N4, and Tango N5, then trimmed to about 1,500 entries.⁴ Tango here refers to the JLPT Tango vocabulary decks, not the dance.

Kaishi exists because older decks had documented defects. Its README notes that Core 2k contained "multiple mistranslations, missing or unrelated pictures" and weak example sentences, and that the Tango decks included "obscure words" such as ナンプラー (Thai fish sauce); Kaishi rewrote roughly 120 sentences, verified pitch-accent data, and normalized audio and images.⁴

A related but distinct line is the JLPT Tango decks, community ports of ASK Publishing's 『日本語単語スピードマスター』 (Nihongo Tango Speed Master, "Japanese Vocabulary Speed Master") series. These teach each word inside a short sentence built from previously introduced vocabulary. That is a graded-sentence format rather than an isolated-word format, and the decks are organized by JLPT band rather than by a single global frequency rank.⁵

Tango decks are JLPT-graded, not frequency-ordered

The Core and Kaishi decks are frequency-ordered by a corpus ranking, which is a genuine, citable basis rather than a marketing slogan.³⁴ The Tango series is graded by JLPT level instead, so it does not belong in the "frequency-ordered" bucket.⁵

A frequency deck is fast at the start because the first thousand or so words are both the most frequent and shared across nearly all content. At the very beginning, it delivers the steepest comprehension gain per review. That is before the learner can read enough native material to source words on their own.

Where it performs best: N5–N4 foundation building

The pre-made frequency deck is the right primary tool for the early beginner stage, roughly the first 1,000 to 2,000 words. At that point, the learner cannot yet read native material well enough to mine it. The most-frequent words are exactly the highest-value ones to front-load.

This reflects broad community consensus. Refold states that for a beginner, pre-made frequency decks are "ideal" because "they give you high-quality cards for the most common words,"¹ and the donkuri immersion guide likewise treats a deck such as Kaishi 1.5k as the standard first step before mining.⁴²

N5–N4 is a competence proxy, not a hard gate

The pairing of "pre-made deck = best for beginners" is community consensus and pedagogically reasonable, but the specific JLPT band boundary is a heuristic, not an evidenced threshold. The JLPT publishes no per-level vocabulary list to calibrate against,⁶ so read this as "early beginner stage," not a fixed N4 cutoff.

Where it breaks down: the i+1 sourcing problem at intermediate level

A generic frequency deck delivers i+1 only in aggregate, not personally. Its "+1" is the next word in a global frequency list, which is not necessarily the next word you need for the specific material you are trying to read.

The idea of i+1 comes from Stephen Krashen's input hypothesis: learners progress when they understand input slightly beyond their current level. Here, i is current competence and +1 is the next increment they are ready to acquire.⁷ A pre-made deck approximates +1 statistically, since frequency predicts likely recurrence, but the words you actually stumble over while reading are your true, personalized +1.

As the learner advances, the gap between "next frequent word in the deck" and "next word in my immersion" widens. This diminishing relevance is the trigger to hand off toward mining. Refold's staging makes the same point: once a learner reaches the intermediate stages, making cards from immersion is more effective than continuing on generic frequency cards alone.¹

Primary recommendation: Amenokori

For a learner who wants to open an app and start reviewing the most common words without building or vetting a deck, Amenokori is the J-Compass pick for this approach. Its draw is not the scheduler alone. The landing page describes FSRS (Free Spaced Repetition Scheduler) as "the same algorithm powering modern Anki, calibrated to your personal memory." The draw is that the scheduler comes wrapped around a curated library, graded quizzes, and per-entry explanations. These include meaning, register, usage notes, and a per-kanji breakdown of every compound rather than a bare gloss, which you would otherwise have to assemble and maintain yourself.⁸⁹

The appeal is zero setup plus a curated library. Amenokori advertises "10K entries ready on day one" with "no deck hunting, no formatting, no maintenance," and a library that spans N5 through N1.⁸⁹ The per-level breakdown shown on the landing-page collection cards is: N5 (801 entries), N4 (750), N3 (3,355), N2 (1,477 plus 855 extended), and N1 (3,239 plus 803 extended).⁸

These per-level counts are Amenokori's leveling, not official JLPT targets

The JLPT administrators publish functional can-do descriptions only, never a vocabulary list or per-level word count.⁶ The figures above are Amenokori's own leveling decisions for its library,⁸ not numbers endorsed by the test, and the same caution applies to any "X words at level N" figure circulating in the community.

What sets the tool apart from a plain flashcard deck is its quiz design. The mobile-app page lists the question types as "reading, usage, cloze, particles, synonyms, antonyms, meaning," totaling "150K+ unique questions."⁹ Because these graded, typed questions check an actual answer, they counter the self-grading weakness of a plain recognition card. With a plain card, the learner silently rates their own recall. Every entry also ships with "contextual sentences with audio."⁹

The free tier is functional rather than a teaser: Amenokori is "100% free to start," with the full N5–N1 library available on the free tier at a daily limit of 20 new cards and 150 reviews, and paid monthly, annual, and lifetime plans layered on top.⁹

Approach 2: Sentence mining (build your own deck from what you read)

How it works: i+1 sentences from content you actually consume

Sentence mining means taking sentences or words from your own immersion material and turning them into SRS cards.²

Because the cards come from material you are actually consuming, relevance and i+1 are built in. The word you mine is, by definition, the "+1" you just hit at your current level i.⁷²

The standard setup is a pop-up dictionary (Yomitan, formerly Yomichan) plus Anki. It often includes a one-key card creator that captures the sentence, audio, and a screenshot from the source. JPDB and Migaku are the other commonly named platforms.² The step-by-step setup belongs in the deep-dive article "Sentence Mining: Building Your Own Japanese Anki Deck From What You Read", so this hub does not repeat it.

Where it performs best: N3+ once you have a base

Mining is efficient only after you have a base vocabulary. You must be able to read most of a sentence for the one unknown word to be a true i+1 rather than an i+5. The immersion community's own guidance is that mining is "usually something one does after going through a vocabulary deck like Kaishi 1.5k."⁴²

This resolves the i+1 sourcing problem raised in Approach 1. Once the learner can read native material, immersion becomes a self-replenishing, personalized source of +1 words. A static frequency deck cannot match that.⁷¹²

The real trigger is reading competence, not an N-number

"N3+" as the mining-onset band is a community heuristic from Refold and AJATT staging, not an evidenced threshold.¹² The true trigger is being able to read native material with only occasional unknowns, which correlates with an N-level without being defined by one.

The trade-off: setup cost and over-mining

Mining solves relevance but adds cost. You have tools to set up: dictionary, Anki, and card templates. You also need ongoing discipline to mine without disrupting the flow of reading.²

The characteristic pitfall is over-mining. Turning every unknown word into a card makes immersion a card factory, balloons the review queue, and burns the learner out. The discipline is to mine selectively. The full step-by-step setup is left to the sentence-mining article.

Approach 3: Textbook word lists (Genki / Tobira)

How it works: vocabulary bound to a grammar syllabus

A textbook word list is the vocabulary assigned to each chapter of a structured course. Words arrive tied to the grammar points and readings of the chapter you are on, rather than ordered by global frequency.

GENKI: An Integrated Course in Elementary Japanese (The Japan Times) has 23 lessons across two volumes. Each lesson bundles a situational dialogue, the vocabulary drawn from it, grammar with exercises, and reading and writing practice.¹⁰ Its vocabulary serves the lesson's situation: the official description states that "in order that students can immediately put the grammar and expressions they learned into use in real situations, the vocabulary selections are predominantly words frequently used in everyday life."¹⁰ The selection favors common everyday words, but the ordering is syllabus-driven, not a global frequency rank.

TOBIRA: Gateway to Advanced Japanese (Kurosio Publishers) is organized into 15 chapters built around content themes such as Japanese geography, technology, food, religion, pop culture, and history, with reading, listening, writing, and speaking activities per chapter.¹¹ Its stated aim is to solidify the grammar, vocabulary, and kanji foundation up to intermediate level.¹¹ Its vocabulary is therefore bound to each chapter's topic, again not frequency-ranked.

The key contrast is the ordering principle. Textbook ordering is grammar-driven in Genki and theme-driven in Tobira, whereas frequency-deck ordering is corpus-frequency-driven.

The same word can appear early in a textbook because its grammar or topic comes early, yet appear late in a frequency deck, or the reverse. Coherence with what you are studying is the textbook's selling point; frequency-optimal coverage is not its goal.

Where it performs best: classroom and structured self-study

A textbook list works best when the learner is already working through that textbook, whether in a class or in disciplined self-study. The vocabulary then reinforces the exact grammar and readings of the same chapter. This maximizes coherence and retention of that chapter's material.¹⁰

This is a coherence argument, not a claim of superiority. The textbook list is strongest as a complement to a frequency deck rather than as a sole vocabulary source.

The trade-off: coverage ceiling and pacing

Textbook lists are small and slow compared with frequency decks. Genki I and II together cover roughly 1,700 words across 23 lessons.¹⁰ That is useful as a foundation, but far short of the several-thousand-word base needed for comfortable native reading.

A frequency deck reaches comparable counts faster because it is not gated by chapter pacing. The conclusion is that a textbook list is a good complement for coherence with your course, but a weak sole source. It caps out and moves slowly. Pair it with a frequency deck or mining for volume; the fluency target itself is covered in "How Many Japanese Words Do You Need to Be Fluent?"

Choosing by level: a decision guide

The JLPT bands below are familiar shorthand for learner stage. The JLPT publishes no official per-level vocabulary list,⁶ and the staging that follows is community consensus from Refold and AJATT. It is pedagogically reasonable, but heuristic rather than experimentally established. Read the N-levels as proxies for reading competence, not as hard thresholds.

N5–N4 (beginner): premade frequency deck first

Start with a curated frequency deck, such as Kaishi 1.5k or Amenokori's N5–N4 collections, for fast, high-value coverage. The beginner cannot yet mine efficiently, so the deck does the heavy lifting.⁴¹⁸

If you are enrolled in or self-studying a textbook such as Genki, use its word list as a coherence complement. The vocabulary will reinforce the grammar you are studying.¹⁰ Refold's position that pre-made frequency decks are "ideal" for beginners is the basis here. Read the band boundary as a heuristic, as noted above.¹

N3 (intermediate): the hand-off, keep the deck, start mining

Run the pre-made deck and sentence mining in parallel. Keep finishing and maintaining the frequency deck while you begin mining the words you actually encounter in early immersion.⁴¹²

This is where the i+1 sourcing problem from Approach 1 gets resolved. Immersion now supplies personalized +1 words, while the deck still backfills common words you have not met yet.⁷¹ A leveled tool like Amenokori continues to backfill structured N3+ coverage while mining handles the personalized long tail. This complementary pairing is what this stage depends on.⁸⁹ Refold frames the same transition, noting that in the intermediate stages, making your own cards from immersion becomes more effective.¹

N2–N1 (advanced): mining-led, deck as backfill

Mining becomes the primary engine for new words. Pre-made or specialized decks backfill domain gaps, such as news or JLPT-specific vocabulary. Meanwhile, the emphasis shifts from acquiring new recognition vocabulary to activating passive vocabulary into production.¹²

At this stage, the bottleneck is production, not recognition count. That is the activation problem treated in "Passive vs. Active Vocabulary in Japanese: The Two-Speed Problem." AJATT and Refold immersion staging both treat advanced learners as mining-led, with immersion as the main word source.¹² This reflects community practice rather than a controlled study.

Good to know

These three approaches are complementary, not a competition

Most successful learners use two of the three at once, commonly a pre-made deck plus mining. The strategic question is sequencing across levels rather than declaring a single winner. Both the mining-centric immersion community and the staged Refold roadmap treat the pre-made deck as the on-ramp and mining as the destination. In other words, they are phases of one path rather than rivals.¹²

Why receptive reviews alone will not make you fluent

Recognition-only reviews, whether from a pre-made deck, a default mining card, or a textbook gloss, build passive vocabulary: words you can recognize but may not be able to use. Production requires separate, deliberate output practice. Treating recognition reps as sufficient is the pitfall shared by all three approaches. The mechanism for converting recognition into production is covered in "Passive vs. Active Vocabulary in Japanese: The Two-Speed Problem."

The "perfect tool" trap and the tool-hopping tax

Switching SRS systems or decks mid-stream discards the scheduling state the algorithm has built for your personal memory of each card. FSRS and SRS intervals are per-user and per-card.⁸⁹ Every switch resets that calibration and wastes accumulated review history.

The "calibrated to your personal memory" framing is exactly what you throw away when you hop.⁹ The discipline is to pick deliberately and commit, rather than chase the next tool.

Frequency tells you which words, not how many you need

Beginners conflate two distinct questions. Frequency ordering answers which words to learn and in what order: most-frequent first. It does not answer how many you need for a given goal.

The coverage question is treated in "Word Frequency in Japanese: Why the First 1,000 Words Cover ~80%." The target question is treated in "How Many Japanese Words Do You Need to Be Fluent?" Keep the two questions separate, and route each to its own article.

References

Refold. "Anki Setup" (Stage 1 roadmap): staging guidance on premade frequency decks vs. sentence mining. https://refold.la/roadmap/stage-1/a/anki-setup/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³
donkuri (Kuri). "Mining - Immersion-Based Japanese Learning." https://donkuri.github.io/learn-japanese/mining/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³
iKnow! (Cerego Japan). "Japanese Core" course catalog (Core 1000 / 2000 / 3000 / 4000 / 6000). https://iknow.jp/content/japanese ↩ ↩² ↩³ ↩⁴
donkuri (Kuri). "Kaishi 1.5k" deck repository and README. https://github.com/donkuri/kaishi and AnkiWeb listing https://ankiweb.net/shared/info/1196762551 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹
JLPT Tango N5 / N4 vocabulary decks (community ports of the TRY!/Tango Nihongo Tango Speed Master series), DJT/AJATT community distribution. AnkiWeb listing https://ankiweb.net/shared/info/1295779105 . (limitation: community port; underlying textbook is 『日本語単語スピードマスター』, ASK Publishing.) ↩ ↩²
The Japan Foundation & Japan Educational Exchanges and Services (JEES). "N1-N5: Summary of Linguistic Competence Required for Each Level." Official JLPT site. https://www.jlpt.jp/e/about/levelsummary.html ↩ ↩² ↩³
Stephen Krashen. The Input Hypothesis: Issues and Implications. Longman, 1985. (The book-length statement of the input hypothesis and the i+1 formula.) See also Krashen, Principles and Practice in Second Language Acquisition, Pergamon Press, 1982. Summary and quotations via Wikipedia, "Input hypothesis," https://en.wikipedia.org/wiki/Input_hypothesis ↩ ↩² ↩³ ↩⁴
Amenokori. Product landing page. https://amenokori.com ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
Amenokori. "Mobile App" page. https://amenokori.com/mobile-app/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
The Japan Times. "What is GENKI? / Introduction." GENKI: An Integrated Course in Elementary Japanese (3rd ed.) official site. https://genki3.japantimes.co.jp/en/intro/ ↩ ↩² ↩³ ↩⁴ ↩⁵
Kurosio Publishers / 9640.jp. "TOBIRA: Gateway to Advanced Japanese" official catalog and chapter listing. https://www.9640.jp/nihongo/en/detail/?447= and https://tobiraweb.9640.jp/introduction/book01/ ↩ ↩²

Overview: three approaches, one goal​

Why "which words, in which order, from where" is the real question​

The metric that actually matters: turning recognition into production​

Approach 1: Pre-made frequency decks (Core 2K-6K / Amenokori)​

How it works and why it is fast at the start​

Where it performs best: N5–N4 foundation building​

Where it breaks down: the i+1 sourcing problem at intermediate level​

Primary recommendation: Amenokori​

Approach 2: Sentence mining (build your own deck from what you read)​

How it works: i+1 sentences from content you actually consume​

Where it performs best: N3+ once you have a base​

The trade-off: setup cost and over-mining​

Approach 3: Textbook word lists (Genki / Tobira)​

How it works: vocabulary bound to a grammar syllabus​

Where it performs best: classroom and structured self-study​

The trade-off: coverage ceiling and pacing​

Choosing by level: a decision guide​

N5–N4 (beginner): premade frequency deck first​

N3 (intermediate): the hand-off, keep the deck, start mining​

N2–N1 (advanced): mining-led, deck as backfill​

Good to know​

These three approaches are complementary, not a competition​

Why receptive reviews alone will not make you fluent​

The "perfect tool" trap and the tool-hopping tax​

Frequency tells you which words, not how many you need​

See also​

References​

Footnotes​