Skip to main content

Sentence Mining: Building Your Own Japanese Anki Deck From What You Read

Sentence mining means pulling sentences with unknown words from Japanese you are already reading or watching. You look those words up and save the sentences as flashcards for review.1 The deck you end up with is built entirely from material you encountered yourself, so every card carries the context, the source, and the surrounding words you already knew.

Overview

Sentence mining sits at the center of an immersion-based study loop: you consume native content, extract sentences that are just beyond your reach, and feed them into a spaced-repetition system (SRS) so the new words stick. The method rests on one selection rule and a small set of dictionary-plus-Anki tools. It works only if you keep a daily discipline that stops your review pile from collapsing.

This guide connects the underlying theory to a concrete daily routine. It compares the three common mining stacks, names the failure mode that derails most learners, and frames mining your own deck and using a pre-built one as complementary paths rather than rivals.

What Sentence Mining Is

Sentence mining is the process of finding sentences with unknown words in your immersion content, looking up the words, and saving them as flashcards for later review.1 The word "mining" is literal: you are extracting useful sentences out of authentic input you are already consuming, rather than studying a deck someone else assembled.1

The result is a deck unlike a downloaded wordlist. A mined card holds a sentence you actually encountered, complete with the scene it came from and the known words around it. It is not a bare term stripped of context.1

The practice was popularized for Japanese by the All Japanese All The Time (AJATT) blog and its "10,000 Sentences" method. In that method, the learner collects and reviews thousands of full example sentences from dictionaries, books, film, and other native material rather than memorizing word lists.2 AJATT's rationale was that "a correct example sentence is nothing other than a set of words arranged according to grammar rules," so a sentence carries both the word and the sense in which to use it.2

The same input-plus-SRS workflow carried forward through the Mass Immersion Approach and then Refold, which restated it in a language-agnostic form.13 The Refold roadmap pages remain a standing community reference for the mechanics.13

The i+1 Principle: Why You Mine Sentences, Not Words

The i+1 principle traces back to Stephen Krashen's Input Hypothesis. Krashen's claim is that "we acquire by understanding language that contains structure a bit beyond our current level of competence (i + 1)," helped by context or extra-linguistic information.4 Here i is the learner's current competence, and i+1 is the next step just beyond it.4

In Krashen's original model, i+1 is a property of a whole stream of comprehensible input, not of any single sentence. He states that input "need not contain only i + 1," and that if the acquirer understands enough input, "i + 1 will be provided automatically."4 He goes further, arguing that deliberately aiming input at i+1 through a "structure of the day" syllabus "is not necessary" and "may even be harmful."4

The sentence-mining community uses the same label more narrowly. In applied mining, an "i+1 sentence" means a single sentence with exactly one unknown item. This one-sentence-equals-i+1 reading is the community's working definition, not Krashen's original meaning.41

Two senses of the same label

When a mining guide calls a sentence "i+1," it means one unknown word in an otherwise understood sentence. When Krashen wrote i+1, he meant the comprehensibility of a whole body of input. Both uses are useful, but they are not the same claim.41

Refold gives the applied target a memorable name: the "1T sentence" (one target). It means a sentence "where you already understand most of the words and only need to learn one or two new ones."1

What Counts as an i+1 Sentence

A 1T (one-target) sentence is low-hanging fruit: you can mostly already read it, and it holds a single unknown word or grammar point.1 If the only thing you do not know in the sentence below is the verb 読む (yomu, "to read"), it is a clean i+1 candidate.

かれまえほんむ。5
"He reads before bedtime."

Sentences with several unknowns at once are the opposite of this. Refold's guidance is explicit: do not mine sentences with three or more unknown words, because at that density you do not even understand the sentence.3 In community shorthand, those are i+2 or i+3 sentences. The line below is too dense to mine if both かかさず ("without fail") and 新聞 (shinbun, "newspaper") are unknown.

かれ毎朝まいあさかかさず新聞しんぶんむ。6
"He never misses reading the papers every morning."

The reason to hold the line at one target is practical: if a card tests several unknowns at once, you may not remember which word the card is meant to teach. Review becomes slow and ambiguous.1 A crisp single-target card reaches the goal faster.1

When You're Ready to Start Mining

Mining is efficient only once most sentences in your input are already close to understandable. Until you can read a sentence and find just one unknown in it, almost every sentence is i+2 or worse. The search for 1T sentences stalls.13

This is why building a foundation of high-frequency vocabulary comes before mining pays off. The mining-specific point is simple: you cannot find one-target sentences if you do not yet know most of the words in your content.1 The coverage math behind that foundation belongs to the companion article on word-frequency coverage. It explains how a small core of frequent words accounts for most of the running text you meet.

Test your readiness on a page, not a number

Open a page of the content you want to mine and read a handful of sentences. If you can follow most of them and get stuck on roughly one word each, you are ready. If nearly every sentence has several unknowns, spend more time on foundational vocabulary first. Mining will only frustrate you.13

The Mining Toolchain

Three tool stacks dominate Japanese sentence mining. They differ mainly in how much they automate the path from "word I do not know" to "card in my review queue."

The diagram shows the same loop every tool follows. The sections below differ only in which steps you do by hand and which the tool does for you.

Yomitan + Anki (the DIY browser-reading stack)

Yomitan is a free browser extension for language learners and the maintained successor to the Yomichan project.7 Its core feature is pop-up lookup: hover over a word and press a key to see definitions, word frequencies, native audio, and example sentences without leaving the page.7

It handles audio and card creation directly. Yomitan can play native-speaker recordings, pull in custom sources such as Forvo, and export that audio to your flashcards.7 A single keypress turns a lookup into a rich Anki card and sends it to your deck.7

The Anki side of that handoff runs over AnkiConnect, an Anki plugin that exposes a local API so external tools can add notes and cards programmatically.8 Together they form the manual DIY baseline: read in the browser, hover to look up the unknown word, and send the sentence into Anki with one keypress.78

JPDB (auto-mining and difficulty-graded media)

jpdb describes itself as a Japanese dictionary and all-in-one learning system. It uses its own spaced-repetition algorithm, which it bases on machine-learning techniques rather than a fixed interval formula.9

Its media library is graded by difficulty. jpdb offers tens of thousands of prebuilt decks covering vocabulary from over a thousand anime. It also offers decks for visual novels, light novels, and web novels, presented in difficulty-ordered lists.9

Its mining is largely automatic. Paste in text, and jpdb extracts the vocabulary. It then generates automatic i+1 sentence cards that show new words in context with the known words around them, drawing on a database it describes as over 130 million Japanese sentences.9 It tracks progress across all decks and imposes no enforced daily new-card cap.9

Migaku (the streamlined all-in-one option)

Migaku is a Chrome extension that turns the websites and videos you already watch into learning material.10 You click words in websites and subtitles to look up definitions, pronunciations, images, and AI-generated explanations.10

Each card it builds captures the new word, its context sentence, audio, a screenshot, the definition, and the AI explanation, all in a few clicks.10 A built-in SRS then schedules those cards for review at spaced intervals.10

Migaku works across YouTube, Netflix, Twitter, and other sites, with iOS and Android apps and a web review tool alongside the extension.10

The Daily Mining Routine

Theory and tools only matter if the habit holds. A workable routine keeps mining bounded inside a much larger amount of plain immersion.

Set a Card Quota, Then Keep Immersing

The discipline is to mine a fixed number of cards, then stop mining and return to free immersion. Refold lists "only mining, never freeflowing" as a mistake. That is a signal that mining time has to stay balanced against unstructured input.3

A self-set daily new-card quota is the standard defense against backlog. No primary source gives a single universal number, so the figure is yours to choose. The rule that matters is that there is a cap and you respect it.3

What a Good Mined Card Looks Like

A good mined card carries the target word, the full sentence it appeared in, the reading, the definition of the target word, and, where available, native audio and a screenshot of the source scene.1710 In Refold's basic format, the sentence goes on the front. The back gives the definition of the target word and, optionally, a screenshot of the scene the sentence came from.1

If 読む (yomu, "to read") is the target word, a mined card would put a context line like this on the front. The reading and gloss would stay on the back.

しゅうに3さつほんむよ。11
"I read three books a week."

Mined cards are receptive first: they train you to recognize the word when you meet it again.1 Converting that recognition into productive, active use comes later. The move from passive to active vocabulary is its own topic.

Common Pitfalls

Over-Mining and Backlog Burnout

The classic failure mode is mining far more cards than you can review. Refold puts it plainly: it is easy to save 30 or more cards in a session. If you cannot review them all, they pile up. Delete cards that are not useful rather than letting them create a backlog.13

The fix is the quota, paired with a willingness to delete low-value cards instead of hoarding them.1 Deck size is not the goal. Reviewable cards are.

Mining i+2 and i+3 Sentences

Mining sentences with multiple unknowns produces cards with a high failure rate and an unclear target. Refold's rule stands: do not mine sentences with three or more unknown words, because you do not even understand the sentence yet.3

A multi-unknown card forces you to guess which item the card is testing, which slows review and weakens retention.1 One target per card is the corrective.1

When NOT to Mine: The Pre-Built Deck Alternative

Mining is the do-it-yourself path. It requires a tooling setup, whether a browser extension plus Anki or an all-in-one, and the ongoing work of building and pruning your own deck.13

When you would rather not build or mine your own deck, the answer is a pre-built, curated, pre-scheduled one, and J-Compass recommends Amenokori for it. Framed as "Anki results, zero setup," it ships vocabulary, grammar, and kanji decks on the FSRS algorithm, more than ten thousand pre-optimized entries, and graded coverage from N5 through N1. That means the breadth layer is ready the day you decide you want one: no tooling setup, no pruning.12

Two ways to fill the same gap

A self-mined deck is maximally personal: your context, your gaps, at the cost of setup and upkeep. A pre-built FSRS deck is zero-setup and curated, but it is not drawn from your own immersion. The two are complementary, and many learners run a pre-built deck for breadth while mining the words their own reading brings up.12

Good to know

Where "i+1" comes from

The term is Krashen's: "We acquire by understanding language that contains structure a bit beyond our current level of competence (i + 1)."4 The immersion community, from AJATT through the Mass Immersion Approach to Refold, borrowed the label and applied it to a single mined sentence, meaning one sentence with one unknown. That applied use is narrower than Krashen's original whole-input model.412

"1T" means one target

Refold's name for the ideal mining sentence, the "1T sentence," is a mnemonic: one target word or grammar point per card.1 If a sentence has more than one thing you do not know, it is not 1T and should usually be skipped.13

Treating mining as a numbers game

The wrong mindset is saving 30 or more cards a session to feel productive, then drowning in reviews.1 The right mindset is to mine a small, bounded set of clean 1T cards, then immerse, and delete cards that turn out useless.13 The value lives in reviewable, single-target cards, not in deck size. Backlog is the documented burnout trigger.3

Mining and pre-built decks are complementary, not rival

A self-mined deck is maximally personal but costs setup and upkeep. A pre-built FSRS-scheduled deck is zero-setup and curated but not drawn from your own immersion. Amenokori's mobile app, for instance, bundles vocabulary, grammar nuance, and kanji in a single app. It ships more than ten thousand entries already built and optimized, attaches contextual sentences with audio to every entry, and schedules them with FSRS across N5 to N1.13 Running a pre-built deck for breadth while mining for the words your own reading surfaces is a common and reasonable combination.1213

See also

References

Footnotes

  1. Refold. "2A: Basic Sentence Mining." Refold Roadmap, Stage 2. https://refold.la/roadmap/stage-2/a/basic-sentence-mining/ 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

  2. AJATT (Khatzumoto). "10,000 Sentences: Why." All Japanese All The Time. https://alljapanesealltheti.me/10000-sentences-why/index.html 2 3

  3. Refold. "2B: Advanced Sentence Mining." Refold Roadmap, Stage 2. https://refold.la/roadmap/stage-2/b/advanced-sentence-mining/ 2 3 4 5 6 7 8 9 10 11 12 13

  4. Krashen, Stephen D. Principles and Practice in Second Language Acquisition. Pergamon Press, 1982 (internet edition, sdkrashen.com). The Input Hypothesis: pp. 20–23. https://www.sdkrashen.com/content/books/principles_and_practice.pdf 2 3 4 5 6 7 8

  5. Tatoeba Project. Sentence #873959 (jpn) with English translation. CC-BY 2.0 FR. https://tatoeba.org/en/sentences/show/873959

  6. Tatoeba Project. Sentence #99824 (jpn) with English translation. CC-BY 2.0 FR. https://tatoeba.org/en/sentences/show/99824

  7. Yomitan. Product site and documentation. https://yomitan.wiki/ 2 3 4 5 6

  8. AnkiConnect (foosoft). Project repository and documentation. https://git.sr.ht/~foosoft/anki-connect 2

  9. jpdb. Product landing page and feature description. https://jpdb.io/ 2 3 4

  10. Migaku. Product landing page and feature description. https://migaku.com/ 2 3 4 5 6

  11. Tatoeba Project. Sentence #10129724 (jpn) with English translation. CC-BY 2.0 FR. https://tatoeba.org/en/sentences/show/10129724

  12. Amenokori. Product landing page. https://amenokori.com 2 3

  13. Amenokori. Mobile app page. https://amenokori.com/mobile-app/ 2