Mokuro: OCR for Japanese Manga
Mokuro is a free, open-source command-line tool that runs optical character recognition (OCR) on Japanese manga.12 That lets a popup dictionary like Yomitan read the text directly off the page. For a learner who can read prose with a hover dictionary but gets stuck on image-based manga, Mokuro is the missing layer that turns scanned panels into lookup-able pages.
Overview
A manga page is a raster image: a grid of colored pixels. To a browser, a speech bubble is just pixels, with no text underneath for a hover dictionary to scan. Mokuro closes that gap by recognizing the Japanese on each page ahead of time and laying an invisible, selectable text layer over the artwork.
What Mokuro does
Its author describes Mokuro plainly as a tool to "Read Japanese manga with selectable text inside a browser," one "aimed towards Japanese learners, who want to read manga in Japanese with a pop-up dictionary like Yomitan."2 Mokuro is open-source and runs entirely from the terminal. You install it with pip3 and run it with the mokuro command.12
Mokuro does not perform the character recognition itself. It coordinates two underlying dependencies: comic-text-detector, which finds where text sits on the page, and manga-ocr, which reads the detected text.2
The practical result is a positioned overlay. Mokuro records the recognized text plus the bounding-box coordinates, or text-region positions, of each bubble or panel. The page can then render the original image with invisible, selectable text aligned to it.2 All of that recognition happens up front: "All processing is done offline (before reading)."2
The engine doing the reading, manga-ocr, is "Optical character recognition for Japanese text, with the main focus being Japanese manga." It is built as a custom end-to-end model on the Transformers Vision Encoder Decoder framework.3 It is trained for manga-specific conditions: "both vertical and horizontal text," "text with furigana," "text overlaid on images," a "wide variety of fonts and font styles," and "low quality images." It also "supports recognizing multi-line text in a single forward pass."3
Provenance matters for a tool you install and run. Both Mokuro and manga-ocr come from the same author, kha-white. The Mokuro repository is licensed GPL-3.0, while manga-ocr carries a separate Apache-2.0 license.14
Where it fits in the reading toolkit
Mokuro is the manga-specific member of a digital-reading toolkit. For image-based manga, it plays the same role that an in-browser e-reader plays for EPUB prose: it exposes selectable text where there was none, so the dictionary layer has something to read. (That comparison is positioning, not an equivalence. The prose-side tool and the dictionary itself are separate articles.)
It sits upstream of the dictionary, not in competition with it. Mokuro produces the selectable text, and the hover dictionary reads it. Mokuro's stated purpose is explicitly pairing "with a pop-up dictionary like Yomitan."2
How the processing pipeline works
Processing a volume is a one-time setup step you run before you open the manga to read. The flow moves from a folder of page images, through detection and OCR, to a .mokuro file that a web reader can open.
Requirements and install
Mokuro requires "Python 3.10 or newer."2 Before you install, note one caveat: "The newest Python release might not be supported due to a PyTorch dependency." A bleeding-edge Python version can break the install, so a slightly older 3.1x line is the safer choice.2
The install itself is a single command:
pip3 install mokuro
A GPU is optional. Mokuro runs on CPU. Installing PyTorch with CUDA support to enable GPU acceleration is a step the repository notes "can be skipped."12 CPU-only operation works, but it is slower, with no published timing figure to quote.
The commands, flags, and Python requirement here reflect the upstream repository. Version-specific behavior is pinned to the release that introduced it. A tool under active development can change its interface, so treat the README as the source of truth if it disagrees.2
Processing a volume
Mokuro works with a folder-per-volume layout: you point the mokuro command at a directory of page images for a single volume. For one volume, the call is:
mokuro /path/to/manga/vol1
Several volumes can be processed in one call by listing them:
mokuro /path/to/manga/vol1 /path/to/manga/vol2 /path/to/manga/vol3
If the volumes sit under one parent directory, point Mokuro at the parent instead. The README's structure example is:
manga_title/
├─vol1/
├─vol2/
├─vol3/
└─vol4/
Then run:
mokuro --parent_dir manga_title/
This is a one-time, offline step per volume, done before reading: "All processing is done offline (before reading)."2 You pay the processing cost once. After that, you can read as many times as you like with no further OCR.
What Mokuro outputs
The maintained output is the .mokuro file. That format "was introduced in version 0.2.0" and "contains only the OCR results and metadata." In other words, it keeps the recognized text and its positions separate from the images themselves.2
Mokuro also still emits a legacy HTML file for backward compatibility, but the README marks that path as legacy and says it "will not be developed further."2 New work should follow the .mokuro plus web-reader path. The HTML output is there for older setups, not for new ones.
A few flags are worth knowing, quoted from the README's option list:2
| Flag | What it does |
|---|---|
--force_cpu | "Force the use of CPU even if CUDA is available." |
--disable_ocr | "Generate mokuro/HTML files without OCR results." |
--disable_html | "Disable legacy HTML output." |
The reading experience
Once a volume is processed, you read it in a browser. The selectable overlay is what makes ordinary text selection, and therefore hover-dictionary lookup, work on top of the artwork.
Opening a processed volume
For the maintained path, you load the .mokuro file together with the manga images in the web reader at reader.mokuro.app.25 The README says it directly: "Load the .mokuro file together with manga images in web reader."2
The legacy HTML path opens differently. That older HTML output opens directly in a browser. Per the README, "You can transfer the resulting HTML file together with manga images to another device (e.g. your mobile phone) and read there."2
Either way, recognized text appears as a selectable overlay positioned over each panel or bubble. That lets panel navigation and ordinary browser text selection both work over the image.2
Hover-to-lookup with Yomitan
This is the payoff. With selectable text overlaid on the page, a hover dictionary scans it just as it would scan any other web text. Hover a word, and the reading, definition, and pitch appear.2 The article "Yomitan (Yomichan): The Hover-Dictionary Workflow" covers that dictionary layer in full.
The dependency runs one direction. Mokuro supplies only the selectable text. It does not carry definitions of its own, so the lookup itself comes from Yomitan or an equivalent dictionary reading that text.2
From lookup to Anki
Once you look up a word over the manga overlay, you can mine it into an Anki card through the dictionary's card-creation flow, just like a word looked up in any other text. The mechanics of that one-click step live in "Yomitan + Anki: One-Click Card Creation". The broader method of building a deck from what you read is covered in "Sentence Mining: Building Your Own Japanese Anki Deck From What You Read".
What Mokuro adds to that workflow is reach. Manga that was previously un-mineable because it was a flat image becomes a source of cards once the overlay is in place.
Good to know
The "Allow access to file URLs" gotcha
The legacy local HTML path has a Chromium-specific catch. When a Mokuro HTML file is opened from a file:// address, the Yomitan extension cannot scan it until you grant file-URL access for the extension.6 Yomitan's own documentation states the requirement in its "Scanning local files and PDFs" section: "In order to use Yomitan with local files in Chrome, you must first tick the Allow access to file URLs checkbox for Yomitan on the extensions page."6
The hosted web reader at reader.mokuro.app sidesteps this entirely. It is served over https:// rather than file://, so the file-URL toggle never enters the picture.256 The actual checkbox steps are documented in "Yomitan (Yomichan): The Hover-Dictionary Workflow". If you are on the local HTML path, follow them there.
OCR is good, not perfect
manga-ocr is purpose-trained, but its author names clear limits. Occasional errors are normal rather than a sign something is broken. On text length: "OCR supports multi-line text, but the longer the text, the more likely some errors are to occur."3 On handwriting: "It probably won't be able to handle handwritten text though."3
There is also a quirk on empty regions: "The model always attempts to recognize some text on the image, even if there is none ... it might even 'dream up' some realistically looking sentences." The author notes this is unlikely to cause practical problems in normal use.3
None of this needs a workaround beyond the overlay's own design. Because the recognized text is selectable, a reader who hits a misrecognized word can re-select it or look it up manually. No accuracy percentage is claimed upstream, so none is asserted here.
What to read with it
Mokuro makes raw manga lookup-able, but it does not choose the manga or rank its difficulty. It is a processing and reading tool, not a recommendation engine.12 For picking titles at the right level, the article "Manga for Japanese Learners: A Difficulty-Sorted Guide" handles difficulty selection.
See also
- Yomitan (Yomichan): The Hover-Dictionary Workflow
- Yomitan + Anki: One-Click Card Creation
- ttu-reader: The In-Browser E-Reader for Japanese
- ASBPlayer: Subtitle-Based Sentence Mining for Anime
- Manga for Japanese Learners: A Difficulty-Sorted Guide
- Sentence Mining: Building Your Own Japanese Anki Deck From What You Read