Mokuro: OCR for Japanese Manga

Mokuro is a free, open-source command-line tool that runs optical character recognition (OCR) on Japanese manga.¹² That lets a popup dictionary like Yomitan read the text directly off the page. For a learner who can read prose with a hover dictionary but gets stuck on image-based manga, Mokuro is the missing layer that turns scanned panels into lookup-able pages.

Overview

A manga page is a raster image: a grid of colored pixels. To a browser, a speech bubble is just pixels, with no text underneath for a hover dictionary to scan. Mokuro closes that gap by recognizing the Japanese on each page ahead of time and laying an invisible, selectable text layer over the artwork.

What Mokuro does

Its author describes Mokuro plainly as a tool to "Read Japanese manga with selectable text inside a browser," one "aimed towards Japanese learners, who want to read manga in Japanese with a pop-up dictionary like Yomitan."² Mokuro is open-source and runs entirely from the terminal. You install it with pip3 and run it with the mokuro command.¹²

Mokuro does not perform the character recognition itself. It coordinates two underlying dependencies: comic-text-detector, which finds where text sits on the page, and manga-ocr, which reads the detected text.²

The practical result is a positioned overlay. Mokuro records the recognized text plus the bounding-box coordinates, or text-region positions, of each bubble or panel. The page can then render the original image with invisible, selectable text aligned to it.² All of that recognition happens up front: "All processing is done offline (before reading)."²

manga-ocr is purpose-built for the page, not generic OCR

The engine doing the reading, manga-ocr, is "Optical character recognition for Japanese text, with the main focus being Japanese manga." It is built as a custom end-to-end model on the Transformers Vision Encoder Decoder framework.³ It is trained for manga-specific conditions: "both vertical and horizontal text," "text with furigana," "text overlaid on images," a "wide variety of fonts and font styles," and "low quality images." It also "supports recognizing multi-line text in a single forward pass."³

Provenance matters for a tool you install and run. Both Mokuro and manga-ocr come from the same author, kha-white. The Mokuro repository is licensed GPL-3.0, while manga-ocr carries a separate Apache-2.0 license.¹⁴

Where it fits in the reading toolkit

Mokuro is the manga-specific member of a digital-reading toolkit. For image-based manga, it plays the same role that an in-browser e-reader plays for EPUB prose: it exposes selectable text where there was none, so the dictionary layer has something to read. (That comparison is positioning, not an equivalence. The prose-side tool and the dictionary itself are separate articles.)

It sits upstream of the dictionary, not in competition with it. Mokuro produces the selectable text, and the hover dictionary reads it. Mokuro's stated purpose is explicitly pairing "with a pop-up dictionary like Yomitan."²

How the processing pipeline works

Processing a volume is a one-time setup step you run before you open the manga to read. The flow moves from a folder of page images, through detection and OCR, to a .mokuro file that a web reader can open.

Requirements and install

Mokuro requires "Python 3.10 or newer."² Before you install, note one caveat: "The newest Python release might not be supported due to a PyTorch dependency." A bleeding-edge Python version can break the install, so a slightly older 3.1x line is the safer choice.²

The install itself is a single command:

pip3 install mokuro

A GPU is optional. Mokuro runs on CPU. Installing PyTorch with CUDA support to enable GPU acceleration is a step the repository notes "can be skipped."¹² CPU-only operation works, but it is slower, with no published timing figure to quote.

Commands may drift; treat the repo README as the source of truth

The commands, flags, and Python requirement here reflect the upstream repository. Version-specific behavior is pinned to the release that introduced it. A tool under active development can change its interface, so treat the README as the source of truth if it disagrees.²

Processing a volume

Mokuro works with a folder-per-volume layout: you point the mokuro command at a directory of page images for a single volume. For one volume, the call is:

mokuro /path/to/manga/vol1

Several volumes can be processed in one call by listing them:

mokuro /path/to/manga/vol1 /path/to/manga/vol2 /path/to/manga/vol3

If the volumes sit under one parent directory, point Mokuro at the parent instead. The README's structure example is:

manga_title/
├─vol1/
├─vol2/
├─vol3/
└─vol4/

Then run:

mokuro --parent_dir manga_title/

This is a one-time, offline step per volume, done before reading: "All processing is done offline (before reading)."² You pay the processing cost once. After that, you can read as many times as you like with no further OCR.

What Mokuro outputs

The maintained output is the .mokuro file. That format "was introduced in version 0.2.0" and "contains only the OCR results and metadata." In other words, it keeps the recognized text and its positions separate from the images themselves.²

Mokuro also still emits a legacy HTML file for backward compatibility, but the README marks that path as legacy and says it "will not be developed further."² New work should follow the .mokuro plus web-reader path. The HTML output is there for older setups, not for new ones.

A few flags are worth knowing, quoted from the README's option list:²

Flag	What it does
`--force_cpu`	"Force the use of CPU even if CUDA is available."
`--disable_ocr`	"Generate mokuro/HTML files without OCR results."
`--disable_html`	"Disable legacy HTML output."

The reading experience

Once a volume is processed, you read it in a browser. The selectable overlay is what makes ordinary text selection, and therefore hover-dictionary lookup, work on top of the artwork.

Opening a processed volume

For the maintained path, you load the .mokuro file together with the manga images in the web reader at reader.mokuro.app.²⁵ The README says it directly: "Load the .mokuro file together with manga images in web reader."²

The legacy HTML path opens differently. That older HTML output opens directly in a browser. Per the README, "You can transfer the resulting HTML file together with manga images to another device (e.g. your mobile phone) and read there."²

Either way, recognized text appears as a selectable overlay positioned over each panel or bubble. That lets panel navigation and ordinary browser text selection both work over the image.²

Hover-to-lookup with Yomitan

This is the payoff. With selectable text overlaid on the page, a hover dictionary scans it just as it would scan any other web text. Hover a word, and the reading, definition, and pitch appear.² The article "Yomitan (Yomichan): The Hover-Dictionary Workflow" covers that dictionary layer in full.

The dependency runs one direction. Mokuro supplies only the selectable text. It does not carry definitions of its own, so the lookup itself comes from Yomitan or an equivalent dictionary reading that text.²

From lookup to Anki

Once you look up a word over the manga overlay, you can mine it into an Anki card through the dictionary's card-creation flow, just like a word looked up in any other text. The mechanics of that one-click step live in "Yomitan + Anki: One-Click Card Creation". The broader method of building a deck from what you read is covered in "Sentence Mining: Building Your Own Japanese Anki Deck From What You Read".

What Mokuro adds to that workflow is reach. Manga that was previously un-mineable because it was a flat image becomes a source of cards once the overlay is in place.

Good to know

The "Allow access to file URLs" gotcha

The legacy local HTML path has a Chromium-specific catch. When a Mokuro HTML file is opened from a file:// address, the Yomitan extension cannot scan it until you grant file-URL access for the extension.⁶ Yomitan's own documentation states the requirement in its "Scanning local files and PDFs" section: "In order to use Yomitan with local files in Chrome, you must first tick the Allow access to file URLs checkbox for Yomitan on the extensions page."⁶

The hosted web reader at reader.mokuro.app sidesteps this entirely. It is served over https:// rather than file://, so the file-URL toggle never enters the picture.²⁵⁶ The actual checkbox steps are documented in "Yomitan (Yomichan): The Hover-Dictionary Workflow". If you are on the local HTML path, follow them there.

OCR is good, not perfect

manga-ocr is purpose-trained, but its author names clear limits. Occasional errors are normal rather than a sign something is broken. On text length: "OCR supports multi-line text, but the longer the text, the more likely some errors are to occur."³ On handwriting: "It probably won't be able to handle handwritten text though."³

There is also a quirk on empty regions: "The model always attempts to recognize some text on the image, even if there is none ... it might even 'dream up' some realistically looking sentences." The author notes this is unlikely to cause practical problems in normal use.³

None of this needs a workaround beyond the overlay's own design. Because the recognized text is selectable, a reader who hits a misrecognized word can re-select it or look it up manually. No accuracy percentage is claimed upstream, so none is asserted here.

What to read with it

Mokuro makes raw manga lookup-able, but it does not choose the manga or rank its difficulty. It is a processing and reading tool, not a recommendation engine.¹² For picking titles at the right level, the article "Manga for Japanese Learners: A Difficulty-Sorted Guide" handles difficulty selection.

References

kha-white. mokuro (source repository). GPL-3.0. https://github.com/kha-white/mokuro ↩ ↩² ↩³ ↩⁴ ↩⁵
kha-white. mokuro README. https://github.com/kha-white/mokuro/blob/master/README.md (raw: https://raw.githubusercontent.com/kha-white/mokuro/master/README.md) ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹ ↩²² ↩²³
kha-white. manga-ocr README. https://github.com/kha-white/manga-ocr/blob/master/README.md ↩ ↩² ↩³ ↩⁴ ↩⁵
kha-white. manga-ocr (source repository). Apache-2.0. https://github.com/kha-white/manga-ocr ↩
mokuro web reader. https://reader.mokuro.app ↩ ↩²
Yomitan documentation. "Scanning local files and PDFs," Getting Started / Basic Usage. https://yomitan.wiki/getting-started/ ↩ ↩² ↩³

Overview​

What Mokuro does​

Where it fits in the reading toolkit​

How the processing pipeline works​

Requirements and install​

Processing a volume​

What Mokuro outputs​

The reading experience​

Opening a processed volume​

Hover-to-lookup with Yomitan​

From lookup to Anki​

Good to know​

The "Allow access to file URLs" gotcha​

OCR is good, not perfect​

What to read with it​

See also​

References​

Footnotes​