Skip to main content

Aozora Bunko: Free Classic Japanese Literature

Aozora Bunko (青空文庫) is a volunteer-built digital library of copyright-expired Japanese texts, freely readable on the internet.1 For advanced learners, it is the largest free entry point into Japan's literary canon, from Sōseki and Akutagawa to Dazai.

Overview

What Aozora Bunko is

Aozora Bunko describes itself as "インターネット上で自由に読める、電子本のアーカイブ" (an archive of electronic books freely readable on the internet).1 It was started in 1997 and has grown into the standard free source for classic Japanese literature online.1

The corpus is built from two streams: works whose copyright term has expired and entered the public domain, and works the rights-holder has expressly permitted for release.1 The collection runs to tens of thousands of works and keeps growing, as volunteers add titles and more authors clear the public-domain threshold each year.1

Why no single work count appears here

Published snapshots have put the catalog in the rough range of fifteen thousand-plus works, but the figure climbs every year. Treat the size as "tens of thousands" rather than a fixed number, which ages the moment it is printed.

The library is often called the "Project Gutenberg of Japan." That is a useful analogy for a volunteer-digitized public-domain text collection, not a formal equivalence.

Why the great pre-modern authors are all here

A work enters the Japanese public domain when its copyright term expires, after which anyone may legally digitize and redistribute it. Aozora Bunko is, by construction, a public-domain corpus.1

Since 30 December 2018, copyright in Japan runs for 70 years after the author's death, counted from the year following death.23 Before that reform the term was 50 years post-mortem.3 The extension came out of the Trans-Pacific Partnership process and an EU–Japan Economic Partnership Agreement requirement. It was applied to all works at the end of 2018.3

The reform was not retroactive. "Works that had entered the public domain between 1999 and 29 December 2018 (inclusive) due to expiration remained in the public domain."3 Nothing already free in Aozora Bunko was pulled back under copyright. The practical effect of the 50→70 change is a 20-year freeze on new authors entering the public domain, not a clawback.

The 50→70 change is the article's most date-sensitive fact

The term length and its 2018 reform date are the only items here that may change. Death years and the non-revival rule are fixed. If any single fact in this article needs a future refresh, it is this one.

This is why the canonical literary giants are reliably present. Under the old 50-year rule, authors who died in 1967 or earlier had already entered the public domain before the 2018 cutoff (1967 + 50 = 2017, clear before 30 December 2018). They stay free under the non-revival rule.3 The major names all died well before that line:

AuthorDied
夏目漱石 (Natsume Sōseki)19164
森鴎外 (Mori Ōgai)19224
芥川龍之介 (Akutagawa Ryūnosuke)19274
宮沢賢治 (Miyazawa Kenji)19334
中島敦 (Nakajima Atsushi)19424
太宰治 (Dazai Osamu)19484

Authors who died in 1968 or later had not finished their 50-year clock by the end of 2018. They now wait out the full 70 years. The leading edge of newly public authors is effectively paused until the early 2040s.

Who this is for

This is an N1-and-up access guide. It assumes a strong base in modern grammar. The value here is navigation, the orthography map, and tooling, not grammar instruction.

Original-orthography texts layer historical kana and old-form kanji on top of literary vocabulary. That can outrun even a solid N1 grammar base.5 This guide states that difficulty directly and expands on it in "Good to know." A reader still working through contemporary fiction is better served starting with modern literary prose before reaching for pre-modern texts.

The orthography variants: which version to read

Aozora Bunko registers a work in up to three orthographic forms. They are defined along two independent axes: kanji shape (旧字 old-form versus 新字 modern-form) and kana spelling (旧仮名 historical versus 新仮名 modern).67 Because the two axes combine, the three named variants fall out as points on a grid.

The three character forms

FormKanjiKana spellingWhat the reader sees
旧字旧仮名 (kyūji-kyūkana)old-form (e.g. 圓, 佛, 舊)historical (ゐ, ゑ, ~ふ for u, けふ)the text as originally printed; maximal authenticity
新字旧仮名 (shinji-kyūkana)modern-form (円, 仏, 旧)historical kana retaineda middle form: only the kanji are modernized
新字新仮名 (shinji-shinkana)modern-formmodern kanafully modernized; reads like contemporary print

Aozora's conversion guideline sets its baseline as "新字、現代仮名づかいへの変更を基本とする" (modernize to new-form kanji and modern kana as the default). It also adds that "ただし、漢字のみを書き換えた、新字、旧仮名づかいへの変更も拒まない" (a kanji-only modernization, that is 新字旧仮名, is also accepted).6 This is why all three forms coexist.

Which form a given title uses is left to the contributor and the source edition. Not every work exists in all three.16 A reader picks among the forms listed on that title's library card.

The opening of 芥川龍之介『羅生門』 exists in Aozora Bunko in both a 旧字旧仮名 edition and a 新字新仮名 edition. This makes a clean side-by-side contrast.

In the old-form, historical-kana edition:

ひろもんしたには、このおとこほかたれもゐない。8
"Under the wide gate there was no one but this man."

The same sentence in the modern edition:

ひろもんしたには、このおとこのほかにだれもいない。9
"Under the wide gate there was no one but this man."

Two differences carry across the pair: the old-form kanji 廣 becomes 広, and the historical kana ゐ in ゐない becomes い in いない. The first paragraph of the old-form file shows the same pattern again and again: 待つてゐた against modern 待っていた, 云ふ against 云う, 災《わざはひ》 against 災《わざわい》, 圓柱 against 円柱, and 地震《ぢしん》 against じしん.89

Old-form kanji and kana are faithful, not corrupted

A text showing 廣, ゐ, or 〜ふ where you expect 広, い, or 〜う is not a typo or an encoding error. It is the work reproduced as it was originally printed, in 旧字旧仮名.

What changes in historical kana orthography

A handful of concrete shifts explain most of what an N1 reader meets in 旧仮名 texts. The rules below come from the historical-kana reference and from Aozora's own modernization guideline.56

Historical (旧仮名)Modern (新仮名)Note
ゐ / ゑい / えobsolete kana wi/we; today read as i/e5
は・ひ・ふ・へ・ほ (mid-word)わ・い・う・え・おha-row kana spell wa/i/u/e/o inside many words (e.g. 災 わざはひ → わざわい)58
づ / ぢず / じmerged in pronunciation; modern kana usually uses ず/じ5
large つ in がつこうsmall っ in がっこうthe sokuon was written full-size historically5
けふきょう今日 "today"; yōon written with full-size kana historically5
てふちょう蝶 "butterfly"5
〜む (auxiliary, terminal)〜んclassical -mu auxiliary read as -n in modern Japanese5

Historical kana is etymological, not phonetic: it preserves older word shapes and grammar rather than current pronunciation. Grammar takes precedence over pronunciation, so the verb warau "to laugh" is written わらふ, and its volitional waraō is written わらはう.5

An N1 reader should learn to recognize these forms, rather than trying to derive sound from spelling each time. The shifts work as quick teaching pairs:

今日: old kana けふ → modern きょう.
"today"

蝶: old kana てふ → modern ちょう.
"butterfly"

居る: old kana ゐる → modern いる.
"to be (animate)"

These pairs are illustrative, not quotations

The three mappings above are assembled from the table to show the pattern in isolation. For the same forms inside real running text, the 羅生門 old-kana passage earlier carries ゐない, 待つてゐた, わざはひ, and ぢしん exactly as printed.8

The goal here is recognition, not full 古文 (classical grammar): enough to read the text, not to master conjugation tables.

Choosing a version for your goal

For comprehension, SRS, or sentence-mining, pick the 新字新仮名 variant when a title offers one. It matches the modern orthography your dictionaries and SRS expect, so lookups and card-making stay frictionless.6

For authenticity or literary study, the 旧字旧仮名 form preserves the text as printed, with old kanji shapes and historical kana intact. This matters when studying an author's actual orthography or working from the original 底本 (source edition).68

The 新字旧仮名 middle form is the compromise: modern kanji shapes, which are easier to look up, but historical kana retained, so you still meet ゐ, 〜ふ, and けふ.6

There is no single best version. The right choice depends on whether you optimize for lookup-friendliness or for fidelity. Not every title offers all three, so the practical answer is often to take the most modern form that title actually provides.1

How to read it: the browser workflow

Reading on the site directly

Each work is served in two ways: as an XHTML page, with ルビ (furigana) rendered as HTML ruby, and as a plain-text (.txt) file carrying Aozora's own markup notation.107

The plain-text ruby notation wraps the reading in double angle brackets 《 》 attached to the base word, as in 親仁《おやじ》.11 When the base string mixes character types and the parser cannot tell where the ruby attaches, a vertical bar | marks the ruby's starting point, as in 霧の|ロンドン警視庁《スコットランドヤード》.11 Editor and input annotations use a bracket-with-sharp form [# … ], for example [#「喋」に「ママ」の注記], to flag an as-printed or uncertain character.11

The XHTML display pipeline is built "この書き方を前提に動作します" (on the assumption of this notation). The symbols you see in the .txt file are therefore the raw form of what the HTML renders as ruby and notes.10 On the HTML page, that bracket-ruby appears as ordinary furigana over the base word:

メロスは激怒げきどした。かならず、かの邪智暴虐じゃちぼうぎゃくおうのぞかなければならぬと決意けついした。12
"Melos was enraged. He resolved that he must remove that tyrant of vicious cruelty, no matter what."

Here the four-kanji compound 邪智暴虐 takes 《》 ruby with no | needed, because the base is all kanji and the parser can locate the attachment unambiguously.

The site is organized by author and title, not by difficulty, so finding a level-appropriate text is the main friction. A third-party level-sorted gateway exists as a finding aid. Treat it as a navigation convenience rather than part of Aozora itself. Literary prose is conventionally 縦書き (vertical). The HTML page and downstream readers can present text vertically, which is the expected reading direction for these works.

Pairing with a pop-up dictionary

At literary and classical register, a hover (pop-up) dictionary is close to mandatory for sustainable reading. Here, the choice between looking a word up and inferring it tilts hard toward lookup. The Aozora XHTML page is itself a lookup surface: because it is ordinary selectable HTML text, a browser pop-up dictionary can read words straight off the page.13

Yomitan is the browser pop-up-dictionary extension used for this, the successor to Yomichan. It looks up Japanese words on hover or selection over any web page, including Aozora HTML.13 This section explains why to use it and what it does. Installation and configuration belong to a dedicated tooling guide.

Pop-up dictionaries stumble on historical kana

A modern pop-up dictionary is tuned for modern orthography and deinflection. With 旧仮名 text such as ゐる, 待つてゐた, or わらふ, lookups frequently miss or mis-segment, because the historical forms are not the dictionary's headword shapes.5 This is a concrete reason to steer comprehension reading toward the 新字新仮名 variant.

Long-form reading in a browser e-reader

ttu reader (ッツ Ebook Reader, ttu-ttu/ebook-reader) is the standard browser-based e-reader for this workflow. It is an online reader that supports Yomitan and Yomichan pop-up lookups while you read.13

It loads EPUB, plain-text, and HTMLZ files. It also offers furigana toggling, font and theme settings, vertical-text support, a book manager, and per-book progress tracking.13 In effect, it turns a loaded Aozora text into a paginated, resumable long-form reading surface. The workflow runs in three steps, with deeper setup left to a dedicated tooling guide.

The project is described as being in maintenance-only status, not under active development. That is a durable caveat worth stating once, not a version-by-version update.13

Good to know

The furigana you get depends on the source edition

ルビ density in an Aozora text reflects the 底本, the source print edition the volunteer transcribed, not the learner's level. Aozora's notation faithfully reproduces the ruby that was in the source. It does not add pedagogical furigana.107

Two texts can therefore differ widely in furigana coverage simply because their source editions did. A reader cannot assume a given work will be furigana-rich, which is another reason to use a pop-up dictionary.

"Free" means public-domain, which skews the catalog older

Because the catalog is built from copyright-expired works, living and recent authors are largely absent. This is a classical and pre-modern corpus by design, not a contemporary-fiction library.13

The 50→70-year extension in 2018 froze the leading edge. New authors no longer enter the public domain in Japan on the old schedule, so the catalog's newest cohort is paused until the longer term starts maturing in the early 2040s.3

Literary register is a different difficulty axis than JLPT

The JLPT tests modern-standard grammar and vocabulary. Aozora's original-orthography texts add historical kana (歴史的仮名遣い) and old-form kanji (旧字体) that the JLPT never tests, plus literary and archaic verb endings and dense kanji.5

A reader can hold N1 and still stall on 旧仮名 prose, because the obstacle is orthography and literary style, not the grammar the JLPT measured. Treat literary register as a separate axis from JLPT level, not a higher rung on the same ladder.

A name that doubles as a memory hook

The variant names encode their own contents: 新字新仮名 is modern kanji plus modern kana, the lookup-friendly study version; 旧字旧仮名 is old kanji plus old kana, the as-printed authentic version; 新字旧仮名 sits in between. In short: new-new for studying, old-old for the real thing.6

See also

References

Footnotes

  1. 青空文庫. 「青空文庫FAQ」(青空文庫編). https://www.aozora.gr.jp/guide/aozora_bunko_faq.html 2 3 4 5 6 7 8 9

  2. 文化庁. "Frequently Asked Questions concerning the Extension of the Term of Copyright Protection." Agency for Cultural Affairs, Government of Japan. https://www.bunka.go.jp/english/policy/copyright/pdf/93468601_03.pdf

  3. Wikipedia contributors. "Copyright law of Japan." Wikipedia. https://en.wikipedia.org/wiki/Copyright_law_of_Japan 2 3 4 5 6 7

  4. Death years cross-checked against a standard Japanese author chronology (作家・文豪 生没年表). Each year (1916/1922/1927/1933/1942/1948) is independently well attested; for a single citable locator, use 国立国会図書館 (NDL) authority records per author. These six are uncontroversial public-record dates. 2 3 4 5 6

  5. Wikipedia contributors. "Historical kana orthography." Wikipedia. https://en.wikipedia.org/wiki/Historical_kana_orthography 2 3 4 5 6 7 8 9 10 11 12

  6. 青空文庫. 「旧字、旧仮名で書かれた作品を、現代表記にあらためる際の作業指針」. https://www.aozora.gr.jp/KOSAKU/genndaihyouki.html 2 3 4 5 6 7 8

  7. 青空文庫. 「青空文庫作業マニュアル【入力編】」. https://www.aozora.gr.jp/aozora-manual/index-input.html 2 3

  8. 芥川龍之介『羅生門』. 青空文庫(旧字旧仮名版、図書カード No.128). https://www.aozora.gr.jp/cards/000879/files/128_15261.html 2 3 4 5

  9. 芥川龍之介『羅生門』. 青空文庫(新字新仮名版、図書カード No.127). https://www.aozora.gr.jp/cards/000879/files/127_15260.html 2

  10. 青空文庫. 「青空文庫 注記一覧」. https://www.aozora.gr.jp/annotation/ 2 3

  11. 青空文庫. 注記一覧「ルビとルビのように付く文字」. https://www.aozora.gr.jp/annotation/etc.html 2 3

  12. 太宰治『走れメロス』. 青空文庫(新字新仮名版). https://www.aozora.gr.jp/cards/000035/files/1567_14913.html

  13. ttu-ttu. "ebook-reader (ッツ Ebook Reader)." GitHub repository, README. https://github.com/ttu-ttu/ebook-reader 2 3 4 5