Skip to main content

Solo Roleplay in Japanese: Restaurant, Interview, and Phone Call

Solo roleplay Japanese conversation practice is a drill where you play both sides of a fixed scenario by yourself: you and the waiter, you and the interviewer, or you and the person on the phone. You script the exchange from a known phrase set, perform both roles out loud, then move beyond the script by extemporizing and varying it.

This is the capstone of solo practice. It combines the pressure of forming your own sentences in self-talk with the turn structure of a real exchange. It also leans on the conversation-pattern articles for the actual phrases, rather than handing you a fixed dialogue to memorize.

Overview

Most published roleplay material assumes a partner: a tutor, a class, or a paid app that plays the other role. The method here removes that dependency. You generate the script yourself from phrases you already have, speak both turns, and use a recording as your only honest feedback.

The method is the same across scenarios; the scenarios differ only in register and difficulty. Three are worth starting with: the restaurant, the job interview, and the phone call.

Why solo two-role roleplay works

It forces both halves of a real exchange

Producing language, not just receiving comprehensible input, helps drive second-language development. Swain advanced this output hypothesis in deliberate contrast to an input-only account. She drew on Canadian French immersion learners who received years of rich input yet still showed off-target production compared with same-age native speakers.1

Output performs a noticing function. While producing language, a learner runs into the gap between what they want to say and what they can yet say, which can prompt them to notice the missing form.2 It also lets the learner test a hypothesis and reflect on the result.2

Self-talk produces only your own line. Roleplay also forces you to produce the prompt you must then answer. That means you rehearse the comprehension-then-response cycle, not just a monologue.

Comprehensible output is the technical anchor

"Comprehensible output," sometimes called "pushed output," is the standard term in the literature. It is usually set against Krashen's comprehensible-input account.1 The two-role drill combines several mechanisms, rather than reproducing a setup validated by any single study.

It is the safest place to fail

L2 learners use private speech and inner speech as mental rehearsal. One core function of this covert speech is gaining control over one's own performance, most visibly when learners edit their language before performing it aloud.34

The documented functions of private and inner speech include planning, self-evaluation, storage and retrieval, self-instruction, and language play.4 These are the same self-control operations that a no-stakes solo rehearsal exercises.

A solo drill carries no social cost and allows infinite retries. You can edit before performing, retry without penalty, and self-instruct, which is exactly the rehearsal-and-control function the research documents.

Where it sits among the solo methods

Skill acquisition theory frames the solo methods as deliberate practice activities. They move declarative knowledge toward proceduralized, then automatic, performance; different practice modalities build different skill-specific procedures.56

Seen in that frame, the four solo methods divide cleanly. Self-talk is free narration with no fixed frame. Journaling is slow written output. AI conversation is a partner that talks back and improvises. Solo roleplay is you scripting and performing a bounded scenario from end to end.

Solo roleplay is the capstone because it pulls in the others: the demand of forming sentences in self-talk, the deliberate editing of journaling, and the turn structure that AI conversation supplies through a partner.

The four-step method: script, perform, extemporize, vary

The loop has four steps. Run them in order, then repeat with variation. Each pass tightens the script and pushes more of it from conscious recall into automatic performance.

Step 1: Build the script from a phrase set

In skill acquisition theory, learners begin in a declarative stage, working consciously from rules and facts. Procedural knowledge then develops through practice.5 A written script is exactly this declarative scaffold: the starting artifact rather than the end goal.56

Do not invent the phrases here. Pull both sides from the article that owns the scenario: the restaurant and phone phrase sets from their conversation-pattern articles, and the self-introduction and keigo material from the articles that own them.

Write both roles. Script the customer, applicant, or caller line, and the staff, interviewer, or recipient line. Keep it short: a beginning, a middle, and an end.

Step 2: Perform both sides out loud

Performing aloud externalizes the rehearsal so it can be heard and corrected. Learners edit and gain control of performance through this externalized speech.34 Proceduralization requires actually performing the behavior, not just knowing the rule. Reading both roles aloud is the step that proceduralizes the script.5

Read at speaking speed, not reading speed. Switch voice, posture, or seat between the two roles so they stay distinct. This is your rehearsal pass.

Step 3: Extemporize off-script

Once the script is fluent, force a deviation. The waiter is out of the dish, the interviewer asks an unscripted follow-up, or the line drops mid-call. When you get stuck, paraphrase instead of stopping.

Repeating a task while varying its content frees cognitive resources for formulation and articulation. This supports more fluent, extended production; the planned deviation is that task variation.7 The hypothesis-testing function of output matters most when you must reformulate on the fly rather than recite. Under that pressure, the gap between intended and available output surfaces.2

Step 4: Vary and re-run

Swap register where it fits. Swap the variable too: a different dish, a different interview question, or a different reason for the call. Then re-run. Repeating the task with a swapped variable is the documented mechanism by which task repetition generalizes a pattern rather than fossilizing one dialogue. Across re-runs, resources shift from content retrieval to fluent formulation.7

Automatization comes from intensive, repeated practice that turns proceduralized routines into fast, effortless ones. Re-running with variation is that intensive practice.56

Record yourself playing both sides

Why recording is non-negotiable here

Speakers monitor their own speech through both the voice they produce and an internal speech representation. Auditory feedback of one's own voice is an established channel for monitoring, including higher-level semantic monitoring.8

The noticing hypothesis holds that a learner must consciously notice a feature for it to become intake.8 Recording and replaying your output creates a second, deliberate pass. In that pass, you can consciously notice dropped particles, hesitations, or register breaks in a way that is hard to manage in real time while still formulating.8

You cannot hear your own hesitations as you make them. In a partnerless drill, the recording is the only honest feedback you get.

The record-both-sides loop

Record the whole two-role take. Play it back, mark the spots where you hesitated or broke register, then re-run just those spots. Output's noticing function, plus deliberate replay, supplies the self-feedback the drill would otherwise lack.28

This section covers only the two-role-take version of the loop. The pronunciation and prosody side of the recording belongs to the record-and-compare loop. The shadowing material is the script-then-perform precursor.

Three scenarios to start with

The three scenarios share one method and differ in register and difficulty. The table maps each scenario to the roles you speak, where the phrases come from, and what to drill.

ScenarioYour two rolesPhrase sourceDrill focus
Restaurant (レストラン / 注文)Customer and staffRestaurant articlePredictable turn structure
Job interview (面接)Applicant and interviewerSelf-intro and business-phrase articlesです/ます plus keigo, 御社 aloud
Phone call (電話)Caller and recipientPhone-call articleOpenings, closings, message-taking with no visual channel

Restaurant (レストラン / 注文)

The restaurant is the lowest-stakes scenario because its turn structure is predictable: greet, order, confirm, pay. A bounded, predictable frame makes a scenario a good first task for task-repetition practice. The fixed frame lets you repeat and vary one slot at a time.7

This section teaches how to turn a restaurant phrase set into a two-role script, not the phrases themselves. Pull the actual lines from the article that covers ordering phrases, then assign them to the customer and the staff roles.

Job interview (面接)

The interview is the highest-register scenario. The baseline is です/ます plus keigo, never casual form. If you drill an interview script in plain form, you train the wrong register for the one scenario where register is graded.910 The recurring interviewer prompts worth scripting are 自己紹介 (self-introduction), 志望動機 (reason for applying), and 長所・短所 (strengths and weaknesses).

The self-introduction frame and the keigo phrasing are covered in their own articles. This section routes you to them rather than reproducing them. The one register fact worth teaching here in depth is the choice between 御社 and 貴社.

The 御社 / 貴社 distinction

御社 (おんしゃ) is the spoken form for "your esteemed company." It is used aloud, in interviews and phone calls (話し言葉, spoken language).11 貴社 (きしゃ) is the written form, used in documents, résumés, and email (書き言葉, written language).11

The split exists because of homophony, when different words sound the same. きしゃ collides with several common words pronounced the same but written differently, including 記者 ("reporter"), 汽車 ("steam train"), and 帰社 ("returning to the office"). Because of that, 貴社 risks being misheard when spoken and is reserved for writing.11

Drill 御社 aloud, not 貴社

Using the written form in speech is generally intelligible, but in a hiring context it reads as a business-manner lapse.11 Because an interview roleplay is a spoken drill, rehearse 御社.

A single constructed line shows the register in context. It is not from a phrase set. It is built to display 御社 and the polite baseline.

御社おんしゃ志望しぼうした理由りゆうは、語学力ごがくりょくかせる環境かんきょうだからです。9
"The reason I applied to your company is that it is an environment where I can make use of my language skills." (Constructed example.)

Phone call (電話)

The phone call is the hardest of the three because there is no body language. You must hold the openings, closings, and any message-taking in your head with no visual channel to lean on. Holding turn structure without a visual channel raises cognitive load. This is exactly where prior scripting and task repetition pay off most, because they free resources for formulation.7

This section teaches scripting both the caller and the recipient. Pull the actual phone phrases from the article that covers them, then split them across the two roles and run the same four-step loop.

Good to know

Do not memorize a fixed script

A script is the declarative scaffold of skill acquisition. It is meant to be proceduralized and then discarded as automatic performance takes over.56 Treating the memorized dialogue as the goal stops at the declarative stage and collapses the moment reality deviates.

The extemporize step is the point of the whole drill. Fluency comes from repeated, varied performance, not from reciting one fixed text.7

Interview register is a trap

The interview is the one scenario where register is graded, and three errors recur. The first is using the written 貴社 in speech where the spoken 御社 belongs. Drilling the wrong one aloud trains a manner error.11 The second is drilling the script in casual form, when the baseline is です/ます plus keigo.910

The third is over-correcting into 二重敬語 (double keigo): applying the same type of honorific to one word twice to sound more polite. The 文化庁 (Agency for Cultural Affairs) keigo guidance states this is generally not appropriate (一般に適切ではない), though a few set forms such as お召し上がりになる and お見えになる are conventionally accepted.9 Piling on honorifics sounds worse than clean です/ます plus a single honorific layer.

For "reads," the doubled お読みになられる layers ~られる on top of the already honorific お…になる. The single-layer form is correct:

みになる9
"reads" (honorific, single layer; お読みになられる over-marks it.)

Voicing both roles is a feature, not a gimmick

Learners use private and inner speech to plan and to control performance.34 Physically separating the two roles by seat, voice, or posture externalizes that control. It keeps the drill from collapsing back into single-voice self-narration and trains turn-taking instead.

The seat-and-voice switch is a technique, not a flourish. Without it, both roles blur into the same flat narration, and the turn-taking the drill is meant to build never gets exercised.

This is a fallback, not a replacement

Procedural knowledge is skill-specific, and training in one modality does not automatically transfer to another.56 Solo roleplay is self-scripted and self-paced. It does not substitute for the interactive modality of a real or AI partner who improvises the other role; it closes the gap until that partner is available.

Pair it with AI conversation practice, where a partner improvises the other side, and with real output whenever you can get it. Solo roleplay is the bridge, not the destination.

See also

References

Footnotes

  1. Swain, Merrill. "Communicative competence: Some roles of comprehensible input and comprehensible output in its development." In S. Gass & C. Madden (Eds.), Input in Second Language Acquisition, pp. 235–253. Rowley, MA: Newbury House, 1985. 2

  2. Swain, Merrill, and Sharon Lapkin. "Problems in Output and the Cognitive Processes They Generate: A Step Towards Second Language Learning." Applied Linguistics 16, no. 3 (1995): 371–391. 2 3 4

  3. de Guerrero, María C. M. Inner Speech – L2: Thinking Words in a Second Language. New York: Springer, 2005. (Educational Linguistics series.) 2 3

  4. de Guerrero, María C. M. "Going covert: Inner and private speech in language learning." Language Teaching 51, no. 1 (2018): 1–35. Cambridge University Press. 2 3 4

  5. DeKeyser, Robert M., and Yuichi Suzuki. "Skill acquisition theory." In B. VanPatten, G. D. Keating, & S. Wulff (Eds.), Theories in Second Language Acquisition: An Introduction, 4th ed., pp. 157–182. New York: Routledge, 2025. 2 3 4 5 6 7

  6. DeKeyser, Robert M. "Automatization, Skill Acquisition, and Practice in Second Language Acquisition." In C. A. Chapelle (Ed.), The Encyclopedia of Applied Linguistics. Oxford: Wiley-Blackwell, 2013. 2 3 4 5

  7. Bygate, Martin (Ed.). Learning Language through Task Repetition. Task-Based Language Teaching series, vol. 11. Amsterdam: John Benjamins, 2018. 2 3 4 5

  8. Schmidt, Richard W. "The role of consciousness in second language learning." Applied Linguistics 11, no. 2 (1990): 129–158. Oxford University Press. 2 3 4

  9. 文化審議会. 『敬語の指針』. 文化庁, 2007 (平成19年2月2日答申). https://www.bunka.go.jp/seisaku/bunkashingikai/sokai/sokai_6/pdf/keigo_tousin.pdf 2 3 4 5

  10. 文化庁. 「敬語おもしろ相談室」(敬語解説ページ). https://www.bunka.go.jp/seisaku/kokugo_nihongo/kokugo_shisaku/keigo/index.html 2

  11. Indeed Japan. 「御社と貴社はメールでどちらを使うのが正解?それぞれの違いを解説」. https://jp.indeed.com/career-advice/career-development/which-is-correct-to-use-onsya-or-kisya-in-email (limitation: recruitment-advice outlet; used only for the spoken-vs-written usage convention, which is widely and consistently attested.) 2 3 4 5