Expand README: foreground the prompt-personalization workflow

The household context and tuning lessons are the main thing here. Add a dedicated section showing what to edit, why each rule exists, and how the emotion library plugs into the prompt at runtime.
2026-05-12 09:48:21 +02:00
parent 5a04a7133a
commit d18e22cd2a
1 changed files with 40 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -80,9 +80,47 @@ The model can emit bracketed tags anywhere in its reply. They are parsed out, ex
 The emotion library is listed in the system prompt at startup so the model can pick by name. To avoid overuse, the runtime drops `[emotion:...]` tags whenever the user's utterance does not include an emotion trigger word (`happy`, `sad`, `look ...`, etc.). Configure the trigger list in `LocalConversationStream.EMOTION_TRIGGERS`.
-## Customizing the personality
+## Personalizing the prompt — this is where the project lives
-The system prompt lives at the top of `conversation.py` (`SYSTEM_PROMPT`). It follows the same structure as Pollen Robotics' official prompt (identity / response rules / tools) and currently bakes in a household context (names, location). Edit those lines to match your environment — or wire it to read from a config file.
+A generic robot says "I am an AI assistant. How can I help you?" A robot that knows your house says "The dogs are out back, the kids should be home from hockey around six." This section is the difference, and it's the part that actually took time to get right.
 The full `SYSTEM_PROMPT` lives at the top of `conversation.py`. It follows the same shape Pollen Robotics use in their official conversation app — IDENTITY / RESPONSE RULES / CORE TRAITS / HOUSEHOLD CONTEXT / BEHAVIOR RULES / TOOL & MOVEMENT RULES / FINAL REMINDER — but tuned for a local model that has no realtime memory and no native tool calling.
 ### What you edit
 Open `conversation.py` and find the `## HOUSEHOLD CONTEXT` block. Replace the placeholder with your household:
 ```markdown
 ## HOUSEHOLD CONTEXT
 You live with a family. Edit this section with the details you want the robot
 to know about your household — for example: who lives there, where roughly,
 ages of kids if relevant, pets, the family's hobbies or shared interests,
 anything else worth remembering.
 You cannot recognize voices or faces — you don't know who is talking.
 Address the speaker as "you"; only use a name if they introduce themselves
 this turn.
 ```
 That single section is the difference between a chatbot and a robot that feels like it belongs in your room.
 ### Lessons learned tuning it
 These quirks are baked into the rest of `SYSTEM_PROMPT` because Gemma kept doing them wrong. Worth keeping when you fork it:
 - **Don't address the speaker by name.** Gemma will latch onto the first name in the household context and start every reply with it. The prompt explicitly forbids name-addressing unless the speaker introduces themselves *this turn*.
 - **Don't project moods or appearance.** With a camera attached, Gemma will hallucinate "you look comfy" or "you seem tired" every turn. The prompt forbids it; the code only passes a frame to the VLM when the speaker uses a *look trigger* word (`see`, `show`, `what color`, `describe`, …).
 - **Don't ask a follow-up question every reply.** Stock LLM behavior is exhausting in voice — every line ends with "What do you think?" The prompt caps follow-ups at roughly one in three turns.
 - **Mirror short acks.** If the speaker says "ok", "mhm", "cool" — the robot replies with a one-word ack of its own, not a probing question.
 - **Default English, follow on switch.** Multilingual is great until the model randomly switches mid-conversation. The rule is: mirror the speaker's language; default to English.
 - **Gate the emotion tool.** The model loves emoting on every reply ("[emotion:curious1] I am Reachy Mini."). Both the prompt and the runtime strip emotion tags unless the user's utterance contains an emotion trigger word — see `EMOTION_TRIGGERS` in `conversation.py`.
 ### Action examples that actually fire
 The tool tags only get used if the prompt shows the model what "good" looks like. The examples in `## RESPONSE EXAMPLES` use `[nod]`, `[shake]`, `[wiggle]` inline — keep them concrete, not abstract.
 ### Where the household context plugs into runtime
 At startup `LocalConversationStream.__init__` loads `pollen-robotics/reachy-mini-emotions-library` and injects the list of available emotion names into the prompt via the `{emotion_list}` placeholder. So the model sees, by name, every emotion it can actually play. If you swap the library, the prompt updates itself.
 ## Helper scripts