Add fully local conversational AI pipeline for Reachy Mini

Local STT (Qwen3-ASR), VLM (Gemma 4 26B-A4B), and TTS (Spark-TTS) running on Apple Silicon via MLX, with bracket-tag action system for nod, shake, wiggle, dance, photo, and pre-recorded emotions.
2026-05-12 09:24:02 +02:00
parent 3a8a8e3145
commit 5a04a7133a
12 changed files with 4074 additions and 0 deletions
--- a/speak.sh
+++ b/speak.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+# Make Reachy Mini speak a sentence via local Spark-TTS on the Mac,
+# then play it back through the robot's speaker over SSH.
+#
+# Usage: ./speak.sh "Hello world"
+
+TEXT="${1:-Hello, I am Reachy Mini.}"
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+
+# Generate audio with Spark-TTS (mlx-audio) on the Mac.
+"$SCRIPT_DIR/.venv/bin/python" -m mlx_audio.tts.generate \
+    --model mlx-community/Spark-TTS-0.5B-bf16 \
+    --text "$TEXT" \
+    --file_prefix /tmp/reachy_speech
+
+# Copy to the robot and play it through its speaker.
+sshpass -p 'root' scp -o StrictHostKeyChecking=no /tmp/reachy_speech_000.wav pollen@reachy-mini.local:/tmp/speech.wav
+sshpass -p 'root' ssh -o StrictHostKeyChecking=no pollen@reachy-mini.local "/venvs/mini_daemon/bin/python -c \"
+import time
+from reachy_mini import ReachyMini
+with ReachyMini() as mini:
+    mini.media.play_sound('/tmp/speech.wav')
+    time.sleep(10)
+\""
+
+echo "Spoke: $TEXT"