Initial

2026-01-27 19:26:12 +01:00
parent a9d8be7441
commit 53559c06cb
6 changed files with 430 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,176 @@
 # mlx-video-maker

+Generate multi-scene AI videos with seamless transitions using I2V (Image-to-Video) chaining on Apple Silicon.
+
+## Example Output
+
+https://github.com/user-attachments/assets/REPLACE_WITH_VIDEO_ASSET_ID
+
+> Sample scene from "The Local AI Revolution" - see [examples/](examples/) for the video file and [stories/local_ai_revolution.txt](stories/local_ai_revolution.txt) for the full story prompts.
+
+## The Power: LLM + Prompting Guide
+
+The real magic is combining an LLM (Claude, local models, etc.) with the included **[promptguide.md](promptguide.md)**. Feed the guide to your LLM, describe your scene in plain language, and get optimized prompts following best practices:
+
+- Flowing narrative paragraphs (not bullet lists)
+- Present-tense action verbs
+- Explicit camera movements and lens choices
+- Audio cues for synchronized generation
+- The six essential elements for every scene
+
+Example workflow:
+```
+You: "Write me a 5-scene story about a detective investigating an abandoned warehouse"
+LLM: [Uses promptguide.md to craft cinematic, LTX-2 optimized prompts]
+```
+
+The guide covers lens language, shutter terminology, video type strategies, and troubleshooting tips.
+
+## The Technique
+
+```
+Scene 1 (T2V) → Extract Last Frame → Scene 2 (I2V) → Extract Last Frame → Scene 3 (I2V) → ...
+```
+
+Each scene's last frame becomes the input image for the next scene, creating visual continuity across an entire movie. In theory, you could generate hour-long films this way.
+
+## How It Works
+
+1. **First scene**: Text-to-Video generation (no image input)
+2. **Frame extraction**: `ffprobe` counts frames, `ffmpeg` extracts the last frame
+3. **Subsequent scenes**: Image-to-Video with `--image` pointing to the previous scene's last frame
+4. **Final concat**: High-quality `ffmpeg` merge of all scenes
+
+## Quick Start
+
+```bash
+# Clone the repo
+git clone https://github.com/YOUR_USERNAME/mlx-video-maker.git
+cd mlx-video-maker
+
+# Create a story file (one prompt per line)
+cat > stories/my_story.txt << 'EOF'
+# Scene 1
+Wide aerial shot of misty mountains at dawn, camera slowly descending, cinematic, 4K
+
+# Scene 2
+Camera continues revealing a lone hiker on the ridge, steady tracking shot, cinematic, 4K
+
+# Scene 3
+Close-up of hiker's face illuminated by golden sunrise, emotional, cinematic, 4K
+EOF
+
+# Generate the movie
+./generate_story.sh stories/my_story.txt output/
+```
+
+For long generations:
+```bash
+nohup ./generate_story.sh stories/my_story.txt output/ > output/nohup.out 2>&1 &
+tail -f output/nohup.out  # Monitor progress
+```
+
+## Options
+
+```
+./generate_story.sh <story_file> [output_dir] [options]
+
+Options:
+  --width       Video width (default: 1920, must be divisible by 64)
+  --height      Video height (default: 1088, must be divisible by 64)
+  --frames      Frames per scene (default: 121, must satisfy 1 + 8*k)
+  --strength    I2V conditioning strength 0.0-1.0 (default: 0.7)
+  --fps         Output framerate (default: 24)
+  --python      Python executable (default: ./venv/bin/python)
+```
+
+### Image Strength Guide
+
+| Value | Effect |
+|-------|--------|
+| 0.5-0.6 | Strong visual continuity, less motion freedom |
+| **0.7** | **Sweet spot** - balanced continuity and new content |
+| 0.8-0.9 | More variation, potential visual jumps |
+
+## Story File Format
+
+Plain text, one prompt per line:
+- Lines starting with `#` are comments (ignored)
+- Empty lines are ignored
+- Each non-comment line = one scene
+
+**Pro tip**: Use consistent style suffixes across all prompts for visual coherence:
+```
+..., cinematic, nature documentary style, 4K
+```
+
+## Prompt Engineering
+
+See **[promptguide.md](promptguide.md)** for the complete guide. Key points:
+
+- Write flowing narrative paragraphs, not lists
+- Use present-tense verbs: "walks", "turns", "reaches"
+- Specify camera explicitly: "slow dolly forward", "steady tracking shot"
+- Include audio cues: "distant traffic hum", "footsteps echoing"
+- Add consistent style suffixes across all scenes
+
+**Example prompt:**
+```
+A lone fisherman rows across a foggy lake before sunrise, the boat creaking softly
+as water laps at its sides. The camera glides overhead in a slow aerial tracking shot,
+following his steady progress from behind and slightly above. His lantern casts a warm
+circle of light that reflects in gentle ripples, while tall reeds sway on the distant
+shoreline.
+```
+
+## Performance
+
+On M3 Max (128GB RAM):
+- ~20-22 minutes per scene at 1920x1088, 121 frames
+- ~4 hours for a 10-scene movie (~50 seconds)
+- ~75GB peak memory usage
+
+**Scaling**: 720 scenes = 1 hour movie = ~10 days generation time
+
+## Requirements
+
+- Apple Silicon Mac (M1/M2/M3/M4)
+- 64GB+ RAM recommended (32GB minimum at lower resolution)
+- [mlx-video](https://github.com/Blaizzy/mlx-video) with LTX-2 model
+- ffmpeg and ffprobe
+- Python 3.11+
+
+### Installation
+
+```bash
+# Install mlx-video (if not already)
+pip install mlx-video
+
+# Install ffmpeg
+brew install ffmpeg
+
+# Make script executable
+chmod +x generate_story.sh
+```
+
+## Future Ideas
+
+- Scene quality detection using image-to-text models (auto-regenerate poor scenes)
+- Different transition styles (fade, match cut)
+- Branching narratives (generate multiple versions, pick best)
+- Audio continuity chaining
+- Checkpoint recovery for long generations
+
+## Credits
+
+- [LTX-Video (LTX-2)](https://github.com/Lightricks/LTX-Video) by Lightricks - The 2B parameter DiT model that powers the video generation
+- [mlx-video](https://github.com/Blaizzy/mlx-video) by Prince Canuma ([@Blaizzy](https://github.com/Blaizzy)) - MLX port enabling Apple Silicon native inference
+- [MLX](https://github.com/ml-explore/mlx) by Apple - The ML framework for Apple Silicon
+
+## License
+
+MIT
+
+---
+
+*Built with [mlx-video](https://github.com/Blaizzy/mlx-video) and [LTX-2](https://github.com/Lightricks/LTX-Video) on Apple Silicon*