Go to file

Norbert Schmidt 02b8c27835 Upgrade to LTX-2.3 with audio generation

- Switch from mlx_video.generate_av to mlx_video.models.ltx_2.generate
- Use prince-canuma/LTX-2.3-distilled model with google/gemma-3-12b-it text encoder
- Add --audio flag for joint audio-video generation
- Add auto-background execution with nohup logging
- Add CLAUDE.md and test stories

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-31 13:55:39 +02:00

examples

Add example

2026-01-27 19:27:25 +01:00

output

Initial

2026-01-27 19:26:12 +01:00

stories

Upgrade to LTX-2.3 with audio generation

2026-03-31 13:55:39 +02:00

.gitattributes

Initial commit

2026-01-27 19:16:05 +01:00

.gitignore

Add example

2026-01-27 19:27:25 +01:00

CLAUDE.md

Upgrade to LTX-2.3 with audio generation

2026-03-31 13:55:39 +02:00

generate_story.sh

Upgrade to LTX-2.3 with audio generation

2026-03-31 13:55:39 +02:00

promptguide.md

Create promptguide.md

2026-01-27 19:26:18 +01:00

README.md

Initial

2026-01-27 19:26:12 +01:00

requirements.txt

Initial

2026-01-27 19:26:12 +01:00

README.md

mlx-video-maker

Generate multi-scene AI videos with seamless transitions using I2V (Image-to-Video) chaining on Apple Silicon.

Example Output

https://github.com/user-attachments/assets/REPLACE_WITH_VIDEO_ASSET_ID

Sample scene from "The Local AI Revolution" - see examples/ for the video file and stories/local_ai_revolution.txt for the full story prompts.

The Power: LLM + Prompting Guide

The real magic is combining an LLM (Claude, local models, etc.) with the included promptguide.md. Feed the guide to your LLM, describe your scene in plain language, and get optimized prompts following best practices:

Flowing narrative paragraphs (not bullet lists)
Present-tense action verbs
Explicit camera movements and lens choices
Audio cues for synchronized generation
The six essential elements for every scene

Example workflow:

You: "Write me a 5-scene story about a detective investigating an abandoned warehouse"
LLM: [Uses promptguide.md to craft cinematic, LTX-2 optimized prompts]

The guide covers lens language, shutter terminology, video type strategies, and troubleshooting tips.

The Technique

Scene 1 (T2V) → Extract Last Frame → Scene 2 (I2V) → Extract Last Frame → Scene 3 (I2V) → ...

Each scene's last frame becomes the input image for the next scene, creating visual continuity across an entire movie. In theory, you could generate hour-long films this way.

How It Works

First scene: Text-to-Video generation (no image input)
Frame extraction: ffprobe counts frames, ffmpeg extracts the last frame
Subsequent scenes: Image-to-Video with --image pointing to the previous scene's last frame
Final concat: High-quality ffmpeg merge of all scenes

Quick Start

# Clone the repo
git clone https://github.com/YOUR_USERNAME/mlx-video-maker.git
cd mlx-video-maker

# Create a story file (one prompt per line)
cat > stories/my_story.txt << 'EOF'
# Scene 1
Wide aerial shot of misty mountains at dawn, camera slowly descending, cinematic, 4K

# Scene 2
Camera continues revealing a lone hiker on the ridge, steady tracking shot, cinematic, 4K

# Scene 3
Close-up of hiker's face illuminated by golden sunrise, emotional, cinematic, 4K
EOF

# Generate the movie
./generate_story.sh stories/my_story.txt output/

For long generations:

nohup ./generate_story.sh stories/my_story.txt output/ > output/nohup.out 2>&1 &
tail -f output/nohup.out  # Monitor progress

Options

./generate_story.sh <story_file> [output_dir] [options]

Options:
  --width       Video width (default: 1920, must be divisible by 64)
  --height      Video height (default: 1088, must be divisible by 64)
  --frames      Frames per scene (default: 121, must satisfy 1 + 8*k)
  --strength    I2V conditioning strength 0.0-1.0 (default: 0.7)
  --fps         Output framerate (default: 24)
  --python      Python executable (default: ./venv/bin/python)

Image Strength Guide

Value	Effect
0.5-0.6	Strong visual continuity, less motion freedom
0.7	Sweet spot - balanced continuity and new content
0.8-0.9	More variation, potential visual jumps

Story File Format

Plain text, one prompt per line:

Lines starting with # are comments (ignored)
Empty lines are ignored
Each non-comment line = one scene

Pro tip: Use consistent style suffixes across all prompts for visual coherence:

..., cinematic, nature documentary style, 4K

Prompt Engineering

See promptguide.md for the complete guide. Key points:

Write flowing narrative paragraphs, not lists
Use present-tense verbs: "walks", "turns", "reaches"
Specify camera explicitly: "slow dolly forward", "steady tracking shot"
Include audio cues: "distant traffic hum", "footsteps echoing"
Add consistent style suffixes across all scenes

Example prompt:

A lone fisherman rows across a foggy lake before sunrise, the boat creaking softly
as water laps at its sides. The camera glides overhead in a slow aerial tracking shot,
following his steady progress from behind and slightly above. His lantern casts a warm
circle of light that reflects in gentle ripples, while tall reeds sway on the distant
shoreline.

Performance

On M3 Max (128GB RAM):

~20-22 minutes per scene at 1920x1088, 121 frames
~4 hours for a 10-scene movie (~50 seconds)
~75GB peak memory usage

Scaling: 720 scenes = 1 hour movie = ~10 days generation time

Requirements

Apple Silicon Mac (M1/M2/M3/M4)
64GB+ RAM recommended (32GB minimum at lower resolution)
mlx-video with LTX-2 model
ffmpeg and ffprobe
Python 3.11+

Installation

# Install mlx-video (if not already)
pip install mlx-video

# Install ffmpeg
brew install ffmpeg

# Make script executable
chmod +x generate_story.sh

Future Ideas

Scene quality detection using image-to-text models (auto-regenerate poor scenes)
Different transition styles (fade, match cut)
Branching narratives (generate multiple versions, pick best)
Audio continuity chaining
Checkpoint recovery for long generations

Credits

LTX-Video (LTX-2) by Lightricks - The 2B parameter DiT model that powers the video generation
mlx-video by Prince Canuma (@Blaizzy) - MLX port enabling Apple Silicon native inference
MLX by Apple - The ML framework for Apple Silicon

License

MIT

Built with mlx-video and LTX-2 on Apple Silicon