n0p/mlx-video

Fork 0

Go to file

Prince Canuma 7b9d0a5e44 Merge branch 'main' into pc/unify-apis

2026-03-18 17:14:17 +01:00

.github

chore: Cleanup -- reorganize README and docs

2026-03-11 09:17:25 +01:00

docs

chore: Cleanup -- reorganize README and docs

2026-03-11 09:17:25 +01:00

examples

More poodles

2026-03-11 09:24:06 +01:00

mlx_video

Merge branch 'main' into pc/unify-apis

2026-03-18 17:14:17 +01:00

scripts/video

feat(wan): Add diagnostic scripts and porting guide

2026-03-11 09:16:22 +01:00

tests

Merge branch 'main' into pc/unify-apis

2026-03-18 17:14:17 +01:00

.gitignore

Update .gitignore to exclude additional configuration and model files. Modify generate.py to enhance console output with rescale parameter and adjust default values for inference steps and CFG scale. Refactor text encoder to align positional embedding max position with PyTorch defaults, improving compatibility and performance.

2026-03-12 17:13:43 +01:00

.pre-commit-config.yaml

Add pre-commit configuration for code formatting and linting with Black, isort, and autoflake

2026-01-12 16:47:34 +01:00

LICENSE

Initial commit

2025-05-07 12:21:09 +02:00

pyproject.toml

Merge branch 'main' into pc/unify-apis

2026-03-18 17:14:17 +01:00

README.md

Merge branch 'main' into pc/unify-apis

2026-03-18 17:14:17 +01:00

uv.lock

Merge branch 'main' into pc/unify-apis

2026-03-18 17:14:17 +01:00

README.md

mlx-video

MLX-Video is the best package for inference and finetuning of Image-Video-Audio generation models on your Mac using MLX.

Installation

Option 1: Install with pip (requires git):

pip install git+https://github.com/Blaizzy/mlx-video.git

Option 2: Install with uv (ultra-fast package manager, optional):

uv pip install git+https://github.com/Blaizzy/mlx-video.git

Supported Models

LTX-2

LTX-2 is 19B parameter video generation model from Lightricks

Features

Text-to-video generation with the LTX-2 19B DiT model
Two-stage generation pipeline for high-quality output
2x spatial upscaling for images and videos
Optimized for Apple Silicon using MLX

Usage

ℹ️ Info: Currently, only the distilled variant is supported. Full LTX-2 feature support is coming soon.

Text-to-Video Generation

# Text-to-Video (distilled, fastest)
uv run mlx_video.generate --prompt "Two dogs wearing sunglasses, cinematic, sunset" -n 97 --width 768

# Image-to-Video
uv run mlx_video.generate --prompt "A person dancing" --image photo.jpg

# Audio-to-Video
uv run mlx_video.generate --audio-file music.wav --prompt "A band playing music"

# Dev pipeline with CFG (higher quality)
uv run mlx_video.generate --pipeline dev --prompt "A cinematic scene" --cfg-scale 3.0

# Dev two-stage HQ (highest quality)
uv run mlx_video.generate --pipeline dev-two-stage-hq \
    --prompt "A cinematic scene of ocean waves at golden hour" \
    --model-repo prince-canuma/LTX-2-dev

Converting weights:

Pre-converted weights are available on HuggingFace (LTX-2-distilled, LTX-2-dev, LTX-2.3-distilled, LTX-2.3-dev), or convert from the original Lightricks checkpoint:

python -m mlx_video.generate \
    --prompt "Ocean waves crashing on a beach at sunset" \
    --height 768 \
    --width 768 \
    --num-frames 65 \
    --seed 123 \
    --output my_video.mp4

CLI Options

Option	Default	Description
`--prompt`, `-p`	(required)	Text description of the video
`--height`, `-H`	512	Output height (must be divisible by 64)
`--width`, `-W`	512	Output width (must be divisible by 64)
`--num-frames`, `-n`	100	Number of frames
`--seed`, `-s`	42	Random seed for reproducibility
`--fps`	24	Frames per second
`--output`, `-o`	output.mp4	Output video path
`--save-frames`	false	Save individual frames as images
`--model-repo`	Lightricks/LTX-2	HuggingFace model repository

How It Works

The pipeline uses a two-stage generation process:

Stage 1: Generate at half resolution (e.g., 384x384) with 8 denoising steps
Upsample: 2x spatial upsampling via LatentUpsampler
Stage 2: Refine at full resolution (e.g., 768x768) with 3 denoising steps
Decode: VAE decoder converts latents to RGB video

Requirements

macOS with Apple Silicon
Python >= 3.11
MLX >= 0.22.0

Model Specifications

Transformer: 48 layers, 32 attention heads, 128 dim per head
Latent channels: 128
Text encoder: Gemma 3 with 3840-dim output
RoPE: Split mode with double precision

Project Structure

mlx_video/
├── generate.py             # Video generation pipeline
├── convert.py              # Weight conversion (PyTorch -> MLX)
├── postprocess.py          # Video post-processing utilities
├── utils.py                # Helper functions
└── models/
    └── ltx/
        ├── ltx.py          # Main LTXModel (DiT transformer)
        ├── config.py       # Model configuration
        ├── transformer.py  # Transformer blocks
        ├── attention.py    # Multi-head attention with RoPE
        ├── text_encoder.py # Text encoder
        ├── upsampler.py    # 2x spatial upsampler
        └── video_vae/      # VAE encoder/decoder

License

MIT

README.md Unescape Escape

mlx-video

Installation

Option 1: Install with pip (requires git):

Option 2: Install with uv (ultra-fast package manager, optional):

Supported Models

LTX-2

Features

Usage

Text-to-Video Generation

CLI Options

How It Works

Requirements

Model Specifications

Project Structure

License

README.md