n0p/mlx-video

Fork 0

Go to file

Prince Canuma 957093c29b use numpy for improved float64 precision and performance

2026-01-14 00:03:00 +01:00

examples

Replace poodles.mp4 with poodles.gif in examples directory

2026-01-12 17:14:12 +01:00

mlx_video

use numpy for improved float64 precision and performance

2026-01-14 00:03:00 +01:00

.gitignore

fix git ignore

2026-01-12 16:35:41 +01:00

.pre-commit-config.yaml

Add pre-commit configuration for code formatting and linting with Black, isort, and autoflake

2026-01-12 16:47:34 +01:00

LICENSE

Initial commit

2025-05-07 12:21:09 +02:00

pyproject.toml

Add frame number validation in video generation and update Gemma3 text encoder to use validated mlx-vlm implementation

2026-01-13 17:12:11 +01:00

README.md

Revise README for text-to-video generation example

2026-01-12 17:21:54 +01:00

uv.lock

Add frame number validation in video generation and update Gemma3 text encoder to use validated mlx-vlm implementation

2026-01-13 17:12:11 +01:00

README.md

mlx-video

MLX-Video is the best package for inference and finetuning of Image-Video-Audio generation models on your Mac using MLX.

Installation

Install from source:

Option 1: Install with pip (requires git):

pip install git+https://github.com/Blaizzy/mlx-video.git

Option 2: Install with uv (ultra-fast package manager, optional):

uv pip install git+https://github.com/Blaizzy/mlx-video.git

Supported models:

LTX-2

LTX-2 is 19B parameter video generation model from Lightricks

Features

Text-to-video generation with the LTX-2 19B DiT model
Two-stage generation pipeline for high-quality output
2x spatial upscaling for images and videos
Optimized for Apple Silicon using MLX

Usage

ℹ️ Info: Currently, only the distilled variant is supported. Full LTX-2 feature support is coming soon.

Text-to-Video Generation

uv run mlx_video.generate --prompt "Two dogs of the poodle breed wearing sunglasses, close up, cinematic, sunset" -n 100 --width 768

With custom settings:

python -m mlx_video.generate \
    --prompt "Ocean waves crashing on a beach at sunset" \
    --height 768 \
    --width 768 \
    --num-frames 65 \
    --seed 123 \
    --output my_video.mp4

CLI Options

Option	Default	Description
`--prompt`, `-p`	(required)	Text description of the video
`--height`, `-H`	512	Output height (must be divisible by 64)
`--width`, `-W`	512	Output width (must be divisible by 64)
`--num-frames`, `-n`	100	Number of frames
`--seed`, `-s`	42	Random seed for reproducibility
`--fps`	24	Frames per second
`--output`, `-o`	output.mp4	Output video path
`--save-frames`	false	Save individual frames as images
`--model-repo`	Lightricks/LTX-2	HuggingFace model repository

How It Works

The pipeline uses a two-stage generation process:

Stage 1: Generate at half resolution (e.g., 384x384) with 8 denoising steps
Upsample: 2x spatial upsampling via LatentUpsampler
Stage 2: Refine at full resolution (e.g., 768x768) with 3 denoising steps
Decode: VAE decoder converts latents to RGB video

Requirements

macOS with Apple Silicon
Python >= 3.11
MLX >= 0.22.0

Model Specifications

Transformer: 48 layers, 32 attention heads, 128 dim per head
Latent channels: 128
Text encoder: Gemma 3 with 3840-dim output
RoPE: Split mode with double precision

Project Structure

mlx_video/
├── generate.py             # Video generation pipeline
├── convert.py              # Weight conversion (PyTorch -> MLX)
├── postprocess.py          # Video post-processing utilities
├── utils.py                # Helper functions
└── models/
    └── ltx/
        ├── ltx.py          # Main LTXModel (DiT transformer)
        ├── config.py       # Model configuration
        ├── transformer.py  # Transformer blocks
        ├── attention.py    # Multi-head attention with RoPE
        ├── text_encoder.py # Text encoder
        ├── upsampler.py    # 2x spatial upsampler
        └── video_vae/      # VAE encoder/decoder

License

MIT

README.md Unescape Escape

mlx-video

Installation

Option 1: Install with pip (requires git):

Option 2: Install with uv (ultra-fast package manager, optional):

LTX-2

Features

Usage

Text-to-Video Generation

CLI Options

How It Works

Requirements

Model Specifications

Project Structure

License

README.md