Refactor and remove Wan2.1/2.2 model files; update README.md to include new model features and usage instructions for LTX-2 and Wan2 models.

This commit is contained in:
Prince Canuma
2026-03-18 17:34:57 +01:00
parent 95d7c81b20
commit 3e33172c12
20 changed files with 137 additions and 72 deletions

View File

@@ -70,7 +70,7 @@ The conversion script auto-detects the model version from the directory structur
#### Wan2.1 T2V 1.3B
```bash
python -m mlx_video.convert_wan \
python -m mlx_video.wan2.convert \
--checkpoint-dir ./Wan2.1-T2V-1.3B \
--output-dir ./Wan2.1-T2V-1.3B-MLX
```
@@ -78,7 +78,7 @@ python -m mlx_video.convert_wan \
#### Wan2.1 T2V 14B
```bash
python -m mlx_video.convert_wan \
python -m mlx_video.wan2.convert \
--checkpoint-dir ./Wan2.1-T2V-14B \
--output-dir ./Wan2.1-T2V-14B-MLX
```
@@ -86,7 +86,7 @@ python -m mlx_video.convert_wan \
#### Wan2.2 T2V 14B
```bash
python -m mlx_video.convert_wan \
python -m mlx_video.wan2.convert \
--checkpoint-dir ./Wan2.2-T2V-A14B \
--output-dir ./Wan2.2-T2V-A14B-MLX
```
@@ -94,7 +94,7 @@ python -m mlx_video.convert_wan \
#### Wan2.2 I2V 14B
```bash
python -m mlx_video.convert_wan \
python -m mlx_video.wan2.convert \
--checkpoint-dir ./Wan2.2-I2V-A14B \
--output-dir ./Wan2.2-I2V-A14B-MLX
```
@@ -104,7 +104,7 @@ The I2V model is auto-detected from `config.json`; the output will include a `va
#### Wan2.2 TI2V 5B
```bash
python -m mlx_video.convert_wan \
python -m mlx_video.wan2.convert \
--checkpoint-dir ./Wan2.2-TI2V-5B \
--output-dir ./Wan2.2-TI2V-5B-MLX
```
@@ -144,7 +144,7 @@ wan_mlx/
#### Wan2.1 T2V 1.3B
```bash
python -m mlx_video.generate_wan \
python -m mlx_video.wan2.gemer \
--model-dir ./Wan2.1-T2V-1.3B-MLX \
--prompt "A cat playing piano in a cozy living room, cinematic lighting" \
--width 832 --height 480 --num-frames 81 \
@@ -156,7 +156,7 @@ python -m mlx_video.generate_wan \
#### Wan2.1 T2V 14B
```bash
python -m mlx_video.generate_wan \
python -m mlx_video.wan2.gemer \
--model-dir ./Wan2.1-T2V-14B-MLX \
--prompt "A woman walks through a misty forest at dawn, slow motion, cinematic" \
--width 1280 --height 704 --num-frames 81 \
@@ -172,7 +172,7 @@ python -m mlx_video.generate_wan \
Wan2.2 uses a dual-model pipeline (separate high-noise and low-noise transformers) and takes guidance as a `high,low` pair:
```bash
python -m mlx_video.generate_wan \
python -m mlx_video.wan2.generate \
--model-dir ./Wan2.2-T2V-A14B-MLX \
--prompt "Two astronauts playing chess on the surface of the moon, dramatic lighting, 8K" \
--negative-prompt "low quality, blurry, distorted" \
@@ -189,7 +189,7 @@ python -m mlx_video.generate_wan \
Image-to-video: animates a starting image guided by a text prompt. Pass the image with `--image`:
```bash
python -m mlx_video.generate_wan \
python -m mlx_video.wan2.generate \
--model-dir ./Wan2.2-I2V-A14B-MLX \
--image ./my_photo.png \
--prompt "The person slowly turns their head and smiles, cinematic, natural lighting" \
@@ -207,7 +207,7 @@ python -m mlx_video.generate_wan \
Text+image-to-video: a single-model variant with a larger VAE (`z_dim=48`). Resolution must be divisible by **32** (not 16 as with other models):
```bash
python -m mlx_video.generate_wan \
python -m mlx_video.wan2.generate \
--model-dir ./Wan2.2-TI2V-5B-MLX \
--image ./my_photo.png \
--prompt "The subject waves hello, warm sunlight, film grain" \
@@ -251,27 +251,27 @@ Quantize the transformer weights to reduce memory usage by ~3.4×. Quantization
```bash
# Convert with 4-bit quantization (works for any variant)
python -m mlx_video.convert_wan \
python -m mlx_video.wan2.convert \
--checkpoint-dir ./Wan2.1-T2V-1.3B \
--output-dir ./Wan2.1-T2V-1.3B-MLX-Q4 \
--quantize --bits 4 --group-size 64
python -m mlx_video.convert_wan \
python -m mlx_video.wan2.convert \
--checkpoint-dir ./Wan2.1-T2V-14B \
--output-dir ./Wan2.1-T2V-14B-MLX-Q4 \
--quantize --bits 4 --group-size 64
python -m mlx_video.convert_wan \
python -m mlx_video.wan2.convert \
--checkpoint-dir ./Wan2.2-T2V-A14B \
--output-dir ./Wan2.2-T2V-A14B-MLX-Q4 \
--quantize --bits 4 --group-size 64
python -m mlx_video.convert_wan \
python -m mlx_video.wan2.convert \
--checkpoint-dir ./Wan2.2-I2V-A14B \
--output-dir ./Wan2.2-I2V-A14B-MLX-Q4 \
--quantize --bits 4 --group-size 64
python -m mlx_video.convert_wan \
python -m mlx_video.wan2.convert \
--checkpoint-dir ./Wan2.2-TI2V-5B \
--output-dir ./Wan2.2-TI2V-5B-MLX-Q4 \
--quantize --bits 4 --group-size 64
@@ -280,7 +280,7 @@ python -m mlx_video.convert_wan \
You can also quantize an already-converted MLX model without re-converting from PyTorch:
```bash
python -m mlx_video.convert_wan \
python -m mlx_video.wan2.convert \
--checkpoint-dir ./Wan2.2-T2V-A14B-MLX \
--output-dir ./Wan2.2-T2V-A14B-MLX-Q4 \
--quantize-only --bits 4
@@ -289,7 +289,7 @@ python -m mlx_video.convert_wan \
Quantized models are used exactly the same way — the quantization is auto-detected from `config.json`:
```bash
python -m mlx_video.generate_wan \
python -m mlx_video.wan2.generate \
--model-dir ./Wan2.2-T2V-A14B-MLX-Q4 \
--prompt "A cat playing piano"
```
@@ -330,7 +330,7 @@ LoRA's can be used with the `--lora-high` and `--lora-low` command line switches
For example, for using the the distilled [Wan2.2-Lightning](https://huggingface.co/lightx2v/Wan2.2-Lightning) LoRA, use the following command. Lightning speeds up generation by using only 4 steps and a CFG scale of 1.
```bash
python -m mlx_video.generate_wan \
python -m mlx_video.wan2.generate \
--model-dir /Volumes/SSD/Wan-AI/Wan2.2-T2V-A14B-MLX \
--width 480 \
--height 704 \