Refactor and remove Wan2.1/2.2 model files; update README.md to include new model features and usage instructions for LTX-2 and Wan2 models.

2026-03-18 17:34:57 +01:00
parent 95d7c81b20
commit 3e33172c12
20 changed files with 137 additions and 72 deletions
--- a/mlx_video/models/wan2/README.md
+++ b/mlx_video/models/wan2/README.md
@@ -70,7 +70,7 @@ The conversion script auto-detects the model version from the directory structur
 #### Wan2.1 T2V 1.3B

 ```bash
-python -m mlx_video.convert_wan \
+python -m mlx_video.wan2.convert \
    --checkpoint-dir ./Wan2.1-T2V-1.3B \
    --output-dir ./Wan2.1-T2V-1.3B-MLX
 ```
@@ -78,7 +78,7 @@ python -m mlx_video.convert_wan \
 #### Wan2.1 T2V 14B

 ```bash
-python -m mlx_video.convert_wan \
+python -m mlx_video.wan2.convert \
    --checkpoint-dir ./Wan2.1-T2V-14B \
    --output-dir ./Wan2.1-T2V-14B-MLX
 ```
@@ -86,7 +86,7 @@ python -m mlx_video.convert_wan \
 #### Wan2.2 T2V 14B

 ```bash
-python -m mlx_video.convert_wan \
+python -m mlx_video.wan2.convert \
    --checkpoint-dir ./Wan2.2-T2V-A14B \
    --output-dir ./Wan2.2-T2V-A14B-MLX
 ```
@@ -94,7 +94,7 @@ python -m mlx_video.convert_wan \
 #### Wan2.2 I2V 14B

 ```bash
-python -m mlx_video.convert_wan \
+python -m mlx_video.wan2.convert \
    --checkpoint-dir ./Wan2.2-I2V-A14B \
    --output-dir ./Wan2.2-I2V-A14B-MLX
 ```
@@ -104,7 +104,7 @@ The I2V model is auto-detected from `config.json`; the output will include a `va
 #### Wan2.2 TI2V 5B

 ```bash
-python -m mlx_video.convert_wan \
+python -m mlx_video.wan2.convert \
    --checkpoint-dir ./Wan2.2-TI2V-5B \
    --output-dir ./Wan2.2-TI2V-5B-MLX
 ```
@@ -144,7 +144,7 @@ wan_mlx/
 #### Wan2.1 T2V 1.3B

 ```bash
-python -m mlx_video.generate_wan \
+python -m mlx_video.wan2.gemer \
    --model-dir ./Wan2.1-T2V-1.3B-MLX \
    --prompt "A cat playing piano in a cozy living room, cinematic lighting" \
    --width 832 --height 480 --num-frames 81 \
@@ -156,7 +156,7 @@ python -m mlx_video.generate_wan \
 #### Wan2.1 T2V 14B

 ```bash
-python -m mlx_video.generate_wan \
+python -m mlx_video.wan2.gemer \
    --model-dir ./Wan2.1-T2V-14B-MLX \
    --prompt "A woman walks through a misty forest at dawn, slow motion, cinematic" \
    --width 1280 --height 704 --num-frames 81 \
@@ -172,7 +172,7 @@ python -m mlx_video.generate_wan \
 Wan2.2 uses a dual-model pipeline (separate high-noise and low-noise transformers) and takes guidance as a `high,low` pair:

 ```bash
-python -m mlx_video.generate_wan \
+python -m mlx_video.wan2.generate \
    --model-dir ./Wan2.2-T2V-A14B-MLX \
    --prompt "Two astronauts playing chess on the surface of the moon, dramatic lighting, 8K" \
    --negative-prompt "low quality, blurry, distorted" \
@@ -189,7 +189,7 @@ python -m mlx_video.generate_wan \
 Image-to-video: animates a starting image guided by a text prompt. Pass the image with `--image`:

 ```bash
-python -m mlx_video.generate_wan \
+python -m mlx_video.wan2.generate \
    --model-dir ./Wan2.2-I2V-A14B-MLX \
    --image ./my_photo.png \
    --prompt "The person slowly turns their head and smiles, cinematic, natural lighting" \
@@ -207,7 +207,7 @@ python -m mlx_video.generate_wan \
 Text+image-to-video: a single-model variant with a larger VAE (`z_dim=48`). Resolution must be divisible by **32** (not 16 as with other models):

 ```bash
-python -m mlx_video.generate_wan \
+python -m mlx_video.wan2.generate \
    --model-dir ./Wan2.2-TI2V-5B-MLX \
    --image ./my_photo.png \
    --prompt "The subject waves hello, warm sunlight, film grain" \
@@ -251,27 +251,27 @@ Quantize the transformer weights to reduce memory usage by ~3.4×. Quantization

 ```bash
 # Convert with 4-bit quantization (works for any variant)
-python -m mlx_video.convert_wan \
+python -m mlx_video.wan2.convert \
    --checkpoint-dir ./Wan2.1-T2V-1.3B \
    --output-dir ./Wan2.1-T2V-1.3B-MLX-Q4 \
    --quantize --bits 4 --group-size 64

-python -m mlx_video.convert_wan \
+python -m mlx_video.wan2.convert \
    --checkpoint-dir ./Wan2.1-T2V-14B \
    --output-dir ./Wan2.1-T2V-14B-MLX-Q4 \
    --quantize --bits 4 --group-size 64

-python -m mlx_video.convert_wan \
+python -m mlx_video.wan2.convert \
    --checkpoint-dir ./Wan2.2-T2V-A14B \
    --output-dir ./Wan2.2-T2V-A14B-MLX-Q4 \
    --quantize --bits 4 --group-size 64

-python -m mlx_video.convert_wan \
+python -m mlx_video.wan2.convert \
    --checkpoint-dir ./Wan2.2-I2V-A14B \
    --output-dir ./Wan2.2-I2V-A14B-MLX-Q4 \
    --quantize --bits 4 --group-size 64

-python -m mlx_video.convert_wan \
+python -m mlx_video.wan2.convert \
    --checkpoint-dir ./Wan2.2-TI2V-5B \
    --output-dir ./Wan2.2-TI2V-5B-MLX-Q4 \
    --quantize --bits 4 --group-size 64
@@ -280,7 +280,7 @@ python -m mlx_video.convert_wan \
 You can also quantize an already-converted MLX model without re-converting from PyTorch:

 ```bash
-python -m mlx_video.convert_wan \
+python -m mlx_video.wan2.convert \
    --checkpoint-dir ./Wan2.2-T2V-A14B-MLX \
    --output-dir ./Wan2.2-T2V-A14B-MLX-Q4 \
    --quantize-only --bits 4
@@ -289,7 +289,7 @@ python -m mlx_video.convert_wan \
 Quantized models are used exactly the same way — the quantization is auto-detected from `config.json`:

 ```bash
-python -m mlx_video.generate_wan \
+python -m mlx_video.wan2.generate \
    --model-dir ./Wan2.2-T2V-A14B-MLX-Q4 \
    --prompt "A cat playing piano"
 ```
@@ -330,7 +330,7 @@ LoRA's can be used with the `--lora-high` and `--lora-low` command line switches
 For example, for using the the distilled [Wan2.2-Lightning](https://huggingface.co/lightx2v/Wan2.2-Lightning) LoRA, use the following command. Lightning speeds up generation by using only 4 steps and a CFG scale of 1.

 ```bash
-python -m mlx_video.generate_wan \
+python -m mlx_video.wan2.generate \
    --model-dir /Volumes/SSD/Wan-AI/Wan2.2-T2V-A14B-MLX \
    --width 480 \
    --height 704 \
--- a/mlx_video/models/wan2/init.py
+++ b/mlx_video/models/wan2/init.py
--- a/mlx_video/models/wan2/attention.py
+++ b/mlx_video/models/wan2/attention.py
--- a/mlx_video/models/wan2/config.py
+++ b/mlx_video/models/wan2/config.py
--- a/mlx_video/models/wan2/convert.py
+++ b/mlx_video/models/wan2/convert.py
--- a/mlx_video/models/wan2/docs/DIAGNOSTICS.md
+++ b/mlx_video/models/wan2/docs/DIAGNOSTICS.md
--- a/mlx_video/models/wan2/docs/IMPLEMENTATION_NOTES.md
+++ b/mlx_video/models/wan2/docs/IMPLEMENTATION_NOTES.md
--- a/mlx_video/models/wan2/generate.py
+++ b/mlx_video/models/wan2/generate.py
--- a/mlx_video/models/wan2/i2v_utils.py
+++ b/mlx_video/models/wan2/i2v_utils.py
--- a/mlx_video/models/wan2/loading.py
+++ b/mlx_video/models/wan2/loading.py
--- a/mlx_video/models/wan2/model.py
+++ b/mlx_video/models/wan2/model.py
--- a/mlx_video/models/wan2/postprocess.py
+++ b/mlx_video/models/wan2/postprocess.py
--- a/mlx_video/models/wan2/rope.py
+++ b/mlx_video/models/wan2/rope.py
--- a/mlx_video/models/wan2/scheduler.py
+++ b/mlx_video/models/wan2/scheduler.py
--- a/mlx_video/models/wan2/text_encoder.py
+++ b/mlx_video/models/wan2/text_encoder.py
--- a/mlx_video/models/wan2/tiling.py
+++ b/mlx_video/models/wan2/tiling.py
--- a/mlx_video/models/wan2/transformer.py
+++ b/mlx_video/models/wan2/transformer.py
--- a/mlx_video/models/wan2/vae.py
+++ b/mlx_video/models/wan2/vae.py
--- a/mlx_video/models/wan2/vae22.py
+++ b/mlx_video/models/wan2/vae22.py