Commit Graph

102 Commits

Author SHA1 Message Date
Prince Canuma
95d7c81b20 Remove deprecated stubs for video conversion and generation; introduce new weight conversion and generation scripts for Wan2.2 models in MLX. 2026-03-18 17:20:36 +01:00
Prince Canuma
7b9d0a5e44 Merge branch 'main' into pc/unify-apis 2026-03-18 17:14:17 +01:00
Prince Canuma
fea0f87df9 Fix token handling in LTX-2 text encoder by directly appending response tokens to the generated tokens list, improving clarity and consistency in token generation. 2026-03-18 13:50:33 +01:00
Prince Canuma
f5e311a77c Update default values for STG and modality scales in LTX-2 video generation; enhance help descriptions for command-line arguments 2026-03-18 12:17:47 +01:00
Prince Canuma
f8e371e9ce Enhance upsampler weight detection logic in LTX-2 model; improve clarity in comments and streamline spatial scale determination for x1.5 and x2 formats 2026-03-17 15:14:57 +01:00
Prince Canuma
57f66bcae2 Add custom spatial upscaling support to LTX-2 video generation; introduce spatial_upscaler parameter and enhance resolution handling for two-stage pipelines 2026-03-17 02:23:47 +01:00
Prince Canuma
cc302d79b0 Refactor comments and optimize key skipping logic in LTX-2 model conversion; improve clarity in code documentation 2026-03-17 00:39:52 +01:00
Prince Canuma
643f250195 Update README.md with installation instructions, supported models, and usage examples; add new LTX-2 model documentation for pipelines and features. 2026-03-16 23:03:05 +01:00
Prince Canuma
f9880a0683 Add audio encoder sanitization and configuration inference to LTX-2 model conversion process; update conversion print statements for new encoder step 2026-03-16 22:35:27 +01:00
Prince Canuma
7a576bfbf4 Refactor weight loading and utility functions for LTX-2 model; remove deprecated weight loading file and update imports accordingly 2026-03-16 22:25:22 +01:00
Prince Canuma
dd573d53d2 Refactor audio VAE directory structure and update related paths in conversion and loading functions 2026-03-16 21:53:37 +01:00
Prince Canuma
a6a6bb2166 Move weight loading functions to a new file for better organization and maintainability 2026-03-16 17:28:06 +01:00
Prince Canuma
3a0da19adb Refactor LTX-2 model structure 2026-03-16 14:50:01 +01:00
Prince Canuma
6f6105b715 Add audio to video conditioning 2026-03-16 01:42:11 +01:00
Prince Canuma
f53b9e0807 Add Dev Two-Stage HQ pipeline mode 2026-03-16 00:34:13 +01:00
Prince Canuma
df81bc852f fix save tensors 2026-03-15 23:08:12 +01:00
Prince Canuma
cecd68197c fix tiling, rope precision and weights 2026-03-15 22:58:55 +01:00
Prince Canuma
ebcd5dd4e4 optimize memory usage by batching weight updates 2026-03-15 03:12:47 +01:00
Prince Canuma
53bae534e7 fix LTX-2.3 audio 2026-03-15 02:06:35 +01:00
Prince Canuma
eb0d1355e4 Fix LTX-2.3 decoder grainy bug 2026-03-14 21:56:03 +01:00
Prince Canuma
5644492f7d Update generate.py to enhance denoising functionality with optional Spatiotemporal Guidance (STG) support. Modify DEFAULT_NEGATIVE_PROMPT for improved clarity and detail. Implement auto-detection of STG blocks based on transformer configuration. Refactor denoise_dev function to incorporate STG parameters, allowing for more flexible audio-visual integration during video generation. 2026-03-14 20:02:42 +01:00
Prince Canuma
ffe271699a Refactor LoRA loading for v2.3 in generate.py to prioritize distilled-lora files over full model weights, enhancing model flexibility. Update key sanitization logic to utilize a replacement list for improved readability and maintainability. Modify denoise_dev_av function to include sigma parameters for audio and video modalities, ensuring consistent handling of latent variables during processing. Adjust Vocoder weight loading to allow for non-strict loading, accommodating additional keys in model weights. 2026-03-14 15:24:50 +01:00
Prince Canuma
9cba2ea7cd Enhance README.md with new usage examples for STG and modality scale parameters in video generation. Update generate.py to support STG and modality guidance in the denoising process, allowing for improved audio-visual integration. Refactor attention mechanisms in the transformer to include options for skipping self-attention, facilitating STG perturbation and modality isolation. Update LTXModel and transformer block processing to accommodate new parameters for enhanced flexibility in model configurations. 2026-03-14 10:26:12 +01:00
Prince Canuma
f346e09de4 Refactor audio handling in generate_video function to preserve stage 1 audio latents during stage 2 processing. Remove redundant audio re-denoising steps, ensuring audio integrity while refining video output. Update comments for clarity on audio processing logic. 2026-03-13 16:09:07 +01:00
Prince Canuma
387d4fc301 improve dev color and quality 2026-03-13 09:51:24 +01:00
Prince Canuma
835ba33202 Enhance README.md with detailed descriptions of LTX-2 features, pipeline options, and usage examples for text-to-video, image-to-video, and audio-video generation. Update generate.py to improve LoRA loading functionality, allowing for local files, directories, or HuggingFace repos. This update improves flexibility in model configurations and enhances user guidance in the documentation. 2026-03-13 01:39:39 +01:00
Prince Canuma
7435facc52 Add support for DEV_TWO_STAGE pipeline and implement LoRA merging functionality in generate.py. Enhance video generation capabilities by allowing LoRA weights to be loaded and merged into the model, improving flexibility in model configurations. Update pipeline handling to accommodate the new two-stage generation process. 2026-03-13 01:22:45 +01:00
Prince Canuma
e0aafd72fc Refactor generate.py to ensure temporal coordinates and position grids are processed in bfloat16 for consistency with PyTorch's precision behavior. Update denoise_dev_av function to apply standard ratio rescaling for audio and video guidance, enhancing numerical fidelity and model compatibility. 2026-03-12 21:26:38 +01:00
Prince Canuma
b07b1e3213 Update .gitignore to exclude additional configuration and model files. Modify generate.py to enhance console output with rescale parameter and adjust default values for inference steps and CFG scale. Refactor text encoder to align positional embedding max position with PyTorch defaults, improving compatibility and performance. 2026-03-12 17:13:43 +01:00
Prince Canuma
d1fa47722b Fix timestep_conditioning logic in infer_vae_decoder_config to ensure consistent behavior based on has_timestep flag. 2026-03-11 18:30:29 +01:00
Daniel
33dd3c2edd Revert small change to mlx_video/generate.py 2026-03-11 12:41:44 +01:00
Daniel
281750f0a9 Revert changes to existing files by copying some code. 2026-03-11 12:35:47 +01:00
Daniel
ae410f3121 Update Wan2.1/Wan2.2 README.md 2026-03-11 12:24:59 +01:00
Daniel
c144c8817c refactor(wan): move causal_temporal tiling to wan/tiling.py
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-11 12:02:54 +01:00
Daniel
1cf878f5e0 More poodles 2026-03-11 09:24:06 +01:00
Daniel
d207275fea fix(wan): Fix scheduler sigma schedule and add debug flags 2026-03-11 09:18:01 +01:00
Daniel
afd15018b7 chore: Cleanup -- reorganize README and docs 2026-03-11 09:17:25 +01:00
Daniel
061ae4407c feat(wan): Add chunked VAE encoding and TI2V-5B support 2026-03-11 09:16:52 +01:00
Daniel
9bdda9f22e feat(wan): Add tiled VAE decoding and fix TI2V quality 2026-03-11 09:16:22 +01:00
Daniel
9597b7c9c5 perf(wan): Add mx.compile and fix first-frame artifacts 2026-03-11 09:14:43 +01:00
Daniel
849cc45d84 feat(wan): Add LoRA with improved quantization pipeline 2026-03-11 09:13:20 +01:00
Daniel
dbab95ec45 fix(wan): Fix RoPE frequency construction 2026-03-11 09:12:19 +01:00
Daniel
f4195f0118 feat(wan): Add I2V-14B dual-model support 2026-03-11 09:12:19 +01:00
Daniel
2bb95c61ed feat(wan): Add Wan2.2 I2V support 2026-03-11 09:08:10 +01:00
Daniel
93da550f65 feat(wan): Add DPM++ 2M and UniPC schedulers 2026-03-11 09:08:10 +01:00
Daniel
e64483a66a feat(wan): Add Wan2.1/2.2 T2V with quantization support 2026-03-11 09:08:10 +01:00
Prince Canuma
207c223354 Add LTX-2.3 model architecture with prompt-conditioned adaptive layer normalization (adaln) support. Introduce gating mechanisms in attention modules and update transformer configurations to accommodate new parameters. Refactor video and audio processing to utilize adaptive normalization, improving model flexibility and performance. Update weight loading and initialization logic to support dynamic block structures in the decoder. 2026-03-10 16:47:36 +01:00
Prince Canuma
d028b239fb Update LTX conversion script to support LTX-2.3 safetensors format. Enhance documentation and improve file matching logic for variant detection in local directories. 2026-03-10 08:01:26 +01:00
Prince Canuma
576e01da14 Implement linking of text encoder and tokenizer directories in conversion process. Enhance error handling in LTX2TextEncoder for tokenizer loading, providing a fallback model if the specified path is unavailable. 2026-03-09 18:25:32 +01:00
Prince Canuma
41ed62f7e8 Add LTX-2 conversion script for safetensors to MLX directory layout. Implement modular structure 2026-03-09 18:16:20 +01:00