9ab4826d20
Merge pull request #22 from Blaizzy/pc/unify-apis
main
Prince Canuma
2026-03-18 22:04:21 +01:00
996a542011
Remove Wan2 model files, including configuration, attention mechanisms, and utility functions, to streamline the codebase and eliminate unused components. This cleanup enhances maintainability and focuses on the core functionality of the Wan2 module.
Prince Canuma
2026-03-18 17:59:43 +01:00
b029668cd2
Refactor Wan model structure by renaming and relocating model imports from model.py to wan2.py, enhancing code organization and clarity across the Wan2 module.
Prince Canuma
2026-03-18 17:57:29 +01:00
6c63163671
Refactor Wan model imports and update script paths in pyproject.toml; transition from wan to wan2 module structure for improved organization and clarity.
Prince Canuma
2026-03-18 17:52:30 +01:00
17397da70c
format
Prince Canuma
2026-03-18 17:40:05 +01:00
78bcfba31b
Update README.md to reflect changes in command usage for Wan2.1 and Wan2.2 models, consolidating generation commands under the new mlx_video.wan2 module.
Prince Canuma
2026-03-18 17:38:49 +01:00
3e33172c12
Refactor and remove Wan2.1/2.2 model files; update README.md to include new model features and usage instructions for LTX-2 and Wan2 models.
Prince Canuma
2026-03-18 17:34:57 +01:00
95d7c81b20
Remove deprecated stubs for video conversion and generation; introduce new weight conversion and generation scripts for Wan2.2 models in MLX.
Prince Canuma
2026-03-18 17:20:36 +01:00
7b9d0a5e44
Merge branch 'main' into pc/unify-apis
Prince Canuma
2026-03-18 17:14:17 +01:00
fea0f87df9
Fix token handling in LTX-2 text encoder by directly appending response tokens to the generated tokens list, improving clarity and consistency in token generation.
Prince Canuma
2026-03-18 13:50:33 +01:00
f5e311a77c
Update default values for STG and modality scales in LTX-2 video generation; enhance help descriptions for command-line arguments
Prince Canuma
2026-03-18 12:17:47 +01:00
f8e371e9ce
Enhance upsampler weight detection logic in LTX-2 model; improve clarity in comments and streamline spatial scale determination for x1.5 and x2 formats
Prince Canuma
2026-03-17 15:14:57 +01:00
57f66bcae2
Add custom spatial upscaling support to LTX-2 video generation; introduce spatial_upscaler parameter and enhance resolution handling for two-stage pipelines
Prince Canuma
2026-03-17 02:23:47 +01:00
cc302d79b0
Refactor comments and optimize key skipping logic in LTX-2 model conversion; improve clarity in code documentation
Prince Canuma
2026-03-17 00:39:52 +01:00
643f250195
Update README.md with installation instructions, supported models, and usage examples; add new LTX-2 model documentation for pipelines and features.
Prince Canuma
2026-03-16 23:03:05 +01:00
f9880a0683
Add audio encoder sanitization and configuration inference to LTX-2 model conversion process; update conversion print statements for new encoder step
Prince Canuma
2026-03-16 22:35:27 +01:00
7a576bfbf4
Refactor weight loading and utility functions for LTX-2 model; remove deprecated weight loading file and update imports accordingly
Prince Canuma
2026-03-16 22:25:22 +01:00
dd573d53d2
Refactor audio VAE directory structure and update related paths in conversion and loading functions
Prince Canuma
2026-03-16 21:53:37 +01:00
a6a6bb2166
Move weight loading functions to a new file for better organization and maintainability
Prince Canuma
2026-03-16 17:28:06 +01:00
3a0da19adb
Refactor LTX-2 model structure
Prince Canuma
2026-03-16 14:50:01 +01:00
decb3eb9e5
Add librosa dependency and enhance A2V documentation with additional pipeline options
Prince Canuma
2026-03-16 02:02:13 +01:00
6f6105b715
Add audio to video conditioning
Prince Canuma
2026-03-16 01:42:11 +01:00
f53b9e0807
Add Dev Two-Stage HQ pipeline mode
Prince Canuma
2026-03-16 00:34:13 +01:00
df81bc852f
fix save tensors
Prince Canuma
2026-03-15 23:08:12 +01:00
38d46a6eda
Implement regression tests for RoPE position precision using NumPy float64 reference. Add a new function to compute reference values and validate that float32 results closely match expected outputs, addressing high-frequency amplification issues. Update imports to include LTXModelConfig for enhanced configuration management.
Prince Canuma
2026-03-15 23:00:38 +01:00
cecd68197c
fix tiling, rope precision and weights
Prince Canuma
2026-03-15 22:58:55 +01:00
ebcd5dd4e4
optimize memory usage by batching weight updates
Prince Canuma
2026-03-15 03:12:47 +01:00
53bae534e7
fix LTX-2.3 audio
Prince Canuma
2026-03-15 02:06:35 +01:00
5644492f7d
Update generate.py to enhance denoising functionality with optional Spatiotemporal Guidance (STG) support. Modify DEFAULT_NEGATIVE_PROMPT for improved clarity and detail. Implement auto-detection of STG blocks based on transformer configuration. Refactor denoise_dev function to incorporate STG parameters, allowing for more flexible audio-visual integration during video generation.
Prince Canuma
2026-03-14 20:02:42 +01:00
ffe271699a
Refactor LoRA loading for v2.3 in generate.py to prioritize distilled-lora files over full model weights, enhancing model flexibility. Update key sanitization logic to utilize a replacement list for improved readability and maintainability. Modify denoise_dev_av function to include sigma parameters for audio and video modalities, ensuring consistent handling of latent variables during processing. Adjust Vocoder weight loading to allow for non-strict loading, accommodating additional keys in model weights.
Prince Canuma
2026-03-14 15:24:50 +01:00
9cba2ea7cd
Enhance README.md with new usage examples for STG and modality scale parameters in video generation. Update generate.py to support STG and modality guidance in the denoising process, allowing for improved audio-visual integration. Refactor attention mechanisms in the transformer to include options for skipping self-attention, facilitating STG perturbation and modality isolation. Update LTXModel and transformer block processing to accommodate new parameters for enhanced flexibility in model configurations.
Prince Canuma
2026-03-14 10:26:12 +01:00
f346e09de4
Refactor audio handling in generate_video function to preserve stage 1 audio latents during stage 2 processing. Remove redundant audio re-denoising steps, ensuring audio integrity while refining video output. Update comments for clarity on audio processing logic.
Prince Canuma
2026-03-13 16:09:07 +01:00
387d4fc301
improve dev color and quality
Prince Canuma
2026-03-13 09:51:24 +01:00
835ba33202
Enhance README.md with detailed descriptions of LTX-2 features, pipeline options, and usage examples for text-to-video, image-to-video, and audio-video generation. Update generate.py to improve LoRA loading functionality, allowing for local files, directories, or HuggingFace repos. This update improves flexibility in model configurations and enhances user guidance in the documentation.
Prince Canuma
2026-03-13 01:39:39 +01:00
7435facc52
Add support for DEV_TWO_STAGE pipeline and implement LoRA merging functionality in generate.py. Enhance video generation capabilities by allowing LoRA weights to be loaded and merged into the model, improving flexibility in model configurations. Update pipeline handling to accommodate the new two-stage generation process.
Prince Canuma
2026-03-13 01:22:45 +01:00
e0aafd72fc
Refactor generate.py to ensure temporal coordinates and position grids are processed in bfloat16 for consistency with PyTorch's precision behavior. Update denoise_dev_av function to apply standard ratio rescaling for audio and video guidance, enhancing numerical fidelity and model compatibility.
Prince Canuma
2026-03-12 21:26:38 +01:00
b07b1e3213
Update .gitignore to exclude additional configuration and model files. Modify generate.py to enhance console output with rescale parameter and adjust default values for inference steps and CFG scale. Refactor text encoder to align positional embedding max position with PyTorch defaults, improving compatibility and performance.
Prince Canuma
2026-03-12 17:13:43 +01:00
3618966625
Wan2.1 and Wan2.2 model support, including LoRA support & more Poodles
Prince Canuma
2026-03-11 19:08:14 +01:00
d1fa47722b
Fix timestep_conditioning logic in infer_vae_decoder_config to ensure consistent behavior based on has_timestep flag.
Prince Canuma
2026-03-11 18:30:29 +01:00
33dd3c2edd
Revert small change to mlx_video/generate.py
Daniel
2026-03-11 12:41:44 +01:00
281750f0a9
Revert changes to existing files by copying some code.
Daniel
2026-03-11 12:34:28 +01:00
ae410f3121
Update Wan2.1/Wan2.2 README.md
Daniel
2026-03-11 12:24:59 +01:00
c144c8817c
refactor(wan): move causal_temporal tiling to wan/tiling.py
Daniel
2026-03-11 12:02:54 +01:00
1cf878f5e0
More poodles
Daniel
2026-03-11 08:14:12 +01:00
d207275fea
fix(wan): Fix scheduler sigma schedule and add debug flags
Daniel
2026-03-11 07:52:07 +01:00
afd15018b7
chore: Cleanup -- reorganize README and docs
Daniel
2026-03-11 07:32:35 +01:00
061ae4407c
feat(wan): Add chunked VAE encoding and TI2V-5B support
Daniel
2026-03-09 20:47:37 +01:00
967218b7c1
feat(wan): Add diagnostic scripts and porting guide
Daniel
2026-03-06 20:46:43 +01:00
9bdda9f22e
feat(wan): Add tiled VAE decoding and fix TI2V quality
Daniel
2026-03-04 14:32:45 +01:00
9597b7c9c5
perf(wan): Add mx.compile and fix first-frame artifacts
Daniel
2026-03-01 18:15:25 +01:00
849cc45d84
feat(wan): Add LoRA with improved quantization pipeline
Daniel
2026-02-28 14:11:13 +01:00
dbab95ec45
fix(wan): Fix RoPE frequency construction
Daniel
2026-02-28 11:20:36 +01:00
f4195f0118
feat(wan): Add I2V-14B dual-model support
Daniel
2026-02-27 23:43:42 +01:00
2bb95c61ed
feat(wan): Add Wan2.2 I2V support
Daniel
2026-02-27 13:46:23 +01:00
93da550f65
feat(wan): Add DPM++ 2M and UniPC schedulers
Daniel
2026-02-27 10:28:33 +01:00
e64483a66a
feat(wan): Add Wan2.1/2.2 T2V with quantization support
Daniel
2026-02-26 16:16:07 +01:00
207c223354
Add LTX-2.3 model architecture with prompt-conditioned adaptive layer normalization (adaln) support. Introduce gating mechanisms in attention modules and update transformer configurations to accommodate new parameters. Refactor video and audio processing to utilize adaptive normalization, improving model flexibility and performance. Update weight loading and initialization logic to support dynamic block structures in the decoder.
Prince Canuma
2026-03-10 16:47:36 +01:00
d028b239fb
Update LTX conversion script to support LTX-2.3 safetensors format. Enhance documentation and improve file matching logic for variant detection in local directories.
Prince Canuma
2026-03-10 08:01:26 +01:00
576e01da14
Implement linking of text encoder and tokenizer directories in conversion process. Enhance error handling in LTX2TextEncoder for tokenizer loading, providing a fallback model if the specified path is unavailable.
Prince Canuma
2026-03-09 18:25:32 +01:00
41ed62f7e8
Add LTX-2 conversion script for safetensors to MLX directory layout. Implement modular structure
Prince Canuma
2026-03-09 18:16:20 +01:00
9f37dab076
Refactor model loading in generate.py to use dynamic model paths for audio and video components. Simplify weight loading logic in LTX2TextEncoder to accommodate both monolithic and reformatted model structures. Introduce a check for existing model paths in get_model_path function to enhance robustness.
Prince Canuma
2026-03-09 15:51:21 +01:00
d1dd30cbac
Add Adaptive Projected Guidance (APG) support to denoising functions. Introduce apg_delta function for stable guidance by decomposing into parallel and orthogonal components. Update denoise_dev and generate_video functions to accept APG parameters, enhancing flexibility in video generation. Modify command-line arguments for APG integration.
Prince Canuma
2026-01-26 21:35:58 +01:00
87962c7f83
Enhance precision in denoising functions by ensuring all latents and calculations are consistently handled in float32. Update model input casting and return types to maintain dtype integrity across audio and video processing. Add precision parameter to video generation for improved memory management.
Prince Canuma
2026-01-24 15:40:42 +01:00
cb2d19c84d
fix loading
Prince Canuma
2026-01-24 01:37:38 +01:00
ef76ec0921
add from pretrained
Prince Canuma
2026-01-23 18:13:51 +01:00
ce39e744c3
Refactor VideoEncoder to initialize from VideoEncoderModelConfig, enhancing configuration management. Add methods for weight sanitization and loading from pretrained models, improving model usability and integration with existing workflows.
Prince Canuma
2026-01-23 17:59:57 +01:00
f8f78aeab5
Add LTXModel with a from_pretrained class method for loading model weights from a specified path. Update weight sanitization to handle positional embeddings and dtype consistency. Refactor timestep and context preparation methods to accept hidden_dtype, improving flexibility in model processing.
Prince Canuma
2026-01-23 17:45:50 +01:00
df753312c7
Refactor video generation and model loading processes to utilize from_pretrained methods for VideoEncoder and VideoDecoder. Update denoising functions to include a cfg_rescale parameter for improved artifact reduction. Ensure consistent dtype handling across audio and video processing, enhancing precision and aligning with PyTorch behavior.
Prince Canuma
2026-01-23 17:39:02 +01:00
02bfa228d9
Refactor weight loading and sanitization processes for audio models
Prince Canuma
2026-01-23 17:31:25 +01:00
7a74946c57
Merge pull request #14 from Blaizzy/pc/add-streaming
Prince Canuma
2026-01-21 15:42:55 +01:00
ffdeec72a6
Merge branch 'main' into pc/add-streaming
Prince Canuma
2026-01-21 15:42:16 +01:00
7ad14e18ca
Merge pull request #12 from Blaizzy/pc/add-vae-tiling
Prince Canuma
2026-01-21 15:41:46 +01:00
2681f75d2f
Refactor LTXModel to include a from_pretrained class method for loading and sanitizing model weights. Update generate.py to utilize this method, streamlining the transformer loading process and improving code clarity.
Prince Canuma
2026-01-20 12:56:29 +01:00
bbb3de6aa7
Update audio decoder configuration to disable mid-block attention and ensure audio waveform is converted to float32 for consistency in processing.
Prince Canuma
2026-01-19 17:05:59 +01:00
8a2ea38c88
Refactor denoising functions in generate.py and utils.py to use float32 for improved precision, aligning with PyTorch behavior. Update calculations for latents and denoised outputs to ensure consistent dtype handling across audio and video processing.
Prince Canuma
2026-01-19 09:13:04 +01:00
e0ee934b99
Update video generation completion message to display elapsed time in a more user-friendly format, showing minutes and seconds instead of just seconds.
Prince Canuma
2026-01-19 02:23:51 +01:00
4cd58f8b26
Refactor LTX2TextEncoder to utilize Rich for progress tracking during token generation. Replace tqdm with Rich's Progress for enhanced console output and user experience. Clean up imports and streamline the generation process.
Prince Canuma
2026-01-19 02:13:10 +01:00
ac67ee8b1e
Remove the generate_dev.py file, consolidating its functionality into generate.py. Enhance the video generation pipeline to support both distilled and dev models, integrating dynamic sigma scheduling and classifier-free guidance (CFG) for improved video quality. Update command-line interface to accommodate new pipeline options and refactor related functions for better maintainability.
Prince Canuma
2026-01-19 02:13:00 +01:00
0538af6554
Enhance video generation pipeline by integrating Rich for styled console output and progress tracking. Update dependencies in pyproject.toml to include Rich. Refactor print statements to use console methods for improved user experience during video and audio processing.
Prince Canuma
2026-01-19 01:43:14 +01:00
cae11291a9
Remove the audio-video generation pipeline from generate_av.py and integrate audio capabilities into generate.py. This includes adding audio position grid creation, audio frame computation, and updating the denoising function to handle audio latents. Enhance the command-line interface to support audio generation options and update the model configuration accordingly.
Prince Canuma
2026-01-19 01:28:53 +01:00
749762a0b9
Update audio decoder configuration to use an empty set for attention resolutions in both generate_av.py and generate_dev.py. Add a print statement for loading audio VAE decoder weights in generate_dev.py.
Prince Canuma
2026-01-18 21:55:38 +01:00
7069cc39c9
Add audio generation capabilities to video pipeline, including audio position grid creation, audio frame computation, and integration of audio VAE and vocoder. Update tests to cover new audio functionalities.
Prince Canuma
2026-01-18 21:28:56 +01:00
b36ad1e22d
add tests
Prince Canuma
2026-01-18 11:18:18 +01:00
e483eab039
Optimize positional embedding handling in TransformerArgsPreprocessor and improve RoPE frequency computation in _precompute_freqs_cis_double_precision for enhanced performance and precision.
Prince Canuma
2026-01-18 11:13:32 +01:00
62fc4805a0
Add LTX-2 Dev Model video generation pipeline
Prince Canuma
2026-01-18 11:13:11 +01:00
b1bf9e2dc0
Enhance video generation with progress bar for streaming and remove debug prints from tiling decoder
Prince Canuma
2026-01-17 23:53:53 +01:00
f256c5fb25
add tests
Prince Canuma
2026-01-17 23:36:39 +01:00
7f20840dc7
Add streaming support to video generation
Prince Canuma
2026-01-17 23:17:08 +01:00
f33f496fba
Merge branch 'main' into pc/add-vae-tiling
Prince Canuma
2026-01-17 19:37:21 +01:00
e692b7a6b3
Add i2v
Prince Canuma
2026-01-17 19:37:06 +01:00
785b0b955d
Merge branch 'main' into pc/add-i2v
Prince Canuma
2026-01-17 19:36:28 +01:00
26fa8919ed
Merge pull request #13 from Blaizzy/Blaizzy-patch-1
Prince Canuma
2026-01-17 19:36:14 +01:00
c89de996eb
Update GitHub Sponsors username in FUNDING.yml
Prince Canuma
2026-01-17 19:35:24 +01:00
0669998e15
Add audio support
Prince Canuma
2026-01-17 19:31:21 +01:00
61c56cd989
Add RoPE tests and warning for bfloat16 precision loss in RoPE calculations
Prince Canuma
2026-01-17 19:28:05 +01:00
78244a2d66
Cast dtype to bf16 in video and audio generation processes
Prince Canuma
2026-01-17 17:20:22 +01:00
883c6b0ad8
ensure dtype cast
Prince Canuma
2026-01-17 13:03:48 +01:00
e4cdbb7eab
add vae tiling
Prince Canuma
2026-01-17 07:51:54 +01:00
f607112407
Refactor video and audio latent generation in generate_video and generate_video_with_audio
Prince Canuma
2026-01-17 01:38:53 +01:00