mlx-video

n0p/mlx-video

Fork 0

9ab4826d20 Merge pull request #22 from Blaizzy/pc/unify-apis main Prince Canuma 2026-03-18 22:04:21 +01:00
996a542011 Remove Wan2 model files, including configuration, attention mechanisms, and utility functions, to streamline the codebase and eliminate unused components. This cleanup enhances maintainability and focuses on the core functionality of the Wan2 module. Prince Canuma 2026-03-18 17:59:43 +01:00
b029668cd2 Refactor Wan model structure by renaming and relocating model imports from model.py to wan2.py, enhancing code organization and clarity across the Wan2 module. Prince Canuma 2026-03-18 17:57:29 +01:00
6c63163671 Refactor Wan model imports and update script paths in pyproject.toml; transition from wan to wan2 module structure for improved organization and clarity. Prince Canuma 2026-03-18 17:52:30 +01:00
17397da70c format Prince Canuma 2026-03-18 17:40:05 +01:00
78bcfba31b Update README.md to reflect changes in command usage for Wan2.1 and Wan2.2 models, consolidating generation commands under the new mlx_video.wan2 module. Prince Canuma 2026-03-18 17:38:49 +01:00
3e33172c12 Refactor and remove Wan2.1/2.2 model files; update README.md to include new model features and usage instructions for LTX-2 and Wan2 models. Prince Canuma 2026-03-18 17:34:57 +01:00
95d7c81b20 Remove deprecated stubs for video conversion and generation; introduce new weight conversion and generation scripts for Wan2.2 models in MLX. Prince Canuma 2026-03-18 17:20:36 +01:00
7b9d0a5e44 Merge branch 'main' into pc/unify-apis Prince Canuma 2026-03-18 17:14:17 +01:00
fea0f87df9 Fix token handling in LTX-2 text encoder by directly appending response tokens to the generated tokens list, improving clarity and consistency in token generation. Prince Canuma 2026-03-18 13:50:33 +01:00
f5e311a77c Update default values for STG and modality scales in LTX-2 video generation; enhance help descriptions for command-line arguments Prince Canuma 2026-03-18 12:17:47 +01:00
f8e371e9ce Enhance upsampler weight detection logic in LTX-2 model; improve clarity in comments and streamline spatial scale determination for x1.5 and x2 formats Prince Canuma 2026-03-17 15:14:57 +01:00
57f66bcae2 Add custom spatial upscaling support to LTX-2 video generation; introduce spatial_upscaler parameter and enhance resolution handling for two-stage pipelines Prince Canuma 2026-03-17 02:23:47 +01:00
cc302d79b0 Refactor comments and optimize key skipping logic in LTX-2 model conversion; improve clarity in code documentation Prince Canuma 2026-03-17 00:39:52 +01:00
643f250195 Update README.md with installation instructions, supported models, and usage examples; add new LTX-2 model documentation for pipelines and features. Prince Canuma 2026-03-16 23:03:05 +01:00
f9880a0683 Add audio encoder sanitization and configuration inference to LTX-2 model conversion process; update conversion print statements for new encoder step Prince Canuma 2026-03-16 22:35:27 +01:00
7a576bfbf4 Refactor weight loading and utility functions for LTX-2 model; remove deprecated weight loading file and update imports accordingly Prince Canuma 2026-03-16 22:25:22 +01:00
dd573d53d2 Refactor audio VAE directory structure and update related paths in conversion and loading functions Prince Canuma 2026-03-16 21:53:37 +01:00
a6a6bb2166 Move weight loading functions to a new file for better organization and maintainability Prince Canuma 2026-03-16 17:28:06 +01:00
3a0da19adb Refactor LTX-2 model structure Prince Canuma 2026-03-16 14:50:01 +01:00
decb3eb9e5 Add librosa dependency and enhance A2V documentation with additional pipeline options Prince Canuma 2026-03-16 02:02:13 +01:00
6f6105b715 Add audio to video conditioning Prince Canuma 2026-03-16 01:42:11 +01:00
f53b9e0807 Add Dev Two-Stage HQ pipeline mode Prince Canuma 2026-03-16 00:34:13 +01:00
df81bc852f fix save tensors Prince Canuma 2026-03-15 23:08:12 +01:00
38d46a6eda Implement regression tests for RoPE position precision using NumPy float64 reference. Add a new function to compute reference values and validate that float32 results closely match expected outputs, addressing high-frequency amplification issues. Update imports to include LTXModelConfig for enhanced configuration management. Prince Canuma 2026-03-15 23:00:38 +01:00
cecd68197c fix tiling, rope precision and weights Prince Canuma 2026-03-15 22:58:55 +01:00
ebcd5dd4e4 optimize memory usage by batching weight updates Prince Canuma 2026-03-15 03:12:47 +01:00
53bae534e7 fix LTX-2.3 audio Prince Canuma 2026-03-15 02:06:35 +01:00
eb0d1355e4 Fix LTX-2.3 decoder grainy bug Prince Canuma 2026-03-14 21:56:03 +01:00
5644492f7d Update generate.py to enhance denoising functionality with optional Spatiotemporal Guidance (STG) support. Modify DEFAULT_NEGATIVE_PROMPT for improved clarity and detail. Implement auto-detection of STG blocks based on transformer configuration. Refactor denoise_dev function to incorporate STG parameters, allowing for more flexible audio-visual integration during video generation. Prince Canuma 2026-03-14 20:02:42 +01:00
ffe271699a Refactor LoRA loading for v2.3 in generate.py to prioritize distilled-lora files over full model weights, enhancing model flexibility. Update key sanitization logic to utilize a replacement list for improved readability and maintainability. Modify denoise_dev_av function to include sigma parameters for audio and video modalities, ensuring consistent handling of latent variables during processing. Adjust Vocoder weight loading to allow for non-strict loading, accommodating additional keys in model weights. Prince Canuma 2026-03-14 15:24:50 +01:00
9cba2ea7cd Enhance README.md with new usage examples for STG and modality scale parameters in video generation. Update generate.py to support STG and modality guidance in the denoising process, allowing for improved audio-visual integration. Refactor attention mechanisms in the transformer to include options for skipping self-attention, facilitating STG perturbation and modality isolation. Update LTXModel and transformer block processing to accommodate new parameters for enhanced flexibility in model configurations. Prince Canuma 2026-03-14 10:26:12 +01:00
f346e09de4 Refactor audio handling in generate_video function to preserve stage 1 audio latents during stage 2 processing. Remove redundant audio re-denoising steps, ensuring audio integrity while refining video output. Update comments for clarity on audio processing logic. Prince Canuma 2026-03-13 16:09:07 +01:00
387d4fc301 improve dev color and quality Prince Canuma 2026-03-13 09:51:24 +01:00
835ba33202 Enhance README.md with detailed descriptions of LTX-2 features, pipeline options, and usage examples for text-to-video, image-to-video, and audio-video generation. Update generate.py to improve LoRA loading functionality, allowing for local files, directories, or HuggingFace repos. This update improves flexibility in model configurations and enhances user guidance in the documentation. Prince Canuma 2026-03-13 01:39:39 +01:00
7435facc52 Add support for DEV_TWO_STAGE pipeline and implement LoRA merging functionality in generate.py. Enhance video generation capabilities by allowing LoRA weights to be loaded and merged into the model, improving flexibility in model configurations. Update pipeline handling to accommodate the new two-stage generation process. Prince Canuma 2026-03-13 01:22:45 +01:00
e0aafd72fc Refactor generate.py to ensure temporal coordinates and position grids are processed in bfloat16 for consistency with PyTorch's precision behavior. Update denoise_dev_av function to apply standard ratio rescaling for audio and video guidance, enhancing numerical fidelity and model compatibility. Prince Canuma 2026-03-12 21:26:38 +01:00
b07b1e3213 Update .gitignore to exclude additional configuration and model files. Modify generate.py to enhance console output with rescale parameter and adjust default values for inference steps and CFG scale. Refactor text encoder to align positional embedding max position with PyTorch defaults, improving compatibility and performance. Prince Canuma 2026-03-12 17:13:43 +01:00
3618966625 Wan2.1 and Wan2.2 model support, including LoRA support & more Poodles Prince Canuma 2026-03-11 19:08:14 +01:00
d1fa47722b Fix timestep_conditioning logic in infer_vae_decoder_config to ensure consistent behavior based on has_timestep flag. Prince Canuma 2026-03-11 18:30:29 +01:00
33dd3c2edd Revert small change to mlx_video/generate.py Daniel 2026-03-11 12:41:44 +01:00
281750f0a9 Revert changes to existing files by copying some code. Daniel 2026-03-11 12:34:28 +01:00
ae410f3121 Update Wan2.1/Wan2.2 README.md Daniel 2026-03-11 12:24:59 +01:00
c144c8817c refactor(wan): move causal_temporal tiling to wan/tiling.py Daniel 2026-03-11 12:02:54 +01:00
1cf878f5e0 More poodles Daniel 2026-03-11 08:14:12 +01:00
d207275fea fix(wan): Fix scheduler sigma schedule and add debug flags Daniel 2026-03-11 07:52:07 +01:00
afd15018b7 chore: Cleanup -- reorganize README and docs Daniel 2026-03-11 07:32:35 +01:00
061ae4407c feat(wan): Add chunked VAE encoding and TI2V-5B support Daniel 2026-03-09 20:47:37 +01:00
967218b7c1 feat(wan): Add diagnostic scripts and porting guide Daniel 2026-03-06 20:46:43 +01:00
9bdda9f22e feat(wan): Add tiled VAE decoding and fix TI2V quality Daniel 2026-03-04 14:32:45 +01:00
9597b7c9c5 perf(wan): Add mx.compile and fix first-frame artifacts Daniel 2026-03-01 18:15:25 +01:00
849cc45d84 feat(wan): Add LoRA with improved quantization pipeline Daniel 2026-02-28 14:11:13 +01:00
dbab95ec45 fix(wan): Fix RoPE frequency construction Daniel 2026-02-28 11:20:36 +01:00
f4195f0118 feat(wan): Add I2V-14B dual-model support Daniel 2026-02-27 23:43:42 +01:00
2bb95c61ed feat(wan): Add Wan2.2 I2V support Daniel 2026-02-27 13:46:23 +01:00
93da550f65 feat(wan): Add DPM++ 2M and UniPC schedulers Daniel 2026-02-27 10:28:33 +01:00
e64483a66a feat(wan): Add Wan2.1/2.2 T2V with quantization support Daniel 2026-02-26 16:16:07 +01:00
207c223354 Add LTX-2.3 model architecture with prompt-conditioned adaptive layer normalization (adaln) support. Introduce gating mechanisms in attention modules and update transformer configurations to accommodate new parameters. Refactor video and audio processing to utilize adaptive normalization, improving model flexibility and performance. Update weight loading and initialization logic to support dynamic block structures in the decoder. Prince Canuma 2026-03-10 16:47:36 +01:00
d028b239fb Update LTX conversion script to support LTX-2.3 safetensors format. Enhance documentation and improve file matching logic for variant detection in local directories. Prince Canuma 2026-03-10 08:01:26 +01:00
576e01da14 Implement linking of text encoder and tokenizer directories in conversion process. Enhance error handling in LTX2TextEncoder for tokenizer loading, providing a fallback model if the specified path is unavailable. Prince Canuma 2026-03-09 18:25:32 +01:00
41ed62f7e8 Add LTX-2 conversion script for safetensors to MLX directory layout. Implement modular structure Prince Canuma 2026-03-09 18:16:20 +01:00
9f37dab076 Refactor model loading in generate.py to use dynamic model paths for audio and video components. Simplify weight loading logic in LTX2TextEncoder to accommodate both monolithic and reformatted model structures. Introduce a check for existing model paths in get_model_path function to enhance robustness. Prince Canuma 2026-03-09 15:51:21 +01:00
d1dd30cbac Add Adaptive Projected Guidance (APG) support to denoising functions. Introduce apg_delta function for stable guidance by decomposing into parallel and orthogonal components. Update denoise_dev and generate_video functions to accept APG parameters, enhancing flexibility in video generation. Modify command-line arguments for APG integration. Prince Canuma 2026-01-26 21:35:58 +01:00
87962c7f83 Enhance precision in denoising functions by ensuring all latents and calculations are consistently handled in float32. Update model input casting and return types to maintain dtype integrity across audio and video processing. Add precision parameter to video generation for improved memory management. Prince Canuma 2026-01-24 15:40:42 +01:00
cb2d19c84d fix loading Prince Canuma 2026-01-24 01:37:38 +01:00
ef76ec0921 add from pretrained Prince Canuma 2026-01-23 18:13:51 +01:00
ce39e744c3 Refactor VideoEncoder to initialize from VideoEncoderModelConfig, enhancing configuration management. Add methods for weight sanitization and loading from pretrained models, improving model usability and integration with existing workflows. Prince Canuma 2026-01-23 17:59:57 +01:00
f8f78aeab5 Add LTXModel with a from_pretrained class method for loading model weights from a specified path. Update weight sanitization to handle positional embeddings and dtype consistency. Refactor timestep and context preparation methods to accept hidden_dtype, improving flexibility in model processing. Prince Canuma 2026-01-23 17:45:50 +01:00
df753312c7 Refactor video generation and model loading processes to utilize from_pretrained methods for VideoEncoder and VideoDecoder. Update denoising functions to include a cfg_rescale parameter for improved artifact reduction. Ensure consistent dtype handling across audio and video processing, enhancing precision and aligning with PyTorch behavior. Prince Canuma 2026-01-23 17:39:02 +01:00
02bfa228d9 Refactor weight loading and sanitization processes for audio models Prince Canuma 2026-01-23 17:31:25 +01:00
7a74946c57 Merge pull request #14 from Blaizzy/pc/add-streaming Prince Canuma 2026-01-21 15:42:55 +01:00
ffdeec72a6 Merge branch 'main' into pc/add-streaming Prince Canuma 2026-01-21 15:42:16 +01:00
7ad14e18ca Merge pull request #12 from Blaizzy/pc/add-vae-tiling Prince Canuma 2026-01-21 15:41:46 +01:00
2681f75d2f Refactor LTXModel to include a from_pretrained class method for loading and sanitizing model weights. Update generate.py to utilize this method, streamlining the transformer loading process and improving code clarity. Prince Canuma 2026-01-20 12:56:29 +01:00
bbb3de6aa7 Update audio decoder configuration to disable mid-block attention and ensure audio waveform is converted to float32 for consistency in processing. Prince Canuma 2026-01-19 17:05:59 +01:00
8a2ea38c88 Refactor denoising functions in generate.py and utils.py to use float32 for improved precision, aligning with PyTorch behavior. Update calculations for latents and denoised outputs to ensure consistent dtype handling across audio and video processing. Prince Canuma 2026-01-19 09:13:04 +01:00
e0ee934b99 Update video generation completion message to display elapsed time in a more user-friendly format, showing minutes and seconds instead of just seconds. Prince Canuma 2026-01-19 02:23:51 +01:00
4cd58f8b26 Refactor LTX2TextEncoder to utilize Rich for progress tracking during token generation. Replace tqdm with Rich's Progress for enhanced console output and user experience. Clean up imports and streamline the generation process. Prince Canuma 2026-01-19 02:13:10 +01:00
ac67ee8b1e Remove the generate_dev.py file, consolidating its functionality into generate.py. Enhance the video generation pipeline to support both distilled and dev models, integrating dynamic sigma scheduling and classifier-free guidance (CFG) for improved video quality. Update command-line interface to accommodate new pipeline options and refactor related functions for better maintainability. Prince Canuma 2026-01-19 02:13:00 +01:00
0538af6554 Enhance video generation pipeline by integrating Rich for styled console output and progress tracking. Update dependencies in pyproject.toml to include Rich. Refactor print statements to use console methods for improved user experience during video and audio processing. Prince Canuma 2026-01-19 01:43:14 +01:00
cae11291a9 Remove the audio-video generation pipeline from generate_av.py and integrate audio capabilities into generate.py. This includes adding audio position grid creation, audio frame computation, and updating the denoising function to handle audio latents. Enhance the command-line interface to support audio generation options and update the model configuration accordingly. Prince Canuma 2026-01-19 01:28:53 +01:00
749762a0b9 Update audio decoder configuration to use an empty set for attention resolutions in both generate_av.py and generate_dev.py. Add a print statement for loading audio VAE decoder weights in generate_dev.py. Prince Canuma 2026-01-18 21:55:38 +01:00
7069cc39c9 Add audio generation capabilities to video pipeline, including audio position grid creation, audio frame computation, and integration of audio VAE and vocoder. Update tests to cover new audio functionalities. Prince Canuma 2026-01-18 21:28:56 +01:00
b36ad1e22d add tests Prince Canuma 2026-01-18 11:18:18 +01:00
e483eab039 Optimize positional embedding handling in TransformerArgsPreprocessor and improve RoPE frequency computation in _precompute_freqs_cis_double_precision for enhanced performance and precision. Prince Canuma 2026-01-18 11:13:32 +01:00
62fc4805a0 Add LTX-2 Dev Model video generation pipeline Prince Canuma 2026-01-18 11:13:11 +01:00
b1bf9e2dc0 Enhance video generation with progress bar for streaming and remove debug prints from tiling decoder Prince Canuma 2026-01-17 23:53:53 +01:00
f256c5fb25 add tests Prince Canuma 2026-01-17 23:36:39 +01:00
7f20840dc7 Add streaming support to video generation Prince Canuma 2026-01-17 23:17:08 +01:00
f33f496fba Merge branch 'main' into pc/add-vae-tiling Prince Canuma 2026-01-17 19:37:21 +01:00
e692b7a6b3 Add i2v Prince Canuma 2026-01-17 19:37:06 +01:00
785b0b955d Merge branch 'main' into pc/add-i2v Prince Canuma 2026-01-17 19:36:28 +01:00
26fa8919ed Merge pull request #13 from Blaizzy/Blaizzy-patch-1 Prince Canuma 2026-01-17 19:36:14 +01:00
c89de996eb Update GitHub Sponsors username in FUNDING.yml Prince Canuma 2026-01-17 19:35:24 +01:00
0669998e15 Add audio support Prince Canuma 2026-01-17 19:31:21 +01:00
61c56cd989 Add RoPE tests and warning for bfloat16 precision loss in RoPE calculations Prince Canuma 2026-01-17 19:28:05 +01:00
78244a2d66 Cast dtype to bf16 in video and audio generation processes Prince Canuma 2026-01-17 17:20:22 +01:00
883c6b0ad8 ensure dtype cast Prince Canuma 2026-01-17 13:03:48 +01:00
e4cdbb7eab add vae tiling Prince Canuma 2026-01-17 07:51:54 +01:00
f607112407 Refactor video and audio latent generation in generate_video and generate_video_with_audio Prince Canuma 2026-01-17 01:38:53 +01:00

1 2

Commit Graph Select branches Hide Pull Requests main Mono Color

Commit Graph

Select branches

Hide Pull Requests

main