mlx-video

Author	SHA1	Message	Date
Prince Canuma	b029668cd2	Refactor Wan model structure by renaming and relocating model imports from `model.py` to `wan2.py`, enhancing code organization and clarity across the Wan2 module.	2026-03-18 17:57:29 +01:00
Prince Canuma	6c63163671	Refactor Wan model imports and update script paths in pyproject.toml; transition from wan to wan2 module structure for improved organization and clarity.	2026-03-18 17:52:30 +01:00
Prince Canuma	17397da70c	format	2026-03-18 17:40:05 +01:00
Prince Canuma	78bcfba31b	Update README.md to reflect changes in command usage for Wan2.1 and Wan2.2 models, consolidating generation commands under the new `mlx_video.wan2` module.	2026-03-18 17:38:49 +01:00
Prince Canuma	3e33172c12	Refactor and remove Wan2.1/2.2 model files; update README.md to include new model features and usage instructions for LTX-2 and Wan2 models.	2026-03-18 17:34:57 +01:00
Prince Canuma	95d7c81b20	Remove deprecated stubs for video conversion and generation; introduce new weight conversion and generation scripts for Wan2.2 models in MLX.	2026-03-18 17:20:36 +01:00
Prince Canuma	7b9d0a5e44	Merge branch 'main' into pc/unify-apis	2026-03-18 17:14:17 +01:00
Prince Canuma	fea0f87df9	Fix token handling in LTX-2 text encoder by directly appending response tokens to the generated tokens list, improving clarity and consistency in token generation.	2026-03-18 13:50:33 +01:00
Prince Canuma	f5e311a77c	Update default values for STG and modality scales in LTX-2 video generation; enhance help descriptions for command-line arguments	2026-03-18 12:17:47 +01:00
Prince Canuma	f8e371e9ce	Enhance upsampler weight detection logic in LTX-2 model; improve clarity in comments and streamline spatial scale determination for x1.5 and x2 formats	2026-03-17 15:14:57 +01:00
Prince Canuma	57f66bcae2	Add custom spatial upscaling support to LTX-2 video generation; introduce spatial_upscaler parameter and enhance resolution handling for two-stage pipelines	2026-03-17 02:23:47 +01:00
Prince Canuma	cc302d79b0	Refactor comments and optimize key skipping logic in LTX-2 model conversion; improve clarity in code documentation	2026-03-17 00:39:52 +01:00
Prince Canuma	643f250195	Update README.md with installation instructions, supported models, and usage examples; add new LTX-2 model documentation for pipelines and features.	2026-03-16 23:03:05 +01:00
Prince Canuma	f9880a0683	Add audio encoder sanitization and configuration inference to LTX-2 model conversion process; update conversion print statements for new encoder step	2026-03-16 22:35:27 +01:00
Prince Canuma	7a576bfbf4	Refactor weight loading and utility functions for LTX-2 model; remove deprecated weight loading file and update imports accordingly	2026-03-16 22:25:22 +01:00
Prince Canuma	dd573d53d2	Refactor audio VAE directory structure and update related paths in conversion and loading functions	2026-03-16 21:53:37 +01:00
Prince Canuma	a6a6bb2166	Move weight loading functions to a new file for better organization and maintainability	2026-03-16 17:28:06 +01:00
Prince Canuma	3a0da19adb	Refactor LTX-2 model structure	2026-03-16 14:50:01 +01:00
Prince Canuma	decb3eb9e5	Add librosa dependency and enhance A2V documentation with additional pipeline options	2026-03-16 02:02:13 +01:00
Prince Canuma	6f6105b715	Add audio to video conditioning	2026-03-16 01:42:11 +01:00
Prince Canuma	f53b9e0807	Add Dev Two-Stage HQ pipeline mode	2026-03-16 00:34:13 +01:00
Prince Canuma	df81bc852f	fix save tensors	2026-03-15 23:08:12 +01:00
Prince Canuma	38d46a6eda	Implement regression tests for RoPE position precision using NumPy float64 reference. Add a new function to compute reference values and validate that float32 results closely match expected outputs, addressing high-frequency amplification issues. Update imports to include LTXModelConfig for enhanced configuration management.	2026-03-15 23:00:38 +01:00
Prince Canuma	cecd68197c	fix tiling, rope precision and weights	2026-03-15 22:58:55 +01:00
Prince Canuma	ebcd5dd4e4	optimize memory usage by batching weight updates	2026-03-15 03:12:47 +01:00
Prince Canuma	53bae534e7	fix LTX-2.3 audio	2026-03-15 02:06:35 +01:00
Prince Canuma	eb0d1355e4	Fix LTX-2.3 decoder grainy bug	2026-03-14 21:56:03 +01:00
Prince Canuma	5644492f7d	Update generate.py to enhance denoising functionality with optional Spatiotemporal Guidance (STG) support. Modify DEFAULT_NEGATIVE_PROMPT for improved clarity and detail. Implement auto-detection of STG blocks based on transformer configuration. Refactor denoise_dev function to incorporate STG parameters, allowing for more flexible audio-visual integration during video generation.	2026-03-14 20:02:42 +01:00
Prince Canuma	ffe271699a	Refactor LoRA loading for v2.3 in generate.py to prioritize distilled-lora files over full model weights, enhancing model flexibility. Update key sanitization logic to utilize a replacement list for improved readability and maintainability. Modify denoise_dev_av function to include sigma parameters for audio and video modalities, ensuring consistent handling of latent variables during processing. Adjust Vocoder weight loading to allow for non-strict loading, accommodating additional keys in model weights.	2026-03-14 15:24:50 +01:00
Prince Canuma	9cba2ea7cd	Enhance README.md with new usage examples for STG and modality scale parameters in video generation. Update generate.py to support STG and modality guidance in the denoising process, allowing for improved audio-visual integration. Refactor attention mechanisms in the transformer to include options for skipping self-attention, facilitating STG perturbation and modality isolation. Update LTXModel and transformer block processing to accommodate new parameters for enhanced flexibility in model configurations.	2026-03-14 10:26:12 +01:00
Prince Canuma	f346e09de4	Refactor audio handling in generate_video function to preserve stage 1 audio latents during stage 2 processing. Remove redundant audio re-denoising steps, ensuring audio integrity while refining video output. Update comments for clarity on audio processing logic.	2026-03-13 16:09:07 +01:00
Prince Canuma	387d4fc301	improve dev color and quality	2026-03-13 09:51:24 +01:00
Prince Canuma	835ba33202	Enhance README.md with detailed descriptions of LTX-2 features, pipeline options, and usage examples for text-to-video, image-to-video, and audio-video generation. Update generate.py to improve LoRA loading functionality, allowing for local files, directories, or HuggingFace repos. This update improves flexibility in model configurations and enhances user guidance in the documentation.	2026-03-13 01:39:39 +01:00
Prince Canuma	7435facc52	Add support for DEV_TWO_STAGE pipeline and implement LoRA merging functionality in generate.py. Enhance video generation capabilities by allowing LoRA weights to be loaded and merged into the model, improving flexibility in model configurations. Update pipeline handling to accommodate the new two-stage generation process.	2026-03-13 01:22:45 +01:00
Prince Canuma	e0aafd72fc	Refactor generate.py to ensure temporal coordinates and position grids are processed in bfloat16 for consistency with PyTorch's precision behavior. Update denoise_dev_av function to apply standard ratio rescaling for audio and video guidance, enhancing numerical fidelity and model compatibility.	2026-03-12 21:26:38 +01:00
Prince Canuma	b07b1e3213	Update .gitignore to exclude additional configuration and model files. Modify generate.py to enhance console output with rescale parameter and adjust default values for inference steps and CFG scale. Refactor text encoder to align positional embedding max position with PyTorch defaults, improving compatibility and performance.	2026-03-12 17:13:43 +01:00
Prince Canuma	3618966625	Wan2.1 and Wan2.2 model support, including LoRA support & more Poodles	2026-03-11 19:08:14 +01:00
Prince Canuma	d1fa47722b	Fix timestep_conditioning logic in infer_vae_decoder_config to ensure consistent behavior based on has_timestep flag.	2026-03-11 18:30:29 +01:00
Daniel	33dd3c2edd	Revert small change to mlx_video/generate.py	2026-03-11 12:41:44 +01:00
Daniel	281750f0a9	Revert changes to existing files by copying some code.	2026-03-11 12:35:47 +01:00
Daniel	ae410f3121	Update Wan2.1/Wan2.2 README.md	2026-03-11 12:24:59 +01:00
Daniel	c144c8817c	refactor(wan): move causal_temporal tiling to wan/tiling.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-11 12:02:54 +01:00
Daniel	1cf878f5e0	More poodles	2026-03-11 09:24:06 +01:00
Daniel	d207275fea	fix(wan): Fix scheduler sigma schedule and add debug flags	2026-03-11 09:18:01 +01:00
Daniel	afd15018b7	chore: Cleanup -- reorganize README and docs	2026-03-11 09:17:25 +01:00
Daniel	061ae4407c	feat(wan): Add chunked VAE encoding and TI2V-5B support	2026-03-11 09:16:52 +01:00
Daniel	967218b7c1	feat(wan): Add diagnostic scripts and porting guide	2026-03-11 09:16:22 +01:00
Daniel	9bdda9f22e	feat(wan): Add tiled VAE decoding and fix TI2V quality	2026-03-11 09:16:22 +01:00
Daniel	9597b7c9c5	perf(wan): Add mx.compile and fix first-frame artifacts	2026-03-11 09:14:43 +01:00
Daniel	849cc45d84	feat(wan): Add LoRA with improved quantization pipeline	2026-03-11 09:13:20 +01:00

1 2 3

133 Commits