mlx-video

Author	SHA1	Message	Date
Prince Canuma	a6a6bb2166	Move weight loading functions to a new file for better organization and maintainability	2026-03-16 17:28:06 +01:00
Prince Canuma	3a0da19adb	Refactor LTX-2 model structure	2026-03-16 14:50:01 +01:00
Prince Canuma	6f6105b715	Add audio to video conditioning	2026-03-16 01:42:11 +01:00
Prince Canuma	f53b9e0807	Add Dev Two-Stage HQ pipeline mode	2026-03-16 00:34:13 +01:00
Prince Canuma	df81bc852f	fix save tensors	2026-03-15 23:08:12 +01:00
Prince Canuma	cecd68197c	fix tiling, rope precision and weights	2026-03-15 22:58:55 +01:00
Prince Canuma	ebcd5dd4e4	optimize memory usage by batching weight updates	2026-03-15 03:12:47 +01:00
Prince Canuma	53bae534e7	fix LTX-2.3 audio	2026-03-15 02:06:35 +01:00
Prince Canuma	eb0d1355e4	Fix LTX-2.3 decoder grainy bug	2026-03-14 21:56:03 +01:00
Prince Canuma	5644492f7d	Update generate.py to enhance denoising functionality with optional Spatiotemporal Guidance (STG) support. Modify DEFAULT_NEGATIVE_PROMPT for improved clarity and detail. Implement auto-detection of STG blocks based on transformer configuration. Refactor denoise_dev function to incorporate STG parameters, allowing for more flexible audio-visual integration during video generation.	2026-03-14 20:02:42 +01:00
Prince Canuma	ffe271699a	Refactor LoRA loading for v2.3 in generate.py to prioritize distilled-lora files over full model weights, enhancing model flexibility. Update key sanitization logic to utilize a replacement list for improved readability and maintainability. Modify denoise_dev_av function to include sigma parameters for audio and video modalities, ensuring consistent handling of latent variables during processing. Adjust Vocoder weight loading to allow for non-strict loading, accommodating additional keys in model weights.	2026-03-14 15:24:50 +01:00
Prince Canuma	9cba2ea7cd	Enhance README.md with new usage examples for STG and modality scale parameters in video generation. Update generate.py to support STG and modality guidance in the denoising process, allowing for improved audio-visual integration. Refactor attention mechanisms in the transformer to include options for skipping self-attention, facilitating STG perturbation and modality isolation. Update LTXModel and transformer block processing to accommodate new parameters for enhanced flexibility in model configurations.	2026-03-14 10:26:12 +01:00
Prince Canuma	f346e09de4	Refactor audio handling in generate_video function to preserve stage 1 audio latents during stage 2 processing. Remove redundant audio re-denoising steps, ensuring audio integrity while refining video output. Update comments for clarity on audio processing logic.	2026-03-13 16:09:07 +01:00
Prince Canuma	387d4fc301	improve dev color and quality	2026-03-13 09:51:24 +01:00
Prince Canuma	835ba33202	Enhance README.md with detailed descriptions of LTX-2 features, pipeline options, and usage examples for text-to-video, image-to-video, and audio-video generation. Update generate.py to improve LoRA loading functionality, allowing for local files, directories, or HuggingFace repos. This update improves flexibility in model configurations and enhances user guidance in the documentation.	2026-03-13 01:39:39 +01:00
Prince Canuma	7435facc52	Add support for DEV_TWO_STAGE pipeline and implement LoRA merging functionality in generate.py. Enhance video generation capabilities by allowing LoRA weights to be loaded and merged into the model, improving flexibility in model configurations. Update pipeline handling to accommodate the new two-stage generation process.	2026-03-13 01:22:45 +01:00
Prince Canuma	e0aafd72fc	Refactor generate.py to ensure temporal coordinates and position grids are processed in bfloat16 for consistency with PyTorch's precision behavior. Update denoise_dev_av function to apply standard ratio rescaling for audio and video guidance, enhancing numerical fidelity and model compatibility.	2026-03-12 21:26:38 +01:00
Prince Canuma	b07b1e3213	Update .gitignore to exclude additional configuration and model files. Modify generate.py to enhance console output with rescale parameter and adjust default values for inference steps and CFG scale. Refactor text encoder to align positional embedding max position with PyTorch defaults, improving compatibility and performance.	2026-03-12 17:13:43 +01:00
Prince Canuma	d1fa47722b	Fix timestep_conditioning logic in infer_vae_decoder_config to ensure consistent behavior based on has_timestep flag.	2026-03-11 18:30:29 +01:00
Prince Canuma	207c223354	Add LTX-2.3 model architecture with prompt-conditioned adaptive layer normalization (adaln) support. Introduce gating mechanisms in attention modules and update transformer configurations to accommodate new parameters. Refactor video and audio processing to utilize adaptive normalization, improving model flexibility and performance. Update weight loading and initialization logic to support dynamic block structures in the decoder.	2026-03-10 16:47:36 +01:00
Prince Canuma	d028b239fb	Update LTX conversion script to support LTX-2.3 safetensors format. Enhance documentation and improve file matching logic for variant detection in local directories.	2026-03-10 08:01:26 +01:00
Prince Canuma	576e01da14	Implement linking of text encoder and tokenizer directories in conversion process. Enhance error handling in LTX2TextEncoder for tokenizer loading, providing a fallback model if the specified path is unavailable.	2026-03-09 18:25:32 +01:00
Prince Canuma	41ed62f7e8	Add LTX-2 conversion script for safetensors to MLX directory layout. Implement modular structure	2026-03-09 18:16:20 +01:00
Prince Canuma	9f37dab076	Refactor model loading in generate.py to use dynamic model paths for audio and video components. Simplify weight loading logic in LTX2TextEncoder to accommodate both monolithic and reformatted model structures. Introduce a check for existing model paths in get_model_path function to enhance robustness.	2026-03-09 15:51:21 +01:00
Prince Canuma	d1dd30cbac	Add Adaptive Projected Guidance (APG) support to denoising functions. Introduce apg_delta function for stable guidance by decomposing into parallel and orthogonal components. Update denoise_dev and generate_video functions to accept APG parameters, enhancing flexibility in video generation. Modify command-line arguments for APG integration.	2026-01-26 21:35:58 +01:00
Prince Canuma	87962c7f83	Enhance precision in denoising functions by ensuring all latents and calculations are consistently handled in float32. Update model input casting and return types to maintain dtype integrity across audio and video processing. Add precision parameter to video generation for improved memory management.	2026-01-24 15:40:42 +01:00
Prince Canuma	cb2d19c84d	fix loading	2026-01-24 01:37:38 +01:00
Prince Canuma	ef76ec0921	add from pretrained	2026-01-23 18:13:51 +01:00
Prince Canuma	ce39e744c3	Refactor VideoEncoder to initialize from VideoEncoderModelConfig, enhancing configuration management. Add methods for weight sanitization and loading from pretrained models, improving model usability and integration with existing workflows.	2026-01-23 17:59:57 +01:00
Prince Canuma	f8f78aeab5	Add LTXModel with a from_pretrained class method for loading model weights from a specified path. Update weight sanitization to handle positional embeddings and dtype consistency. Refactor timestep and context preparation methods to accept hidden_dtype, improving flexibility in model processing.	2026-01-23 17:45:50 +01:00
Prince Canuma	df753312c7	Refactor video generation and model loading processes to utilize from_pretrained methods for VideoEncoder and VideoDecoder. Update denoising functions to include a cfg_rescale parameter for improved artifact reduction. Ensure consistent dtype handling across audio and video processing, enhancing precision and aligning with PyTorch behavior.	2026-01-23 17:39:02 +01:00
Prince Canuma	02bfa228d9	Refactor weight loading and sanitization processes for audio models	2026-01-23 17:31:25 +01:00
Prince Canuma	2681f75d2f	Refactor LTXModel to include a from_pretrained class method for loading and sanitizing model weights. Update generate.py to utilize this method, streamlining the transformer loading process and improving code clarity.	2026-01-20 12:56:29 +01:00
Prince Canuma	bbb3de6aa7	Update audio decoder configuration to disable mid-block attention and ensure audio waveform is converted to float32 for consistency in processing.	2026-01-19 17:05:59 +01:00
Prince Canuma	8a2ea38c88	Refactor denoising functions in generate.py and utils.py to use float32 for improved precision, aligning with PyTorch behavior. Update calculations for latents and denoised outputs to ensure consistent dtype handling across audio and video processing.	2026-01-19 09:13:04 +01:00
Prince Canuma	e0ee934b99	Update video generation completion message to display elapsed time in a more user-friendly format, showing minutes and seconds instead of just seconds.	2026-01-19 02:23:51 +01:00
Prince Canuma	4cd58f8b26	Refactor LTX2TextEncoder to utilize Rich for progress tracking during token generation. Replace tqdm with Rich's Progress for enhanced console output and user experience. Clean up imports and streamline the generation process.	2026-01-19 02:13:10 +01:00
Prince Canuma	ac67ee8b1e	Remove the generate_dev.py file, consolidating its functionality into generate.py. Enhance the video generation pipeline to support both distilled and dev models, integrating dynamic sigma scheduling and classifier-free guidance (CFG) for improved video quality. Update command-line interface to accommodate new pipeline options and refactor related functions for better maintainability.	2026-01-19 02:13:00 +01:00
Prince Canuma	0538af6554	Enhance video generation pipeline by integrating Rich for styled console output and progress tracking. Update dependencies in pyproject.toml to include Rich. Refactor print statements to use console methods for improved user experience during video and audio processing.	2026-01-19 01:43:14 +01:00
Prince Canuma	cae11291a9	Remove the audio-video generation pipeline from generate_av.py and integrate audio capabilities into generate.py. This includes adding audio position grid creation, audio frame computation, and updating the denoising function to handle audio latents. Enhance the command-line interface to support audio generation options and update the model configuration accordingly.	2026-01-19 01:28:53 +01:00
Prince Canuma	749762a0b9	Update audio decoder configuration to use an empty set for attention resolutions in both generate_av.py and generate_dev.py. Add a print statement for loading audio VAE decoder weights in generate_dev.py.	2026-01-18 21:55:38 +01:00
Prince Canuma	7069cc39c9	Add audio generation capabilities to video pipeline, including audio position grid creation, audio frame computation, and integration of audio VAE and vocoder. Update tests to cover new audio functionalities.	2026-01-18 21:28:56 +01:00
Prince Canuma	e483eab039	Optimize positional embedding handling in TransformerArgsPreprocessor and improve RoPE frequency computation in _precompute_freqs_cis_double_precision for enhanced performance and precision.	2026-01-18 11:13:32 +01:00
Prince Canuma	62fc4805a0	Add LTX-2 Dev Model video generation pipeline	2026-01-18 11:13:11 +01:00
Prince Canuma	b1bf9e2dc0	Enhance video generation with progress bar for streaming and remove debug prints from tiling decoder	2026-01-17 23:53:53 +01:00
Prince Canuma	7f20840dc7	Add streaming support to video generation	2026-01-17 23:17:08 +01:00
Prince Canuma	61c56cd989	Add RoPE tests and warning for bfloat16 precision loss in RoPE calculations	2026-01-17 19:28:05 +01:00
Prince Canuma	78244a2d66	Cast dtype to bf16 in video and audio generation processes	2026-01-17 17:20:22 +01:00
Prince Canuma	883c6b0ad8	ensure dtype cast	2026-01-17 13:03:48 +01:00
Prince Canuma	e4cdbb7eab	add vae tiling	2026-01-17 07:51:54 +01:00

1 2

75 Commits