mlx-video

Author	SHA1	Message	Date
Prince Canuma	576e01da14	Implement linking of text encoder and tokenizer directories in conversion process. Enhance error handling in LTX2TextEncoder for tokenizer loading, providing a fallback model if the specified path is unavailable.	2026-03-09 18:25:32 +01:00
Prince Canuma	41ed62f7e8	Add LTX-2 conversion script for safetensors to MLX directory layout. Implement modular structure	2026-03-09 18:16:20 +01:00
Prince Canuma	9f37dab076	Refactor model loading in generate.py to use dynamic model paths for audio and video components. Simplify weight loading logic in LTX2TextEncoder to accommodate both monolithic and reformatted model structures. Introduce a check for existing model paths in get_model_path function to enhance robustness.	2026-03-09 15:51:21 +01:00
Prince Canuma	cb2d19c84d	fix loading	2026-01-24 01:37:38 +01:00
Prince Canuma	ef76ec0921	add from pretrained	2026-01-23 18:13:51 +01:00
Prince Canuma	ce39e744c3	Refactor VideoEncoder to initialize from VideoEncoderModelConfig, enhancing configuration management. Add methods for weight sanitization and loading from pretrained models, improving model usability and integration with existing workflows.	2026-01-23 17:59:57 +01:00
Prince Canuma	f8f78aeab5	Add LTXModel with a from_pretrained class method for loading model weights from a specified path. Update weight sanitization to handle positional embeddings and dtype consistency. Refactor timestep and context preparation methods to accept hidden_dtype, improving flexibility in model processing.	2026-01-23 17:45:50 +01:00
Prince Canuma	df753312c7	Refactor video generation and model loading processes to utilize from_pretrained methods for VideoEncoder and VideoDecoder. Update denoising functions to include a cfg_rescale parameter for improved artifact reduction. Ensure consistent dtype handling across audio and video processing, enhancing precision and aligning with PyTorch behavior.	2026-01-23 17:39:02 +01:00
Prince Canuma	02bfa228d9	Refactor weight loading and sanitization processes for audio models	2026-01-23 17:31:25 +01:00
Prince Canuma	2681f75d2f	Refactor LTXModel to include a from_pretrained class method for loading and sanitizing model weights. Update generate.py to utilize this method, streamlining the transformer loading process and improving code clarity.	2026-01-20 12:56:29 +01:00
Prince Canuma	4cd58f8b26	Refactor LTX2TextEncoder to utilize Rich for progress tracking during token generation. Replace tqdm with Rich's Progress for enhanced console output and user experience. Clean up imports and streamline the generation process.	2026-01-19 02:13:10 +01:00
Prince Canuma	e483eab039	Optimize positional embedding handling in TransformerArgsPreprocessor and improve RoPE frequency computation in _precompute_freqs_cis_double_precision for enhanced performance and precision.	2026-01-18 11:13:32 +01:00
Prince Canuma	b1bf9e2dc0	Enhance video generation with progress bar for streaming and remove debug prints from tiling decoder	2026-01-17 23:53:53 +01:00
Prince Canuma	7f20840dc7	Add streaming support to video generation	2026-01-17 23:17:08 +01:00
Prince Canuma	61c56cd989	Add RoPE tests and warning for bfloat16 precision loss in RoPE calculations	2026-01-17 19:28:05 +01:00
Prince Canuma	883c6b0ad8	ensure dtype cast	2026-01-17 13:03:48 +01:00
Prince Canuma	e4cdbb7eab	add vae tiling	2026-01-17 07:51:54 +01:00
Prince Canuma	d52e567c56	Enhance precision in denormalization and normalization processes - Updated `denormalize` and `pixel_norm` methods in `LTX2VideoDecoder` and `PerChannelStatistics` classes to cast mean and standard deviation to float32 for improved precision. - Ensured that the output of normalization operations retains the original data type of the input tensor.	2026-01-17 01:14:29 +01:00
Prince Canuma	146f5d2981	Add image-to-video (I2V) conditioning support - Introduced `load_image`, `prepare_image_for_encoding`, and `apply_conditioning` functions for handling image inputs and conditioning during video generation. - Enhanced `generate_video` and `denoise_av` functions to accept optional image inputs for I2V conditioning. - Updated command-line interface to include parameters for image conditioning, such as `--image`, `--image-strength`, and `--image-frame-idx`. - Added new `VideoConditionByLatentIndex` and `LatentState` classes for managing latent states with conditioning. - Implemented VAE encoder loading and image encoding for conditioning in the video generation process.d	2026-01-17 00:19:52 +01:00
Prince Canuma	5f86e881d7	Update top_p parameter in sampler function to 1.0 for enhanced sampling control in LTX2TextEncoder	2026-01-16 21:08:14 +01:00
Prince Canuma	f6e0e5d5a4	Update av_ca_timestep_scale_multiplier to 1000 in model configuration for consistency across modules	2026-01-16 15:59:22 +01:00
Prince Canuma	a658911f98	add audio	2026-01-16 01:15:22 +01:00
Prince Canuma	81daf3f67d	Add prompt enhancement feature to video generation - Introduced `enhance_prompt`, `max_tokens`, and `temperature` parameters in `generate_video` function for improved prompt handling. - Implemented prompt enhancement logic using the new `enhance_t2v` method in the text encoder. - Added command-line arguments for prompt enhancement options. - Created new system prompt files for T2V and I2V generation to guide the enhancement process.	2026-01-15 14:31:00 +01:00
Prince Canuma	f5134fa172	adjust gelu and precision	2026-01-15 12:49:21 +01:00
Prince Canuma	349a82f763	Refactor GroupNorm3d: Optimize data type handling by casting input, weight, and bias to float32 for consistency and performance	2026-01-15 04:46:56 +01:00
Prince Canuma	09c2b460a7	Refactor LTX2VideoDecoder and ResBlockGroup: Change up_blocks and res_blocks from lists to dictionaries for better parameter tracking in MLX	2026-01-15 03:48:16 +01:00
Prince Canuma	3fcd8f90be	Refactor LTXModel: Change transformer_blocks from list to dictionary	2026-01-15 03:47:52 +01:00
Prince Canuma	e7067fea11	Refactor LTX2VideoDecoder: Remove redundant comments for residual parameter	2026-01-14 01:21:43 +01:00
Prince Canuma	957093c29b	use numpy for improved float64 precision and performance	2026-01-14 00:03:00 +01:00
Prince Canuma	74af04718d	Remove commented-out code and clean up text encoder initialization	2026-01-13 23:31:54 +01:00
Prince Canuma	ea063f7550	Cast LM weights to bfloat16	2026-01-13 23:30:26 +01:00
Prince Canuma	fc6ef20c1b	Add custom text encoder with quantization Co-authored-by: HimanshU Mourya <40685364+codingstark-dev@users.noreply.github.com>	2026-01-13 22:56:51 +01:00
Prince Canuma	01d895bc77	Add frame number validation in video generation and update Gemma3 text encoder to use validated mlx-vlm implementation	2026-01-13 17:12:11 +01:00
Prince Canuma	7114b023bd	- Refactor video generation script - Introduced argparse for parameter handling, streamlined model loading, and enhanced denoising functions. - Updated VAE weight sanitization for compatibility and improved activation function handling in text projection. - Added support for saving individual frames and refined output video generation process.	2026-01-12 14:04:53 +01:00
Prince Canuma	d1ca36a315	initial commit (LTX-2)	2026-01-11 23:48:33 +01:00

35 Commits