Commit Graph

76 Commits

Author SHA1 Message Date
Prince Canuma
2681f75d2f Refactor LTXModel to include a from_pretrained class method for loading and sanitizing model weights. Update generate.py to utilize this method, streamlining the transformer loading process and improving code clarity. 2026-01-20 12:56:29 +01:00
Prince Canuma
4cd58f8b26 Refactor LTX2TextEncoder to utilize Rich for progress tracking during token generation. Replace tqdm with Rich's Progress for enhanced console output and user experience. Clean up imports and streamline the generation process. 2026-01-19 02:13:10 +01:00
Prince Canuma
e483eab039 Optimize positional embedding handling in TransformerArgsPreprocessor and improve RoPE frequency computation in _precompute_freqs_cis_double_precision for enhanced performance and precision. 2026-01-18 11:13:32 +01:00
Prince Canuma
b1bf9e2dc0 Enhance video generation with progress bar for streaming and remove debug prints from tiling decoder 2026-01-17 23:53:53 +01:00
Prince Canuma
7f20840dc7 Add streaming support to video generation 2026-01-17 23:17:08 +01:00
Prince Canuma
61c56cd989 Add RoPE tests and warning for bfloat16 precision loss in RoPE calculations 2026-01-17 19:28:05 +01:00
Prince Canuma
883c6b0ad8 ensure dtype cast 2026-01-17 13:03:48 +01:00
Prince Canuma
e4cdbb7eab add vae tiling 2026-01-17 07:51:54 +01:00
Prince Canuma
d52e567c56 Enhance precision in denormalization and normalization processes
- Updated `denormalize` and `pixel_norm` methods in `LTX2VideoDecoder` and `PerChannelStatistics` classes to cast mean and standard deviation to float32 for improved precision.
- Ensured that the output of normalization operations retains the original data type of the input tensor.
2026-01-17 01:14:29 +01:00
Prince Canuma
146f5d2981 Add image-to-video (I2V) conditioning support
- Introduced `load_image`, `prepare_image_for_encoding`, and `apply_conditioning` functions for handling image inputs and conditioning during video generation.
- Enhanced `generate_video` and `denoise_av` functions to accept optional image inputs for I2V conditioning.
- Updated command-line interface to include parameters for image conditioning, such as `--image`, `--image-strength`, and `--image-frame-idx`.
- Added new `VideoConditionByLatentIndex` and `LatentState` classes for managing latent states with conditioning.
- Implemented VAE encoder loading and image encoding for conditioning in the video generation process.d
2026-01-17 00:19:52 +01:00
Prince Canuma
5f86e881d7 Update top_p parameter in sampler function to 1.0 for enhanced sampling control in LTX2TextEncoder 2026-01-16 21:08:14 +01:00
Prince Canuma
f6e0e5d5a4 Update av_ca_timestep_scale_multiplier to 1000 in model configuration for consistency across modules 2026-01-16 15:59:22 +01:00
Prince Canuma
a658911f98 add audio 2026-01-16 01:15:22 +01:00
Prince Canuma
81daf3f67d Add prompt enhancement feature to video generation
- Introduced `enhance_prompt`, `max_tokens`, and `temperature` parameters in `generate_video` function for improved prompt handling.
- Implemented prompt enhancement logic using the new `enhance_t2v` method in the text encoder.
- Added command-line arguments for prompt enhancement options.
- Created new system prompt files for T2V and I2V generation to guide the enhancement process.
2026-01-15 14:31:00 +01:00
Prince Canuma
f5134fa172 adjust gelu and precision 2026-01-15 12:49:21 +01:00
Prince Canuma
349a82f763 Refactor GroupNorm3d: Optimize data type handling by casting input, weight, and bias to float32 for consistency and performance 2026-01-15 04:46:56 +01:00
Prince Canuma
09c2b460a7 Refactor LTX2VideoDecoder and ResBlockGroup: Change up_blocks and res_blocks from lists to dictionaries for better parameter tracking in MLX 2026-01-15 03:48:16 +01:00
Prince Canuma
3fcd8f90be Refactor LTXModel: Change transformer_blocks from list to dictionary 2026-01-15 03:47:52 +01:00
Prince Canuma
e7067fea11 Refactor LTX2VideoDecoder: Remove redundant comments for residual parameter 2026-01-14 01:21:43 +01:00
Prince Canuma
957093c29b use numpy for improved float64 precision and performance 2026-01-14 00:03:00 +01:00
Prince Canuma
74af04718d Remove commented-out code and clean up text encoder initialization 2026-01-13 23:31:54 +01:00
Prince Canuma
ea063f7550 Cast LM weights to bfloat16 2026-01-13 23:30:26 +01:00
Prince Canuma
fc6ef20c1b Add custom text encoder with quantization
Co-authored-by: HimanshU Mourya <40685364+codingstark-dev@users.noreply.github.com>
2026-01-13 22:56:51 +01:00
Prince Canuma
01d895bc77 Add frame number validation in video generation and update Gemma3 text encoder to use validated mlx-vlm implementation 2026-01-13 17:12:11 +01:00
Prince Canuma
7114b023bd - Refactor video generation script
- Introduced argparse for parameter handling, streamlined model loading, and enhanced denoising functions.
- Updated VAE weight sanitization for compatibility and improved activation function handling in text projection.
- Added support for saving individual frames and refined output video generation process.
2026-01-12 14:04:53 +01:00
Prince Canuma
d1ca36a315 initial commit (LTX-2) 2026-01-11 23:48:33 +01:00