Prince Canuma
a6a6bb2166
Move weight loading functions to a new file for better organization and maintainability
2026-03-16 17:28:06 +01:00
Prince Canuma
3a0da19adb
Refactor LTX-2 model structure
2026-03-16 14:50:01 +01:00
Prince Canuma
6f6105b715
Add audio to video conditioning
2026-03-16 01:42:11 +01:00
Prince Canuma
cecd68197c
fix tiling, rope precision and weights
2026-03-15 22:58:55 +01:00
Prince Canuma
53bae534e7
fix LTX-2.3 audio
2026-03-15 02:06:35 +01:00
Prince Canuma
eb0d1355e4
Fix LTX-2.3 decoder grainy bug
2026-03-14 21:56:03 +01:00
Prince Canuma
ffe271699a
Refactor LoRA loading for v2.3 in generate.py to prioritize distilled-lora files over full model weights, enhancing model flexibility. Update key sanitization logic to utilize a replacement list for improved readability and maintainability. Modify denoise_dev_av function to include sigma parameters for audio and video modalities, ensuring consistent handling of latent variables during processing. Adjust Vocoder weight loading to allow for non-strict loading, accommodating additional keys in model weights.
2026-03-14 15:24:50 +01:00
Prince Canuma
9cba2ea7cd
Enhance README.md with new usage examples for STG and modality scale parameters in video generation. Update generate.py to support STG and modality guidance in the denoising process, allowing for improved audio-visual integration. Refactor attention mechanisms in the transformer to include options for skipping self-attention, facilitating STG perturbation and modality isolation. Update LTXModel and transformer block processing to accommodate new parameters for enhanced flexibility in model configurations.
2026-03-14 10:26:12 +01:00
Prince Canuma
e0aafd72fc
Refactor generate.py to ensure temporal coordinates and position grids are processed in bfloat16 for consistency with PyTorch's precision behavior. Update denoise_dev_av function to apply standard ratio rescaling for audio and video guidance, enhancing numerical fidelity and model compatibility.
2026-03-12 21:26:38 +01:00
Prince Canuma
b07b1e3213
Update .gitignore to exclude additional configuration and model files. Modify generate.py to enhance console output with rescale parameter and adjust default values for inference steps and CFG scale. Refactor text encoder to align positional embedding max position with PyTorch defaults, improving compatibility and performance.
2026-03-12 17:13:43 +01:00
Prince Canuma
d1fa47722b
Fix timestep_conditioning logic in infer_vae_decoder_config to ensure consistent behavior based on has_timestep flag.
2026-03-11 18:30:29 +01:00
Prince Canuma
207c223354
Add LTX-2.3 model architecture with prompt-conditioned adaptive layer normalization (adaln) support. Introduce gating mechanisms in attention modules and update transformer configurations to accommodate new parameters. Refactor video and audio processing to utilize adaptive normalization, improving model flexibility and performance. Update weight loading and initialization logic to support dynamic block structures in the decoder.
2026-03-10 16:47:36 +01:00
Prince Canuma
d028b239fb
Update LTX conversion script to support LTX-2.3 safetensors format. Enhance documentation and improve file matching logic for variant detection in local directories.
2026-03-10 08:01:26 +01:00
Prince Canuma
576e01da14
Implement linking of text encoder and tokenizer directories in conversion process. Enhance error handling in LTX2TextEncoder for tokenizer loading, providing a fallback model if the specified path is unavailable.
2026-03-09 18:25:32 +01:00
Prince Canuma
41ed62f7e8
Add LTX-2 conversion script for safetensors to MLX directory layout. Implement modular structure
2026-03-09 18:16:20 +01:00
Prince Canuma
9f37dab076
Refactor model loading in generate.py to use dynamic model paths for audio and video components. Simplify weight loading logic in LTX2TextEncoder to accommodate both monolithic and reformatted model structures. Introduce a check for existing model paths in get_model_path function to enhance robustness.
2026-03-09 15:51:21 +01:00
Prince Canuma
cb2d19c84d
fix loading
2026-01-24 01:37:38 +01:00
Prince Canuma
ef76ec0921
add from pretrained
2026-01-23 18:13:51 +01:00
Prince Canuma
ce39e744c3
Refactor VideoEncoder to initialize from VideoEncoderModelConfig, enhancing configuration management. Add methods for weight sanitization and loading from pretrained models, improving model usability and integration with existing workflows.
2026-01-23 17:59:57 +01:00
Prince Canuma
f8f78aeab5
Add LTXModel with a from_pretrained class method for loading model weights from a specified path. Update weight sanitization to handle positional embeddings and dtype consistency. Refactor timestep and context preparation methods to accept hidden_dtype, improving flexibility in model processing.
2026-01-23 17:45:50 +01:00
Prince Canuma
df753312c7
Refactor video generation and model loading processes to utilize from_pretrained methods for VideoEncoder and VideoDecoder. Update denoising functions to include a cfg_rescale parameter for improved artifact reduction. Ensure consistent dtype handling across audio and video processing, enhancing precision and aligning with PyTorch behavior.
2026-01-23 17:39:02 +01:00
Prince Canuma
02bfa228d9
Refactor weight loading and sanitization processes for audio models
2026-01-23 17:31:25 +01:00
Prince Canuma
2681f75d2f
Refactor LTXModel to include a from_pretrained class method for loading and sanitizing model weights. Update generate.py to utilize this method, streamlining the transformer loading process and improving code clarity.
2026-01-20 12:56:29 +01:00
Prince Canuma
4cd58f8b26
Refactor LTX2TextEncoder to utilize Rich for progress tracking during token generation. Replace tqdm with Rich's Progress for enhanced console output and user experience. Clean up imports and streamline the generation process.
2026-01-19 02:13:10 +01:00
Prince Canuma
e483eab039
Optimize positional embedding handling in TransformerArgsPreprocessor and improve RoPE frequency computation in _precompute_freqs_cis_double_precision for enhanced performance and precision.
2026-01-18 11:13:32 +01:00
Prince Canuma
b1bf9e2dc0
Enhance video generation with progress bar for streaming and remove debug prints from tiling decoder
2026-01-17 23:53:53 +01:00
Prince Canuma
7f20840dc7
Add streaming support to video generation
2026-01-17 23:17:08 +01:00
Prince Canuma
61c56cd989
Add RoPE tests and warning for bfloat16 precision loss in RoPE calculations
2026-01-17 19:28:05 +01:00
Prince Canuma
883c6b0ad8
ensure dtype cast
2026-01-17 13:03:48 +01:00
Prince Canuma
e4cdbb7eab
add vae tiling
2026-01-17 07:51:54 +01:00
Prince Canuma
d52e567c56
Enhance precision in denormalization and normalization processes
...
- Updated `denormalize` and `pixel_norm` methods in `LTX2VideoDecoder` and `PerChannelStatistics` classes to cast mean and standard deviation to float32 for improved precision.
- Ensured that the output of normalization operations retains the original data type of the input tensor.
2026-01-17 01:14:29 +01:00
Prince Canuma
146f5d2981
Add image-to-video (I2V) conditioning support
...
- Introduced `load_image`, `prepare_image_for_encoding`, and `apply_conditioning` functions for handling image inputs and conditioning during video generation.
- Enhanced `generate_video` and `denoise_av` functions to accept optional image inputs for I2V conditioning.
- Updated command-line interface to include parameters for image conditioning, such as `--image`, `--image-strength`, and `--image-frame-idx`.
- Added new `VideoConditionByLatentIndex` and `LatentState` classes for managing latent states with conditioning.
- Implemented VAE encoder loading and image encoding for conditioning in the video generation process.d
2026-01-17 00:19:52 +01:00
Prince Canuma
5f86e881d7
Update top_p parameter in sampler function to 1.0 for enhanced sampling control in LTX2TextEncoder
2026-01-16 21:08:14 +01:00
Prince Canuma
f6e0e5d5a4
Update av_ca_timestep_scale_multiplier to 1000 in model configuration for consistency across modules
2026-01-16 15:59:22 +01:00
Prince Canuma
a658911f98
add audio
2026-01-16 01:15:22 +01:00
Prince Canuma
81daf3f67d
Add prompt enhancement feature to video generation
...
- Introduced `enhance_prompt`, `max_tokens`, and `temperature` parameters in `generate_video` function for improved prompt handling.
- Implemented prompt enhancement logic using the new `enhance_t2v` method in the text encoder.
- Added command-line arguments for prompt enhancement options.
- Created new system prompt files for T2V and I2V generation to guide the enhancement process.
2026-01-15 14:31:00 +01:00
Prince Canuma
f5134fa172
adjust gelu and precision
2026-01-15 12:49:21 +01:00
Prince Canuma
349a82f763
Refactor GroupNorm3d: Optimize data type handling by casting input, weight, and bias to float32 for consistency and performance
2026-01-15 04:46:56 +01:00
Prince Canuma
09c2b460a7
Refactor LTX2VideoDecoder and ResBlockGroup: Change up_blocks and res_blocks from lists to dictionaries for better parameter tracking in MLX
2026-01-15 03:48:16 +01:00
Prince Canuma
3fcd8f90be
Refactor LTXModel: Change transformer_blocks from list to dictionary
2026-01-15 03:47:52 +01:00
Prince Canuma
e7067fea11
Refactor LTX2VideoDecoder: Remove redundant comments for residual parameter
2026-01-14 01:21:43 +01:00
Prince Canuma
957093c29b
use numpy for improved float64 precision and performance
2026-01-14 00:03:00 +01:00
Prince Canuma
74af04718d
Remove commented-out code and clean up text encoder initialization
2026-01-13 23:31:54 +01:00
Prince Canuma
ea063f7550
Cast LM weights to bfloat16
2026-01-13 23:30:26 +01:00
Prince Canuma
fc6ef20c1b
Add custom text encoder with quantization
...
Co-authored-by: HimanshU Mourya <40685364+codingstark-dev@users.noreply.github.com >
2026-01-13 22:56:51 +01:00
Prince Canuma
01d895bc77
Add frame number validation in video generation and update Gemma3 text encoder to use validated mlx-vlm implementation
2026-01-13 17:12:11 +01:00
Prince Canuma
7114b023bd
- Refactor video generation script
...
- Introduced argparse for parameter handling, streamlined model loading, and enhanced denoising functions.
- Updated VAE weight sanitization for compatibility and improved activation function handling in text projection.
- Added support for saving individual frames and refined output video generation process.
2026-01-12 14:04:53 +01:00
Prince Canuma
d1ca36a315
initial commit (LTX-2)
2026-01-11 23:48:33 +01:00