Commit Graph

106 Commits

Author SHA1 Message Date
Prince Canuma
207c223354 Add LTX-2.3 model architecture with prompt-conditioned adaptive layer normalization (adaln) support. Introduce gating mechanisms in attention modules and update transformer configurations to accommodate new parameters. Refactor video and audio processing to utilize adaptive normalization, improving model flexibility and performance. Update weight loading and initialization logic to support dynamic block structures in the decoder. 2026-03-10 16:47:36 +01:00
Prince Canuma
d028b239fb Update LTX conversion script to support LTX-2.3 safetensors format. Enhance documentation and improve file matching logic for variant detection in local directories. 2026-03-10 08:01:26 +01:00
Prince Canuma
576e01da14 Implement linking of text encoder and tokenizer directories in conversion process. Enhance error handling in LTX2TextEncoder for tokenizer loading, providing a fallback model if the specified path is unavailable. 2026-03-09 18:25:32 +01:00
Prince Canuma
41ed62f7e8 Add LTX-2 conversion script for safetensors to MLX directory layout. Implement modular structure 2026-03-09 18:16:20 +01:00
Prince Canuma
9f37dab076 Refactor model loading in generate.py to use dynamic model paths for audio and video components. Simplify weight loading logic in LTX2TextEncoder to accommodate both monolithic and reformatted model structures. Introduce a check for existing model paths in get_model_path function to enhance robustness. 2026-03-09 15:51:21 +01:00
Prince Canuma
d1dd30cbac Add Adaptive Projected Guidance (APG) support to denoising functions. Introduce apg_delta function for stable guidance by decomposing into parallel and orthogonal components. Update denoise_dev and generate_video functions to accept APG parameters, enhancing flexibility in video generation. Modify command-line arguments for APG integration. 2026-01-26 21:35:58 +01:00
Prince Canuma
87962c7f83 Enhance precision in denoising functions by ensuring all latents and calculations are consistently handled in float32. Update model input casting and return types to maintain dtype integrity across audio and video processing. Add precision parameter to video generation for improved memory management. 2026-01-24 15:40:42 +01:00
Prince Canuma
cb2d19c84d fix loading 2026-01-24 01:37:38 +01:00
Prince Canuma
ef76ec0921 add from pretrained 2026-01-23 18:13:51 +01:00
Prince Canuma
ce39e744c3 Refactor VideoEncoder to initialize from VideoEncoderModelConfig, enhancing configuration management. Add methods for weight sanitization and loading from pretrained models, improving model usability and integration with existing workflows. 2026-01-23 17:59:57 +01:00
Prince Canuma
f8f78aeab5 Add LTXModel with a from_pretrained class method for loading model weights from a specified path. Update weight sanitization to handle positional embeddings and dtype consistency. Refactor timestep and context preparation methods to accept hidden_dtype, improving flexibility in model processing. 2026-01-23 17:45:50 +01:00
Prince Canuma
df753312c7 Refactor video generation and model loading processes to utilize from_pretrained methods for VideoEncoder and VideoDecoder. Update denoising functions to include a cfg_rescale parameter for improved artifact reduction. Ensure consistent dtype handling across audio and video processing, enhancing precision and aligning with PyTorch behavior. 2026-01-23 17:39:02 +01:00
Prince Canuma
02bfa228d9 Refactor weight loading and sanitization processes for audio models 2026-01-23 17:31:25 +01:00
Prince Canuma
2681f75d2f Refactor LTXModel to include a from_pretrained class method for loading and sanitizing model weights. Update generate.py to utilize this method, streamlining the transformer loading process and improving code clarity. 2026-01-20 12:56:29 +01:00
Prince Canuma
bbb3de6aa7 Update audio decoder configuration to disable mid-block attention and ensure audio waveform is converted to float32 for consistency in processing. 2026-01-19 17:05:59 +01:00
Prince Canuma
8a2ea38c88 Refactor denoising functions in generate.py and utils.py to use float32 for improved precision, aligning with PyTorch behavior. Update calculations for latents and denoised outputs to ensure consistent dtype handling across audio and video processing. 2026-01-19 09:13:04 +01:00
Prince Canuma
e0ee934b99 Update video generation completion message to display elapsed time in a more user-friendly format, showing minutes and seconds instead of just seconds. 2026-01-19 02:23:51 +01:00
Prince Canuma
4cd58f8b26 Refactor LTX2TextEncoder to utilize Rich for progress tracking during token generation. Replace tqdm with Rich's Progress for enhanced console output and user experience. Clean up imports and streamline the generation process. 2026-01-19 02:13:10 +01:00
Prince Canuma
ac67ee8b1e Remove the generate_dev.py file, consolidating its functionality into generate.py. Enhance the video generation pipeline to support both distilled and dev models, integrating dynamic sigma scheduling and classifier-free guidance (CFG) for improved video quality. Update command-line interface to accommodate new pipeline options and refactor related functions for better maintainability. 2026-01-19 02:13:00 +01:00
Prince Canuma
0538af6554 Enhance video generation pipeline by integrating Rich for styled console output and progress tracking. Update dependencies in pyproject.toml to include Rich. Refactor print statements to use console methods for improved user experience during video and audio processing. 2026-01-19 01:43:14 +01:00
Prince Canuma
cae11291a9 Remove the audio-video generation pipeline from generate_av.py and integrate audio capabilities into generate.py. This includes adding audio position grid creation, audio frame computation, and updating the denoising function to handle audio latents. Enhance the command-line interface to support audio generation options and update the model configuration accordingly. 2026-01-19 01:28:53 +01:00
Prince Canuma
749762a0b9 Update audio decoder configuration to use an empty set for attention resolutions in both generate_av.py and generate_dev.py. Add a print statement for loading audio VAE decoder weights in generate_dev.py. 2026-01-18 21:55:38 +01:00
Prince Canuma
7069cc39c9 Add audio generation capabilities to video pipeline, including audio position grid creation, audio frame computation, and integration of audio VAE and vocoder. Update tests to cover new audio functionalities. 2026-01-18 21:28:56 +01:00
Prince Canuma
e483eab039 Optimize positional embedding handling in TransformerArgsPreprocessor and improve RoPE frequency computation in _precompute_freqs_cis_double_precision for enhanced performance and precision. 2026-01-18 11:13:32 +01:00
Prince Canuma
62fc4805a0 Add LTX-2 Dev Model video generation pipeline 2026-01-18 11:13:11 +01:00
Prince Canuma
b1bf9e2dc0 Enhance video generation with progress bar for streaming and remove debug prints from tiling decoder 2026-01-17 23:53:53 +01:00
Prince Canuma
7f20840dc7 Add streaming support to video generation 2026-01-17 23:17:08 +01:00
Prince Canuma
61c56cd989 Add RoPE tests and warning for bfloat16 precision loss in RoPE calculations 2026-01-17 19:28:05 +01:00
Prince Canuma
78244a2d66 Cast dtype to bf16 in video and audio generation processes 2026-01-17 17:20:22 +01:00
Prince Canuma
883c6b0ad8 ensure dtype cast 2026-01-17 13:03:48 +01:00
Prince Canuma
e4cdbb7eab add vae tiling 2026-01-17 07:51:54 +01:00
Prince Canuma
f607112407 Refactor video and audio latent generation in generate_video and generate_video_with_audio
- Removed direct initialization of latents with random noise, replacing it with a conditional approach based on I2V (Image-to-Video) conditioning.
- Introduced a structured flow for applying noise during the latent state creation, enhancing the conditioning process for both video and audio.
- Updated the noise application logic to ensure proper handling of conditioned and unconditioned frames in both stages of video generation.
- Improved code clarity and maintainability by consolidating latent shape definitions and restructuring noise application logic.
2026-01-17 01:38:53 +01:00
Prince Canuma
d52e567c56 Enhance precision in denormalization and normalization processes
- Updated `denormalize` and `pixel_norm` methods in `LTX2VideoDecoder` and `PerChannelStatistics` classes to cast mean and standard deviation to float32 for improved precision.
- Ensured that the output of normalization operations retains the original data type of the input tensor.
2026-01-17 01:14:29 +01:00
Prince Canuma
146f5d2981 Add image-to-video (I2V) conditioning support
- Introduced `load_image`, `prepare_image_for_encoding`, and `apply_conditioning` functions for handling image inputs and conditioning during video generation.
- Enhanced `generate_video` and `denoise_av` functions to accept optional image inputs for I2V conditioning.
- Updated command-line interface to include parameters for image conditioning, such as `--image`, `--image-strength`, and `--image-frame-idx`.
- Added new `VideoConditionByLatentIndex` and `LatentState` classes for managing latent states with conditioning.
- Implemented VAE encoder loading and image encoding for conditioning in the video generation process.d
2026-01-17 00:19:52 +01:00
Prince Canuma
5f86e881d7 Update top_p parameter in sampler function to 1.0 for enhanced sampling control in LTX2TextEncoder 2026-01-16 21:08:14 +01:00
Prince Canuma
f6e0e5d5a4 Update av_ca_timestep_scale_multiplier to 1000 in model configuration for consistency across modules 2026-01-16 15:59:22 +01:00
Prince Canuma
e1bff927df Auto-detect timestep_cond from model metadata () 2026-01-16 14:55:50 +01:00
Prince Canuma
a658911f98 add audio 2026-01-16 01:15:22 +01:00
Prince Canuma
81daf3f67d Add prompt enhancement feature to video generation
- Introduced `enhance_prompt`, `max_tokens`, and `temperature` parameters in `generate_video` function for improved prompt handling.
- Implemented prompt enhancement logic using the new `enhance_t2v` method in the text encoder.
- Added command-line arguments for prompt enhancement options.
- Created new system prompt files for T2V and I2V generation to guide the enhancement process.
2026-01-15 14:31:00 +01:00
Prince Canuma
f5134fa172 adjust gelu and precision 2026-01-15 12:49:21 +01:00
Prince Canuma
349a82f763 Refactor GroupNorm3d: Optimize data type handling by casting input, weight, and bias to float32 for consistency and performance 2026-01-15 04:46:56 +01:00
Prince Canuma
09c2b460a7 Refactor LTX2VideoDecoder and ResBlockGroup: Change up_blocks and res_blocks from lists to dictionaries for better parameter tracking in MLX 2026-01-15 03:48:16 +01:00
Prince Canuma
3fcd8f90be Refactor LTXModel: Change transformer_blocks from list to dictionary 2026-01-15 03:47:52 +01:00
Prince Canuma
e7067fea11 Refactor LTX2VideoDecoder: Remove redundant comments for residual parameter 2026-01-14 01:21:43 +01:00
Prince Canuma
957093c29b use numpy for improved float64 precision and performance 2026-01-14 00:03:00 +01:00
Prince Canuma
74af04718d Remove commented-out code and clean up text encoder initialization 2026-01-13 23:31:54 +01:00
Prince Canuma
ea063f7550 Cast LM weights to bfloat16 2026-01-13 23:30:26 +01:00
Prince Canuma
fc6ef20c1b Add custom text encoder with quantization
Co-authored-by: HimanshU Mourya <40685364+codingstark-dev@users.noreply.github.com>
2026-01-13 22:56:51 +01:00
Prince Canuma
01d895bc77 Add frame number validation in video generation and update Gemma3 text encoder to use validated mlx-vlm implementation 2026-01-13 17:12:11 +01:00
Prince Canuma
4f6fc8252c Add example usage to README and enhance console output in generate.py with ANSI colors 2026-01-12 16:45:09 +01:00