Prince Canuma
53bae534e7
fix LTX-2.3 audio
2026-03-15 02:06:35 +01:00
Prince Canuma
5644492f7d
Update generate.py to enhance denoising functionality with optional Spatiotemporal Guidance (STG) support. Modify DEFAULT_NEGATIVE_PROMPT for improved clarity and detail. Implement auto-detection of STG blocks based on transformer configuration. Refactor denoise_dev function to incorporate STG parameters, allowing for more flexible audio-visual integration during video generation.
2026-03-14 20:02:42 +01:00
Prince Canuma
ffe271699a
Refactor LoRA loading for v2.3 in generate.py to prioritize distilled-lora files over full model weights, enhancing model flexibility. Update key sanitization logic to utilize a replacement list for improved readability and maintainability. Modify denoise_dev_av function to include sigma parameters for audio and video modalities, ensuring consistent handling of latent variables during processing. Adjust Vocoder weight loading to allow for non-strict loading, accommodating additional keys in model weights.
2026-03-14 15:24:50 +01:00
Prince Canuma
9cba2ea7cd
Enhance README.md with new usage examples for STG and modality scale parameters in video generation. Update generate.py to support STG and modality guidance in the denoising process, allowing for improved audio-visual integration. Refactor attention mechanisms in the transformer to include options for skipping self-attention, facilitating STG perturbation and modality isolation. Update LTXModel and transformer block processing to accommodate new parameters for enhanced flexibility in model configurations.
2026-03-14 10:26:12 +01:00
Prince Canuma
f346e09de4
Refactor audio handling in generate_video function to preserve stage 1 audio latents during stage 2 processing. Remove redundant audio re-denoising steps, ensuring audio integrity while refining video output. Update comments for clarity on audio processing logic.
2026-03-13 16:09:07 +01:00
Prince Canuma
387d4fc301
improve dev color and quality
2026-03-13 09:51:24 +01:00
Prince Canuma
835ba33202
Enhance README.md with detailed descriptions of LTX-2 features, pipeline options, and usage examples for text-to-video, image-to-video, and audio-video generation. Update generate.py to improve LoRA loading functionality, allowing for local files, directories, or HuggingFace repos. This update improves flexibility in model configurations and enhances user guidance in the documentation.
2026-03-13 01:39:39 +01:00
Prince Canuma
7435facc52
Add support for DEV_TWO_STAGE pipeline and implement LoRA merging functionality in generate.py. Enhance video generation capabilities by allowing LoRA weights to be loaded and merged into the model, improving flexibility in model configurations. Update pipeline handling to accommodate the new two-stage generation process.
2026-03-13 01:22:45 +01:00
Prince Canuma
e0aafd72fc
Refactor generate.py to ensure temporal coordinates and position grids are processed in bfloat16 for consistency with PyTorch's precision behavior. Update denoise_dev_av function to apply standard ratio rescaling for audio and video guidance, enhancing numerical fidelity and model compatibility.
2026-03-12 21:26:38 +01:00
Prince Canuma
b07b1e3213
Update .gitignore to exclude additional configuration and model files. Modify generate.py to enhance console output with rescale parameter and adjust default values for inference steps and CFG scale. Refactor text encoder to align positional embedding max position with PyTorch defaults, improving compatibility and performance.
2026-03-12 17:13:43 +01:00
Prince Canuma
207c223354
Add LTX-2.3 model architecture with prompt-conditioned adaptive layer normalization (adaln) support. Introduce gating mechanisms in attention modules and update transformer configurations to accommodate new parameters. Refactor video and audio processing to utilize adaptive normalization, improving model flexibility and performance. Update weight loading and initialization logic to support dynamic block structures in the decoder.
2026-03-10 16:47:36 +01:00
Prince Canuma
9f37dab076
Refactor model loading in generate.py to use dynamic model paths for audio and video components. Simplify weight loading logic in LTX2TextEncoder to accommodate both monolithic and reformatted model structures. Introduce a check for existing model paths in get_model_path function to enhance robustness.
2026-03-09 15:51:21 +01:00
Prince Canuma
d1dd30cbac
Add Adaptive Projected Guidance (APG) support to denoising functions. Introduce apg_delta function for stable guidance by decomposing into parallel and orthogonal components. Update denoise_dev and generate_video functions to accept APG parameters, enhancing flexibility in video generation. Modify command-line arguments for APG integration.
2026-01-26 21:35:58 +01:00
Prince Canuma
87962c7f83
Enhance precision in denoising functions by ensuring all latents and calculations are consistently handled in float32. Update model input casting and return types to maintain dtype integrity across audio and video processing. Add precision parameter to video generation for improved memory management.
2026-01-24 15:40:42 +01:00
Prince Canuma
cb2d19c84d
fix loading
2026-01-24 01:37:38 +01:00
Prince Canuma
df753312c7
Refactor video generation and model loading processes to utilize from_pretrained methods for VideoEncoder and VideoDecoder. Update denoising functions to include a cfg_rescale parameter for improved artifact reduction. Ensure consistent dtype handling across audio and video processing, enhancing precision and aligning with PyTorch behavior.
2026-01-23 17:39:02 +01:00
Prince Canuma
2681f75d2f
Refactor LTXModel to include a from_pretrained class method for loading and sanitizing model weights. Update generate.py to utilize this method, streamlining the transformer loading process and improving code clarity.
2026-01-20 12:56:29 +01:00
Prince Canuma
bbb3de6aa7
Update audio decoder configuration to disable mid-block attention and ensure audio waveform is converted to float32 for consistency in processing.
2026-01-19 17:05:59 +01:00
Prince Canuma
8a2ea38c88
Refactor denoising functions in generate.py and utils.py to use float32 for improved precision, aligning with PyTorch behavior. Update calculations for latents and denoised outputs to ensure consistent dtype handling across audio and video processing.
2026-01-19 09:13:04 +01:00
Prince Canuma
e0ee934b99
Update video generation completion message to display elapsed time in a more user-friendly format, showing minutes and seconds instead of just seconds.
2026-01-19 02:23:51 +01:00
Prince Canuma
ac67ee8b1e
Remove the generate_dev.py file, consolidating its functionality into generate.py. Enhance the video generation pipeline to support both distilled and dev models, integrating dynamic sigma scheduling and classifier-free guidance (CFG) for improved video quality. Update command-line interface to accommodate new pipeline options and refactor related functions for better maintainability.
2026-01-19 02:13:00 +01:00
Prince Canuma
0538af6554
Enhance video generation pipeline by integrating Rich for styled console output and progress tracking. Update dependencies in pyproject.toml to include Rich. Refactor print statements to use console methods for improved user experience during video and audio processing.
2026-01-19 01:43:14 +01:00
Prince Canuma
cae11291a9
Remove the audio-video generation pipeline from generate_av.py and integrate audio capabilities into generate.py. This includes adding audio position grid creation, audio frame computation, and updating the denoising function to handle audio latents. Enhance the command-line interface to support audio generation options and update the model configuration accordingly.
2026-01-19 01:28:53 +01:00
Prince Canuma
b1bf9e2dc0
Enhance video generation with progress bar for streaming and remove debug prints from tiling decoder
2026-01-17 23:53:53 +01:00
Prince Canuma
7f20840dc7
Add streaming support to video generation
2026-01-17 23:17:08 +01:00
Prince Canuma
78244a2d66
Cast dtype to bf16 in video and audio generation processes
2026-01-17 17:20:22 +01:00
Prince Canuma
e4cdbb7eab
add vae tiling
2026-01-17 07:51:54 +01:00
Prince Canuma
f607112407
Refactor video and audio latent generation in generate_video and generate_video_with_audio
...
- Removed direct initialization of latents with random noise, replacing it with a conditional approach based on I2V (Image-to-Video) conditioning.
- Introduced a structured flow for applying noise during the latent state creation, enhancing the conditioning process for both video and audio.
- Updated the noise application logic to ensure proper handling of conditioned and unconditioned frames in both stages of video generation.
- Improved code clarity and maintainability by consolidating latent shape definitions and restructuring noise application logic.
2026-01-17 01:38:53 +01:00
Prince Canuma
146f5d2981
Add image-to-video (I2V) conditioning support
...
- Introduced `load_image`, `prepare_image_for_encoding`, and `apply_conditioning` functions for handling image inputs and conditioning during video generation.
- Enhanced `generate_video` and `denoise_av` functions to accept optional image inputs for I2V conditioning.
- Updated command-line interface to include parameters for image conditioning, such as `--image`, `--image-strength`, and `--image-frame-idx`.
- Added new `VideoConditionByLatentIndex` and `LatentState` classes for managing latent states with conditioning.
- Implemented VAE encoder loading and image encoding for conditioning in the video generation process.d
2026-01-17 00:19:52 +01:00
Prince Canuma
e1bff927df
Auto-detect timestep_cond from model metadata ()
2026-01-16 14:55:50 +01:00
Prince Canuma
a658911f98
add audio
2026-01-16 01:15:22 +01:00
Prince Canuma
81daf3f67d
Add prompt enhancement feature to video generation
...
- Introduced `enhance_prompt`, `max_tokens`, and `temperature` parameters in `generate_video` function for improved prompt handling.
- Implemented prompt enhancement logic using the new `enhance_t2v` method in the text encoder.
- Added command-line arguments for prompt enhancement options.
- Created new system prompt files for T2V and I2V generation to guide the enhancement process.
2026-01-15 14:31:00 +01:00
Prince Canuma
fc6ef20c1b
Add custom text encoder with quantization
...
Co-authored-by: HimanshU Mourya <40685364+codingstark-dev@users.noreply.github.com >
2026-01-13 22:56:51 +01:00
Prince Canuma
01d895bc77
Add frame number validation in video generation and update Gemma3 text encoder to use validated mlx-vlm implementation
2026-01-13 17:12:11 +01:00
Prince Canuma
4f6fc8252c
Add example usage to README and enhance console output in generate.py with ANSI colors
2026-01-12 16:45:09 +01:00
Prince Canuma
7eac6ae7de
Replace imageio with OpenCV for video saving in generate.py; updated default frame count to 100.
2026-01-12 16:12:41 +01:00
Prince Canuma
666e1f2e0c
Refactor model path handling: moved get_model_path function to utils.py and updated generate.py to use the new import.
2026-01-12 15:54:32 +01:00
Prince Canuma
75511a0b17
Remove main.py and refactor video generation logic into generate.py.
2026-01-12 14:23:02 +01:00
Prince Canuma
d1ca36a315
initial commit (LTX-2)
2026-01-11 23:48:33 +01:00