Update README.md
This commit is contained in:
100
README.md
100
README.md
@@ -97,6 +97,95 @@ freqs_dtype = torch.float32
|
||||
freqs_dtype = torch.float32
|
||||
```
|
||||
|
||||
## prompting guide
|
||||
|
||||
LTX-2 works best with detailed, flowing paragraph prompts rather than comma-separated tags. describe what happens in the video like you're writing a screenplay.
|
||||
|
||||
### prompt structure
|
||||
|
||||
write prompts as flowing paragraphs that include:
|
||||
|
||||
1. **scene setting** - location, time of day, weather
|
||||
2. **camera work** - shot type, movement, framing
|
||||
3. **subject action** - what's happening, how it moves
|
||||
4. **visual style** - lighting, colors, atmosphere
|
||||
5. **audio cues** - ambient sounds, music mood (LTX-2 generates audio too!)
|
||||
|
||||
### example prompts
|
||||
|
||||
**bad prompt:**
|
||||
```
|
||||
wolf, snow, forest, walking, cinematic
|
||||
```
|
||||
|
||||
**good prompt:**
|
||||
```
|
||||
EXT. SNOWY FOREST - DUSK. A cinematic tracking shot follows a lone grey wolf
|
||||
walking through deep powder snow between towering pine trees. The camera moves
|
||||
alongside at eye level as soft blue twilight filters through the branches.
|
||||
The wolf's breath is visible in the cold air, paws crunching softly in the snow.
|
||||
Atmospheric and moody, shallow depth of field with gentle film grain.
|
||||
```
|
||||
|
||||
### cinematography terms that work well
|
||||
|
||||
- **shot types:** wide establishing shot, medium shot, close-up, extreme close-up, overhead shot
|
||||
- **camera movement:** tracking shot, dolly in/out, pan, crane up, handheld, steadicam
|
||||
- **framing:** shallow depth of field, rack focus, silhouette, rule of thirds
|
||||
- **lighting:** golden hour, blue hour, rim light, volumetric light, natural lighting
|
||||
- **style:** cinematic, documentary style, film grain, anamorphic, photorealistic
|
||||
|
||||
### negative prompts
|
||||
|
||||
always include a negative prompt to avoid common issues:
|
||||
|
||||
```
|
||||
blurry, low quality, distorted, deformed, ugly, bad anatomy, text, watermark, signature
|
||||
```
|
||||
|
||||
if you're getting unwanted artistic styles, add:
|
||||
|
||||
```
|
||||
cartoon, anime, illustration, painting, drawing, sketch, cgi, 3d render, digital art, stylized
|
||||
```
|
||||
|
||||
## multi-scene films with image-to-video
|
||||
|
||||
LTX-2 supports image-to-video generation using `LTX2ImageToVideoPipeline`. you can create continuity between scenes by using the last frame of scene N as the input image for scene N+1.
|
||||
|
||||
### important warnings
|
||||
|
||||
- **style corruption can propagate** - if one scene produces artifacts or wrong style, it will affect all subsequent scenes
|
||||
- **the prompt still applies** but the input image has strong influence on visual style
|
||||
- **use higher guidance_scale (5.0+)** to give the prompt more weight over the image
|
||||
- **if a scene goes wrong**, use the last frame from an earlier good scene instead
|
||||
|
||||
### example workflow
|
||||
|
||||
```python
|
||||
from diffusers import LTX2Pipeline, LTX2ImageToVideoPipeline
|
||||
|
||||
# scene 1: text-to-video
|
||||
t2v_pipe = LTX2Pipeline.from_pretrained("Lightricks/LTX-2", torch_dtype=torch.bfloat16)
|
||||
result1 = t2v_pipe(prompt="...", guidance_scale=4.0, ...)
|
||||
last_frame = result1.frames[0][-1]
|
||||
|
||||
# scene 2+: image-to-video for continuity
|
||||
i2v_pipe = LTX2ImageToVideoPipeline.from_pretrained("Lightricks/LTX-2", torch_dtype=torch.bfloat16)
|
||||
result2 = i2v_pipe(
|
||||
image=last_frame,
|
||||
prompt="...", # prompt still matters!
|
||||
guidance_scale=5.0, # higher to enforce prompt style
|
||||
...
|
||||
)
|
||||
```
|
||||
|
||||
## distilled model warning
|
||||
|
||||
there's a distilled version available (`blanchon/LTX-2-Distilled-diffusers`) that promises faster generation with fewer steps.
|
||||
|
||||
**do not use it for production** - in our testing it produces severe artifacts, cartoon-style corruption, and generally unusable output. stick with the full `Lightricks/LTX-2` model.
|
||||
|
||||
## troubleshooting
|
||||
|
||||
**out of memory** - reduce resolution/frames or close other apps
|
||||
@@ -105,6 +194,17 @@ freqs_dtype = torch.float32
|
||||
|
||||
**import errors** - make sure you installed diffusers from git, not pip
|
||||
|
||||
**cartoon/artistic style when you wanted photorealistic:**
|
||||
- add "photorealistic, cinematic film look, real world footage" to your prompt
|
||||
- add "cartoon, anime, illustration, painting, drawing" to negative prompt
|
||||
- increase guidance_scale to 5.0 or higher
|
||||
- if using image-to-video, check if the input image has style issues
|
||||
|
||||
**scene continuity problems in multi-scene films:**
|
||||
- check each scene individually before combining
|
||||
- if a scene has artifacts, regenerate it with text-to-video or use a different input frame
|
||||
- style corruption from bad frames propagates to all subsequent scenes
|
||||
|
||||
## credits
|
||||
|
||||
- [lightricks](https://github.com/Lightricks) for ltx-2
|
||||
|
||||
Reference in New Issue
Block a user