Files
mlx-video-maker/promptguide.md
2026-01-27 19:26:18 +01:00

9.0 KiB

LTX-2 Prompt Engineering Guide

This guide helps you (or an LLM) craft effective prompts for LTX-2 video generation.

LTX-2 Capabilities

  • Resolution: Native 4K support
  • Frame rates: Up to 50 fps
  • Duration: Up to 20 seconds continuous video
  • Audio: Synchronized audio-video generation
  • Thinking: Interprets prompts like a cinematographer reading director's notes

Core Principle: Complete Story Picture

Write prompts as flowing narratives, not descriptive lists. Think mini-screenplay: each action leads naturally to the next, camera movements have purpose, and all elements contribute to temporal flow.

The Six Essential Elements

Every effective prompt should include:

1. Shot Establishment

Opening framing and camera position.

  • Examples: "Wide shot from across the street," "Extreme close-up on weathered hands," "Bird's eye view"
  • Match terminology to genre: documentary = handheld language; cinematic = "dolly," "crane"

2. Scene Setting

Environment with lighting, color, texture, atmosphere.

  • "Golden hour bounce," "harsh overhead fluorescents," "soft window light"
  • "Desaturated blues and grays," "warm amber tones"
  • "Thick morning fog," "dust particles in sunbeams"

3. Action Description

Present-tense verbs describing motion sequentially.

  • Use: "walks," "turns," "reaches" (not "walked," "turning")
  • Show cause and effect: "The door swings open, revealing..."
  • Include small physical details: "His fingers drum against the table"

4. Character Definition

Physical details, clothing, emotional cues through body language.

  • Show emotion through action, not labels: "tears welling in her eyes" not "sad"

5. Camera Movement

Specify how and when camera moves.

  • Pan, tilt, dolly, track, crane movements with speed descriptors
  • "Slow dolly forward," "steady tracking shot," "tripod-locked"

6. Audio Description

Ambient sounds, music, dialogue in quotes, vocal characteristics.

  • "Distant traffic hum," "footsteps echoing on tile," "clock ticking"
  • Dialogue: "Hello, is anyone there?" she calls out

Structural Guidelines

Single Continuous Paragraph: No line breaks or lists within a prompt. Flow naturally from beginning to end.

Present Tense Action Verbs: Essential for conveying dynamic motion.

Explicit Camera Behavior: Never assume the model will infer camera movement. Specify angle, movement, and speed.

Precise Physical Details: Use measurable movements and specific gestures.

  • Generic: "She looks surprised"
  • Precise: "Her eyebrows lift, eyes widen, lips part slightly as she inhales sharply"

Temporal Connectors: Use "as," "then," "while," "before," "after," "when" to create smooth flow.

What to Avoid

  • Emotional labels without visual cues: "A sad woman" → "A woman with slumped shoulders, downcast eyes"
  • Text and logos: Model cannot generate readable text reliably
  • Complex physics: Avoid collisions, liquid simulations, chaotic crowds
  • Scene overload: Too many simultaneous elements create confusion
  • Conflicting lighting: Don't mix incompatible light sources
  • Overly complicated camera work: Keep movements clear and purposeful

Example Prompts

Nature Scene

A lone fisherman rows across a foggy lake before sunrise, the boat creaking softly as water laps at its sides. The camera glides overhead in a slow aerial tracking shot, following his steady progress from behind and slightly above. His lantern casts a warm circle of light that reflects in gentle ripples, while tall reeds sway on the distant shoreline. A distant bird call echoes across the water as mist rolls slowly across the glassy surface.

Character Scene

A woman stands at a kitchen counter slicing vegetables in afternoon light streaming through a nearby window. The camera begins in a medium close-up at shoulder height, then slowly pushes forward to focus on her hands. As she hears a creak from the hallway, her eyebrows lift slightly and the blade pauses mid-air. The camera holds steady with shallow depth of field, ambient kitchen sounds—a refrigerator hum, distant traffic—creating a quiet domestic atmosphere.

Advanced Techniques

Lens and Shutter Language

Focal lengths:

  • 24mm wide-angle - expansive, environmental shots
  • 50mm standard - natural perspective
  • 85mm portrait - compressed, intimate
  • 200mm telephoto - compressed depth, isolated subject

Shutter/motion:

  • 180° shutter equivalent - cinematic motion blur
  • Natural motion blur - realistic movement
  • Fast shutter, crisp motion - sports/action feel

Keywords for Smooth 50 FPS Motion

Use these for fluid motion:

  • Camera: "steady dolly," "smooth gimbal," "tripod-locked," "constant speed pan"
  • Motion: "natural motion blur," "fluid movement," "controlled motion," "stable tracking"

Avoid these (causes warps/artifacts):

  • "handheld chaotic," "shaky cam," "erratic movement"

Six-Part Structured Prompt (4K)

For optimal 4K generation, include these six parts:

  1. Scene Anchor: Location, time, atmosphere
    • Dawn over a misty alpine lake, light fog, glassy water
  2. Subject + Action: Who/what and a verb
    • A red canoe gliding across, single rower in a yellow raincoat
  3. Camera + Lens: Movement, focal length, aperture, framing
    • Slow dolly-right, 50mm, f/2.8, medium-wide, stable rig
  4. Visual Style: Color science, grading, film emulation
    • Soft contrast, rich primaries, Kodak 2383 print look
  5. Motion Cues: Speed, frame intent, shutter feel
    • Natural motion blur, 50 fps feel, 180° shutter equivalent
  6. Guardrails: What to avoid
    • No flicker, no high-frequency patterns, no text overlays

Tips for Different Video Lengths

Short (<5 seconds):

  • Single action, simple camera movement or static shot
  • Example: A coffee cup lifts from a saucer, steam rising. Close-up, shallow DOF, soft morning light.

Medium (5-10 seconds):

  • 2-3 connected actions, one camera movement, clear progression
  • Example: A woman opens a wooden door, pauses as sunlight streams past her silhouette, then steps inside. The camera tracks forward, following from exterior to interior.

Long (10-20 seconds):

  • Mini-narrative with multiple beats, camera movement changes
  • Pro tip: Start with close-up and move out—helps retain facial/material detail (wider shots can soften likeness)

Audio-Video Synchronization

LTX-2 generates audio and video simultaneously. Use timing cues:

  • On the downbeat - sync action to music
  • Constant speed pan - predictable motion for rhythm
  • Rhythmic footsteps - regular intervals

What Works Well

  • Controlled camera movements (dolly, crane, tracking shots)
  • Subtle facial expressions and natural body language
  • Atmospheric settings with weather effects (fog, rain, snow)
  • Film emulation looks and color grading styles
  • Multilingual voice work with accent specifications

Video Type Strategies

Marketing/Product Videos

  • Start with product close-ups, controlled camera movements (dolly, crane)
  • Lighting that highlights product features
  • Include human element (hand interacting) for relatability
  • Keywords: "premium aesthetic," "shallow depth of field," "clean whites"

Educational Content

  • Medium shots for presenter visibility, steady tripod-locked camera
  • Deliberate pacing, explicit gestures for teaching behaviors
  • Keywords: "educational pacing," "clear voice," "professional lighting"

Social Media Clips

  • Immediate high-impact opening, dynamic movements (whip pan, quick zoom)
  • Vibrant saturated colors, high contrast
  • Keywords: "crushed blacks," "lens flare," "trendy aesthetic," "bass drop synchronized"

Cinematic Sequences

  • Film terminology: "anamorphic," "bokeh," "film grain"
  • Subtle micro-expressions, longer sequences with narrative arc
  • Reference film stocks: "Kodak 2383 print emulation," "ARRI Alexa look"

Multi-Shot Continuity

Scene Transitions

  • Match cut: Match visual elements (spinning wheel → spinning record)
  • Action match: Continue action across cut (hand reaching → door opening)
  • Light/color match: Maintain lighting consistency across shots
  • Audio bridge: Use sound to connect shots

Character Consistency Across Shots

  • Provide identical detailed descriptions in every prompt
  • Include specific clothing, hair, physical details
  • Reference "same person as previous shot" when applicable

Troubleshooting

Motion Blur Issues

  • Add "natural motion blur" and "180-degree shutter equivalent"
  • Avoid "fast shutter" unless intentional
  • For action: "appropriate motion blur for speed"

Moiré/Artifact Problems

(brick walls, mesh, fine patterns)

  • Add "avoid high-frequency patterns" to prompts
  • Use "smooth textures" or "soft focus on background"
  • Apply shallow DOF to blur problematic areas

Audio-Video Sync

  • Use timing cues: "on the downbeat," "at 2.5 seconds"
  • Describe rhythmic actions: "footsteps in steady rhythm"
  • Specify regular patterns: "constant speed," "even intervals"

Based on the official LTX-Video prompt guidance