Upgrade to LTX-2.3 with audio generation

- Switch from mlx_video.generate_av to mlx_video.models.ltx_2.generate
- Use prince-canuma/LTX-2.3-distilled model with google/gemma-3-12b-it text encoder
- Add --audio flag for joint audio-video generation
- Add auto-background execution with nohup logging
- Add CLAUDE.md and test stories

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Norbert Schmidt
2026-03-31 13:55:39 +02:00
parent e49c273b94
commit 02b8c27835
5 changed files with 117 additions and 3 deletions

12
stories/local_runners.txt Normal file
View File

@@ -0,0 +1,12 @@
# Local Runners - MLX community showcase
# Theme: devs generating AI video locally, raw energy, no cloud needed
A cinematic wide shot of a dimly lit room filled with glowing Apple laptops on a long wooden table. Multiple developers sit intensely focused, screens reflecting off their faces in blue and purple light. The camera slowly dollies forward between the rows. Sound of mechanical keyboards clicking rapidly and a low electronic hum building tension.
A close-up tracking shot moves across laptop screens showing colorful terminal output scrolling rapidly with progress bars and neural network visualizations. Code and numbers cascade down the displays. Green text on black backgrounds. The camera glides smoothly left to right. Sound of digital processing tones and a pulsing synthetic beat growing stronger.
A medium shot of a developer leaning back in their chair with a confident grin as a fully rendered AI video plays on their MacBook screen. The room behind them is dark with ambient RGB lighting. They tap the spacebar triumphantly. The camera slowly pushes in on their satisfied expression. Sound of a cinematic bass drop and the crowd murmuring in amazement.
A dramatic wide aerial shot pulling back to reveal an entire warehouse space filled with hundreds of developers at glowing workstations, all generating video simultaneously. Streams of light rise from each screen into the air like digital aurora borealis. The camera rises and tilts upward. Sound of an epic orchestral swell mixed with electronic beats reaching a powerful crescendo.
A slow motion close-up of a MacBook Pro with the Apple logo glowing. The screen displays a beautiful AI generated landscape video playing flawlessly. Binary code and particles of light float upward from the keyboard like embers from a fire. The camera holds steady with shallow depth of field. Sound of a deep resonant tone fading into silence with a final satisfying click.

8
stories/test_ltx23.txt Normal file
View File

@@ -0,0 +1,8 @@
# Quick LTX 2.3 test - 3 scenes, low res
# Testing: T2V, I2V continuity, audio consistency
A wide establishing shot of a quiet cobblestone alley in a European village at dawn. Warm golden light spills between old brick buildings with wooden shutters. A tabby cat sits on a windowsill grooming itself. The camera slowly pushes forward through the alley. Soft ambient sound of distant church bells and birdsong.
A medium tracking shot follows the tabby cat as it leaps from the windowsill and trots along the cobblestones. Morning light casts long shadows across the wet stones. The camera tracks alongside the cat at ground level. Sound of soft paws on stone and gentle wind rustling through hanging laundry.
A close-up shot of the tabby cat stopping at a wooden door and looking up expectantly. The door creaks open revealing a warm interior with a fireplace glow. The cat slips inside. The camera holds steady at the doorway. Sound of a creaking door hinge and a crackling fireplace inside.

7
stories/test_person.txt Normal file
View File

@@ -0,0 +1,7 @@
# LTX 2.3 test - person speaking, testing audio speech quality
A medium shot of a young woman standing in a sunlit kitchen. She looks directly at the camera and says hello, welcome to my cooking show. Her voice is clear and warm. The kitchen has white tiles and wooden countertops. The camera is static. Natural indoor ambient sound with her speaking voice.
A close-up of the woman chopping vegetables on a wooden cutting board. She explains that today we are making a simple pasta dish. Her hands move confidently with the knife. Sound of chopping and her narrating voice clearly audible.
A wide shot of the woman stirring a pot on the stove. Steam rises from the pot. She turns to the camera and says this is the secret ingredient, and holds up a jar of spices with a smile. The camera slowly zooms in. Sound of bubbling water and her clear speaking voice.