Lightricks just dropped LTX 2.3 — a 22-billion-parameter open-source Diffusion Transformer that generates 4K video and synchronized audio in a single forward pass. Released on March 5, 2026, it's the highest-performing open-weight video model available and the only open-source option generating native 4K with built-in audio.
This isn't incremental. It's a category shift.
What LTX 2.3 Does
At its core, LTX 2.3 is a dual-stream Diffusion Transformer (DiT): roughly 14B parameters handle video generation while 5B parameters handle audio — all within a single unified architecture. You give it a text prompt, and it produces a video clip with matching sound.
Key specs at a glance: 22B params, native 4K at up to 50 FPS, synchronized stereo audio at 24 kHz, up to 20 seconds per clip, Apache 2.0 license.
Seven generation modes are available:
- Text-to-video (standard and fast)
- Image-to-video (standard and fast)
- Audio-to-video — feed it a soundtrack, get visuals that match the energy
- Extend-video — seamlessly continue an existing clip
- Retake-video — regenerate portions while keeping the rest
The audio-to-video mode is particularly interesting. It shapes the visual pacing and energy based on the audio input — though it doesn't do frame-accurate beat alignment or lip-sync yet.
The Technical Leap from LTX 2.0
If you used LTX 2.0 (released October 2025), the improvements are substantial across three areas:
Sharper Visual Encoder
A rebuilt high-fidelity VAE produces noticeably sharper output. Textures, facial features, and small objects retain detail across the full frame. The texture drift that plagued LTX 2.0 in longer clips is significantly reduced.
Better Prompt Understanding
A 4x larger text connector dramatically improves prompt adherence. Complex multi-clause prompts — "a woman in a red coat walks through a snowy forest while birds fly overhead" — now reliably render all described elements. LTX 2.0 would frequently drop secondary details.
Cleaner Audio
An improved HiFi-GAN vocoder delivers cleaner stereo output at 24 kHz. Users report reduced phasing and warble artifacts, especially with rhythmic audio cues.
Benchmarks and Performance
LTX 2.3 ranks as the top open-source model on the Artificial Analysis video benchmark. Against closed-source competitors, it trails Kling 3.0 and Runway Gen-4.5 on the Elo leaderboard, but it's the best you can run on your own hardware.
Speed matters too: LTX 2.3 is approximately 10–14x faster than Wan 2.2 in typical scenarios (up to 18x on simple scenes) at equivalent quality settings. That's the difference between waiting 10 minutes and waiting 30 seconds.
| Metric | LTX 2.3 | Wan 2.2 | Kling 3.0 |
|---|---|---|---|
| Parameters | 22B | 14B | Closed |
| Max Resolution | 4K | 1080p | 4K |
| Native Audio | Yes | No | No |
| Open Source | Yes (Apache 2.0) | Yes | No |
| Speed (relative) | 10–14x faster | 1x baseline | N/A |
Hardware Requirements
Here's where it gets real. This is a 22B-parameter model, and video generation is memory-hungry:
- 4K generation (fp16): ~44GB VRAM — you're looking at an A100 or dual high-end consumer GPUs
- Quantized (FP8): ~24GB VRAM — viable on an RTX 4090 or RTX 5090
- 720p base resolution: 24GB VRAM handles it comfortably with room for small batches
- 12-16GB GPUs: Possible with tiling and CPU offload, but noticeably slower
Two model variants are available:
- LTX 2.3-22B-dev — Full bf16 precision for fine-tuning and research
- LTX 2.3-22B-distilled — 8-step distilled version for faster inference with reduced memory
Both are on Hugging Face under Apache 2.0.
Running It Locally
Lightricks released LTX Desktop Beta alongside the model — a desktop application for Windows and Mac that wraps the model in a local video editor. On Apple Silicon Macs, it runs in API-only mode (inference happens on a cloud backend).
For direct model usage:
pip install ltx-core ltx-pipelines
ltx-generate --model ltx-2.3-22b-distilled --prompt "a timelapse of a city skyline at sunset" --resolution 1080p --fps 24
Cloud inference is also available through fal.ai and other API providers if you don't have the local hardware.
The Licensing Nuance
The Apache 2.0 license comes with a catch worth knowing: it permits unrestricted commercial use for companies under $10 million in annual revenue. Larger commercial deployments require a direct license from Lightricks.
This is a pragmatic approach — open enough for startups, indie developers, and researchers, while giving Lightricks a revenue path from enterprise users.
What's Still Missing
Let's not oversell it:
- No lip-sync — The audio-to-video mode matches energy and pacing, but characters won't move their lips to speech
- Complex spatial logic struggles — Intricate physical interactions (like pouring liquid into a glass) remain inconsistent
- No in-scene text — Text rendering within generated video is still unreliable
- Object persistence gaps — Objects maintain stability for about 6-10 seconds, but longer clips can see drift
- First-token latency — The initial generation step takes longer than LTX 2.0, though total throughput is faster
Who This Is For
LTX 2.3 fits three clear use cases:
- Content creators who need quick B-roll, establishing shots, or visual effects without stock footage
- Developers building video generation into products — the Apache 2.0 license and local inference make this viable for the first time
- Researchers who need a high-quality open baseline for video generation experiments
If you're comparing it to closed APIs like Runway or Kling — LTX 2.3 won't match their absolute quality ceiling, but you get full control, no per-generation costs, and the ability to fine-tune for your specific domain.
The Bottom Line
LTX 2.3 is the first open-source video model that's genuinely production-viable for a wide range of use cases. The combination of 4K output, native audio, seven generation modes, and Apache 2.0 licensing is unmatched in the open-source space.
The 24GB VRAM floor for the distilled model means it's accessible to anyone with a current-gen high-end GPU. And the 10–14x speed advantage over Wan 2.2 makes iterative workflows actually practical.
Video generation just crossed the threshold from "impressive demo" to "useful tool." The gap between open-source and closed-source video AI narrowed dramatically this month — and Lightricks deserves credit for pushing that boundary while keeping the weights open.
