AI News 6 min read intermediate

LTX 2.3: Lightricks' Open-Source Model Generates 4K Video with Synced Audio

Lightricks' LTX 2.3 is an open-source model generating native 4K video with perfectly synced audio.

Mar 29, 2026

Lightricks just dropped LTX 2.3 — a 22-billion-parameter open-source Diffusion Transformer that generates 4K video and synchronized audio in a single forward pass. Released on March 5, 2026, it's the highest-performing open-weight video model available and the only open-source option generating native 4K with built-in audio.

This isn't incremental. It's a category shift.

What LTX 2.3 Does

At its core, LTX 2.3 is a dual-stream Diffusion Transformer (DiT): roughly 14B parameters handle video generation while 5B parameters handle audio — all within a single unified architecture. You give it a text prompt, and it produces a video clip with matching sound.

Key specs at a glance: 22B params, native 4K at up to 50 FPS, synchronized stereo audio at 24 kHz, up to 20 seconds per clip, Apache 2.0 license.

Seven generation modes are available:

Text-to-video (standard and fast)
Image-to-video (standard and fast)
Audio-to-video — feed it a soundtrack, get visuals that match the energy
Extend-video — seamlessly continue an existing clip
Retake-video — regenerate portions while keeping the rest

The audio-to-video mode is particularly interesting. It shapes the visual pacing and energy based on the audio input — though it doesn't do frame-accurate beat alignment or lip-sync yet.

The Technical Leap from LTX 2.0

If you used LTX 2.0 (released October 2025), the improvements are substantial across three areas:

Sharper Visual Encoder

A rebuilt high-fidelity VAE produces noticeably sharper output. Textures, facial features, and small objects retain detail across the full frame. The texture drift that plagued LTX 2.0 in longer clips is significantly reduced.

Better Prompt Understanding

A 4x larger text connector dramatically improves prompt adherence. Complex multi-clause prompts — "a woman in a red coat walks through a snowy forest while birds fly overhead" — now reliably render all described elements. LTX 2.0 would frequently drop secondary details.

Cleaner Audio

An improved HiFi-GAN vocoder delivers cleaner stereo output at 24 kHz. Users report reduced phasing and warble artifacts, especially with rhythmic audio cues.

Benchmarks and Performance

LTX 2.3 ranks as the top open-source model on the Artificial Analysis video benchmark. Against closed-source competitors, it trails Kling 3.0 and Runway Gen-4.5 on the Elo leaderboard, but it's the best you can run on your own hardware.

Speed matters too: LTX 2.3 is approximately 10–14x faster than Wan 2.2 in typical scenarios (up to 18x on simple scenes) at equivalent quality settings. That's the difference between waiting 10 minutes and waiting 30 seconds.

Metric	LTX 2.3	Wan 2.2	Kling 3.0
Parameters	22B	14B	Closed
Max Resolution	4K	1080p	4K
Native Audio	Yes	No	No
Open Source	Yes (Apache 2.0)	Yes	No
Speed (relative)	10–14x faster	1x baseline	N/A

Hardware Requirements

Here's where it gets real. This is a 22B-parameter model, and video generation is memory-hungry:

4K generation (fp16): ~44GB VRAM — you're looking at an A100 or dual high-end consumer GPUs
Quantized (FP8): ~24GB VRAM — viable on an RTX 4090 or RTX 5090
720p base resolution: 24GB VRAM handles it comfortably with room for small batches
12-16GB GPUs: Possible with tiling and CPU offload, but noticeably slower

Two model variants are available:

LTX 2.3-22B-dev — Full bf16 precision for fine-tuning and research
LTX 2.3-22B-distilled — 8-step distilled version for faster inference with reduced memory

Both are on Hugging Face under Apache 2.0.

Running It Locally

Lightricks released LTX Desktop Beta alongside the model — a desktop application for Windows and Mac that wraps the model in a local video editor. On Apple Silicon Macs, it runs in API-only mode (inference happens on a cloud backend).

For direct model usage:

pip install ltx-core ltx-pipelines
ltx-generate --model ltx-2.3-22b-distilled --prompt "a timelapse of a city skyline at sunset" --resolution 1080p --fps 24

Cloud inference is also available through fal.ai and other API providers if you don't have the local hardware.

The Licensing Nuance

The Apache 2.0 license comes with a catch worth knowing: it permits unrestricted commercial use for companies under $10 million in annual revenue. Larger commercial deployments require a direct license from Lightricks.

This is a pragmatic approach — open enough for startups, indie developers, and researchers, while giving Lightricks a revenue path from enterprise users.

What's Still Missing

Let's not oversell it:

No lip-sync — The audio-to-video mode matches energy and pacing, but characters won't move their lips to speech
Complex spatial logic struggles — Intricate physical interactions (like pouring liquid into a glass) remain inconsistent
No in-scene text — Text rendering within generated video is still unreliable
Object persistence gaps — Objects maintain stability for about 6-10 seconds, but longer clips can see drift
First-token latency — The initial generation step takes longer than LTX 2.0, though total throughput is faster

Who This Is For

LTX 2.3 fits three clear use cases:

Content creators who need quick B-roll, establishing shots, or visual effects without stock footage
Developers building video generation into products — the Apache 2.0 license and local inference make this viable for the first time
Researchers who need a high-quality open baseline for video generation experiments

If you're comparing it to closed APIs like Runway or Kling — LTX 2.3 won't match their absolute quality ceiling, but you get full control, no per-generation costs, and the ability to fine-tune for your specific domain.

The Bottom Line

LTX 2.3 is the first open-source video model that's genuinely production-viable for a wide range of use cases. The combination of 4K output, native audio, seven generation modes, and Apache 2.0 licensing is unmatched in the open-source space.

The 24GB VRAM floor for the distilled model means it's accessible to anyone with a current-gen high-end GPU. And the 10–14x speed advantage over Wan 2.2 makes iterative workflows actually practical.

Video generation just crossed the threshold from "impressive demo" to "useful tool." The gap between open-source and closed-source video AI narrowed dramatically this month — and Lightricks deserves credit for pushing that boundary while keeping the weights open.

ltx-23 lightricks video-generation open-source diffusion-transformer ai-video

More in AI News

AI News

GLM-5.2: Zhipu's Open-Weight Model Beats GPT-5.5 at 1/6 the Cost

Z.AI released GLM-5.2 on June 16, 2026: a 753B-parameter MoE model under an MIT license with a 1M-token context. It tops open-weight coding benchmarks, beating GPT-5.5 on SWE-bench Pro, FrontierSWE and PostTrainBench at roughly one-sixth the cost.

By Sarah Chen · 5 min · Jun 26, 2026

AI News

MAI-Thinking-1: Microsoft's First In-House Reasoning Model

Microsoft unveiled MAI-Thinking-1 at Build 2026, its first reasoning model trained in-house without distillation. The 35B-active, ~1T-total MoE has a 256k context window, scores 97.0% on AIME 2025 and matches Claude Opus 4.6 on SWE-Bench Pro. It's in private preview on Microsoft Foundry.

By Sarah Chen · 5 min · Jun 23, 2026

AI News

Mistral: The Industrial AI Pivot Behind Airbus and BMW Deals

Mistral AI used its May 2026 AI Now Summit to pivot toward industrial engineering, announcing a physics-AI stack, the Emmi acquisition, partnerships with Airbus, BMW (crash simulation) and ASML, the unified Vibe agent, and a 10 MW Les Ulis inference data center opening Q3 2026.

By Sarah Chen · 5 min · Jun 19, 2026