Files

Daniel e64483a66a feat(wan): Add Wan2.1/2.2 T2V with quantization support

2026-03-11 09:08:10 +01:00

name, description

name	description
fast-mlx	Optimize MLX code for performance and memory. Use when asked to implement or speed up MLX models or algorithms, reduce latency/throughput bottlenecks, tune lazy evaluation, type promotion, fast ops, compilation, memory use, or profiling.

Fast MLX

Workflow

Looks for opportunities to compile functions of mostly elementwise operations.
For models with fixed shape inputs or where the shapes don't change much, compile the entire graph
Replace slow implementations with MLX fast ops
Identify evaluation boundaries and unintended sync points (mx.eval, item(), NumPy conversions).
Check dtype promotion and scalar usage; keep precision consistent with intent.
Review compilation strategy; avoid unnecessary recompiles and closure captures.
Reduce peak memory via lazy loading order and releasing temporaries before mx.eval.
Suggest profiling steps if the bottleneck is unclear.

Read references/fast-mlx-guide.md for detailed tips and examples. Use it as the source of truth.

Provide concrete code changes with brief rationale
Call out changes that need user confirmation (e.g., enabling async eval or shapeless compile).