Files
mlx-video/.github/skills/fast-mlx/SKILL.md

1.2 KiB

name, description
name description
fast-mlx Optimize MLX code for performance and memory. Use when asked to implement or speed up MLX models or algorithms, reduce latency/throughput bottlenecks, tune lazy evaluation, type promotion, fast ops, compilation, memory use, or profiling.

Fast MLX

Workflow

  • Looks for opportunities to compile functions of mostly elementwise operations.
  • For models with fixed shape inputs or where the shapes don't change much, compile the entire graph
  • Replace slow implementations with MLX fast ops
  • Identify evaluation boundaries and unintended sync points (mx.eval, item(), NumPy conversions).
  • Check dtype promotion and scalar usage; keep precision consistent with intent.
  • Review compilation strategy; avoid unnecessary recompiles and closure captures.
  • Reduce peak memory via lazy loading order and releasing temporaries before mx.eval.
  • Suggest profiling steps if the bottleneck is unclear.

References

  • Read references/fast-mlx-guide.md for detailed tips and examples. Use it as the source of truth.

Output expectations

  • Provide concrete code changes with brief rationale
  • Call out changes that need user confirmation (e.g., enabling async eval or shapeless compile).