1.2 KiB
1.2 KiB
name, description
| name | description |
|---|---|
| fast-mlx | Optimize MLX code for performance and memory. Use when asked to implement or speed up MLX models or algorithms, reduce latency/throughput bottlenecks, tune lazy evaluation, type promotion, fast ops, compilation, memory use, or profiling. |
Fast MLX
Workflow
- Looks for opportunities to compile functions of mostly elementwise operations.
- For models with fixed shape inputs or where the shapes don't change much, compile the entire graph
- Replace slow implementations with MLX fast ops
- Identify evaluation boundaries and unintended sync points (
mx.eval,item(), NumPy conversions). - Check dtype promotion and scalar usage; keep precision consistent with intent.
- Review compilation strategy; avoid unnecessary recompiles and closure captures.
- Reduce peak memory via lazy loading order and releasing temporaries before
mx.eval. - Suggest profiling steps if the bottleneck is unclear.
References
- Read
references/fast-mlx-guide.mdfor detailed tips and examples. Use it as the source of truth.
Output expectations
- Provide concrete code changes with brief rationale
- Call out changes that need user confirmation (e.g., enabling async eval or shapeless compile).