feat(wan): Add Wan2.1/2.2 T2V with quantization support
This commit is contained in:
26
.github/skills/fast-mlx/SKILL.md
vendored
Normal file
26
.github/skills/fast-mlx/SKILL.md
vendored
Normal file
@@ -0,0 +1,26 @@
|
||||
---
|
||||
name: fast-mlx
|
||||
description: Optimize MLX code for performance and memory. Use when asked to implement or speed up MLX models or algorithms, reduce latency/throughput bottlenecks, tune lazy evaluation, type promotion, fast ops, compilation, memory use, or profiling.
|
||||
---
|
||||
|
||||
# Fast MLX
|
||||
|
||||
## Workflow
|
||||
|
||||
- Looks for opportunities to compile functions of mostly elementwise operations.
|
||||
- For models with fixed shape inputs or where the shapes don't change much, compile the entire graph
|
||||
- Replace slow implementations with MLX fast ops
|
||||
- Identify evaluation boundaries and unintended sync points (`mx.eval`, `item()`, NumPy conversions).
|
||||
- Check dtype promotion and scalar usage; keep precision consistent with intent.
|
||||
- Review compilation strategy; avoid unnecessary recompiles and closure captures.
|
||||
- Reduce peak memory via lazy loading order and releasing temporaries before `mx.eval`.
|
||||
- Suggest profiling steps if the bottleneck is unclear.
|
||||
|
||||
## References
|
||||
|
||||
- Read `references/fast-mlx-guide.md` for detailed tips and examples. Use it as the source of truth.
|
||||
|
||||
## Output expectations
|
||||
|
||||
- Provide concrete code changes with brief rationale
|
||||
- Call out changes that need user confirmation (e.g., enabling async eval or shapeless compile).
|
||||
Reference in New Issue
Block a user