MoE Expert Pruning: What Works, What Doesn't, and What We Still Don't Know
2026-05-11
A survey of expert pruning techniques for sparse Mixture-of-Experts language models, covering why pruning works, the pruning-vs-merging debate, how to score expert importance, the strategies that matter, and where the standard story breaks.
moeexpert-pruningsparsificationefficient-inferencemodel-deploymentmixtralnllb