Token-Compression

LLM/VLM Compression Foundations
2026-05-10
A working notebook on compressing LLMs and VLMs — why overparameterization is the precondition, how pruning, quantization, and distillation interact, why P-KD-Q ordering dominates, and where compression breaks.
pruning quantization knowledge-distillation token-compression vision-language-models neural-architecture-search hardware-aware-ml