LLM/VLM Compression Foundations
2026-05-10
A working notebook on compressing LLMs and VLMs — why overparameterization is the precondition, how pruning, quantization, and distillation interact, why P-KD-Q ordering dominates, and where compression breaks.
pruningquantizationknowledge-distillationtoken-compressionvision-language-modelsneural-architecture-searchhardware-aware-ml