Sparse KV Caches Cut Attention Scaling
Sparse key‑value caches collapse the quadratic blow‑up of softmax attention into a cost that grows near‑linearly with sequence length. By making each query attend to a tiny, top‑k subset of blockwise KV memories, the per‑query work stops scaling with the full context. This tiny change flips the scal









