I Built a Neural Network Inference Engine From Scratch in C++ (No PyTorch, No ONNX, Just AVX2)
Why does inference need a framework at all? Every time I ran a tiny linear model through PyTorch, I felt like I was driving a go-kart with a jet engine strapped to it. The model was a few hundred KB. PyTorch's runtime was gigabytes. Somewhere between model(x) and the actual floating-point math, an e


