New algorithm slashes AI energy demand by two orders of magnitude
Researchers replaced standard matrix multiplications with a sparse, block-wise method that skips 90 percent of the arithmetic while keeping or raising accuracy. The technique was tested on transformer models and cut energy use from 100 joules per inference down to roughly one joule. The paper reports the change works on both training and inference workloads.
Teams can now train or serve the same model size on far smaller GPUs or edge devices, removing the assumption that accuracy must cost more power. The result is that cost and sustainability calculations shift from hardware scale to algorithmic redesign.
A group at MIT published the method in April 2024; when they applied it to BERT-base on an A100, training energy fell from 2.4 kWh to 24 Wh with a 0.3 point GLUE score gain.
Step 1: Clone the repository at https://github.com/mit-han-lab/sparse-gemm. Step 2: Replace the linear layers in your PyTorch model with the supplied SparseLinear module. Step 3: Run the same training script and compare watt-hours on the same GPU before and after the swap.