2026-05-31 BREAKTHROUGHS☾ PM

New algorithm slashes AI energy by 100x while raising accuracy

📰 THE BRIEF

Researchers replaced standard matrix multiplications with a sparse, event-driven method that activates only 1 percent of weights per token. On GPT-2 scale models the technique cut energy from 0.8 joules per token to 0.008 joules while lifting GLUE scores by 1.4 points.

💡 WHY IT MATTERS

Energy cost per inference now becomes a first-class optimization target rather than an afterthought. Builders must audit which layers actually fire for each task and prune accordingly. This reframes model selection from accuracy alone to accuracy per joule.

👥 WHO'S DOING IT

The Sparse Inference Lab at MIT published the method and open-sourced the training script at github.com/mit-sparse/sparse-llm. Early adopters at Stanford’s Hazy Research group reproduced the 100x saving on a 7B Llama variant running on an A100.

⚡ TRY IT

Step 1: Clone github.com/mit-sparse/sparse-llm and install via `pip install -e .`. Step 2: Run `python train_sparse.py --model gpt2 --sparsity 0.99 --dataset wikitext` to produce a sparse checkpoint. Step 3: Measure energy on an NVIDIA A100 with `nvidia-smi` while running `python infer.py --checkpoint sparse-gpt2.pt` and compare joules per token to the dense baseline.

→ Read original source