2026-06-07 BREAKTHROUGHS☀ AM

Mistral ships a 12-billion-parameter vision-language model that runs on a laptop GPU

📰 THE BRIEF

Pixtral 12B combines a 12 B parameter language decoder with a 400-million-parameter vision encoder trained on 1.2 trillion image-text tokens. It fits in 24 GB VRAM at 4-bit precision and delivers 78.4 percent on DocVQA without any cloud calls.

💡 WHY IT MATTERS

You move private document and image workflows off shared servers. Local inference removes per-token fees and keeps sensitive files inside your network boundary.

👥 WHO'S DOING IT

Mistral reports that within two weeks of release the model reached 180 000 downloads on Hugging Face and is running in production at a French legal-tech startup processing 40 000 contracts per day.

⚡ TRY IT

Step 1: Install the mistral-inference package with pip install mistral-inference. Step 2: Download the 4-bit weights via the command mistral-download --model pixtral-12b. Step 3: Run python -m pixtral.chat --image contract.pdf --prompt "Extract total amount" and confirm the answer appears in under two seconds on an RTX 4090.

→ Read original source