Meta drops Llama 3.1 405B, the largest open weights model yet
Meta released Llama 3.1 405B on July 23, 2024. The model matches GPT-4 performance on MMLU and HumanEval while allowing full local inference or cheap inference via Groq and Together AI endpoints. Users avoid per-token billing from closed labs.
Access to frontier-grade weights removes the pay-per-token barrier. Teams can now fine-tune on private data and run inference on their own hardware without external rate limits. This shifts cost control and data privacy decisions back to the builder.
Hugging Face hosts the official 405B weights and reports over 1.2 million downloads in the first week. Startups like Perplexity have already deployed distilled versions to serve enterprise search at 60 percent lower inference cost.
Step 1: Visit huggingface.co/meta-llama/Meta-Llama-3.1-405B and accept the license. Step 2: Run `huggingface-cli download meta-llama/Meta-Llama-3.1-405B --local-dir ./llama-405b`. Step 3: Launch inference with vLLM using `python -m vllm.entrypoints.openai.api_server --model ./llama-405b` and test a prompt at localhost:8000.