Meta Ships Open Llama 3.1 405B, Matching Closed-Model Quality
Meta published the full weights for Llama 3.1 405B, an open-source model that scores within 1 percent of GPT-4o on MMLU and HumanEval. The release includes an optimized inference stack that runs the model on eight H100 GPUs or through Together AI’s $0.90-per-million-token endpoint.
Users realize they can host frontier-grade models on their own hardware, removing monthly subscription risk and data-sharing concerns. The technique encourages local fine-tuning pipelines instead of prompt-only reliance on external providers.
Hugging Face’s open-science team fine-tuned Llama 3.1 405B on a 10,000-example legal corpus and released the adapter weights, achieving 78 percent accuracy on their private contract-review benchmark.
Step 1: Visit https://ai.meta.com/blog/meta-llama-3-1/ and accept the license to download the 405B weights. Step 2: Install vLLM with pip install vllm and run python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3.1-405B. Step 3: Send a curl request to localhost:8000 with your prompt and confirm the model returns coherent, high-quality text.