Meta Drops the 405-Billion-Parameter Llama 3.1 for Free Commercial Use
Meta published Llama 3.1 405B weights under a commercial license that permits fine-tuning and redistribution. The model runs on eight H100 GPUs with 80 GB each using 4-bit quantization, reaching 74.8 percent on MMLU and matching or exceeding GPT-4 on several academic benchmarks.
Developers no longer need API keys or usage caps to access frontier-level performance. Local deployment removes data-sharing requirements and lets teams iterate on custom fine-tunes without per-token costs.
Hugging Face hosts the weights and reports over 250,000 downloads in the first week. Independent lab LMSYS integrated the model into its chatbot arena and shows it ranking within the top five public models on blind user votes.
Step 1: Visit https://huggingface.co/meta-llama/Meta-Llama-3.1-405B and accept the license terms. Step 2: Run the Hugging Face text-generation-inference Docker image with --model meta-llama/Meta-Llama-3.1-405B-Instruct --quantization bitsandbytes. Step 3: Send a prompt through the localhost endpoint; the expected outcome is coherent multi-paragraph output without leaving your own hardware.