Meta Open-Sources Llama 3 8B and 70B for Local Deployment
Meta released Llama 3 8B and 70B weights under a permissive license on April 18, 2024. The 8B model fits on a MacBook M1 with 16 GB RAM while the 70B model runs on a single A100 80 GB GPU. Both models match or exceed GPT-3.5 performance on MMLU and HumanEval.
Developers realize they can host private models without sending data to external APIs. This changes risk calculations around data privacy and recurring inference costs. Local deployment also enables offline experimentation and fine-tuning on proprietary datasets.
The Ollama project packaged Llama 3 8B for one-command local install and reports over 1.2 million monthly downloads with average inference latency of 28 tokens per second on consumer laptops.
Step 1: Install Ollama from ollama.com. Step 2: Run the command ollama run llama3:8b in your terminal. Step 3: Enter prompts directly in the chat interface or connect via the local REST API at http://localhost:11434 for custom applications.