Why Run AI Locally?
Running LLMs on your own hardware gives you complete privacy, zero API costs, and offline access. With Apple Silicon, it's surprisingly capable.
Prerequisites
- MacBook with M1/M2/M3/M4 chip (16GB+ RAM recommended)
- Homebrew installed
- ~20GB free disk space
Step 1: Install Ollama
The easiest way to get started:
brew install ollama
ollama serve
Step 2: Pull a Model
# Fast and capable
ollama pull llama3.2
# For coding tasks
ollama pull deepseek-coder-v2
# Smaller, faster option
ollama pull phi-3
Step 3: Start Chatting
ollama run llama3.2
Performance Benchmarks
| Model | RAM Usage | Tokens/sec (M3 Pro) |
|---|---|---|
| Llama 3.2 7B | 5.2 GB | 42 t/s |
| DeepSeek Coder | 8.1 GB | 28 t/s |
| Phi-3 Mini | 2.8 GB | 65 t/s |
Pro Tips
- Use quantized models (Q4_K_M) for the best speed/quality balance
- Close memory-heavy apps before running larger models
- Use Open WebUI for a ChatGPT-like interface on localhost
The local AI revolution is here, and your MacBook is more than ready for it.