Prerequisites
- A Mac Mini M4 (16 GB or 24 GB unified memory recommended)
- macOS 15.x (Sequoia)
- An Apple ID and admin access
- A Tailscale account if you want remote access
Steps
- Install Ollama
Ollama ships a native macOS app. Download from the official site, drag to Applications, launch once to register the launchd service.
# Verify install at the CLI: ollama --version # 0.4.7 or later expected. - Pull a sensible default model
qwen2.5-coder:7b-instruct-q4_K_M is a strong starting point on the M4. It uses ~5 GB resident, leaves headroom for everything else, and runs at 25–30 tokens/s.
ollama pull qwen2.5-coder:7b-instruct-q4_K_M ollama run qwen2.5-coder:7b-instruct-q4_K_M "Write a haiku about Apple Silicon." - Make Ollama listen on the Tailscale interface
By default Ollama binds to 127.0.0.1. To reach it over Tailscale, bind to 0.0.0.0 and rely on Tailscale ACLs for access control. Edit the launchd plist if Ollama installed it, or use the environment variable.
# Quick option — set in current shell: launchctl setenv OLLAMA_HOST 0.0.0.0 # Then restart Ollama (cmd-Q the menu bar app and relaunch). # Verify: curl http://$(hostname -s):11434/api/tags - Install Tailscale
Use the Mac App Store version. Sign in. Mac Mini joins your tailnet. Other devices can now reach Ollama at the Tailscale IP, port 11434.
- Set thermal expectations
The M4 Mac Mini is fanless-quiet under most workloads. Sustained 7B-13B inference will spin the fan. The chip throttles only at extreme ambient temperatures. For 24/7 deployment, place it on a hard surface, not a soft pad, and leave 5 cm of clearance behind for airflow.
Troubleshooting
- Ollama uses unified memory aggressively and slows the desktop
- Set the OLLAMA_NUM_PARALLEL=1 environment variable to prevent concurrent requests from spawning multiple model instances. Restart Ollama after setting.
- Models slower than expected after a model swap
- macOS may evict the old model from RAM and reload from disk. To keep a model resident, set OLLAMA_KEEP_ALIVE=24h. The model will stay loaded until 24h after last use or until eviction by memory pressure.
- Tailscale ACL refusing the connection
- Default Tailscale ACL permits all-to-all; if you've tightened it, ensure your client device has access to the Mac Mini on TCP 11434.
Where to go from here
Pair this with the Hermes Agent installation guide on the same Mac Mini for a complete self-hosted AI host.