AI MODELS
NestAI runs open-source AI models directly on your server using Ollama. You control which models are installed — and your data never leaves your infrastructure. All speeds below are for a single user on CPU. Concurrent users share CPU and will see lower speeds.
Latest & Recommended Models (2026)
These models run well on standard and dedicated servers. Best balance of speed, quality, and reliability.
| Model | Size | Best for | Speed (single user) |
|---|---|---|---|
| Qwen 3.5 4B ⭐ | 2.8 GB | Chat, vision, multilingual | ~15-20 tok/s |
| Qwen 3.5 9B | 5.5 GB | All-round, vision, reasoning | ~8-12 tok/s |
| Llama 3.2 3B | 2.0 GB | Fast chat, Q&A, drafting | ~15-25 tok/s |
| Phi-4 Mini | 2.5 GB | Reasoning, beats most 7B | ~15-20 tok/s |
| Gemma 3 4B | 3.3 GB | Multilingual, instruction | ~12-18 tok/s |
| Llama 3.1 8B | 4.7 GB | Balanced general use | ~8-12 tok/s |
| DeepSeek R1 7B | 4.7 GB | Reasoning, math, logic | ~10-14 tok/s |
| Mistral 7B | 4.1 GB | Business, code, structured | ~10-15 tok/s |
| Qwen 2.5 Coder 7B | 4.5 GB | Coding | ~10-14 tok/s |
| Gemma 3 12B | 7.5 GB | High-quality all-round | ~6-8 tok/s |
| Mistral Nemo 12B | 7.1 GB | Long documents (128K ctx) | ~6-8 tok/s |
Advanced Models (14B-20B)
Higher quality outputs but slower on standard servers. Best with dedicated resources.
| Model | Size | Best for | Speed |
|---|---|---|---|
| GPT-OSS 20B 🆕 | 12 GB | OpenAI open-weight, adjustable reasoning | ~3-5 tok/s |
| Phi-4 14B | 8.9 GB | STEM, reasoning (GPT-4o-mini level) | ~4-6 tok/s |
| Phi-4 Reasoning 14B 🆕 | 9.0 GB | Math olympiad, complex logic | ~4-6 tok/s |
| DeepSeek R1 14B | 9.0 GB | Advanced math & logic | ~4-6 tok/s |
| Qwen 2.5 14B | 8.5 GB | Multilingual, analysis | ~3-5 tok/s |
| Qwen 2.5 Coder 14B | 9.0 GB | Complex coding, multi-file | ~4-6 tok/s |
Flagship Models (Dedicated Resources Required)
These models deliver the highest quality but require dedicated server resources (32GB+ RAM). Add dedicated resources from Dashboard → Billing.
| Model | Size | Best for | Speed on CPU |
|---|---|---|---|
| DeepSeek R1 32B | 19 GB | Near GPT-4 reasoning | ~4-6 tok/s |
| Qwen 2.5 32B | 19 GB | Multilingual + general | ~4-6 tok/s |
| Qwen 2.5 Coder 32B | 19 GB | Best coding model | ~4-6 tok/s |
| Llama 3.3 70B | 43 GB | Highest quality overall | ~2-3 tok/s |
Embeddings & RAG
| Model | Size | Purpose |
|---|---|---|
| Nomic Embed Text | 274 MB | Required for Knowledge Base (RAG) and document search |
Installing Models
Go to Dashboard → Models and click Pull →. Downloads run in the background and typically take 2–15 minutes depending on model size. You can install multiple models and switch between them anytime.
Removing Models
Click Remove next to any installed model to free disk space. You can reinstall models anytime.
Disk Usage
| Setup | Disk Usage |
|---|---|
| 1-2 small models (3-4B) | ~5-8 GB |
| 2-3 models (7B) | ~10-15 GB |
| 5 models mixed | ~25 GB |
| System + base setup | ~8 GB |