NESTAIDOCS
NestAI Docs

AI MODELS

NestAI runs open-source AI models directly on your server using Ollama. You control which models are installed — and your data never leaves your infrastructure. All speeds below are for a single user on CPU. Concurrent users share CPU and will see lower speeds.

Models are grouped by tier. Your standard server (8 vCPU, 16GB RAM, 160GB SSD) runs most models well. Add dedicated resources from the billing page for 32B+ models.

Latest & Recommended Models (2026)

These models run well on standard and dedicated servers. Best balance of speed, quality, and reliability.

ModelSizeBest forSpeed (single user)
Qwen 3.5 4B ⭐2.8 GBChat, vision, multilingual~15-20 tok/s
Qwen 3.5 9B5.5 GBAll-round, vision, reasoning~8-12 tok/s
Llama 3.2 3B2.0 GBFast chat, Q&A, drafting~15-25 tok/s
Phi-4 Mini2.5 GBReasoning, beats most 7B~15-20 tok/s
Gemma 3 4B3.3 GBMultilingual, instruction~12-18 tok/s
Llama 3.1 8B4.7 GBBalanced general use~8-12 tok/s
DeepSeek R1 7B4.7 GBReasoning, math, logic~10-14 tok/s
Mistral 7B4.1 GBBusiness, code, structured~10-15 tok/s
Qwen 2.5 Coder 7B4.5 GBCoding~10-14 tok/s
Gemma 3 12B7.5 GBHigh-quality all-round~6-8 tok/s
Mistral Nemo 12B7.1 GBLong documents (128K ctx)~6-8 tok/s
Qwen 3.5 4B is our top recommendation for most users — it outperforms many 7B models while being faster and smaller.

Advanced Models (14B-20B)

Higher quality outputs but slower on standard servers. Best with dedicated resources.

ModelSizeBest forSpeed
GPT-OSS 20B 🆕12 GBOpenAI open-weight, adjustable reasoning~3-5 tok/s
Phi-4 14B8.9 GBSTEM, reasoning (GPT-4o-mini level)~4-6 tok/s
Phi-4 Reasoning 14B 🆕9.0 GBMath olympiad, complex logic~4-6 tok/s
DeepSeek R1 14B9.0 GBAdvanced math & logic~4-6 tok/s
Qwen 2.5 14B8.5 GBMultilingual, analysis~3-5 tok/s
Qwen 2.5 Coder 14B9.0 GBComplex coding, multi-file~4-6 tok/s
These models fit in 16GB RAM but are significantly slower on shared servers (~3-6 tok/s). Add dedicated resources for a better experience.

Flagship Models (Dedicated Resources Required)

These models deliver the highest quality but require dedicated server resources (32GB+ RAM). Add dedicated resources from Dashboard → Billing.

ModelSizeBest forSpeed on CPU
DeepSeek R1 32B19 GBNear GPT-4 reasoning~4-6 tok/s
Qwen 2.5 32B19 GBMultilingual + general~4-6 tok/s
Qwen 2.5 Coder 32B19 GBBest coding model~4-6 tok/s
Llama 3.3 70B43 GBHighest quality overall~2-3 tok/s
70B models run at ~2-3 tokens/sec on CPU — usable for document analysis and batch tasks, but not ideal for real-time chat. 32B models are the sweet spot for quality vs speed.

Embeddings & RAG

ModelSizePurpose
Nomic Embed Text274 MBRequired for Knowledge Base (RAG) and document search
Install Nomic Embed to enable document search, semantic retrieval, and knowledge base features.

Installing Models

Go to Dashboard → Models and click Pull →. Downloads run in the background and typically take 2–15 minutes depending on model size. You can install multiple models and switch between them anytime.

Do not restart your server while a model is downloading.

Removing Models

Click Remove next to any installed model to free disk space. You can reinstall models anytime.

Disk Usage

SetupDisk Usage
1-2 small models (3-4B)~5-8 GB
2-3 models (7B)~10-15 GB
5 models mixed~25 GB
System + base setup~8 GB
Standard servers include 160GB SSD. Dedicated resource tiers include 240-960GB depending on tier.