NESTAIDOCS
NestAI Docs

GPU SERVERS

NestAI offers GPU-accelerated servers powered by NVIDIA GPUs via TensorDock. GPU servers deliver 5-10x faster inference compared to CPU — running 7B models at 60-90 tokens per second feels like using ChatGPT.

GPU plans are standalone — they include the server, GPU, and the full managed stack (Ollama, Open WebUI, SSL, monitoring). No separate plan purchase needed.

GPU Tiers

TierGPUVRAMvCPURAMStoragePriceBest for
GPU StarterRTX A400016 GB416 GB150 GB$119/mo7B-14B models at ~60 tok/s
GPU ProRTX 409024 GB832 GB200 GB$349/mo14B-32B models at ~25-35 tok/s
GPU MaxA100 80GB80 GB1664 GB300 GB$999/mo70B models at ~30-50 tok/s

Speed Comparison — CPU vs GPU

ModelCPU (shared)CPU (dedicated)GPU StarterGPU Pro
Qwen 3.5 4B~15-20 tok/s~15-20 tok/s~80+ tok/s~90+ tok/s
Mistral 7B~10-15 tok/s~12-15 tok/s~55-65 tok/s~70-80 tok/s
Phi-4 14B~4-6 tok/s~7-10 tok/s~30-40 tok/s~40-50 tok/s
DeepSeek R1 32BToo slow~3-5 tok/sWon't fit (16GB)~15-20 tok/s
Llama 3.3 70BToo slow~2-3 tok/sWon't fitWon't fit
For Llama 3.3 70B at usable speed, you need GPU Max (A100 80GB) which runs it at ~30-50 tok/s.

How GPU Servers Work

1

Choose GPU tier during onboarding

Select "GPU" infrastructure at step 2, then pick your GPU tier based on the models you want to run.

2

Server is provisioned

NestAI provisions a dedicated VM on TensorDock with NVIDIA drivers, CUDA, Ollama (GPU mode), Open WebUI, and SSL. Takes about 5-10 minutes.

3

Start chatting

Your team accesses the AI at yourteam.nestai.chirai.dev. Same interface as CPU servers, but responses are 5-10x faster.

Which Model Fits Which GPU?

VRAMModels that fitGPU Tier
16 GB (A4000)3B, 4B, 7B, 8B, 12B (Q4)GPU Starter
24 GB (RTX 4090)3B-14B, 32B (Q4)GPU Pro
80 GB (A100)All models up to 70B (Q4)GPU Max

VRAM determines the largest model you can load. All models use Q4_K_M quantization by default — this gives the best balance of quality and speed.

GPU Regions

GPU servers are deployed on TensorDock's global network across 100+ locations in 20+ countries. During onboarding, you select a region preference (Europe, US, or Asia-Pacific) and NestAI automatically selects the best available datacenter.

GPU availability varies by region and model. If your preferred GPU is not available in your region, NestAI will suggest an alternative location.

GPU vs CPU — When to Choose What

FactorCPU (from $39/mo)GPU (from $119/mo)
Speed (7B)10-20 tok/s60-90 tok/s
Best forLight usage, internal tools, document processingReal-time chat, demos, production, multiple users
Concurrent users1-2 comfortable4-8 comfortable
70B modelsUnusable (2-3 tok/s)Usable on GPU Max (30-50 tok/s)
Region optionsGermany, US, SingaporeGlobal (20+ countries)
Deploy time20-25 min5-10 min
InfrastructureHetzner CloudTensorDock