GPU SERVERS
NestAI offers GPU-accelerated servers powered by NVIDIA GPUs via TensorDock. GPU servers deliver 5-10x faster inference compared to CPU — running 7B models at 60-90 tokens per second feels like using ChatGPT.
GPU Tiers
| Tier | GPU | VRAM | vCPU | RAM | Storage | Price | Best for |
|---|---|---|---|---|---|---|---|
| GPU Starter | RTX A4000 | 16 GB | 4 | 16 GB | 150 GB | $119/mo | 7B-14B models at ~60 tok/s |
| GPU Pro | RTX 4090 | 24 GB | 8 | 32 GB | 200 GB | $349/mo | 14B-32B models at ~25-35 tok/s |
| GPU Max | A100 80GB | 80 GB | 16 | 64 GB | 300 GB | $999/mo | 70B models at ~30-50 tok/s |
Speed Comparison — CPU vs GPU
| Model | CPU (shared) | CPU (dedicated) | GPU Starter | GPU Pro |
|---|---|---|---|---|
| Qwen 3.5 4B | ~15-20 tok/s | ~15-20 tok/s | ~80+ tok/s | ~90+ tok/s |
| Mistral 7B | ~10-15 tok/s | ~12-15 tok/s | ~55-65 tok/s | ~70-80 tok/s |
| Phi-4 14B | ~4-6 tok/s | ~7-10 tok/s | ~30-40 tok/s | ~40-50 tok/s |
| DeepSeek R1 32B | Too slow | ~3-5 tok/s | Won't fit (16GB) | ~15-20 tok/s |
| Llama 3.3 70B | Too slow | ~2-3 tok/s | Won't fit | Won't fit |
How GPU Servers Work
Choose GPU tier during onboarding
Select "GPU" infrastructure at step 2, then pick your GPU tier based on the models you want to run.
Server is provisioned
NestAI provisions a dedicated VM on TensorDock with NVIDIA drivers, CUDA, Ollama (GPU mode), Open WebUI, and SSL. Takes about 5-10 minutes.
Start chatting
Your team accesses the AI at yourteam.nestai.chirai.dev. Same interface as CPU servers, but responses are 5-10x faster.
Which Model Fits Which GPU?
| VRAM | Models that fit | GPU Tier |
|---|---|---|
| 16 GB (A4000) | 3B, 4B, 7B, 8B, 12B (Q4) | GPU Starter |
| 24 GB (RTX 4090) | 3B-14B, 32B (Q4) | GPU Pro |
| 80 GB (A100) | All models up to 70B (Q4) | GPU Max |
VRAM determines the largest model you can load. All models use Q4_K_M quantization by default — this gives the best balance of quality and speed.
GPU Regions
GPU servers are deployed on TensorDock's global network across 100+ locations in 20+ countries. During onboarding, you select a region preference (Europe, US, or Asia-Pacific) and NestAI automatically selects the best available datacenter.
GPU vs CPU — When to Choose What
| Factor | CPU (from $39/mo) | GPU (from $119/mo) |
|---|---|---|
| Speed (7B) | 10-20 tok/s | 60-90 tok/s |
| Best for | Light usage, internal tools, document processing | Real-time chat, demos, production, multiple users |
| Concurrent users | 1-2 comfortable | 4-8 comfortable |
| 70B models | Unusable (2-3 tok/s) | Usable on GPU Max (30-50 tok/s) |
| Region options | Germany, US, Singapore | Global (20+ countries) |
| Deploy time | 20-25 min | 5-10 min |
| Infrastructure | Hetzner Cloud | TensorDock |