☁️ API Providers
Universal API gateway to 200+ models from every major provider. One API key gives access to everything — Claude, GPT-4, Gemini, DeepSeek, Llama, Mistral, and more. The recommended starting point for most users.
Models200+ models — Anthropic, OpenAI, Google, Meta, Mistral, DeepSeek, and dozens more
PricingPay-per-token at provider rates. No monthly fee. Free tier models available.
Best forBest default provider. One key, all models. Easy switching between providers.
Free tier✅ Free tier
Pros - One key for 200+ models
- Free tier available
- Easy model switching
- No monthly fee
- Credential pooling supported
- Rate limit fallback
Cons - Slight latency overhead vs direct API
- Some niche models not available
- Dependent on OpenRouter uptime
Setup guide
Sign up at openrouter.ai → create API key → `echo 'OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.hermes/.env` → `hermes model` → pick a model → start chatting
☁️ API Providers
Claude models — Sonnet, Haiku, and Opus. Best-in-class for coding, complex reasoning, and nuanced instruction following. Hermes's recommended premium model.
ModelsClaude Sonnet 4, Claude Haiku 3.5, Claude Opus 4
PricingSonnet: ~$15/1M tokens. Haiku: ~$3/1M. Opus: ~$75/1M.
Best forComplex coding, deep reasoning, nuanced tasks. The smartest model for hard problems.
Free tierNo — paid API only
Pros - Best coding & reasoning model
- Excellent instruction following
- Large context window (200K)
- Reliable API
- Good safety guardrails
Cons - Expensive (Opus especially)
- No free tier
- Rate limits on lower tiers
- Vision quality varies
Setup guide
Sign up at anthropic.com → create API key → `echo 'ANTHROPIC_API_KEY="sk-ant-..."' >> ~/.hermes/.env` → `hermes config set model.default "claude-sonnet-4-20250514"` → `hermes config set model.provider "anthropic"`
☁️ API Providers
GPT-4o, GPT-4o-mini, and o-series reasoning models. Broad capabilities with excellent tool-use performance. The original LLM API.
ModelsGPT-4o, GPT-4o-mini, o1, o3, o4-mini
PricingGPT-4o: ~$10/1M tokens. GPT-4o-mini: ~$1/1M. o-series: $15-60/1M.
Best forGeneral purpose. Good tool-use, broad knowledge, fast responses with mini models.
Free tierNo — paid API only
Pros - Excellent tool-use performance
- Fast mini models (cheap)
- Broad model selection
- Reliable infrastructure
- Great documentation
Cons - No free API tier
- o-series models are expensive
- Usage limits on new accounts
- Less nuanced than Claude for reasoning
Setup guide
Sign up at platform.openai.com → create API key → `echo 'OPENAI_API_KEY="sk-..."' >> ~/.hermes/.env` → `hermes config set model.default "gpt-4o"` → `hermes config set model.provider "openai"`
☁️ API Providers
DeepSeek Chat and DeepSeek Reasoner. Excellent quality-to-price ratio. Strong coding performance comparable to Claude at a fraction of the cost.
ModelsDeepSeek Chat V3, DeepSeek Reasoner R1
PricingChat: ~$0.27/1M tokens. Reasoner: ~$0.55/1M. Fraction of premium providers.
Best forBest value provider. Near-premium quality at budget prices. Excellent for coding.
Free tier✅ Free tier
Pros - Excellent price/performance
- Strong coding ability
- Free tier available
- Reasoner model for complex tasks
- Chinese language support
Cons - Fewer models than OpenAI/Anthropic
- API reliability varies
- Less ecosystem/support
- Some rate limits on free tier
Setup guide
Sign up at platform.deepseek.com → create API key → `echo 'DEEPSEEK_API_KEY="sk-..."' >> ~/.hermes/.env` → `hermes config set model.default "deepseek-chat"` → `hermes config set model.provider "deepseek"`
☁️ API Providers
Gemini 2.0 Flash, Pro, and experimental models. Very fast, very cheap, huge context window (1M+ tokens). Great for processing large documents.
ModelsGemini 2.0 Flash, Gemini 2.0 Pro, Gemini 2.5 Pro (experimental)
PricingFlash: ~$0.10/1M tokens (one of the cheapest). Pro: ~$1.25/1M.
Best forCost-effective general use. Processing large documents (1M+ token context). Fast responses.
Free tier✅ Free tier
Pros - Very cheap (Flash)
- 1M+ token context window
- Fast response times
- Free tier available
- Google ecosystem integration
Cons - Less consistent than Claude for coding
- Fewer third-party tools
- Experimental models can be unstable
- Some features lag behind OpenAI
Setup guide
Get API key at aistudio.google.com → `echo 'GOOGLE_API_KEY="AIza..."' >> ~/.hermes/.env` → `hermes config set model.default "gemini-2.0-flash"` → `hermes config set model.provider "google"`
☁️ API Providers
xAI's Grok models. Good general performance with real-time knowledge. Integrated with X/Twitter ecosystem.
ModelsGrok 3, Grok 3-mini, Grok Vision
PricingGrok 3: ~$15/1M tokens. Mini: ~$3/1M.
Best forX/Twitter integration. Real-time knowledge. Image generation via Grok-Imagine.
Free tierLimited — free tier for X Premium subscribers
Pros - Real-time knowledge
- X/Twitter integration
- Grok-Imagine image gen
- Competitive pricing
Cons - Newer provider, less mature
- Fewer models
- Smaller ecosystem
- Limited free options
Setup guide
Sign up at x.ai → create API key → `echo 'XAI_API_KEY="xai-..."' >> ~/.hermes/.env` → `hermes config set model.default "grok-3"` → `hermes config set model.provider "xai"`
🔑 OAuth / Platform
GitHub Copilot
OAuth ★★★★☆
GitHub Copilot as LLM provider. Uses Copilot's backend models (currently based on GPT-4o and Claude). Free for verified students, teachers, and OSS maintainers.
ModelsCopilot models (GPT-4o based, Claude Sonnet)
Pricing$10/month (Pro) or included in GitHub Enterprise. Free for students/teachers/OSS.
Best forFree/cheap access to premium models. Perfect if you already have Copilot.
Free tier✅ Free tier
Pros - Free for students/teachers/OSS
- Good model quality
- No separate billing
- OAuth setup is smooth
- Copilot CLI integration
Cons - Not standalone — requires GitHub account
- No rate limit guarantees
- Limited model selection
- OAuth only (no API key option)
Setup guide
Do NOT use `gh auth login`. Run `hermes model` → select GitHub Copilot → follow OAuth device code flow → authenticate in browser → done. Or set `COPILOT_GITHUB_TOKEN` for token-based auth.
🔑 OAuth / Platform
OpenAI Codex CLI integration. OAuth-based access to OpenAI models via Codex subscription. Alternative to direct OpenAI API key billing.
ModelsGPT-4o, GPT-4o-mini (Codex hosted)
PricingIncluded with Codex subscription (~$20/month)
Best forUsers who already subscribe to Codex and want unified billing.
Free tierNo — requires Codex subscription
Pros - Unified billing with Codex
- OAuth — no API key management
- Good model quality
Cons - Requires Codex subscription
- Higher cost than direct API
- Limited model selection
- No free tier
Setup guide
Run `hermes login --provider openai-codex` → follow browser auth → models are available via Codex provider
🔑 OAuth / Platform
Nous Research's own model hosting. Access to Nous models and community finetunes. OAuth-based authentication.
ModelsNous models and community finetunes
PricingVaries by model — generally competitive
Best forAccess to Nous Research's latest models and community finetunes.
Free tier✅ Free tier
Pros - Access to Nous models
- OAuth-based
- Community finetunes available
- Supports Hermes development
Cons - Smaller model selection
- Newer service, less mature
- Variable model quality
Setup guide
Run `hermes login --provider nous` → follow browser auth → select models via `hermes model`
💻 Local Inference
Ollama (Local)
Local ★★★★★
Run LLMs locally on your own hardware. Hundreds of models available — Llama 3.2, Mistral, Phi-4, Gemma, DeepSeek. Zero API cost, fully private.
ModelsHundreds — Llama 3.2, Mistral, Phi-4, Gemma, Qwen, DeepSeek, and community models
Pricing$0 (electricity only)
Best forZero-cost inference, privacy, offline use. 3B models on modest hardware, 7B+ on better machines.
Free tierN/A — it's always free (no API calls)
Pros - Completely free (no API costs)
- Fully private — no data leaves your machine
- Hundreds of models
- Works offline
- Fast for small models on modest hardware
Cons - Requires local hardware (RAM/GPU)
- Small models struggle with complex tasks
- 10-50 tok/s on CPU, 50-100 on GPU
- GGUF file sizes: 2-8GB per model
- Setup needed per machine
Setup guide
Install Ollama: `curl -fsSL https://ollama.ai/install.sh | sh` → pull a model: `ollama pull llama3.2` → `hermes config set model.default "ollama/llama3.2"` → `hermes config set model.provider "ollama"` → start chatting
💻 Local Inference
GGUF model inference via llama.cpp. Lower-level than Ollama but gives more control. Supports quantized models for running on limited hardware.
ModelsAny GGUF format model — thousands available on Hugging Face
Pricing$0 (electricity only)
Best forPower users who want full control over inference parameters. Custom quantization levels.
Free tierN/A — always free
Pros - Full control over inference
- Custom quantization
- Runs on CPU efficiently
- Supports any GGUF model
- Lower overhead than Ollama
Cons - More setup than Ollama
- No model download management
- Manual configuration
- Less user-friendly
Setup guide
Download or compile llama.cpp → download GGUF model from Hugging Face → `hermes config set model.default "llama.cpp/path/to/model.gguf"` → `hermes config set model.provider "llama.cpp"`
☁️ API Providers
Hugging Face Inference API and Inference Endpoints. Access to thousands of community and first-party models. Serverless or dedicated endpoints.
ModelsThousands — all models on Hugging Face Hub with Inference API support
PricingServerless: free tier with rate limits. Dedicated endpoints: pay-per-hour for GPU.
Best forAccess to niche or community models not available on other providers. Experimenting with new models.
Free tier✅ Free tier
Pros - Thousands of models
- Free serverless tier
- Community models available
- Dedicated endpoints for production
Cons - Serverless is slow (cold starts)
- Model quality varies wildly
- Less curated than other providers
- Dedicated endpoints are expensive
Setup guide
Sign up at huggingface.co → create access token → `echo 'HF_TOKEN="hf_..."' >> ~/.hermes/.env` → select models via `hermes model`
☁️ API Providers
Groq's LPU inference engine. Extremely fast inference speed on open models (Llama, Mixtral, Gemma). Best token-per-second throughput available.
ModelsLlama 3, Mixtral 8x7B, Gemma 2, Whisper (STT)
PricingFree tier with rate limits. Paid plans for higher throughput.
Best forFastest inference speed. Free STT (Whisper). Open models at blazing speed.
Free tier✅ Free tier
Pros - Blazing fast inference
- Free tier available
- Free Whisper STT
- Good for chat/streaming
Cons - Limited model selection
- No premium/closed models
- Rate limits on free tier
- Newer provider
Setup guide
Sign up at console.groq.com → create API key → `echo 'GROQ_API_KEY="gsk_..."' >> ~/.hermes/.env` → select models via `hermes model`
☁️ API Providers
Together AI's inference platform. Good selection of open models with competitive pricing. Fine-tuning API also available.
ModelsLlama 3, Mistral, DeepSeek, Qwen, and other open models
PricingCompetitive with other open-model providers. ~$0.10-1.00/1M tokens.
Best forOpen model inference with fine-tuning capabilities.
Free tier✅ Free tier
Pros - Good open model selection
- Fine-tuning API
- Competitive pricing
- Free tier
Cons - Smaller selection than OpenRouter
- No premium/closed models
- Less well-known
Setup guide
Sign up at together.ai → create API key → set env var → select models via hermes model
☁️ API Providers
Novita AI inference platform. Supports a wide range of open models with competitive pricing. Includes image generation models.
ModelsLlama 3, Mistral, DeepSeek, Stable Diffusion, and others
PricingCompetitive pricing on open models. Image generation additional.
Best forOpen models plus image generation in one provider.
Free tier✅ Free tier
Pros - Open models + image gen
- Competitive pricing
- Free signup credits
Cons - Newer provider
- Smaller ecosystem
- Variable reliability
Setup guide
Sign up at novita.ai → create API key → set env var → select models via hermes model
☁️ API Providers
Zhipu AI's GLM models. Leading Chinese LLM provider. Strong Chinese language performance with competitive English capabilities.
ModelsGLM-4, GLM-4V (vision), GLM-4-Plus
PricingCompetitive with international providers
Best forChinese language tasks. Access to GLM models. Chinese enterprise deployments.
Free tier✅ Free tier
Pros - Strong Chinese language
- Vision model available
- Competitive pricing
- Free tier
Cons - Chinese-focused
- English performance lags behind
- Limited international docs
Setup guide
Sign up at zhipu.ai → create API key → `echo 'GLM_API_KEY="..."' >> ~/.hermes/.env` → select models
☁️ API Providers
MiniMax LLM and TTS models. Chinese provider with competitive language models and high-quality text-to-speech.
ModelsMiniMax models, MiniMax TTS
PricingCompetitive pricing. TTS also available.
Best forChinese language tasks. High-quality Chinese TTS.
Free tier✅ Free tier
Pros - Good Chinese language models
- High-quality TTS
- Competitive pricing
- Free tier
Cons - Chinese-focused
- Smaller model selection
- Limited English support
Setup guide
Sign up at minimax.com → create API key → set env var → select models
☁️ API Providers
Alibaba / DashScope
API ★★★★☆
Alibaba Cloud's Qwen models via DashScope API. Strong Chinese and English performance. Qwen2.5 models are competitive globally.
ModelsQwen2.5-72B, Qwen2.5-Coder, Qwen2-VL (vision), and smaller Qwen models
PricingVery competitive — Qwen models offer excellent value
Best forQwen models — excellent quality-to-price ratio. Both Chinese and English.
Free tier✅ Free tier
Pros - Qwen models are top-tier
- Excellent value
- Strong bilingual (CN/EN)
- Vision model available
- Coder model for programming
Cons - Alibaba Cloud signup can be complex
- Less community adoption in West
- Documentation mostly in Chinese
Setup guide
Sign up at dashscope.aliyun.com → create API key → `echo 'DASHSCOPE_API_KEY="sk-..."' >> ~/.hermes/.env` → select models
☁️ API Providers
Kimi / Moonshot
API ★★★☆☆
Moonshot AI's Kimi models. Known for very long context windows. Strong Chinese language performance.
ModelsKimi models with long context support
PricingCompetitive
Best forVery long context tasks. Chinese language applications.
Free tier✅ Free tier
Pros - Very long context windows
- Good Chinese performance
- Competitive pricing
- Free tier
Cons - Chinese-focused
- Smaller international presence
- Limited model variety
Setup guide
Sign up at moonshot.cn → create API key → set env var → select models