LLM Providers

19 providers for Hermes Agent — 13 with free tiers, 2 local options. Reviews, pricing, models, and setup guides.

Total Providers

Free Tier Available

Top Rated ★4+

Local / Free

☁️ API Providers

OpenRouter

API

★★★★★

Universal API gateway to 200+ models from every major provider. One API key gives access to everything — Claude, GPT-4, Gemini, DeepSeek, Llama, Mistral, and more. The recommended starting point for most users.

Models200+ models — Anthropic, OpenAI, Google, Meta, Mistral, DeepSeek, and dozens more

PricingPay-per-token at provider rates. No monthly fee. Free tier models available.

Best forBest default provider. One key, all models. Easy switching between providers.

Free tier✅ Free tier

Pros

One key for 200+ models
Free tier available
Easy model switching
No monthly fee
Credential pooling supported
Rate limit fallback

Cons

Slight latency overhead vs direct API
Some niche models not available
Dependent on OpenRouter uptime

Setup guide

Sign up at openrouter.ai → create API key → `echo 'OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.hermes/.env` → `hermes model` → pick a model → start chatting

☁️ API Providers

Anthropic

API

★★★★★

Claude models — Sonnet, Haiku, and Opus. Best-in-class for coding, complex reasoning, and nuanced instruction following. Hermes's recommended premium model.

ModelsClaude Sonnet 4, Claude Haiku 3.5, Claude Opus 4

PricingSonnet: ~$15/1M tokens. Haiku: ~$3/1M. Opus: ~$75/1M.

Best forComplex coding, deep reasoning, nuanced tasks. The smartest model for hard problems.

Free tierNo — paid API only

Pros

Best coding & reasoning model
Excellent instruction following
Large context window (200K)
Reliable API
Good safety guardrails

Cons

Expensive (Opus especially)
No free tier
Rate limits on lower tiers
Vision quality varies

Setup guide

Sign up at anthropic.com → create API key → `echo 'ANTHROPIC_API_KEY="sk-ant-..."' >> ~/.hermes/.env` → `hermes config set model.default "claude-sonnet-4-20250514"` → `hermes config set model.provider "anthropic"`

☁️ API Providers

OpenAI

API

★★★★☆

GPT-4o, GPT-4o-mini, and o-series reasoning models. Broad capabilities with excellent tool-use performance. The original LLM API.

ModelsGPT-4o, GPT-4o-mini, o1, o3, o4-mini

PricingGPT-4o: ~$10/1M tokens. GPT-4o-mini: ~$1/1M. o-series: $15-60/1M.

Best forGeneral purpose. Good tool-use, broad knowledge, fast responses with mini models.

Free tierNo — paid API only

Pros

Excellent tool-use performance
Fast mini models (cheap)
Broad model selection
Reliable infrastructure
Great documentation

Cons

No free API tier
o-series models are expensive
Usage limits on new accounts
Less nuanced than Claude for reasoning

Setup guide

Sign up at platform.openai.com → create API key → `echo 'OPENAI_API_KEY="sk-..."' >> ~/.hermes/.env` → `hermes config set model.default "gpt-4o"` → `hermes config set model.provider "openai"`

☁️ API Providers

DeepSeek

API

★★★★★

DeepSeek Chat and DeepSeek Reasoner. Excellent quality-to-price ratio. Strong coding performance comparable to Claude at a fraction of the cost.

ModelsDeepSeek Chat V3, DeepSeek Reasoner R1

PricingChat: ~$0.27/1M tokens. Reasoner: ~$0.55/1M. Fraction of premium providers.

Best forBest value provider. Near-premium quality at budget prices. Excellent for coding.

Free tier✅ Free tier

Pros

Excellent price/performance
Strong coding ability
Free tier available
Reasoner model for complex tasks
Chinese language support

Cons

Fewer models than OpenAI/Anthropic
API reliability varies
Less ecosystem/support
Some rate limits on free tier

Setup guide

Sign up at platform.deepseek.com → create API key → `echo 'DEEPSEEK_API_KEY="sk-..."' >> ~/.hermes/.env` → `hermes config set model.default "deepseek-chat"` → `hermes config set model.provider "deepseek"`

☁️ API Providers

Google Gemini

API

★★★★☆

Gemini 2.0 Flash, Pro, and experimental models. Very fast, very cheap, huge context window (1M+ tokens). Great for processing large documents.

ModelsGemini 2.0 Flash, Gemini 2.0 Pro, Gemini 2.5 Pro (experimental)

PricingFlash: ~$0.10/1M tokens (one of the cheapest). Pro: ~$1.25/1M.

Best forCost-effective general use. Processing large documents (1M+ token context). Fast responses.

Free tier✅ Free tier

Pros

Very cheap (Flash)
1M+ token context window
Fast response times
Free tier available
Google ecosystem integration

Cons

Less consistent than Claude for coding
Fewer third-party tools
Experimental models can be unstable
Some features lag behind OpenAI

Setup guide

Get API key at aistudio.google.com → `echo 'GOOGLE_API_KEY="AIza..."' >> ~/.hermes/.env` → `hermes config set model.default "gemini-2.0-flash"` → `hermes config set model.provider "google"`

☁️ API Providers

xAI / Grok

API

★★★☆☆

xAI's Grok models. Good general performance with real-time knowledge. Integrated with X/Twitter ecosystem.

ModelsGrok 3, Grok 3-mini, Grok Vision

PricingGrok 3: ~$15/1M tokens. Mini: ~$3/1M.

Best forX/Twitter integration. Real-time knowledge. Image generation via Grok-Imagine.

Free tierLimited — free tier for X Premium subscribers

Pros

Real-time knowledge
X/Twitter integration
Grok-Imagine image gen
Competitive pricing

Cons

Newer provider, less mature
Fewer models
Smaller ecosystem
Limited free options

Setup guide

Sign up at x.ai → create API key → `echo 'XAI_API_KEY="xai-..."' >> ~/.hermes/.env` → `hermes config set model.default "grok-3"` → `hermes config set model.provider "xai"`

🔑 OAuth / Platform

GitHub Copilot

OAuth

★★★★☆

GitHub Copilot as LLM provider. Uses Copilot's backend models (currently based on GPT-4o and Claude). Free for verified students, teachers, and OSS maintainers.

ModelsCopilot models (GPT-4o based, Claude Sonnet)

Pricing$10/month (Pro) or included in GitHub Enterprise. Free for students/teachers/OSS.

Best forFree/cheap access to premium models. Perfect if you already have Copilot.

Free tier✅ Free tier

Pros

Free for students/teachers/OSS
Good model quality
No separate billing
OAuth setup is smooth
Copilot CLI integration

Cons

Not standalone — requires GitHub account
No rate limit guarantees
Limited model selection
OAuth only (no API key option)

Setup guide

Do NOT use `gh auth login`. Run `hermes model` → select GitHub Copilot → follow OAuth device code flow → authenticate in browser → done. Or set `COPILOT_GITHUB_TOKEN` for token-based auth.

🔑 OAuth / Platform

OpenAI Codex

OAuth

★★★☆☆

OpenAI Codex CLI integration. OAuth-based access to OpenAI models via Codex subscription. Alternative to direct OpenAI API key billing.

ModelsGPT-4o, GPT-4o-mini (Codex hosted)

PricingIncluded with Codex subscription (~$20/month)

Best forUsers who already subscribe to Codex and want unified billing.

Free tierNo — requires Codex subscription

Pros

Unified billing with Codex
OAuth — no API key management
Good model quality

Cons

Requires Codex subscription
Higher cost than direct API
Limited model selection
No free tier

Setup guide

Run `hermes login --provider openai-codex` → follow browser auth → models are available via Codex provider

🔑 OAuth / Platform

Nous Portal

OAuth

★★★☆☆

Nous Research's own model hosting. Access to Nous models and community finetunes. OAuth-based authentication.

ModelsNous models and community finetunes

PricingVaries by model — generally competitive

Best forAccess to Nous Research's latest models and community finetunes.

Free tier✅ Free tier

Pros

Access to Nous models
OAuth-based
Community finetunes available
Supports Hermes development

Cons

Smaller model selection
Newer service, less mature
Variable model quality

Setup guide

Run `hermes login --provider nous` → follow browser auth → select models via `hermes model`

💻 Local Inference

Ollama (Local)

Local

★★★★★

Run LLMs locally on your own hardware. Hundreds of models available — Llama 3.2, Mistral, Phi-4, Gemma, DeepSeek. Zero API cost, fully private.

ModelsHundreds — Llama 3.2, Mistral, Phi-4, Gemma, Qwen, DeepSeek, and community models

Pricing$0 (electricity only)

Best forZero-cost inference, privacy, offline use. 3B models on modest hardware, 7B+ on better machines.

Free tierN/A — it's always free (no API calls)

Pros

Completely free (no API costs)
Fully private — no data leaves your machine
Hundreds of models
Works offline
Fast for small models on modest hardware

Cons

Requires local hardware (RAM/GPU)
Small models struggle with complex tasks
10-50 tok/s on CPU, 50-100 on GPU
GGUF file sizes: 2-8GB per model
Setup needed per machine

Setup guide

Install Ollama: `curl -fsSL https://ollama.ai/install.sh | sh` → pull a model: `ollama pull llama3.2` → `hermes config set model.default "ollama/llama3.2"` → `hermes config set model.provider "ollama"` → start chatting

💻 Local Inference

llama.cpp

Local

★★★★☆

GGUF model inference via llama.cpp. Lower-level than Ollama but gives more control. Supports quantized models for running on limited hardware.

ModelsAny GGUF format model — thousands available on Hugging Face

Pricing$0 (electricity only)

Best forPower users who want full control over inference parameters. Custom quantization levels.

Free tierN/A — always free

Pros

Full control over inference
Custom quantization
Runs on CPU efficiently
Supports any GGUF model
Lower overhead than Ollama

Cons

More setup than Ollama
No model download management
Manual configuration
Less user-friendly

Setup guide

Download or compile llama.cpp → download GGUF model from Hugging Face → `hermes config set model.default "llama.cpp/path/to/model.gguf"` → `hermes config set model.provider "llama.cpp"`

☁️ API Providers

Hugging Face

API

★★★☆☆

Hugging Face Inference API and Inference Endpoints. Access to thousands of community and first-party models. Serverless or dedicated endpoints.

ModelsThousands — all models on Hugging Face Hub with Inference API support

PricingServerless: free tier with rate limits. Dedicated endpoints: pay-per-hour for GPU.

Best forAccess to niche or community models not available on other providers. Experimenting with new models.

Free tier✅ Free tier

Pros

Thousands of models
Free serverless tier
Community models available
Dedicated endpoints for production

Cons

Serverless is slow (cold starts)
Model quality varies wildly
Less curated than other providers
Dedicated endpoints are expensive

Setup guide

Sign up at huggingface.co → create access token → `echo 'HF_TOKEN="hf_..."' >> ~/.hermes/.env` → select models via `hermes model`

☁️ API Providers

Groq

API

★★★★☆

Groq's LPU inference engine. Extremely fast inference speed on open models (Llama, Mixtral, Gemma). Best token-per-second throughput available.

ModelsLlama 3, Mixtral 8x7B, Gemma 2, Whisper (STT)

PricingFree tier with rate limits. Paid plans for higher throughput.

Best forFastest inference speed. Free STT (Whisper). Open models at blazing speed.

Free tier✅ Free tier

Pros

Blazing fast inference
Free tier available
Free Whisper STT
Good for chat/streaming

Cons

Limited model selection
No premium/closed models
Rate limits on free tier
Newer provider

Setup guide

Sign up at console.groq.com → create API key → `echo 'GROQ_API_KEY="gsk_..."' >> ~/.hermes/.env` → select models via `hermes model`

☁️ API Providers

Together AI

API

★★★☆☆

Together AI's inference platform. Good selection of open models with competitive pricing. Fine-tuning API also available.

ModelsLlama 3, Mistral, DeepSeek, Qwen, and other open models

PricingCompetitive with other open-model providers. ~$0.10-1.00/1M tokens.

Best forOpen model inference with fine-tuning capabilities.

Free tier✅ Free tier

Pros

Good open model selection
Fine-tuning API
Competitive pricing
Free tier

Cons

Smaller selection than OpenRouter
No premium/closed models
Less well-known

Setup guide

☁️ API Providers

Novita AI

API

★★★☆☆

Novita AI inference platform. Supports a wide range of open models with competitive pricing. Includes image generation models.

ModelsLlama 3, Mistral, DeepSeek, Stable Diffusion, and others

PricingCompetitive pricing on open models. Image generation additional.

Best forOpen models plus image generation in one provider.

Free tier✅ Free tier

Pros

Open models + image gen
Competitive pricing
Free signup credits

Cons

Newer provider
Smaller ecosystem
Variable reliability

Setup guide

☁️ API Providers

Z.AI / GLM

API

★★★☆☆

Zhipu AI's GLM models. Leading Chinese LLM provider. Strong Chinese language performance with competitive English capabilities.

ModelsGLM-4, GLM-4V (vision), GLM-4-Plus

PricingCompetitive with international providers

Best forChinese language tasks. Access to GLM models. Chinese enterprise deployments.

Free tier✅ Free tier

Pros

Strong Chinese language
Vision model available
Competitive pricing
Free tier

Cons

Chinese-focused
English performance lags behind
Limited international docs

Setup guide

☁️ API Providers

MiniMax

API

★★★☆☆

MiniMax LLM and TTS models. Chinese provider with competitive language models and high-quality text-to-speech.

ModelsMiniMax models, MiniMax TTS

PricingCompetitive pricing. TTS also available.

Best forChinese language tasks. High-quality Chinese TTS.

Free tier✅ Free tier

Pros

Good Chinese language models
High-quality TTS
Competitive pricing
Free tier

Cons

Chinese-focused
Smaller model selection
Limited English support

Setup guide

☁️ API Providers

Alibaba / DashScope

API

★★★★☆

Alibaba Cloud's Qwen models via DashScope API. Strong Chinese and English performance. Qwen2.5 models are competitive globally.

ModelsQwen2.5-72B, Qwen2.5-Coder, Qwen2-VL (vision), and smaller Qwen models

PricingVery competitive — Qwen models offer excellent value

Best forQwen models — excellent quality-to-price ratio. Both Chinese and English.

Free tier✅ Free tier

Pros

Qwen models are top-tier
Excellent value
Strong bilingual (CN/EN)
Vision model available
Coder model for programming

Cons

Alibaba Cloud signup can be complex
Less community adoption in West
Documentation mostly in Chinese

Setup guide

☁️ API Providers

Kimi / Moonshot

API

★★★☆☆

Moonshot AI's Kimi models. Known for very long context windows. Strong Chinese language performance.

ModelsKimi models with long context support

PricingCompetitive

Best forVery long context tasks. Chinese language applications.

Free tier✅ Free tier

Pros

Very long context windows
Good Chinese performance
Competitive pricing
Free tier

Cons

Chinese-focused
Smaller international presence
Limited model variety

Setup guide

🔍

No providers match your search. Try a different filter or keyword.