Tools — Hermes Agent Tutorials

Total Tools

Terminal

Default On

★★★★★

Execute shell commands, manage background processes, run scripts. The primary way Hermes interacts with your system. Supports local, Docker, SSH, and Modal backends.

Shell executionProcess managementPTY modeMulti-backend (local/docker/ssh/modal)

PTY mode has \r vs \n issues with prompt_toolkit apps. Prefer tmux for interactive spawning.

⚙️ Core Tools

File System

Default On

★★★★★

Read, write, search, and patch files on the local filesystem. Replaces cat/grep/sed with agent-friendly structured operations.

File read/writeContent search (ripgrep-backed)Find-and-replace patchingSyntax linting on write

write_file completely overwrites files. Use patch for targeted edits to avoid losing content.

⚙️ Core Tools

Web Search

Default On

★★★★★

Internet search and content extraction. The primary research tool — finds URLs, fetches content, and extracts structured data from web pages.

Web searchContent extractionURL fetchingMulti-backend (Firecrawl, Tavily, SearXNG, etc.)

Requires: FIRECRAWL_API_KEY or TAVILY_API_KEY

Requires at least one search backend configured. Some sites block automated access.

⚙️ Core Tools

Skills

Default On

★★★★★

Browse, install, create, and manage skills. Skills are reusable procedure documents that teach the agent how to do specific tasks.

Skill search/installSkill creationSkill managementHub publishing

None — fully self-contained.

⚙️ Core Tools

Memory

Default On

★★★★☆

Persistent cross-session memory. Stores facts about the user, environment, and lessons learned. Pluggable backends (built-in SQLite, Honcho, Mem0).

Fact storage/retrievalUser preference learningCross-session persistencePluggable backends

Memory is bounded (~2KB). Old entries are evicted when full. Cloud backends need API keys.

⚙️ Core Tools

Session Search

Default On

★★★★☆

Search past conversations using FTS5 full-text search. Retrieves summaries of matching sessions. Essential for cross-session context.

FTS5 full-text searchSession summariesRecent session browsingRole-based filtering

Search uses OR between keywords by default (AND for phrases). Recent sessions mode has no LLM cost.

⚙️ Core Tools

Delegation

Default On

★★★★☆

Spawn subagents with isolated contexts and terminal sessions. Supports parallel batch execution (up to 3 concurrent children).

Subagent spawningParallel batch executionIsolated context/terminalLeaf and orchestrator roles

Not durable — children are cancelled if parent is interrupted. Use cron jobs or background terminal for persistent work.

⚙️ Core Tools

Cron Jobs

Default On

★★★★★

Built-in scheduler for recurring tasks. Supports durations, cron expressions, and ISO timestamps. Jobs run autonomously with configurable model/skills/delivery.

Scheduled executionPer-job model overrideScript pre-runMulti-platform deliveryWatchdog pattern (no_agent)

Schedule format: duration (30m), cron (0 9 * * *), or ISO. "every sunday" phrases not supported.

⚙️ Core Tools

Clarify

Default On

★★★★☆

Ask the user clarifying questions when a task is ambiguous. Supports multiple choice (up to 4 options) and open-ended modes.

Multiple choice promptsOpen-ended questionsInline "Other" option

Overuse can be annoying. Prefer making a reasonable default when the decision is low-stakes.

⚙️ Core Tools

Task List

Default On

★★★★☆

In-session task tracking with priority ordering. Supports create, update, mark complete, cancel, and merge operations. One task in_progress at a time.

Task CRUDPriority orderingMerge/replace modesProgress tracking

Tasks are session-scoped — not persistent across sessions. Use kanban for durable task management.

⚙️ Core Tools

Code Execution

Default On

★★★★☆

Sandboxed Python execution with access to file/search/patch/terminal tools. Use for multi-step processing, data filtering, and conditional logic between tool calls.

Python executionTool library access5-minute timeout50KB stdout cap

Foreground-only (no background/pty). 50 tool calls per script. Stdout capped at 50KB.

🌐 Web & Browser

Browser

Opt-in

★★★★★

Full browser automation — navigate pages, click elements, type text, take screenshots, read console output, and execute JavaScript. Supports local Chromium, Browserbase, and Camofox backends.

Page navigationElement interactionScreenshot + vision analysisConsole outputJavaScript evaluationScroll/click/type

Requires: BROWSERBASE_API_KEY or local Chromium

Resource-heavy. Prefer web_search for simple lookups. Local Chromium must be installed separately.

🌐 Web & Browser

Vision

Default On

★★★★☆

Image analysis — load and describe images from URLs, file paths, or data URIs. Falls back to an auxiliary vision model if the main model lacks vision capabilities.

Image loading (URL/file/data URI)Visual descriptionFallback to auxiliary model

Some models lack native vision — falls back to slower auxiliary model. File paths must be absolute.

🎵 Media

Image Generation

Opt-in

★★★★☆

AI image generation via multiple backends. Supports OpenAI gpt-image-2, xAI Grok-Imagine, and more via plugins.

Text-to-imageMulti-backendImage caching

Requires: OPENAI_API_KEY or XAI_API_KEY

Requires a backend plugin with API key. Not available on all platforms.

🎵 Media

Video

Opt-in

★★★☆☆

Video analysis and generation. Supports FAL.ai multi-model (Veo 3.1, Kling, Pixverse) and xAI Grok-Imagine backends.

Text-to-videoImage-to-videoVideo analysis

Requires: FAL_KEY or XAI_API_KEY

Expensive API costs. FAL is the more mature backend.

🎵 Media

Text-to-Speech

Default On

★★★★☆

Convert text to spoken audio. Supports Edge TTS (free, default), ElevenLabs, OpenAI, MiniMax, Mistral, and local NeuTTS.

Text-to-audioMulti-providerVoice memo saving

Requires: Provider-dependent

Edge TTS works out of the box. Cloud providers need API keys. 4096-15000 char limits per provider.

⚡ Automation

Kanban

Opt-in

★★★★☆

Durable SQLite-based work queue for multi-agent coordination. Tasks have lifecycle (create → assign → complete/block), comments, and links. Dispatcher auto-assigns to worker profiles.

Task lifecycleMulti-profile assignmentComments and linksAuto-dispatchHeartbeat monitoring

Best used with multi-profile setups. Single-user kanban adds overhead without benefit.

💬 Messaging

Messaging

Default On

★★★★☆

Cross-platform message sending. Routes messages through the gateway to any connected platform — Telegram, Discord, Slack, Signal, and more.

Cross-platform sendGateway routingPlatform-specific formatting

Depends on gateway being active. Not available in CLI-only mode.

💬 Messaging

Discord

Opt-in

★★★★☆

Discord integration tools for the gateway. Enables the Hermes Discord bot to read and respond in channels and DMs.

Channel messagingDM handlingMessage history reading

Requires: Discord bot token

Requires Message Content Intent enabled in Discord Developer Portal.

💬 Messaging

Discord Admin

Opt-in

★★★☆☆

Discord admin and moderation tools — manage users, roles, channels, and server settings through the agent.

User managementRole managementChannel managementModeration actions

Requires: Discord bot token with admin permissions

Requires elevated Discord permissions. Use with caution — moderation actions are irreversible.

🧠 AI / ML

Reinforcement Learning

Opt-in

★★☆☆☆

Reinforcement learning tools for training and evaluating AI models. Off by default — niche use case for ML researchers.

RL training loopsModel evaluation

Requires: ML framework dependencies

Experimental. Not recommended for general use.

🧠 AI / ML

Mixture of Agents

Opt-in

★★★☆☆

Mixture of Agents pattern — runs multiple model instances in parallel and aggregates their outputs for improved quality. Off by default.

Parallel model inferenceOutput aggregationQuality improvement

Requires: Multiple API keys

Token cost multiplies by number of agents. Experimental feature.

🔧 Developer

Debugging

Opt-in

★★★☆☆

Extra introspection and debugging tools. Adds verbose logging, state inspection, and diagnostic capabilities. Off by default.

Verbose loggingState inspectionDiagnostic output

Generates a lot of output. Enable only when debugging specific issues.

🔧 Developer

Safe Mode

Opt-in

★★★☆☆

Minimal, low-risk toolset for locked-down sessions. Strips dangerous tools (terminal, browser, delegation) for safe exploration.

Read-only operationsMinimal tool surfaceReduced risk profile

Very limited functionality. Use only for untrusted or shared environments.

🔗 Integrations

Spotify

Opt-in

★★★★★

Spotify playback control — play, pause, skip, queue, search, manage playlists and library. Uses Spotify Web API with PKCE OAuth via the Spotify plugin.

Playback controlDevice managementQueue managementSearchPlaylist/library management

Requires: Spotify Premium, hermes auth spotify

Requires Spotify Premium. One-time OAuth setup via hermes auth spotify.

🔗 Integrations

Home Assistant

Opt-in

★★★☆☆

Smart home control via Home Assistant integration. Control lights, switches, sensors, and automations through the agent.

Device controlState queriesAutomation triggers

Requires: Home Assistant URL + token

Requires running Home Assistant instance. Off by default for security reasons.

🔗 Integrations

Feishu Docs

Opt-in

★★★☆☆

Feishu (Lark) document tools — create, read, and edit Feishu documents through the agent.

Document CRUDBlock operations

Requires: Feishu API credentials

Feishu-specific. Only useful if you use the Feishu/Lark platform.

🔗 Integrations

Feishu Drive

Opt-in

★★★☆☆

Feishu (Lark) drive tools — manage files and folders in Feishu Cloud Drive.

File managementFolder operations

Requires: Feishu API credentials

Feishu-specific. Requires separate API setup from Feishu Docs.

🔗 Integrations

Yuanbao

Opt-in

★★★☆☆

Yuanbao (Tencent) integration — @mention users in groups, query member information and group details.

Group member queries@mention supportGroup information

Requires: Yuanbao API credentials

China-specific platform. Requires Yuanbao account.

🔍

No tools match your search. Try a different filter or keyword.

Tools Reference

Terminal

File System

Web Search

Skills

Memory

Session Search

Delegation

Cron Jobs

Clarify

Task List

Code Execution

Browser

Vision

Image Generation

Video

Text-to-Speech

Kanban

Messaging

Discord

Discord Admin

Reinforcement Learning

Mixture of Agents

Debugging

Safe Mode

Spotify

Home Assistant

Feishu Docs

Feishu Drive

Yuanbao