Tools Reference

Every Hermes Agent tool across 8 categories — 14 enabled by default, 15 opt-in.

29
Total Tools
8
Categories
14
Default On
19
Top Rated ★4+
⚙️ Core Tools

Terminal

Default On
★★★★★

Execute shell commands, manage background processes, run scripts. The primary way Hermes interacts with your system. Supports local, Docker, SSH, and Modal backends.

Shell executionProcess managementPTY modeMulti-backend (local/docker/ssh/modal)
PTY mode has \r vs \n issues with prompt_toolkit apps. Prefer tmux for interactive spawning.
⚙️ Core Tools

File System

Default On
★★★★★

Read, write, search, and patch files on the local filesystem. Replaces cat/grep/sed with agent-friendly structured operations.

File read/writeContent search (ripgrep-backed)Find-and-replace patchingSyntax linting on write
write_file completely overwrites files. Use patch for targeted edits to avoid losing content.
⚙️ Core Tools

Web Search

Default On
★★★★★

Internet search and content extraction. The primary research tool — finds URLs, fetches content, and extracts structured data from web pages.

Web searchContent extractionURL fetchingMulti-backend (Firecrawl, Tavily, SearXNG, etc.)
Requires: FIRECRAWL_API_KEY or TAVILY_API_KEY
Requires at least one search backend configured. Some sites block automated access.
⚙️ Core Tools

Skills

Default On
★★★★★

Browse, install, create, and manage skills. Skills are reusable procedure documents that teach the agent how to do specific tasks.

Skill search/installSkill creationSkill managementHub publishing
None — fully self-contained.
⚙️ Core Tools

Memory

Default On
★★★★☆

Persistent cross-session memory. Stores facts about the user, environment, and lessons learned. Pluggable backends (built-in SQLite, Honcho, Mem0).

Fact storage/retrievalUser preference learningCross-session persistencePluggable backends
Memory is bounded (~2KB). Old entries are evicted when full. Cloud backends need API keys.
⚙️ Core Tools

Session Search

Default On
★★★★☆

Search past conversations using FTS5 full-text search. Retrieves summaries of matching sessions. Essential for cross-session context.

FTS5 full-text searchSession summariesRecent session browsingRole-based filtering
Search uses OR between keywords by default (AND for phrases). Recent sessions mode has no LLM cost.
⚙️ Core Tools

Delegation

Default On
★★★★☆

Spawn subagents with isolated contexts and terminal sessions. Supports parallel batch execution (up to 3 concurrent children).

Subagent spawningParallel batch executionIsolated context/terminalLeaf and orchestrator roles
Not durable — children are cancelled if parent is interrupted. Use cron jobs or background terminal for persistent work.
⚙️ Core Tools

Cron Jobs

Default On
★★★★★

Built-in scheduler for recurring tasks. Supports durations, cron expressions, and ISO timestamps. Jobs run autonomously with configurable model/skills/delivery.

Scheduled executionPer-job model overrideScript pre-runMulti-platform deliveryWatchdog pattern (no_agent)
Schedule format: duration (30m), cron (0 9 * * *), or ISO. "every sunday" phrases not supported.
⚙️ Core Tools

Clarify

Default On
★★★★☆

Ask the user clarifying questions when a task is ambiguous. Supports multiple choice (up to 4 options) and open-ended modes.

Multiple choice promptsOpen-ended questionsInline "Other" option
Overuse can be annoying. Prefer making a reasonable default when the decision is low-stakes.
⚙️ Core Tools

Task List

Default On
★★★★☆

In-session task tracking with priority ordering. Supports create, update, mark complete, cancel, and merge operations. One task in_progress at a time.

Task CRUDPriority orderingMerge/replace modesProgress tracking
Tasks are session-scoped — not persistent across sessions. Use kanban for durable task management.
⚙️ Core Tools

Code Execution

Default On
★★★★☆

Sandboxed Python execution with access to file/search/patch/terminal tools. Use for multi-step processing, data filtering, and conditional logic between tool calls.

Python executionTool library access5-minute timeout50KB stdout cap
Foreground-only (no background/pty). 50 tool calls per script. Stdout capped at 50KB.
🌐 Web & Browser

Browser

Opt-in
★★★★★

Full browser automation — navigate pages, click elements, type text, take screenshots, read console output, and execute JavaScript. Supports local Chromium, Browserbase, and Camofox backends.

Page navigationElement interactionScreenshot + vision analysisConsole outputJavaScript evaluationScroll/click/type
Requires: BROWSERBASE_API_KEY or local Chromium
Resource-heavy. Prefer web_search for simple lookups. Local Chromium must be installed separately.
🌐 Web & Browser

Vision

Default On
★★★★☆

Image analysis — load and describe images from URLs, file paths, or data URIs. Falls back to an auxiliary vision model if the main model lacks vision capabilities.

Image loading (URL/file/data URI)Visual descriptionFallback to auxiliary model
Some models lack native vision — falls back to slower auxiliary model. File paths must be absolute.
🎵 Media

Image Generation

Opt-in
★★★★☆

AI image generation via multiple backends. Supports OpenAI gpt-image-2, xAI Grok-Imagine, and more via plugins.

Text-to-imageMulti-backendImage caching
Requires: OPENAI_API_KEY or XAI_API_KEY
Requires a backend plugin with API key. Not available on all platforms.
🎵 Media

Video

Opt-in
★★★☆☆

Video analysis and generation. Supports FAL.ai multi-model (Veo 3.1, Kling, Pixverse) and xAI Grok-Imagine backends.

Text-to-videoImage-to-videoVideo analysis
Requires: FAL_KEY or XAI_API_KEY
Expensive API costs. FAL is the more mature backend.
🎵 Media

Text-to-Speech

Default On
★★★★☆

Convert text to spoken audio. Supports Edge TTS (free, default), ElevenLabs, OpenAI, MiniMax, Mistral, and local NeuTTS.

Text-to-audioMulti-providerVoice memo saving
Requires: Provider-dependent
Edge TTS works out of the box. Cloud providers need API keys. 4096-15000 char limits per provider.
⚡ Automation

Kanban

Opt-in
★★★★☆

Durable SQLite-based work queue for multi-agent coordination. Tasks have lifecycle (create → assign → complete/block), comments, and links. Dispatcher auto-assigns to worker profiles.

Task lifecycleMulti-profile assignmentComments and linksAuto-dispatchHeartbeat monitoring
Best used with multi-profile setups. Single-user kanban adds overhead without benefit.
💬 Messaging

Messaging

Default On
★★★★☆

Cross-platform message sending. Routes messages through the gateway to any connected platform — Telegram, Discord, Slack, Signal, and more.

Cross-platform sendGateway routingPlatform-specific formatting
Depends on gateway being active. Not available in CLI-only mode.
💬 Messaging

Discord

Opt-in
★★★★☆

Discord integration tools for the gateway. Enables the Hermes Discord bot to read and respond in channels and DMs.

Channel messagingDM handlingMessage history reading
Requires: Discord bot token
Requires Message Content Intent enabled in Discord Developer Portal.
💬 Messaging

Discord Admin

Opt-in
★★★☆☆

Discord admin and moderation tools — manage users, roles, channels, and server settings through the agent.

User managementRole managementChannel managementModeration actions
Requires: Discord bot token with admin permissions
Requires elevated Discord permissions. Use with caution — moderation actions are irreversible.
🧠 AI / ML

Reinforcement Learning

Opt-in
★★☆☆☆

Reinforcement learning tools for training and evaluating AI models. Off by default — niche use case for ML researchers.

RL training loopsModel evaluation
Requires: ML framework dependencies
Experimental. Not recommended for general use.
🧠 AI / ML

Mixture of Agents

Opt-in
★★★☆☆

Mixture of Agents pattern — runs multiple model instances in parallel and aggregates their outputs for improved quality. Off by default.

Parallel model inferenceOutput aggregationQuality improvement
Requires: Multiple API keys
Token cost multiplies by number of agents. Experimental feature.
🔧 Developer

Debugging

Opt-in
★★★☆☆

Extra introspection and debugging tools. Adds verbose logging, state inspection, and diagnostic capabilities. Off by default.

Verbose loggingState inspectionDiagnostic output
Generates a lot of output. Enable only when debugging specific issues.
🔧 Developer

Safe Mode

Opt-in
★★★☆☆

Minimal, low-risk toolset for locked-down sessions. Strips dangerous tools (terminal, browser, delegation) for safe exploration.

Read-only operationsMinimal tool surfaceReduced risk profile
Very limited functionality. Use only for untrusted or shared environments.
🔗 Integrations

Spotify

Opt-in
★★★★★

Spotify playback control — play, pause, skip, queue, search, manage playlists and library. Uses Spotify Web API with PKCE OAuth via the Spotify plugin.

Playback controlDevice managementQueue managementSearchPlaylist/library management
Requires: Spotify Premium, hermes auth spotify
Requires Spotify Premium. One-time OAuth setup via hermes auth spotify.
🔗 Integrations

Home Assistant

Opt-in
★★★☆☆

Smart home control via Home Assistant integration. Control lights, switches, sensors, and automations through the agent.

Device controlState queriesAutomation triggers
Requires: Home Assistant URL + token
Requires running Home Assistant instance. Off by default for security reasons.
🔗 Integrations

Feishu Docs

Opt-in
★★★☆☆

Feishu (Lark) document tools — create, read, and edit Feishu documents through the agent.

Document CRUDBlock operations
Requires: Feishu API credentials
Feishu-specific. Only useful if you use the Feishu/Lark platform.
🔗 Integrations

Feishu Drive

Opt-in
★★★☆☆

Feishu (Lark) drive tools — manage files and folders in Feishu Cloud Drive.

File managementFolder operations
Requires: Feishu API credentials
Feishu-specific. Requires separate API setup from Feishu Docs.
🔗 Integrations

Yuanbao

Opt-in
★★★☆☆

Yuanbao (Tencent) integration — @mention users in groups, query member information and group details.

Group member queries@mention supportGroup information
Requires: Yuanbao API credentials
China-specific platform. Requires Yuanbao account.
🔍

No tools match your search. Try a different filter or keyword.