Goal: Build the intent → tool execution skeleton and verify it works with the daemon + CLI.
- Cargo workspace + 5 crate structure
-
nerva-core: CapabilityBus, ToolRegistry, PolicyEngine, Skill trait -
nerva-os: process, clipboard, screenshot, wayland integration stubs -
nerva-skills: 9 built-in skillslaunch_app(Safe)list_windows(Safe)capture_screen(Safe)clipboard_read(Safe)run_command_safe(Caution)notify(Safe)open_path(Safe)get_active_window(Safe)focus_window(Caution)
-
nerva-daemon: Unix socket server (nervad) -
nerva-cli: CLI client (nerva) - JSON-lines protocol
- PolicyEngine with risk tiers
- Execution audit log
- Configuration file support (TOML) —
~/.config/nerva/config.toml - Graceful shutdown (SIGTERM/SIGINT signal handling)
- systemd user service unit file (
dist/nervad.service) - Integration tests (daemon + CLI end-to-end, 5 tests)
- Unit tests (CapabilityBus, PolicyEngine, config, 9 tests)
- Dynamic skill registration / plugin directory
- Script-based plugins:
skill.tomlmanifest + executable - Plugin directory:
~/.config/nerva/skills/ - ID conflict detection (plugins cannot override built-in skills)
- Configurable via
[plugins]section in config
- Script-based plugins:
Goal: Understand screen state and enable context-aware operations.
-
get_active_windowskill via compositor API (hyprctl) - Improved
capture_screen— portal (xdg-desktop-portal) with grim fallback, region/window capture - OCR integration —
ocr_screenskill via tesseract - VLM integration — Ollama HTTP client with vision model support (moondream, llava, etc.)
-
summarize_screenskill — capture → VLM (with OCR fallback) - Context assembly —
DesktopContextstruct +gather_contextskill (active window + clipboard + screen text)
Goal: Automate the natural language → tool invocation pipeline with an LLM.
- LLM client — Ollama chat API with tool calling support (
llm.rs) - AgentRuntime — unified interpret → plan → execute loop (
agent.rs)- Interpreter: LLM with system prompt + tool definitions → tool calls
- Planner: multi-round tool calling (up to 10 rounds)
- Orchestrator: sequential execution via CapabilityBus with result feedback
- LLM backend abstraction — configurable Ollama URL and model via
[llm]config -
askcommand — natural language queries via daemon protocol and CLI - Execution memory — conversation history maintained across tool-call rounds
- Confirmation flow — headless auto-approve with risk-tier logging (UI confirmation deferred to Phase 6)
Goal: React to state changes and provide suggestions as a resident agent.
- Watcher framework —
Watchertrait +WatcherManagerwith per-watcher poll intervals - Clipboard watcher — detects URLs and file paths, suggests relevant actions
- Active window watcher — detects app switches, suggests context-aware actions
- Suggestion mode — buffered suggestions accessible via
get_suggestionscommand and CLI - Notification integration — optional desktop notifications via notify-send
- Daemon integration — watcher lifecycle management, suggestion store, configurable via
[watchers]
Goal: Safely execute multi-step workflows.
- Workflow definition format
- Browser control via Chrome DevTools Protocol
- Editor integration — VS Code / Neovim
- Extended safe shell — dynamic allowlist management
- Limited input automation via ydotool / uinput
- Killer workflows
- "Read the error on screen and suggest causes + next command"
- "Summarize the current article/PDF in 3 lines and save to notes"
- "Pre-meeting setup" (check calendar → show notes → copy meeting URL)
Goal: Build a graphical launcher UI.
- Command palette UI with GTK4 / Tauri
- Global shortcut (XDG Desktop Portal GlobalShortcuts)
- Candidate display / incremental search
- Execution confirmation dialog
- Execution log viewer
- Overlay mode
Goal: Learn user preferences and optimize operations.
- Preference store — frequent apps, preferred operations, confirmation overrides
- Usage analytics — track skill usage frequency
- Smart defaults — optimize candidate ordering based on frequency
Goal: Package as a distributable NixOS module.
- Nix flake
- NixOS module (
services.nerva.enable = true) - Home Manager module (keybindings, watcher rules, agent config)
- NixOS VM test (
testScriptto verify agentd startup) - CI/CD pipeline
The following are intentionally deferred:
- Fully autonomous agent (executing anything without human confirmation)
- Arbitrary bash execution
- Dependence on click-coordinate-based automation
- Elaborate memory / RAG systems
- Complex multi-LLM router
- Building a custom compositor