Skip to content

Latest commit

 

History

History
145 lines (107 loc) · 5.25 KB

File metadata and controls

145 lines (107 loc) · 5.25 KB

Nerva Roadmap

Phase 1: Foundation (Current)

Goal: Build the intent → tool execution skeleton and verify it works with the daemon + CLI.

Done

  • Cargo workspace + 5 crate structure
  • nerva-core: CapabilityBus, ToolRegistry, PolicyEngine, Skill trait
  • nerva-os: process, clipboard, screenshot, wayland integration stubs
  • nerva-skills: 9 built-in skills
    • launch_app (Safe)
    • list_windows (Safe)
    • capture_screen (Safe)
    • clipboard_read (Safe)
    • run_command_safe (Caution)
    • notify (Safe)
    • open_path (Safe)
    • get_active_window (Safe)
    • focus_window (Caution)
  • nerva-daemon: Unix socket server (nervad)
  • nerva-cli: CLI client (nerva)
  • JSON-lines protocol
  • PolicyEngine with risk tiers
  • Execution audit log
  • Configuration file support (TOML) — ~/.config/nerva/config.toml
  • Graceful shutdown (SIGTERM/SIGINT signal handling)
  • systemd user service unit file (dist/nervad.service)
  • Integration tests (daemon + CLI end-to-end, 5 tests)
  • Unit tests (CapabilityBus, PolicyEngine, config, 9 tests)

TODO

  • Dynamic skill registration / plugin directory
    • Script-based plugins: skill.toml manifest + executable
    • Plugin directory: ~/.config/nerva/skills/
    • ID conflict detection (plugins cannot override built-in skills)
    • Configurable via [plugins] section in config

Phase 2: Screen Context

Goal: Understand screen state and enable context-aware operations.

  • get_active_window skill via compositor API (hyprctl)
  • Improved capture_screen — portal (xdg-desktop-portal) with grim fallback, region/window capture
  • OCR integration — ocr_screen skill via tesseract
  • VLM integration — Ollama HTTP client with vision model support (moondream, llava, etc.)
  • summarize_screen skill — capture → VLM (with OCR fallback)
  • Context assembly — DesktopContext struct + gather_context skill (active window + clipboard + screen text)

Phase 3: Agent Runtime (LLM Integration)

Goal: Automate the natural language → tool invocation pipeline with an LLM.

  • LLM client — Ollama chat API with tool calling support (llm.rs)
  • AgentRuntime — unified interpret → plan → execute loop (agent.rs)
    • Interpreter: LLM with system prompt + tool definitions → tool calls
    • Planner: multi-round tool calling (up to 10 rounds)
    • Orchestrator: sequential execution via CapabilityBus with result feedback
  • LLM backend abstraction — configurable Ollama URL and model via [llm] config
  • ask command — natural language queries via daemon protocol and CLI
  • Execution memory — conversation history maintained across tool-call rounds
  • Confirmation flow — headless auto-approve with risk-tier logging (UI confirmation deferred to Phase 6)

Phase 4: Watchers & Suggestions

Goal: React to state changes and provide suggestions as a resident agent.

  • Watcher framework — Watcher trait + WatcherManager with per-watcher poll intervals
  • Clipboard watcher — detects URLs and file paths, suggests relevant actions
  • Active window watcher — detects app switches, suggests context-aware actions
  • Suggestion mode — buffered suggestions accessible via get_suggestions command and CLI
  • Notification integration — optional desktop notifications via notify-send
  • Daemon integration — watcher lifecycle management, suggestion store, configurable via [watchers]

Phase 5: Workflow Automation

Goal: Safely execute multi-step workflows.

  • Workflow definition format
  • Browser control via Chrome DevTools Protocol
  • Editor integration — VS Code / Neovim
  • Extended safe shell — dynamic allowlist management
  • Limited input automation via ydotool / uinput
  • Killer workflows
    • "Read the error on screen and suggest causes + next command"
    • "Summarize the current article/PDF in 3 lines and save to notes"
    • "Pre-meeting setup" (check calendar → show notes → copy meeting URL)

Phase 6: Launcher UI

Goal: Build a graphical launcher UI.

  • Command palette UI with GTK4 / Tauri
  • Global shortcut (XDG Desktop Portal GlobalShortcuts)
  • Candidate display / incremental search
  • Execution confirmation dialog
  • Execution log viewer
  • Overlay mode

Phase 7: Preference Memory & Personalization

Goal: Learn user preferences and optimize operations.

  • Preference store — frequent apps, preferred operations, confirmation overrides
  • Usage analytics — track skill usage frequency
  • Smart defaults — optimize candidate ordering based on frequency

Phase 8: NixOS Module & Distribution

Goal: Package as a distributable NixOS module.

  • Nix flake
  • NixOS module (services.nerva.enable = true)
  • Home Manager module (keybindings, watcher rules, agent config)
  • NixOS VM test (testScript to verify agentd startup)
  • CI/CD pipeline

Non-Goals (for now)

The following are intentionally deferred:

  • Fully autonomous agent (executing anything without human confirmation)
  • Arbitrary bash execution
  • Dependence on click-coordinate-based automation
  • Elaborate memory / RAG systems
  • Complex multi-LLM router
  • Building a custom compositor