Skip to content

feat(channels): add Max Messenger channel#1109

Open
HumanGoClaude wants to merge 27 commits into
nextlevelbuilder:devfrom
HumanGoClaude:feat/max-channel
Open

feat(channels): add Max Messenger channel#1109
HumanGoClaude wants to merge 27 commits into
nextlevelbuilder:devfrom
HumanGoClaude:feat/max-channel

Conversation

@HumanGoClaude

@HumanGoClaude HumanGoClaude commented May 6, 2026

Copy link
Copy Markdown

Summary

Adds Max (https://max.ru) as a first-class channel
under internal/channels/max/, following the same factory-registration
pattern as Telegram, Discord, Slack, Feishu, WhatsApp, Zalo OA, Zalo
Personal, Facebook, and Pancake. The new package supplies long-poll
and webhook transports, in-place streaming edits, allowlist / pairing
access control, and inbound / outbound media. Admin UI integration is
included so operators can create Max channels through the dashboard.

The backend wires through three small touch-points (one line each in
cmd/gateway.go, internal/channels/channel.go, and
internal/http/channel_instances.go). The admin dashboard receives Max
through five entries in the existing schema-driven channel forms — no
new components, just data added to the schemas already used by Discord,
Facebook, and Pancake.

The diff totals 48 files / +8,076 / −21 — large because the new
backend package is test-heavy: 3,188 LOC of production Go with
3,559 LOC of unit/integration tests + 173 LOC of JSON fixtures
captured from real API responses (a 1.12× test-to-production ratio).
The remaining ~1,140 LOC are documentation. The UI portion is 20 LOC of
TypeScript schema entries — no React components — so no UI test layer
was added (same pattern as Discord / Facebook / Pancake schemas).

Type

  • Feature
  • Bug fix
  • Hotfix (targeting main)
  • Refactor
  • Docs
  • CI/CD

Target Branch

dev (per CONTRIBUTING.md).

Checklist

  • go build ./... passes
  • go build -tags sqliteonly ./... passes
  • go vet ./... passes
  • Tests pass: go test -race ./...
  • Web UI builds: pnpm lint + pnpm build pass
  • No hardcoded secrets or credentials
  • SQL queries use parameterized $1, $2 (no new SQL — uses the
    existing channel_instances table only)
  • New user-facing strings added to all 3 locales (en/vi/zh)
    (only the pairing-reply text and "💭 Thinking..."
    streaming placeholder are new; both are English defaults
    identical in approach to Telegram, Discord, Slack — none of
    the existing channels localize these strings via
    i18n.Catalog)
  • Migration version bumped in internal/upgrade/version.go (if new migration)
    (no migration — uses existing channel_instances table with
    channel_type='max')

Test Plan

Automated

  • 241 unit tests in the new internal/channels/max/ package,
    covering: HTTP client, inbound translator, outbound dispatcher,
    chunker, streaming state machine, webhook signature verification,
    HMAC for request_contact, media download/upload, reactions,
    policy translator (DM + group), and a lifecycle integration_test.go
    exercising start → poll → policy → dispatch → stream → finalize.
  • All tests pass under go test -race -count=1 -timeout=120s ./internal/channels/max/... (~20 s).
  • Full repo go test -count=1 -short ./... is green — no regression
    in any other package.
  • Build matrices: go build ./... and go build -tags sqliteonly ./...
    both pass. go vet ./... clean. gofmt clean on every file
    touched by this PR.
  • Frontend toolchain: Node 22 + pnpm 10 (matching CI's web job).
    pnpm install --frozen-lockfile resolves cleanly, pnpm lint reports
    zero errors, tsc -b && vite build transforms 6,077 modules without
    errors.

Manual

End-to-end validation against a real Max bot in a local goclaw stack:

  • Channel created through the admin UI (Channels → Add Channel →
    Max), POST /api/channel-instances with channel_type=max.
    Channel reaches healthy state after probing GET /me.
  • DM from a test user → policy check (dm_policy=open) → agent loop
    → streaming placeholder edits with throttled markdown finalisation
    → final reply delivered with citations from web_fetch tool.
  • DM under dm_policy=pairing from an unpaired user → pairing-code
    reply sent via Max API; agent loop is not triggered (verified
    the security gate works).
  • Group code is implemented but not exercised live — Max platform
    does not yet permit adding bots to chats. Group policy logic is
    unit-tested.
  • Webhook mode is implemented and unit-tested but has not been
    exercised against the production Max API yet.
    Polling is the
    recommended primary mode and the only one validated end-to-end.

Why this design

Max (https://max.ru) is a messaging platform popular
in Russia (>140M MAU), with a documented Bot API at
platform-api.max.ru. There is no widely-adopted Go SDK for it, so the
channel uses a small net/http client directly. The API surface in use
is deliberately narrow (~10 endpoints), which keeps the client simple
enough to maintain in-tree without a third-party dependency — same
reasoning by which goclaw uses stdlib + a thin client for several
existing channels.

The package mirrors existing channels' conventions:

  • Embeds BaseChannel for shared allowlist/pairing/health logic
  • Self-registers through instanceLoader.RegisterFactory
  • Publishes inbound to MessageBus; consumes outbound from
    MessageBus
  • Uses the same PairingStore and PendingMessageStore shared
    services
  • Implements StreamingChannel, WebhookChannel, ReactionChannel,
    BlockReplyChannel — the same optional interfaces other channels
    expose

Notable choices specific to Max:

  • Two transports in one package (polling + webhook) selected by
    config.mode. Splitting them would duplicate the inbound translator
    and the policy wiring; existing channels with multiple transport
    modes (WhatsApp bridge vs. native, Slack bot vs. user) keep them
    in one package.
  • Streaming uses plain text during edits, then re-emits as
    markdown on final Send(). Max's markdown parser breaks on partial
    syntax mid-stream.
  • request_contact HMAC verified with hmac.Equal for
    constant-time comparison; distinct error types for malformed hex
    vs. tampered payload.
  • Admin UI integration is data, not code. The dashboard generates
    channel forms from configSchema and credentialsSchema. Adding
    Max meant a few schema entries and channel-type list additions
    across 5 existing files (~20 LOC total) — no new TSX components,
    mirroring how Discord, Facebook, and Pancake are integrated.

What's in this PR

Category Files Notes
Backend channel 28 .go files in internal/channels/max/ (+ README.md, 7 testdata/*.json) Self-contained package
Backend wiring cmd/gateway.go, internal/channels/channel.go, internal/http/channel_instances.go One added line each (factory registration, TypeMax constant, validator entry)
Admin UI 5 files in ui/web/src/ (constants/channels.ts, pages/channels/channel-schemas.ts, pages/channels/channels-status-utils.ts, pages/config/sections/bindings-section.tsx, pages/contacts/contacts-page.tsx) Schema-driven; +20 LOC, no new components
Docs docs/05-channels-messaging.md, README.md, CLAUDE.md, CHANGELOG.md Channel-count bump, new section, file-tree entry, Unreleased entry
Other .gitignore (+3 lines for local smoke binary) Keeps the repo root clean

The diff in cmd/gateway.go shows the 1 new factory-registration
line plus a few lines of goimports re-grouping that the formatter
applied automatically when the file was opened. Happy to revert the
re-grouping if you prefer the patch strictly minimal.

Backwards compatibility

  • New channel type. No existing behaviour changes.
  • Schema unchanged (see migration note in the Checklist above).
  • New feat(channels/max) Conventional-Commits scope follows the
    per-channel scoping convention.

Follow-ups (separate PRs, after this one merges)

  1. User-facing setup guidedocs(channels): add Max setup guide in nextlevelbuilder/goclaw-docs (modeled on
    channels/telegram.md)
  2. Webhook end-to-end validation — once a public HTTPS endpoint
    is available, exercise webhook mode against the live API and
    tighten the "known limitations" wording
  3. Channel Comparison matrix — add a Max column to the table at
    docs/05-channels-messaging.md § 4

Reviewer suggested order

If you'd like to slice the review:

  1. internal/channels/channel.go (1 line) — confirm constant placement
  2. cmd/gateway.go (1 substantive line — the RegisterFactory
    call) — confirm wiring matches the pattern
  3. internal/channels/max/factory.go — config + creds shape
  4. internal/channels/max/max.go — lifecycle and embedded
    BaseChannel
  5. internal/channels/max/inbound.go + policy.go — request flow,
    policy enforcement
  6. internal/channels/max/outbound.go + stream.go + format.go
    — response flow + chunking
  7. internal/channels/max/webhook.go + auth.go — alt transport
    and HMAC
  8. Tests — *_test.go, especially integration_test.go
  9. ui/web/src/pages/channels/channel-schemas.tscredentialsSchema.max
    and configSchema.max mirror factory.go's InstanceConfig
  10. Other UI files — small enum/array additions
  11. Docs — docs/05-channels-messaging.md Max section + extended
    interfaces table

Happy to squash the commit history or drop the webhook mode if you'd
prefer a narrower scope.

Origin

The 20 commits split into 15 code (feat/fix/test/chore)
and 5 docs. Branch is linear, no merge commits. Branch is based
on the public v3.11.2 tag — every commit is reachable from it. The
work was developed on a separate fork
(HumanGoClaude/goclaw_max_channel)
to allow iterative testing before upstreaming; this branch
(feat/max-channel) is the version targeting dev.

HumanGoClaude added 20 commits May 6, 2026 09:19
Skeleton implementation of channels.Channel interface for Max
Messenger (https://max.ru). Lays groundwork for full integration
to be filled in by subsequent commits.

Day 1 deliverables:
- internal/channels/max/ package with factory + Channel struct
- TypeMax constant in internal/channels/channel.go
- Factory registration in cmd/gateway.go
- 7 unit tests covering factory parsing and Channel interface

Day 2-5 will add:
- Max API HTTP client (GET /me, GET /updates, POST /messages, etc.)
- Inbound polling loop and message translator
- Outbound dispatcher with chunking and Markdown formatting
- Streaming preview via PUT /messages
- Webhook mode (POST /subscriptions, HTTPS-only)
- request_contact HMAC verification
- Reactions, callback buttons, media upload
- Documentation in docs/05-channels-messaging.md

Refs: https://dev.max.ru/docs-api
Adds production implementation of Max Messenger inbound:
- types.go: API types matching dev.max.ru/docs-api
- client.go: HTTP client with auth, retries, JSON enc/dec
- inbound.go: long-poll loop, update dispatch, translator
- max.go: real Start/Stop lifecycle with bot probe + handler pool

Send is still a stub (Day 3). Media downloads stubbed (Day 4).
Webhook mode (POST /subscriptions) is Day 4.

Refs: https://dev.max.ru/docs-api
Critical findings from live PoC test against platform-api.max.ru:

1. Inner message content is returned under JSON key 'message',
   not 'body' as suggested in some docs sections. Real payload
   structure: {message: {sender, recipient, message: {mid, seq, text}}}.

2. Recipient.user_id and chat_id are BOTH populated for direct
   messages (user_id = bot's id, chat_id = dialog thread id).
   Authoritative discriminator is chat_type ('dialog' vs 'chat').

3. DM chat_id is the dialog thread ID, not the sender's user_id —
   stable per-conversation identifier suitable for goclaw session keys.

Patches:
- types.go: Message.Body json tag 'body' -> 'message'
- types.go: Recipient.IsDialog() now uses ChatType
- inbound.go: DM chatID = recipient.chat_id

Verified against live bot: text 'проверка фикса', 'второй тест',
and image attachment all parse correctly.

Refs: https://dev.max.ru/docs-api
Adds 39 unit tests covering Day 2 production code:

client_test.go (18 tests):
- GetMe success/auth-error/network-error
- GetUpdates with/without marker, type filtering, rate-limit retry,
  context cancellation
- SendMessage DM/group/validation
- EditMessage, PostAction, AnswerCallback
- SubscribeWebhook, UnsubscribeWebhook

inbound_test.go (21 tests):
- DM text translator with real PoC fixture
- DM with image attachment
- Group with @mention via markup — passes mention gate
- Group without mention — filtered when require_mention=true
- Group without mention — passes when require_mention=false
- Self-loop guard, empty messages, malformed messages
- detectMention: markup-based, textual fallback, case-insensitive
- stripBotMention table-driven
- Recipient.IsDialog discriminator
- chatIDFromUpdate fallbacks
- buildMetadata: basic, edited flag, linked reply
- Marker get/set with copy-safety

Test data includes real captures from PoC live test plus synthetic
fixtures for group/callback scenarios.

Refs: https://dev.max.ru/docs-api
Implements channels.Channel.Send for the Max channel.

outbound.go (new, 134 LOC):
- Send: parses ChatID from bus.OutboundMessage, chunks content,
  sends each chunk via POST /messages with format=markdown
- parseChatID: validates bus string -> int64 conversion
- lastMessageIDFor: lookup last sent message_id per chat (used by
  streaming preview in Day 4)

format.go (new, 107 LOC):
- chunkText: paragraph -> line -> sentence -> word -> codepoint-safe hard cut
- safeUTF8Cut: prevents splitting multi-byte UTF-8 sequences

types.go:
- SendMessageResponse extended with top-level chat_id, recipient_id,
  message_id (matches real API response observed in PoC)

client.go:
- SendMessage / EditMessage now return *SendMessageResponse instead
  of *Message — needed to expose message_id for streaming edits

max.go:
- Channel struct gains placeholders sync.Map and sentCount counter
- Send() wired to outbound.send

Tests: 20 new (11 outbound + 9 format), 3 existing client tests
adjusted for new return type. Total 66 unit tests, all green.

Day 4 will add: media upload/download, streaming preview (PUT /messages),
webhook mode, request_contact HMAC verification.

Refs: https://dev.max.ru/docs-api
Day 4 of Sprint 9 — adds production features short of streaming
(StreamingChannel deferred to Day 4.5 due to its complexity).

New files:

  auth.go, auth_test.go (60 + 90 LOC, 8 tests)
  - VerifyContactHash for request_contact button auth
  - HMAC-SHA256(bot_token, vcf_info) per Max docs
  - Constant-time comparison; distinct errors for bad-format vs bad-hash

  webhook.go, webhook_test.go (144 + 222 LOC, 12 tests)
  - WebhookChannel impl: WebhookHandler() returns (path, http.Handler)
  - HTTPS-only enforcement, body size cap (1 MiB), proper status codes
  - Always 200 OK once parsed (Max retries non-2xx — avoid retry storms)

  media_download.go, media_test.go (285 + 413 LOC, 15 tests)
  - HTTP fetch with retry, size cap (25 MiB), Content-Length pre-check
  - Per-attachment failures logged and skipped — partial-success preferred
  - Sanitized filenames; type-prefixed temp files (goclaw_max_image_*.jpg)

  media_upload.go (343 LOC, tests in media_test.go)
  - Two-step Max upload: POST /uploads -> multipart push -> attach token
  - Handles image (photo_ids/photos), video, audio, file
  - Failed uploads logged but don't fail send — text still ships

  reactions.go, reactions_test.go (204 + 238 LOC, 10 tests)
  - ReactionChannel impl mapping goclaw status -> Max actions API
  - Per-chat refresher goroutines (4s interval keeps typing visible)
  - Graceful Stop via stopAllReactionRefreshers()

Updated:

  max.go — Channel struct gains runCtxMu/pollRunCtx/reactionRefreshers
           Start saves pollRunCtx; Stop drains reaction refreshers
           BlockReplyEnabled() method (BlockReplyChannel impl)

  inbound.go — downloadInboundMedia stub removed (real impl in
               media_download.go)

  outbound.go — Send wires uploadAndAttachMedia; media attached only to
                first chunk; sendOneChunk helper for media-only case

Tests: 45 new (8 auth + 12 webhook + 15 media + 10 reactions),
       111 total all green.

Day 4.5 will add StreamingChannel implementation.

Refs: https://dev.max.ru/docs-api
Day 4.5 of Sprint 9 — adds StreamingChannel implementation.

Architectural decision: Опция 2 (answer-only streaming).
Reasoning lanes (Опция 3) deferred to a future PR. Slack and Max share
this minimal pattern; Telegram extends it with per-lane draft transport.

stream.go (312 LOC):
- maxStream struct implementing channels.ChannelStream:
  * Update: throttle (800ms) + dedup + accumulator pattern
  * Stop: final flush, idempotent, no message deletion
  * MessageID: returns 0 (Max uses string mids, handed off via type assertion)
- StreamingChannel methods on *Channel:
  * StreamEnabled: cfg.DMStream/GroupStream with default true/false
  * CreateStream: POST '💭 Печатаю...' eagerly, returns stream w/ mid
  * FinalizeStream: type-asserts maxStream, stores mid in c.placeholders
  * ReasoningStreamEnabled: false (Опция 2)

Lifecycle:
  CreateStream → POST /messages '💭 Печатаю...' → returns stream w/ messageID
  Update(text) → PUT /messages (throttled, plain text)
  Stop → final PUT if pending text remains
  FinalizeStream → c.placeholders.Store(chatID, messageID)
  Send (next call) → consumePlaceholder → PUT /messages with markdown format

Configuration (factory.go):
- dm_stream: default true (modern UX expected for new channel)
- group_stream: default false (Max doesn't yet support bots in groups)

outbound.go updates:
- Send detects pre-stored placeholder via consumePlaceholder
- First chunk routes to editPlaceholder (PUT) instead of SendMessage (POST)
- Subsequent chunks always POST as before
- Media-on-first-chunk path: deletes placeholder, sends fresh (Max API
  doesn't support attachments on EditMessage)
- Edited placeholder mid is NOT re-stored — would cause next Send to
  overwrite the user-visible answer instead of producing a new message

Decisions documented:
- Throttle 800ms: balance UX vs Max 30 rps limit (1.25 rps even at full
  utilization, headroom for parallel runs)
- Plain text during stream, markdown final: avoids partial-token rendering
  glitches mid-stream (e.g. unclosed **bold)
- Edit errors logged at debug, not propagated: stream is best-effort UX;
  the next Update will retry with fresher text
- Eager placeholder (in CreateStream): user sees '💭 Печатаю...' immediately
  on receiving their message — better UX than lazy-on-first-chunk

Tests: 22 new (throttle, dedup, lifecycle, accumulation, FinalizeStream
handoff, full E2E with Send placeholder edit). Total 133 unit tests, all green.

Refs: https://dev.max.ru/docs-api
docs/05-channels-messaging.md:
- New '## 12. Max Messenger' section, structured in Slack/WhatsApp style
- Key Behaviors covering both transports (polling + webhook), markdown
  forwarding, streaming, reactions, media, request_contact verification,
  group support status, lifecycle
- API Findings table documenting three live-API discrepancies vs docs
  (body→message, recipient discriminator, DM chat_id semantics)
- Configuration shapes for channel_instances.config and credentials JSONB
- Streaming lifecycle Mermaid diagram showing CreateStream → Update →
  Stop → FinalizeStream → Send placeholder edit handoff

Cross-channel sections (Channel-Isolated Workspaces, Local Key Propagation,
Per-User Isolation, Pairing System) renumbered 12-15 → 13-16 to keep the
channel-specific group (5-12) contiguous.

internal/channels/max/README.md (new):
- File map describing each .go file's purpose
- Architecture decisions: minimal upstream fork (5 LOC), custom HTTP client
  rationale, answer-only streaming choice, placeholder ownership invariant,
  chat_type discriminator
- Live test commands via cmd/max-smoke
- Guide for adding new endpoints
- Known limitations (groups, voice, callbacks, request_contact outbound)

Refs: https://dev.max.ru/docs-api
Addresses 5 of 7 valid issues from a four-role expert review (Architect,
Product Owner, Analyst, Team Lead Go). Two issues (concurrent runs race,
media-upload-failure UX) were rejected after maintainer reverification.

Code fixes:

  webhook.go — independent context for dispatch
  Webhook handler used to dispatch on c.pollRunCtx, sharing the polling
  goroutine's lifecycle. A webhook arriving mid-Stop saw a just-cancelled
  context and dropped the message AFTER we'd already 200-OK'd Max →
  silent message loss during rolling restarts. Fix: per-delivery
  context.Background with 5-minute timeout, dispatched on a fresh
  goroutine. Renamed runContext → pollContext (still used by reaction
  refreshers, which correctly stop on channel Stop).

  client.go, media_download.go — DownloadFile helper
  media_download.go reached into c.client.httpClient directly, breaking
  encapsulation. New Client.DownloadFile uses the same transport stack;
  future middleware (otelhttp, retry) will apply uniformly to both API
  calls and asset downloads.

  stream.go — orphan placeholder cleanup
  When a stream's CreateStream succeeds but no Update fires (agent crash
  before first chunk), the '💭 Печатаю...' placeholder lived in chat
  forever. Fix: FinalizeStream now deletes the placeholder if lastSent
  is empty.

  outbound.go, stream.go — panic safety
  defer recover in send() restores the consumed placeholder before
  re-raising — preserves runtime visibility while preventing UI orphans
  if a slog handler or transport panics. defer recover in flushLocked()
  swallows (stream is best-effort UX; the next Update retries).

  media_upload.go — single transient retry
  RequestUploadURL and UploadFile now retry once on transient errors
  (context.DeadlineExceeded, net.OpError, 5xx, EOF, connection reset).
  Permanent errors (4xx, local file errors) are not retried. Trade-off
  documented: a transient retry may produce one orphan upload, which
  Max storage TTLs out within 24-48h.

Test additions:

  stream_test.go — TestStreaming_ConcurrentRuns_DoNotInterfere
  Documents the known limitation that c.placeholders is keyed only on
  chatID, so the second FinalizeStream in a chat overwrites the first.
  Practically unreachable in DM (debounce + per-session run limits);
  will need a per-RunContext key once Max enables group bots.

  stream_test.go — TestFinalizeStream_DeletesOrphanPlaceholder
  Regression check for the orphan cleanup fix.

  webhook_test.go — TestServeWebhook_DispatchSurvivesStop
  Verifies ServeHTTP returns promptly (dispatch is async). Full
  Start/Stop lifecycle aspect of the fix is exercised in
  integration_test.go.

  integration_test.go (NEW)
  Full pipeline: Channel.Start → mock backend serves /me + /updates →
  bus delivery → Channel.Send → Channel.Stop. The safety net we lacked.

Doc updates:

  internal/channels/max/README.md
  - 'Concurrent runs in same chat' under Known limitations
  - New 'Webhook security' section with required and recommended
    operator controls (hard-to-guess URL, TLS, dm_policy: allowlist,
    ingress allowlist, rate limiting)

  docs/05-channels-messaging.md
  - 'Webhook security' subsection appended to Max Messenger section,
    summarizing the operator guidance.

Tooling:

  apply-day5b.sh runs go test with -race. The Max channel has multiple
  goroutines (polling, reaction refreshers, webhook dispatch); race
  detector coverage is non-negotiable for production.

Rejected after maintainer reverification:

  Concurrent runs placeholder race: documented as known limitation.
  Goclaw's debounce + per-session run limits make this practically
  unreachable in DM. Adding a per-chat mutex would block legitimate
  parallel work without solving the actual problem (which is the
  chatID-only handoff key, not concurrency itself).

  Media upload failure UX (Variant A: append '(не удалось прикрепить
  файл)' to text): rejected. Hardcoded Russian, modifies agent output
  without agent's knowledge, may corrupt conversation history. Better
  long-term fix is to surface failure metadata on a follow-up agent
  turn — out of scope for this PR.

Tests: 4 new (orphan delete, concurrent runs limitation, webhook async
verify, full integration). Total 137 unit + 1 integration test, all
green with -race detector enabled.

Refs: https://dev.max.ru/docs-api
isValidChannelType() in internal/http/channel_instances.go had a
hardcoded switch over known channel-type string literals. Although
the factory was registered in cmd/gateway.go:470 and channels.TypeMax
defined in internal/channels/channel.go:77, POST /api/channel-instances
returned HTTP 400 'invalid channel_type' when admin UI tried to
create a Max channel.

Added 'max' to the validator's allow-list, matching the existing
literal-based convention in the function.
Live PoC validation in local goclaw stack revealed that dm_policy and
group_policy settings were silently ignored: the inbound handler called
BaseChannel.HandleMessage directly, skipping the CheckDMPolicy /
CheckGroupPolicy flow that all other channels (whatsapp, telegram,
discord, slack, feishu, zalo) explicitly wire in their handlers.
Anyone could DM the bot regardless of dm_policy.

Day 1 left a misleading comment 'Hand off to BaseChannel — this
enforces allowlist + publishes to bus' which was incorrect:
HandleMessage only filters allowlist as a fallback and never consults
the pairing service. The README also claimed the channel 'rejects
unauthorized senders before invoking the agent' — this was false
until this patch.

Changes:
  - policy.go: checkDMPolicy / checkGroupPolicy / sendPairingReply,
    modeled on whatsapp/policy.go. PolicyNeedsPairing triggers a
    pairing-code reply via Max API; PolicyDeny silently drops.
    Debounce reuses BaseChannel.CanSendPairingNotif so repeated
    messages from one unpaired sender don't burn fresh codes.
  - inbound.go: parses chatID -> int64 and calls checkDMPolicy /
    checkGroupPolicy before HandleMessage. Returns early on deny or
    pairing-needed so the agent loop is never invoked for
    unauthorized senders.
  - policy_test.go: 12 tests covering open / disabled / allowlist /
    pairing for DM, plus group disabled / open / allowlist. Tests
    use the existing mockMaxBackend pattern from outbound_test.go,
    so SendMessage calls hit a real httptest.Server. Verifies that
    paired senders proceed, unpaired senders get a pairing reply,
    allowlisted senders bypass pairing, and repeat sends from one
    unpaired user are debounced.
  - README.md: 'Production policy' section now accurately describes
    what the channel enforces.

Group policy is implemented for forward compatibility — Max does not
yet expose adding bots to groups via the public API.
Run gofmt -w on 5 files to bring them in line with the project's
formatting check. No semantic changes — whitespace and line-break
alignment only.

Files:
- internal/channels/max/format_test.go
- internal/channels/max/outbound_test.go
- internal/channels/max/stream.go
- internal/channels/max/stream_test.go
- internal/channels/max/types.go
Bumps the messaging-channels count from 7 to 8 and adds Max to the
three places it's listed:

  - the feature bullet
  - the Lite vs Standard comparison table
  - the documentation-links table
Adds Max to the channel-manager comment list and registers the new
max/ subpackage in the file-tree breakdown. Mirrors the existing
whatsapp/ entry.
Max implements all four optional channel interfaces (StreamingChannel,
WebhookChannel, ReactionChannel, BlockReplyChannel). Updates the
Extended Interfaces table in docs/05-channels-messaging.md so the
list of implementers is complete and matches the code.
Adds Max Messenger to the Unreleased / New Features section. Entry
documents transports (polling + webhook), streaming, reactions,
media, access control, and current validation status (tested
locally end-to-end; webhook mode not yet live-validated).
Replace the Russian "💭 Печатаю..." streaming placeholder with
"💭 Thinking..." to match the English-default convention used by
other channels (Telegram, Discord, Slack). The Russian text was a
local artefact from PoC deployment for a Russian-speaking user
base; in upstream goclaw, English is the appropriate default.

Operators who want a localized placeholder per deployment can be
addressed in a follow-up that introduces a configurable string
(out of scope for this PR).
Backend factory was registered in cmd/gateway.go:470, but the React
admin dashboard had hardcoded enums in 5 files that omitted 'max',
making the channel invisible in the create-channel dropdown and
contacts/permissions/bindings UI.

Files:
  - constants/channels.ts: CHANNEL_TYPES (used by setup wizard and
    create form)
  - channel-schemas.ts: credentialsSchema.max (bot_token) and
    configSchema.max mirroring internal/channels/max/types.go
    (mode, polling_timeout, dm_policy, group_policy, dm_stream,
    group_stream, history_limit, media_max_mb, allow_from,
    block_reply)
  - channels-status-utils.ts: channelTypeLabels.max for status
    badges
  - contacts-page.tsx: CHANNEL_TYPES filter and PERM_CHANNELS
    permissions list
  - bindings-section.tsx: <SelectItem value=max> in the channel
    binding selector

group_policy defaults to 'disabled' since Max does not yet expose a
platform API for adding bots to groups (see
internal/channels/max/README.md known limitations).

channels-section.tsx CHANNEL_META was intentionally NOT modified:
that section drives legacy config.json-based gateway-level env-var
secret overrides (GOCLAW_TELEGRAM_TOKEN etc.); Max stores its
bot_token in the channel_instances credentials column like all other
modern channels.
@HumanGoClaude

Copy link
Copy Markdown
Author

@nguyennguyenit Hello Plateau, can I ask you to review and approve my PR. As you see a lot of files added/changed, but only a few files from source of GoClaw. Most of them a new files with clear separation by functionality. I can prepare a demo with all functionality.
Thank you

Agamariel

This comment was marked as duplicate.

@HumanGoClaude

Copy link
Copy Markdown
Author

Hello @viettranx Viet, we are still waiting approval from maintainer of repo GoClaw. May I ask you to look my PR? We want to get approval and got through workflow.

HumanGoClaude added 6 commits May 29, 2026 11:19
Root cause: isCtxDone() classified http.Client.Timeout error as polling
context cancellation. errors.Is(err, context.DeadlineExceeded) returns
true for HTTP timeouts (which wrap an internal child context) even when
the main polling context is alive. pollLoop returned silently on any
HTTP timeout, leaving the bot unresponsive until pod restart.

Evidence in production logs (2026-05-12 to 2026-05-14):
- Pod alive 42+ hours, all 5 Telegram channels healthy
- Max channel zero events after startup log line
- Bot unresponsive until kubectl rollout restart

Fixes:
1. isCtxDone() now checks ctx.Err() only — HTTP timeouts are transient
   errors retried with backoff
2. defer recover() in pollLoop with stack trace logging
3. runPollSupervisor restarts pollLoop on unexpected exit, rate-limited
   to 5 restarts per 5 minutes; marks channel FAILED past limit
4. lastPollAt atomic heartbeat for health monitoring
5. Logs on every pollLoop exit path — no more silent deaths
6. Panic recovery in handleUpdate and spawnMessageHandler goroutines

Signed-off-by: HumanGoClaude <HumanGoClaude@yandex.ru>
…connections

Production observed bursts of 5-15 minutes where every GET /updates
request hung for 120s (Client.Timeout) before returning 'while awaiting
headers'. The pollfix that landed earlier kept the polling goroutine
alive across these timeouts, but each user-visible delay was 1-2 minutes
per attempt.

Root cause: the default net/http transport is wrong for long-poll
workloads.

  - No HTTP/2 PING active health checks — dead connections in the
    multiplex pool are only detected when Client.Timeout fires.
  - net.Dialer.KeepAlive default behaviour leaves OS TCP keepalive
    parameters in effect (linux: 7200s before first probe), which
    never fire during a 30s long-poll, so NAT/firewall flow eviction
    is invisible until the request times out.

Fix has three layers:

1. internal/channels/max/client.go: replace the bare
   http.Client{Timeout: 120s} with newDefaultHTTPClient() that:
     - sets net.Dialer.KeepAlive = 15s (overrides OS default)
     - calls http2.ConfigureTransports and sets
         ReadIdleTimeout = 15s (PING after 15s of no inbound bytes)
         PingTimeout     = 10s (mark conn dead if PING not ACKed)
     - sets ResponseHeaderTimeout = 45s (down from implicit 120s)
     - sets Client.Timeout = 45s (down from 120s)
   Detection of half-broken HTTP/2 connections drops ~120s to ~25s.

2. internal/channels/max/inbound.go: in handlePollError, after the
   error is classified as a timeout/unknown (transient), call
   c.client.CloseIdleConnections() before the next attempt so the
   retry establishes a fresh TCP/TLS session instead of reusing a
   pooled connection that may be the one that just failed.

3. internal/channels/max/client.go: expose
   (*Client).CloseIdleConnections() as the public seam for layer 2.

Evidence collected before this change:
  - /proc/sys/net/ipv4/tcp_keepalive_time = 7200 in pod
  - Single ESTABLISHED TCP connection to Max API in netstat
  - 'Client.Timeout exceeded while awaiting headers' in error
  - Backoff series 1->2->4->8->16->30s, all on same marker

No public API changes. Existing tests use WithHTTPClient injection
and are unaffected.

Signed-off-by: HumanGoClaude <HumanGoClaude@yandex.ru>
A file sent to the Max bot with no accompanying text reached the agent as
an empty message and was ignored ("you sent an empty message"), while a
file sent WITH a caption worked. Images were unaffected because they are
attached inline via the vision pipeline.

Root cause: unlike the Telegram channel, the Max channel never built the
shared <media:*> content tags and never inlined document text. It only
passed raw file paths to BaseChannel.HandleMessage. With an empty caption
the agent's user message was therefore empty, so the model never learned a
file was attached and never called read_document (agent ran iterations=0).

Fix (Max plugin only, reusing the shared internal/channels/media package
exactly as the Telegram channel does):

  - media_download.go: add downloadInboundMediaInfo() returning
    []media.MediaInfo (path + kind + original filename + MIME).
    downloadInboundMedia() now delegates to it via mediaPathsFromInfos(),
    so its []string contract and all existing tests are unchanged.
  - add maxAttachmentToMediaType() mapping Max attachment types to the
    shared media kinds (image/video/audio/document).
  - add enrichContentWithMedia(): prepends media.BuildMediaTags() output to
    the message content and, for text documents, inlines the file body via
    media.ExtractDocumentContent() (binary files get a read_document hint).
  - inbound.go: handleMessage builds the enriched content before handing off
    to BaseChannel.HandleMessage.

Result: a file with no caption now yields a <media:document name=...> tag
(plus inlined text for .txt/.csv/.json/...), so the agent sees the
attachment and reads it. Text documents work with any provider since their
content is inlined; binary files are surfaced via read_document. Images
keep working (vision inline) and additionally gain a <media:image> tag,
matching the Telegram channel.

No shared code changed; the media package is only imported, as Telegram
already does. Existing Max media tests pass unchanged.

Signed-off-by: HumanGoClaude <HumanGoClaude@yandex.ru>
Builds on c1b2a60. Two architectural changes that complete the file-handling
fix and guarantee each file reaches the correct pipeline:

1. MIME-based classification of "file" attachments

   A Max "file" attachment is a polymorphic container — the user can attach
   any kind of file under it. Previously maxAttachmentToMediaType always
   mapped "file" -> document, so a PNG sent as a file landed in the
   read_document path instead of the vision pipeline. The kind now follows
   the filename MIME via media.MediaKindFromMime(media.DetectMIMEType(...)):
     image/* -> media.TypeImage
     audio/* -> media.TypeAudio
     video/* -> media.TypeVideo
     other   -> media.TypeDocument

2. Direct bus publish with per-file MimeType (parity with Telegram)

   BaseChannel.HandleMessage([]string) drops MimeType when it builds
   bus.MediaFile, so persistMedia classifies every file as kind=document
   (mediaKindFromMime("") falls to the default branch). Result: images
   sent as files never reach the vision pipeline. The Telegram channel
   already worked around this by publishing bus.InboundMessage directly
   with per-file MimeType — Max now does the same.

The full file-handling pipeline after this change (Max channel scope only,
no shared code modified, exactly mirrors Telegram's approach):

  text (.txt/.csv/.json/.md/.py/...) -> inlined into content (any provider)
  image/* sent as file               -> kind=image  -> vision pipeline
  application/pdf, application/vnd.* -> kind=document -> read_document tool
  audio/* sent as file               -> kind=audio  -> STT pipeline
  Max inline image attachment        -> kind=image  -> vision (unchanged)

Tests: all internal/channels/max + internal/channels/media + internal/bus
tests pass. The legacy downloadInboundMedia([]string) shim is preserved
unchanged for the existing test suite; the message flow uses the new
downloadInboundMediaInfo + direct bus publish path. handleCallback still
uses BaseChannel.HandleMessage because callback payloads have no media.

Signed-off-by: HumanGoClaude <HumanGoClaude@yandex.ru>
…name

Production test revealed that inline image attachments (a photo sent without
a custom filename) arrived at the agent with kind=document instead of
kind=image. The vision pipeline filters by kind=image MediaRefs, so the
image was never attached to the LLM; the model received only a textual
<media:image> tag with no image data and hallucinated a response (described
an airplane when shown an architectural floor plan).

Root cause: Max attachment payloads for inline media often have an empty
Filename — a user just taps the camera/gallery button, no name is set.
media.DetectMIMEType("") returns "application/octet-stream", which
mediaKindFromMime falls through to its default branch ("document").

Before this commit's predecessors, the legacy HandleMessage([]string) path
left bus.MediaFile.MimeType empty, and persistMedia's ext-from-path fallback
recovered: guessExtension always supplies a type-based extension (.jpg for
image, .ogg for audio, .mp4 for video, .webp for sticker) so the downloaded
path carries the correct MIME hint. The direct-bus-publish change
accidentally bypassed that fallback by writing octet-stream into MimeType.

Fix: compute mime once from the downloaded path (always has the right
extension via guessExtension) and reuse it for both the shared
media.MediaInfo.Type (tag-builder/agent pipeline kind) and the
bus.MediaFile.MimeType (persistMedia kind detection). "file" attachments
still classify by MIME, so PNG/audio sent as files keep their correct kind.

Single-file change in internal/channels/max/media_download.go.

Signed-off-by: HumanGoClaude <HumanGoClaude@yandex.ru>
Brings in Bitrix24 channel, vault fixes, secure-cli, skills updates,
and other dev advances accumulated since the original Max PR was
opened. Conflicts in cmd/gateway.go between Max and Bitrix24 factory
registrations are resolved additively (both channels coexist).
@HumanGoClaude

Copy link
Copy Markdown
Author

Added since original Max channel implementation

Network resilience (commits 7a1ce9f, d45de7d)

  • Long-poll receives now survive half-broken HTTP/2 connections
  • HTTP/2 PING keepalives (15s/15s), TCP keepalive, pool eviction on transient errors
  • Production validated: 12 days of stable polling on Cloud.ru deployment

File handling — complete pipeline (commits 175b4b1, c1656b6, 88e49f1)

File attachments sent without caption now reach the agent correctly, with
proper routing per MIME type:

  1. 175b4b1 — build shared <media:*> tags + inline text-document content
    via internal/channels/media (parity with Telegram). Fixes bug where a file
    sent without text arrived at the agent as an empty message.

  2. c1656b6 — classify "file" attachments by MIME (not by Max's attachment
    label) and publish bus.InboundMessage directly with per-file MimeType
    (mirrors Telegram). Ensures persistMedia routes each file to the right
    pipeline (vision/STT/read_document).

  3. 88e49f1 — detect MIME from the downloaded path (always has the
    correct extension via guessExtension) instead of a.Payload.Filename
    (often empty for inline media). Ensures inline images get kind=image
    and reach the vision pipeline.

Production validation

  • Text files: ✅ demo_tables.txt analyzed correctly
  • Inline images: ✅ floor plan described with all 7 rooms, areas, totals
  • Architecture verified end-to-end: media: persisted file kind=image
    vision: attached images inline to main provider count=1

Sync with dev

Merge commit d6ec599f brings in:

  • Bitrix24 channel (conflicts in cmd/gateway.go resolved additively —
    both Max and Bitrix24 factories registered)
  • Vault enrichment fixes, secure-cli adapters, skills updates, etc.

… aggregator

Max delivers a logically-single user action (e.g. text + attachment sent
together) as N separate Update events arriving within ~50-500ms of each
other. Each event reached handleMessage independently and triggered its
own agent run, producing two disjoint responses: a stale response based
on text-only context, then a delayed correct response with the file.

This adds a Max-specific inbound aggregator at the channel layer,
mirroring the architecture used by Telegram's album_aggregator. Each
Push delays dispatch by aggregatorWindow (800ms); subsequent Push calls
within the window for the same (chat, sender) extend the silence timer
and append to the buffer. When the timer fires, the buffered messages
are merged into ONE synthesized inbound and dispatched downstream.

Architecture:
  - inbound_aggregator.go: new file with the aggregator (190 lines)
  - max.go: Channel struct field + init in New() + Stop() drain
  - inbound.go: split handleMessage into preconditions+Push and
    dispatchMessage (the original download/build/publish pipeline);
    dispatchMessage is the aggregator's flushFn so coalesced messages
    follow the same path as direct ones
  - inbound_test.go: handleMessage tests disable the aggregator for
    synchronous drain() semantics (no behaviour change in production)
  - inbound_aggregator_test.go: 12 unit tests covering single message,
    coalesce (text+file, 3 rapid), per-sender/chat isolation, DoS caps
    (per-buffer + global), Stop discipline (drain + idempotent),
    post-stop rejection, nil-field rejection, timer-reset semantics

Why channel-specific instead of bus-level:
The shared internal/bus/inbound_debounce.go has a media-bypass shortcut
in the currently deployed version (fix exists upstream in f771cff but
that touches the consumer dedup and web chat surfaces too). Coalescing
one layer earlier — at the Max channel — sidesteps the bypass entirely
without requiring shared-code changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants