Skip to content

Dev Console DLQ: badge shows N but 'No failed messages' panel renders empty when topic-based queue is also in queue_configs #1653

Description

@racso-dev

Summary

On the dev console (v0.11.6, builtin queue adapter), a queue that is both subscribed via durable:subscriber (topic-based) and listed in queue_configs (purely to set max_retries / backoff_ms) shows the correct DLQ depth in the queues list badge, but the "Dead Letters" tab renders empty — blocking inspect / redrive / discard.

Bonus: clicking Redrive All then silently returns { redriven: 0 } without warning the operator that the redrive missed every actual message.

In practice this turns the console DLQ surface into a no-op for any queue that uses queue_configs only as a "retry policy" record — currently the standard pattern for everyone migrating off Motia / RabbitMQ retry conventions.

Reproduction

  1. iii-config.yaml:
    - name: iii-queue
      config:
        adapter: { name: builtin }
        queue_configs:
          my.topic:
            max_retries: 3
            backoff_ms: 10000
  2. Register a topic-based subscriber on the same name:
    iii.registerTrigger({
      type: 'durable:subscriber',
      function_id: 'my.handler',
      config: { topic: 'my.topic' },
    })
  3. Let some jobs exhaust their retries → real failures end up in the DLQ.
  4. Open the console queues page:
    • Two homonymous my.topic rows are listed.
    • Both show the ⚠ N badge with the correct count (e.g. 69).
  5. Click the row → Dead Letters tab → No failed messages.
  6. Click Redrive All → API returns { queue: "my.topic", redriven: 0 }.

Screenshot

Image

Root cause

Asymmetric routing in engine/src/workers/queue/queue.rs:

  • Count path (works) — console_dlq_topics iterates over list_topics() and calls adapter.dlq_count(&topic.name) with the bare topic name. The builtin adapter then:

    // engine/src/workers/queue/adapters/builtin/adapter.rs
    async fn dlq_count(&self, topic: &str) -> anyhow::Result<u64> {
        let tf = self.topic_functions.read().await;
        if let Some(fids) = tf.get(topic) {                // ← "my.topic" hits here
            let mut total = 0u64;
            for fid in fids {
                let iq = format!("{}::{}", topic, fid);    // ← real DLQ key
                total += self.queue.dlq_count(&iq).await;
            }
            Ok(total)                                      // ← 69 ✓
        } else {
            Ok(self.queue.dlq_count(topic).await)
        }
    }
  • Messages path (broken) — console_dlq_messages calls resolve_queue_key(topic) first:

    fn resolve_queue_key(&self, name: &str) -> String {
        if self._config.queue_configs.contains_key(name) {
            format!("__fn_queue::{}", name)                // ← unconditional prefix
        } else {
            name.to_string()
        }
    }

    Then adapter.dlq_peek("__fn_queue::my.topic", …). The builtin adapter's topic_functions map is keyed on the bare topic, so the lookup misses and falls through to self.queue.dlq_peek("__fn_queue::my.topic", …), which reads an entirely different (empty) key namespace → [].

The same resolve_queue_key is on the redrive / discard / single-redrive paths, so every DLQ action silently no-ops on these queues.

Why two homonymous rows

console_list_topics strips the __fn_queue:: prefix after merging list_topics() with queue_configs, which makes two visually identical entries (one from trigger_function_map, one from poll_intervals / config merge). The frontend's dlqCountMap is keyed on topic.name, which is no longer unique, so both rows display the same badge. This is a separate but related issue — it's the visible symptom of the underlying routing ambiguity.

Suggested fix

Backend, most localized:

  • (A) Drop the resolve_queue_key call from console_dlq_messages / redrive_dlq / redrive_message / discard_message. The builtin adapter's topic_functions lookup already routes correctly; mirroring console_dlq_topics (which doesn't pre-resolve) restores symmetry and fixes count↔messages divergence.
  • (B) Have console_dlq_topics return a stable queue_key (the internal key it just read from) alongside topic, and accept that queue_key on the messages / redrive / discard endpoints. Removes ambiguity by construction.

Frontend, cosmetic but removes the user-visible confusion:

  • Expose broker_type on each row and/or deduplicate when a topic exists both as topic-based and as a config-only entry.

Versions

  • iii engine 0.11.6
  • iii-console v0.11.6
  • Queue adapter: builtin

Affected real-world example

apps/workflows in Tendiz declares ~20 topic-based queues in queue_configs for retry / backoff policy only. Every one is unreachable from the console DLQ view despite carrying real failed messages — e.g. extraction.agentic-triage with 69 messages in this screenshot. This is the standard pattern for anyone who wants per-queue retry tuning today, since there is no documented way to attach max_retries / backoff_ms to a durable:subscriber trigger directly.

If a better surface for that exists, please point me to it and the engine docs can be updated — the bug remains either way (the console should not silently mismatch its own counts).

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions