gdb/target: revert target_async change by simark · Pull Request #176 · ROCm/ROCgdb

simark · 2026-06-17T20:44:00Z

While working a multi-inferior fixes series [1], I tried to apply said series on the master branch, but the test hits an internal error when trying to start a second inferior:

/home/smarchi/src/binutils-gdb/gdb/amd-dbgapi-target.c:466: internal-error: async_event_handler_clear: Assertion `amd_dbgapi_async_event_handler != nullptr' failed.

This made me realize that we have this diff between master and downstream, in the target_async function (- is master, + is amd-staging), which would be needed to avoid the internal error.

@@ -4329,7 +4424,17 @@ target_async (bool enable)
      async mode is possible for this target.  */
   gdb_assert (!enable || target_can_async_p ());
   infrun_async (enable);
-  current_inferior ()->top_target ()->async (enable); +
+  process_stratum_target *proc_target = current_inferior ()->process_target ();
+  scoped_restore_current_thread restore_thread; +
+  for (inferior *inf : all_inferiors (proc_target))
+    {
+      if (current_inferior () != inf)
+       switch_to_inferior_no_thread (inf);
+
+      inf->top_target ()->async (enable);
+    }
 }

This change was introduced in the downstream-only commit ff546ec ("gdb/rocm: fix vfork handling"), but now I am not sure it makes sense.

It was meant to handle the following situation. Imagine we have two inferiors, currently stopped, with the following targets pushed:

inferior 1, rocm target + linux nat target
inferior 2, linux nat target

or, visually:

    inf 1         inf 2
+------------+
| amd-dbgapi |
+--------------------------+
|        linux-nat         |
+--------------------------+

When you resume only inf 2, then GDB calls target_async(true) on inf 2, which has the effect of making the linux-nat target async-enabled. The amd-dbgapi target remains async-disabled.

Once an event happens, do_target_wait randomly calls wait on all inferiors, and it happens that it calls wait on inf 1. This call will reach amd_dbgapi_target::wait, more particularly this code:

/* Flush the async handler first.  */                                                                                                                                                                │
if (target_is_async_p ())                                                                                                                                                                            │
  async_event_handler_clear ();

The amd-dbgapi target does not implement target_ops::is_async_p() (which is probably a mistake), so the target_is_async_p() call hits linux-nat's implementation (which is in fact inf_ptrace_target::is_async_p()). That returns true, so we call async_event_handler_clear(). That function checks:

gdb_assert (amd_dbgapi_async_event_handler != nullptr);

which is a synonym for "I am currently async-enabled". And that is false, because the amd-dbgapi remained async-disabled, as mentioned above.

The target_async change we have downstream makes it so that when calling target_async(true) with inf 2 as the current inferior, then we'll look at all the inferiors sharing its process target (linux-nat) and make them async-enabled as well. This will avoid the assert, because it will make the amd-dbgapi async-enabled. But that doesn't seem like a good fix to today me, because inferior 2 is not running, there is no good reason to make the amd-dbgapi target async-enabled. That fix forced the amd-dbgapi target to be async-enabled before calling wait on it, but that's not a good reason. There is nothing wrong with calling wait on a target that isn't currently async-enabled.

This patch restores target_async to what it is upstream (which alone would introduce some test failures) and then tries to fix the problem in a more focused way.

I think the most obvious problem is that the target_is_async_p() call returns true while the amd-dbgapi target is not currently async-enabled, which leads us on a wrong path.

As a start, this patch changes the conditions that gate the calls to async_event_handler_mark and async_event_handler_clear with (abstracted in a helper function):

amd_dbgapi_async_event_handler != nullptr

instead of

target_is_async_p ()

I think this reflects what we want to know at this very moment: is the amd-dbgapi target, in isolation, async-enabled.

Further questions:

Should we give amd_dbgapi_target its own target_ops::is_async_p implementation, and if so, what should it be? If the amd-dbgapi target is async-disabled and the underlying target is async-enabled, what should it return?
Is there maybe a reason why we don't want to be in a state where a target in a stack is async-enabled and another is async-disabled?
Looking at the uses of target_is_async_p(), I saw one in gdb_readline_wrapper_cleanup that could be problematic with target stacks with async half enabled / half disabled. gdb_readline_wrapper_cleanup records the result of target_is_async_p() on entry, then calls target_async(false), then (if async was enabled) restores it on exit with target_async(true). If we were in a state where amd-dbgapi is async-disabled and linux-nat is async-enabled, I could see how the "restored" state might not match the original state.

I don't know why gdb_readline_wrapper_cleanup does that, so I can't judge what impact it would have to either fail to disable async here, or if the restored state differs from the original state.

[1] #106

While working a multi-inferior fixes series [1], I tried to apply said series on the master branch, but the test hits an internal error when trying to start a second inferior: /home/smarchi/src/binutils-gdb/gdb/amd-dbgapi-target.c:466: internal-error: async_event_handler_clear: Assertion `amd_dbgapi_async_event_handler != nullptr' failed. This made me realize that we have this diff between master and downstream, in the `target_async` function (- is master, + is amd-staging), which would be needed to avoid the internal error. @@ -4329,7 +4424,17 @@ target_async (bool enable) async mode is possible for this target. */ gdb_assert (!enable || target_can_async_p ()); infrun_async (enable); - current_inferior ()->top_target ()->async (enable); + + process_stratum_target *proc_target = current_inferior ()->process_target (); + scoped_restore_current_thread restore_thread; + + for (inferior *inf : all_inferiors (proc_target)) + { + if (current_inferior () != inf) + switch_to_inferior_no_thread (inf); + + inf->top_target ()->async (enable); + } } This change was introduced in the downstream-only commit ff546ec ("gdb/rocm: fix vfork handling"), but now I am not sure it makes sense. It was meant to handle the following situation. Imagine we have two inferiors, currently stopped, with the following targets pushed: - inferior 1, rocm target + linux nat target - inferior 2, linux nat target or, visually: inf 1 inf 2 +------------+ | amd-dbgapi | +--------------------------+ | linux-nat | +--------------------------+ When you resume only inf 2, then GDB calls `target_async(true)` on inf 2, which has the effect of making the linux-nat target async-enabled. The amd-dbgapi target remains async-disabled. Once an event happens, do_target_wait randomly calls wait on all inferiors, and it happens that it calls wait on inf 1. This call will reach amd_dbgapi_target::wait, more particularly this code: /* Flush the async handler first. */ │ if (target_is_async_p ()) │ async_event_handler_clear (); The amd-dbgapi target does not implement `target_ops::is_async_p()` (which is probably a mistake), so the `target_is_async_p()` call hits linux-nat's implementation (which is in fact `inf_ptrace_target::is_async_p()`). That returns true, so we call `async_event_handler_clear()`. That function checks: gdb_assert (amd_dbgapi_async_event_handler != nullptr); which is a synonym for "I am currently async-enabled". And that is false, because the amd-dbgapi remained async-disabled, as mentioned above. The `target_async` change we have downstream makes it so that when calling `target_async(true)` with inf 2 as the current inferior, then we'll look at all the inferiors sharing its process target (linux-nat) and make them async-enabled as well. This will avoid the assert, because it will make the amd-dbgapi async-enabled. But that doesn't seem like a good fix to today me, because inferior 2 is not running, there is no good reason to make the amd-dbgapi target async-enabled. That fix forced the amd-dbgapi target to be async-enabled before calling wait on it, but that's not a good reason. There is nothing wrong with calling wait on a target that isn't currently async-enabled. This patch restores `target_async` to what it is upstream (which alone would introduce some test failures) and then tries to fix the problem in a more focused way. I think the most obvious problem is that the `target_is_async_p()` call returns true while the amd-dbgapi target is not currently async-enabled, which leads us on a wrong path. As a start, this patch changes the conditions that gate the calls to `async_event_handler_mark` and `async_event_handler_clear` with (abstracted in a helper function): amd_dbgapi_async_event_handler != nullptr instead of target_is_async_p () I think this reflects what we want to know at this very moment: is the amd-dbgapi target, in isolation, async-enabled. Further questions: - Should we give amd_dbgapi_target its own target_ops::is_async_p implementation, and if so, what should it be? If the amd-dbgapi target is async-disabled and the underlying target is async-enabled, what should it return? - Is there maybe a reason why we don't want to be in a state where a target in a stack is async-enabled and another is async-disabled? - Looking at the uses of `target_is_async_p()`, I saw one in gdb_readline_wrapper_cleanup that could be problematic with target stacks with async half enabled / half disabled. gdb_readline_wrapper_cleanup records the result of `target_is_async_p()` on entry, then calls `target_async(false)`, then (if async was enabled) restores it on exit with `target_async(true)`. If we were in a state where amd-dbgapi is async-disabled and linux-nat is async-enabled, I could see how the "restored" state might not match the original state. I don't know why gdb_readline_wrapper_cleanup does that, so I can't judge what impact it would have to either fail to disable async here, or if the restored state differs from the original state. [1] ROCm#106

simark · 2026-06-17T20:46:44Z

This PR is a way to discuss the downstream change we have in function target_async. The outcome will be either:

this downstream change is good and we should send it upstream, but then we need a good justification to say why it's needed and why it's correct
this downstream change doesn't make sense and we should seek an alternative (this patch provides one)

lancesix · 2026-06-17T20:54:08Z

Hi, I have not really looked much in the details, but I expect there will be (minor) conflicts with #175. Might need to watch for this.

simark · 2026-06-17T22:17:18Z

Hi, I have not really looked much in the details, but I expect there will be (minor) conflicts with #175. Might need to watch for this.

It rebased cleanly on that PR, built fine and ran a smoke test fine.

lancesix · 2026-06-17T22:23:11Z

Hi, I have not really looked much in the details, but I expect there will be (minor) conflicts with #175. Might need to watch for this.

It rebased cleanly on that PR, built fine and ran a smoke test fine.

Indeed, my bad.

simark requested review from lancesix and palves June 17, 2026 20:44

simark requested a review from a team as a code owner June 17, 2026 20:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gdb/target: revert target_async change#176

gdb/target: revert target_async change#176
simark wants to merge 1 commit into
ROCm:amd-stagingfrom
simark:revert-target-async

simark commented Jun 17, 2026

Uh oh!

simark commented Jun 17, 2026

Uh oh!

lancesix commented Jun 17, 2026

Uh oh!

simark commented Jun 17, 2026

Uh oh!

lancesix commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simark commented Jun 17, 2026

Uh oh!

simark commented Jun 17, 2026

Uh oh!

lancesix commented Jun 17, 2026

Uh oh!

simark commented Jun 17, 2026

Uh oh!

lancesix commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants