gdb/infrun.c: fix handle_step_into_function lane divergence#181
Open
lancesix wants to merge 1 commit into
Open
gdb/infrun.c: fix handle_step_into_function lane divergence#181lancesix wants to merge 1 commit into
lancesix wants to merge 1 commit into
Conversation
Pending compiler changes (current tip of `amd-staging` of the
compiler[1]) shows failures in gdb.rocm/lane-execution.exp:
Running .../gdb/testsuite/gdb.rocm/lane-execution.exp ...
FAIL: gdb.rocm/lane-execution.exp: step: else: step to " if_3_cond " (timeout)
FAIL: gdb.rocm/lane-execution.exp: step: else: step to " if_3_else " (timeout)
FAIL: gdb.rocm/lane-execution.exp: step: else: step to "atomicAdd " (timeout)
In this test, we have:
101 if (gid % 2) /* if_2_cond */
102 elem = foo (const_array[gid]); /* if_2_then */
103 else
104 elem = const_array[gid]; /* if_2_else */
105 /* if_2_end */
106
107 /* This condition is always false. */
108 if (gid == -1) /* if_3_cond */
The program stops at L104 (active lane is 2), set scheduler-locking on
and issues a "step" command. Because the compiler emits the "else"
branch before the "then" branch, the "step" command will single step
through L104, then will change the active lanes and single step through
L102. The expectation is that we single step through L102 until we
reach L108, which will be the first line with lane 2 active we will
execute next.
However, something goes wrong when stepping L102, and the wave will run
to completion. Because we have scheduler-locking on, once the wave has
terminated, no further events will reach GDB, hanging it, causing the
timeout.
When stepping with lane divergence support enabled, GDB tries to single
step until the current lane becomes active again and the PC of the wave
reaches a new line. If we follow this process, here is what happens:
First GDB steps through L104 as expected, and does not change as it is
still on the starting line. We then go to the code changing the EXEC
mask (mapped to L101, but not marked as statement so skipped), and
end-up on L102. When executing L102, lane 2 is inactive, so we continue
single stepping instructions, which includes calling the function foo.
In the function foo, we have:
Dump of assembler code for function _Z3fooi:
.../gdb/testsuite/gdb.rocm/lane-execution.cpp:
68 {
0x00007ffff5f29dc0 <+0>: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
0x00007ffff5f29dc4 <+4>: s_mov_b32 s25, s33
0x00007ffff5f29dc8 <+8>: s_mov_b32 s33, s32
0x00007ffff5f29dcc <+12>: s_xor_saveexec_b32 s16, -1
0x00007ffff5f29dd0 <+16>: buffer_store_dword v3, off, s[0:3], s33 offset:4
0x00007ffff5f29dd8 <+24>: s_mov_b32 exec_lo, s16
[...]
When single stepping the first 5 instructions, because the current lane
is inactive, we just ignore the stop and continue single stepping
directly. However, the 5th instruction (at offset foo+12) activates all
the lanes of the wave for spilling using a full VGPR, so lane 2 is
active when executing the buffer_store_dword instruction. As a
consequence, GDB cannot dismiss this instruction as the others, and
needs to consider it fully.
At this point, in process_event_stop_test, GDB will realize that the
current frame changed compared to when we started the step command.
The process from this point is quite simple, we just stepped inside a
new function, we are almost done. Last thing to do is to skip the
function prologue, and continue execution. This is done by
handle_step_into_function: we set a breakpoint after the prologue (at
foo+104: 0x00007ffff7fa9f28), resume execution and hope to be done.
We resume execution, which executes the buffer_store, then restores the
EXEC mask (disabling lane 2) and execute until the end of the prologue.
Because lane 2 is now inactive, we have to continue stepping, so we
execute all of the foo function. This includes a call to the bar
function which never touches the EXEC mask, so nothing important
happening there.
After we return from bar, we need to execute the epilogue of function
foo which looks like:
0x00007ffff5f29e80 <+192>: s_xor_saveexec_b32 s4, -1
0x00007ffff5f29e84 <+196>: buffer_load_dword v3, off, s[0:3], s33 offset:4
0x00007ffff5f29e8c <+204>: s_mov_b32 exec_lo, s4
0x00007ffff5f29e90 <+208>: s_mov_b32 s33, s25
0x00007ffff5f29e94 <+212>: s_waitcnt vmcnt(0)
0x00007ffff5f29e98 <+216>: s_setpc_b64 s[30:31]
End of assembler dump.
Between offset +192 and +204, we need to restore the state that was
spilled during the prologue, so we do the same thing: activate all the
lanes, do the load and restore lanes. When stopping on the buffer_load
instruction, GDB sees that lane 2 is active again, and gets back to
"normal" processing. We see that we are on a frame different from the
one we had when starting the step, so are about to stop and try to skip
the prologue. However, this goes quite wrong, as shown by the infrun
logs:
[infrun] handle_signal_stop: stop_pc=0x7ffff5f29e84
[infrun] process_event_stop_test: stepped into subroutine
[infrun] insert_step_resume_breakpoint_at_sal_1: inserting step-resume breakpoint at 0x7ffff5f29e34
[infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [108550.1.1] at 0x7ffff5f29e84
We insert the breakpoint at the end of the function prologue and resume
execution. But the PC is currently in the epilogue, not in the
prologue! Because of this, we resume execution without single stepping,
with a breakpoint inserted at an instruction we will not execute, which
allows the wave to run to completion.
I noted earlier that those failures are regressions with a new compiler,
so why didn't we see this issue before? With the previous version of
the compiler, the bar function (called by foo) also had some spill /
restore logic, changing slightly the sequence. Because bar was also
spilling, it has an instruction where lane 2 was active, causing GDB to
consider the stop without the lane divergence bypass.
When evaluating if we frame changed, we executed this condition:
if ((get_stack_frame_id (frame)
!= ecs->event_thread->control.step_stack_frame_id)
&& get_frame_type (frame) != SIGTRAMP_FRAME
&& ((frame_unwind_caller_id (frame)
== ecs->event_thread->control.step_stack_frame_id)
&& ((ecs->event_thread->control.step_stack_frame_id
!= outer_frame_id)
|| !ecs->event_thread->control.in_step_start_function (frame))))
Here, the first two predicates hold (current frame is not the same as
when we started stepping, and it is not a SIGTRAMP frame), but things go
south when checking
frame_unwind_caller_id (frame) == ecs->event_thread->control.step_stack_frame_id)
This one is not true. Here frame is bar, so the caller is foo, and the
step_stack_frame we compare it to is lane_pc_test. As a consequence,
this check fails, which gets us to fallback to another case: stepping
got us out of the original step_range we set, but the current
instruction is not a statement either. So we reset the stepping range
information (set_step_info) and continue single stepping through the
current range. This gets us all the way through bar (let's ignore the
spill-restore bit), then in foo where we also do the spill-restore.
Because the same check is done, and the current frame (foo) is not
called by where we started (bar), we continue stepping, and eventually
finish the operation normally.
So now the question is: how do we fix this? I see 2 approaches for
this. One is to reset the "step range" when we see lane 2 active in foo
the first time. This would avoid us the incorrect detection of a new
frame when doing the spill-restore. However, there is something even
more broken exposed with this: why do we allow resuming the wave to skip
the function prologue if we are not in the function prologue?
Because the resume seems universally wrong, this patch proposes to fix
this first. When doing the resume, check that the thread's PC is not
past the end of the prologue before we use
insert_step_resume_breakpoint_at_sal_1. With this simple adjustment,
the gdb.rocm/lane-execution.exp testcase passes correctly.
I will explore reseting the stepping range in followup work.
Tested on amdgpu GFX1031.
[1] ROCm/llvm-project@a6fa68a
Change-Id: I7f57c51a430e9a87bddc4877ebddf5deec8a0de8
Bug: LCOMPILER-2337
Contributor
|
Function foo is called in an execution path where lane 2 is inactive. The lane becomes active temporarily for spilling/restoring, but that sounds like an implementation detail. Can we not hide these temporary state changes, by noticing that the frame for foo is deeper than the frame in which we started stepping? |
Collaborator
Author
|
Will probably need few discussions for this but:
That second bit would probably be the most efficient approach to address stepping with divergent lanes. I can look into this. |
Contributor
This makes sense to me. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pending compiler changes (current tip of
amd-stagingof the compiler[1]) shows failures in gdb.rocm/lane-execution.exp:In this test, we have:
101 if (gid % 2) /* if_2_cond /
102 elem = foo (const_array[gid]); / if_2_then /
103 else
104 elem = const_array[gid]; / if_2_else /
105 / if_2_end /
106
107 / This condition is always false. /
108 if (gid == -1) / if_3_cond */
The program stops at L104 (active lane is 2), set scheduler-locking on and issues a "step" command. Because the compiler emits the "else" branch before the "then" branch, the "step" command will single step through L104, then will change the active lanes and single step through L102. The expectation is that we single step through L102 until we reach L108, which will be the first line with lane 2 active we will execute next.
However, something goes wrong when stepping L102, and the wave will run to completion. Because we have scheduler-locking on, once the wave has terminated, no further events will reach GDB, hanging it, causing the timeout.
When stepping with lane divergence support enabled, GDB tries to single step until the current lane becomes active again and the PC of the wave reaches a new line. If we follow this process, here is what happens:
First GDB steps through L104 as expected, and does not change as it is still on the starting line. We then go to the code changing the EXEC mask (mapped to L101, but not marked as statement so skipped), and end-up on L102. When executing L102, lane 2 is inactive, so we continue single stepping instructions, which includes calling the function foo.
In the function foo, we have:
When single stepping the first 5 instructions, because the current lane is inactive, we just ignore the stop and continue single stepping directly. However, the 5th instruction (at offset foo+12) activates all the lanes of the wave for spilling using a full VGPR, so lane 2 is active when executing the buffer_store_dword instruction. As a consequence, GDB cannot dismiss this instruction as the others, and needs to consider it fully.
At this point, in process_event_stop_test, GDB will realize that the current frame changed compared to when we started the step command. The process from this point is quite simple, we just stepped inside a new function, we are almost done. Last thing to do is to skip the function prologue, and continue execution. This is done by handle_step_into_function: we set a breakpoint after the prologue (at foo+104: 0x00007ffff7fa9f28), resume execution and hope to be done.
We resume execution, which executes the buffer_store, then restores the EXEC mask (disabling lane 2) and execute until the end of the prologue. Because lane 2 is now inactive, we have to continue stepping, so we execute all of the foo function. This includes a call to the bar function which never touches the EXEC mask, so nothing important happening there.
After we return from bar, we need to execute the epilogue of function foo which looks like:
Between offset +192 and +204, we need to restore the state that was spilled during the prologue, so we do the same thing: activate all the lanes, do the load and restore lanes. When stopping on the buffer_load instruction, GDB sees that lane 2 is active again, and gets back to "normal" processing. We see that we are on a frame different from the one we had when starting the step, so are about to stop and try to skip the prologue. However, this goes quite wrong, as shown by the infrun logs:
We insert the breakpoint at the end of the function prologue and resume execution. But the PC is currently in the epilogue, not in the prologue! Because of this, we resume execution without single stepping, with a breakpoint inserted at an instruction we will not execute, which allows the wave to run to completion.
I noted earlier that those failures are regressions with a new compiler, so why didn't we see this issue before? With the previous version of the compiler, the bar function (called by foo) also had some spill / restore logic, changing slightly the sequence. Because bar was also spilling, it has an instruction where lane 2 was active, causing GDB to consider the stop without the lane divergence bypass.
When evaluating if we frame changed, we executed this condition:
Here, the first two predicates hold (current frame is not the same as when we started stepping, and it is not a SIGTRAMP frame), but things go south when checking
This one is not true. Here frame is bar, so the caller is foo, and the step_stack_frame we compare it to is lane_pc_test. As a consequence, this check fails, which gets us to fallback to another case: stepping got us out of the original step_range we set, but the current instruction is not a statement either. So we reset the stepping range information (set_step_info) and continue single stepping through the current range. This gets us all the way through bar (let's ignore the spill-restore bit), then in foo where we also do the spill-restore. Because the same check is done, and the current frame (foo) is not called by where we started (bar), we continue stepping, and eventually finish the operation normally.
So now the question is: how do we fix this? I see 2 approaches for this. One is to reset the "step range" when we see lane 2 active in foo the first time. This would avoid us the incorrect detection of a new frame when doing the spill-restore. However, there is something even more broken exposed with this: why do we allow resuming the wave to skip the function prologue if we are not in the function prologue?
Because the resume seems universally wrong, this patch proposes to fix this first. When doing the resume, check that the thread's PC is not past the end of the prologue before we use
insert_step_resume_breakpoint_at_sal_1. With this simple adjustment, the gdb.rocm/lane-execution.exp testcase passes correctly.
I will explore reseting the stepping range in followup work.
Tested on amdgpu GFX1031.
[1] ROCm/llvm-project@a6fa68a
Change-Id: I7f57c51a430e9a87bddc4877ebddf5deec8a0de8
Bug: LCOMPILER-2337