Discussion:
GDB loops forever until it crashes when it runs out of memory
Raphael Zulliger
2014-08-13 05:11:39 UTC
Permalink
Hi

I had to debug an embedded target (ARMv7, extended-remote) running an
RTOS with roughly 30 threads. Due to a programming error, one of the
threads stack was completely screwed up and that caused that a 'bt' on
that thread to terminate GDB with the following error which occurred
after some seconds:

---
Recursive internal problem.

This application has requested the Runtime to terminate it in an unusual
way.
Please contact the application's support team for more information.
---

That crash occured becuase GDB looped in 'int value_fetch_lazy (struct
value *val)' within 'while (VALUE_LVAL (new_val) == lval_register &&
value_lazy (new_val))' forever - but suddenly run out of memory, because
one of the called functions allocated heap on every call.

The issue was that, due to the screwed up thread stack, 'new_val =
get_frame_register_value (frame, regnum);' happened to return the same
data on every call (new_val was a different allocation each time, but
contained the same data as the previously returned). As you know: If you
have to debug a crashed system, you're happy to see just anything rather
than a crashing debugger. The point is that using a graphical debugger,
such as Eclipse/CDT, you can't really avoid the 'bt' on that thread and
therefore the only thing you notice is that GDB crashes without the
chance of investigating anything.

I don't know about the philosophy of GDB, whether it is supposed to
handle such situation. However, for me, the following additional code
helped to avoid the GDB crash which gave me a chance to inspect the rest
of the system with a Eclipse/CDT:

new_val = get_frame_register_value (frame, regnum);
if( (regnum == VALUE_REGNUM(new_val))
&& (frame == frame_find_by_id (VALUE_FRAME_ID (new_val)) )) {
set_value_lazy (val, 0);
mark_value_bytes_unavailable (val,
value_embedded_offset (val),
TYPE_LENGTH (type));
return 0;
}

As I'm unfamiliar with GDB internals, I don't know whether I compared
the right properties of and new_val and whether the implementation is
"ok" like this - but at least this code helped to make GDB properly
abort trying to unwind the stack after it received the same information
twice.

---

Last line of the GDB traces ('set debug frame 1') without the patch,
having executed a 'bt' on the screwed up thread:

{ frame_id_p (l={stack=0x316800,code=0x1c15c0,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316800,code=0x1c15c0,!special},r={stack=0x316800,code=0x1c15c0,!special})
-> 1 }
{ frame_unwind_register_value (frame=1,regnum=98(s7),...) {
frame_unwind_register_value (frame=0,regnum=98(s7),...) { frame_id_p
(l={stack=0x3167d0,code=0x199850,!special}) -> 1 }
-> register=98 lazy }
-> register=98 lazy }
{ frame_id_p (l={stack=0x3167d0,code=0x199850,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x3167d0,code=0x199850,!special},r={stack=0x3167d0,code=0x199850,!special})
-> 1 }
{ frame_unwind_register_value (frame=-1,regnum=98(s7),...) ->
register=98 bytes=[00000000] }
{ frame_id_p (l={stack=0x316938,code=0x20a18,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x20a18,!special},r={stack=0x316938,code=0x20a18,!special})
-> 1 }
{ value_fetch_lazy (frame=8,regnum=98(s7),...) -> register=98
bytes=[00000000] }
#9 0x00050f2c in CINOSBaseRamp::Pull (this=0x10da0
<CHcsController::Register_AllAutoComt(unsigned short)+36>,
arS=16.000000000000057, arV=0, arA=0, arJ=0) at
../../inos/os/inos/src/cinosbaseramp.cpp:4844
{ get_prev_frame_1 (this_frame=9) ->
{level=10,type=<unknown>,unwind=<unknown>,pc=0x50f2c,id=<unknown>,func=<unknown>}
// cached
{ get_frame_func (this_frame=10) -> 0x50f28 }
{ frame_unwind_register_value (frame=9,regnum=13(sp),...) -> computed
bytes=[38693100] }
{ frame_unwind_arch (next_frame=10) -> arm }
{ frame_unwind_register_value (frame=10,regnum=15(pc),...) {
frame_unwind_register_value (frame=10,regnum=14(lr),...) { get_frame_id
(fi=10) { frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> {stack=0x316938,code=0x50f28,!special} }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=14 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_unwind_register_value (frame=9,regnum=14(lr),...) { frame_id_p
(l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=14 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_unwind_register_value (frame=9,regnum=14(lr),...) { frame_id_p
(l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=14 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_unwind_register_value (frame=9,regnum=14(lr),...) { frame_id_p
(l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=14 lazy }
... (this repeats endlessly)


with the patch, it ends like this:

-> register=97 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_unwind_register_value (frame=9,regnum=97(s6),...) { frame_id_p
(l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=97 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_unwind_register_value (frame=9,regnum=98(s7),...) { frame_id_p
(l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=98 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_unwind_register_value (frame=9,regnum=98(s7),...) { frame_id_p
(l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=98 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
#10 0x00050f2c in CINOSBaseRamp::Pull (this=<unavailable>,
arS=<unavailable>, arV=<unavailable>, arA=<unavailable>,
arJ=<unavailable>) at ../../inos/os/inos/src/cinosbaseramp.cpp:4844
{ get_prev_frame_1 (this_frame=10) -> <NULL frame> // cached
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

Raphael
Pedro Alves
2014-08-19 14:12:32 UTC
Permalink
Post by Raphael Zulliger
I don't know about the philosophy of GDB, whether it is supposed to
handle such situation. However, for me, the following additional code
helped to avoid the GDB crash which gave me a chance to inspect the rest
new_val = get_frame_register_value (frame, regnum);
if( (regnum == VALUE_REGNUM(new_val))
&& (frame == frame_find_by_id (VALUE_FRAME_ID (new_val)) )) {
set_value_lazy (val, 0);
mark_value_bytes_unavailable (val,
value_embedded_offset (val),
TYPE_LENGTH (type));
return 0;
}
As I'm unfamiliar with GDB internals, I don't know whether I compared
the right properties of and new_val and whether the implementation is
"ok" like this - but at least this code helped to make GDB properly
abort trying to unwind the stack after it received the same information
twice.
This sounds like the issue addressed by 33f8fe58 (and follow ups).
What version of GDB are you using?
--
Thanks,
Pedro Alves
Loading...