GDB loops forever until it crashes when it runs out of memory

Raphael Zulliger

2014-08-13 05:11:39 UTC

Hi

I had to debug an embedded target (ARMv7, extended-remote) running an
RTOS with roughly 30 threads. Due to a programming error, one of the
threads stack was completely screwed up and that caused that a 'bt' on
that thread to terminate GDB with the following error which occurred
after some seconds:

---
Recursive internal problem.

This application has requested the Runtime to terminate it in an unusual
way.
Please contact the application's support team for more information.
---

That crash occured becuase GDB looped in 'int value_fetch_lazy (struct
value *val)' within 'while (VALUE_LVAL (new_val) == lval_register &&
value_lazy (new_val))' forever - but suddenly run out of memory, because
one of the called functions allocated heap on every call.

The issue was that, due to the screwed up thread stack, 'new_val =
get_frame_register_value (frame, regnum);' happened to return the same
data on every call (new_val was a different allocation each time, but
contained the same data as the previously returned). As you know: If you
have to debug a crashed system, you're happy to see just anything rather
than a crashing debugger. The point is that using a graphical debugger,
such as Eclipse/CDT, you can't really avoid the 'bt' on that thread and
therefore the only thing you notice is that GDB crashes without the
chance of investigating anything.

I don't know about the philosophy of GDB, whether it is supposed to
handle such situation. However, for me, the following additional code
helped to avoid the GDB crash which gave me a chance to inspect the rest
of the system with a Eclipse/CDT:

new_val = get_frame_register_value (frame, regnum);
if( (regnum == VALUE_REGNUM(new_val))
&& (frame == frame_find_by_id (VALUE_FRAME_ID (new_val)) )) {
set_value_lazy (val, 0);
mark_value_bytes_unavailable (val,
value_embedded_offset (val),
TYPE_LENGTH (type));
return 0;
}

As I'm unfamiliar with GDB internals, I don't know whether I compared
the right properties of and new_val and whether the implementation is
"ok" like this - but at least this code helped to make GDB properly
abort trying to unwind the stack after it received the same information
twice.

---

Last line of the GDB traces ('set debug frame 1') without the patch,
having executed a 'bt' on the screwed up thread:

{ frame_id_p (l={stack=0x316800,code=0x1c15c0,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316800,code=0x1c15c0,!special},r={stack=0x316800,code=0x1c15c0,!special})
-> 1 }
{ frame_unwind_register_value (frame=1,regnum=98(s7),...) {
frame_unwind_register_value (frame=0,regnum=98(s7),...) { frame_id_p
(l={stack=0x3167d0,code=0x199850,!special}) -> 1 }
-> register=98 lazy }
-> register=98 lazy }
{ frame_id_p (l={stack=0x3167d0,code=0x199850,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x3167d0,code=0x199850,!special},r={stack=0x3167d0,code=0x199850,!special})
-> 1 }
{ frame_unwind_register_value (frame=-1,regnum=98(s7),...) ->
register=98 bytes=[00000000] }
{ frame_id_p (l={stack=0x316938,code=0x20a18,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x20a18,!special},r={stack=0x316938,code=0x20a18,!special})
-> 1 }
{ value_fetch_lazy (frame=8,regnum=98(s7),...) -> register=98
bytes=[00000000] }
#9 0x00050f2c in CINOSBaseRamp::Pull (this=0x10da0
<CHcsController::Register_AllAutoComt(unsigned short)+36>,
arS=16.000000000000057, arV=0, arA=0, arJ=0) at
../../inos/os/inos/src/cinosbaseramp.cpp:4844
{ get_prev_frame_1 (this_frame=9) ->
{level=10,type=<unknown>,unwind=<unknown>,pc=0x50f2c,id=<unknown>,func=<unknown>}
// cached
{ get_frame_func (this_frame=10) -> 0x50f28 }
{ frame_unwind_register_value (frame=9,regnum=13(sp),...) -> computed
bytes=[38693100] }
{ frame_unwind_arch (next_frame=10) -> arm }
{ frame_unwind_register_value (frame=10,regnum=15(pc),...) {
frame_unwind_register_value (frame=10,regnum=14(lr),...) { get_frame_id
(fi=10) { frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> {stack=0x316938,code=0x50f28,!special} }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=14 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_unwind_register_value (frame=9,regnum=14(lr),...) { frame_id_p
(l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=14 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_unwind_register_value (frame=9,regnum=14(lr),...) { frame_id_p
(l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=14 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_unwind_register_value (frame=9,regnum=14(lr),...) { frame_id_p
(l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=14 lazy }
... (this repeats endlessly)

with the patch, it ends like this:

-> register=97 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_unwind_register_value (frame=9,regnum=97(s6),...) { frame_id_p
(l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=97 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_unwind_register_value (frame=9,regnum=98(s7),...) { frame_id_p
(l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=98 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
{ frame_unwind_register_value (frame=9,regnum=98(s7),...) { frame_id_p
(l={stack=0x316938,code=0x50f28,!special}) -> 1 }
-> register=98 lazy }
{ frame_id_p (l={stack=0x316938,code=0x50f28,!special}) -> 1 }
{ frame_id_eq
(l={stack=0x316938,code=0x50f28,!special},r={stack=0x316938,code=0x50f28,!special})
-> 1 }
#10 0x00050f2c in CINOSBaseRamp::Pull (this=<unavailable>,
arS=<unavailable>, arV=<unavailable>, arA=<unavailable>,
arJ=<unavailable>) at ../../inos/os/inos/src/cinosbaseramp.cpp:4844
{ get_prev_frame_1 (this_frame=10) -> <NULL frame> // cached
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

Raphael