How to catch GDB crash

Discussion:

How to catch GDB crash

Dmitry Smirnov

2008-06-23 16:31:53 UTC

Hi,

I've encountered the very annoying problem with GDB. While debugging, it crashes for some reason and I cannot catch this moment. I have a possibility to attach to the running process with another GDB, but it does not help. Perhaps, there is some way to catch some system exception or something similar?

I'm using a litle bit complex setup, so there is no much freedom (at least I cannot simplify the situation). First, I'm debugging from Eclipse (and it looks this crash happens only while running from Eclipse, I've tried from commad line - there is no crash). The debugger I'm using is cross-compiled arm-elf-gdb. It is compiled on windows (i686) platform:
GNU gdb (GDB) 6.8.50.20080620
...
This GDB was configured as "--host=i686-pc-cygwin --target=arm-elf".

I doubt this can be linked to GDB version. I recall I've seen this crash earlier with arm-elf-gdb 6.5 from GNUARM. I was just skipped this crash somehow but now I cannot continue my job.

This arm-elf-gdb is running against skyeye simulator as a remote target.

As you can see there are many possibilities for mailfunctioning software (Eclipse, GDB, skyeye). But the only way I can find the root cause is to debug the crash in arm-elf-gdb. While crashing it attempts to create the crashdump file, but it is incomplete and Cygwin gdb cannot recognize it. Typically it contains just three lines:

Stack trace:
Frame Function Args
0022E268 7C802532 (00000058

If attached, Cygwin GDB just reporting me:
Program exited with code 037777777777.

Is there any way to stop arm-elf-gdb on some critical error?

Dmitry

Aleksandar Ristovski

2008-06-23 16:56:44 UTC

Post by Dmitry Smirnov
Hi,
I've encountered the very annoying problem with GDB. While debugging, it crashes for some reason and I cannot catch this moment. I have a possibility to attach to the running process with another GDB, but it does not help. Perhaps, there is some way to catch some system exception or something similar?
GNU gdb (GDB) 6.8.50.20080620
...
This GDB was configured as "--host=i686-pc-cygwin --target=arm-elf".
I doubt this can be linked to GDB version. I recall I've seen this crash earlier with arm-elf-gdb 6.5 from GNUARM. I was just skipped this crash somehow but now I cannot continue my job.

Hello, I am working on a problem that might be related (although gdb version I am using is 6.7).

Could you try with this sample code? (from cmd line, set a breakpoint in printSimple and once the breakpoint is hit do 'print p' and see if the output makes sense).

Thanks,

Aleksandar Ristovski
QNX Software Systems

#include <stdlib.h>
#include <stdio.h>

struct participant {
char name[15];
char country[15];
float score;
int age;
};

void printSimpleP(struct participant *p)
{
printf("Name: %s, Country: %s, Score: %f, Age: %d\n",
p->name, p->country, p->score, p->age);
}

void printSimple(struct participant p)
{
printSimpleP(&p);
}

int main(int argc, char *argv[]) {
struct participant p = { "Foo", "Bar", 1.2, 45 };
printSimple(p);
return 0;
}

Michael Snyder

2008-06-23 17:12:04 UTC

Post by Dmitry Smirnov
Hi,
I've encountered the very annoying problem with GDB. While debugging, it crashes for some reason and I cannot catch this moment. I have a possibility to attach to the running process with another GDB, but it does not help. Perhaps, there is some way to catch some system exception or something similar?
GNU gdb (GDB) 6.8.50.20080620
...
This GDB was configured as "--host=i686-pc-cygwin --target=arm-elf".
I doubt this can be linked to GDB version. I recall I've seen this crash earlier with arm-elf-gdb 6.5 from GNUARM. I was just skipped this crash somehow but now I cannot continue my job.
This arm-elf-gdb is running against skyeye simulator as a remote target.
Frame Function Args
0022E268 7C802532 (00000058
Program exited with code 037777777777.
Is there any way to stop arm-elf-gdb on some critical error?

Sounds annoying.

You're running on a Windows host, right? Doesn't Windows have
some mechanism for automatically catching a program that is
crashing, and holding it for the debugger? Like on a Mac?

If you can attach to the gdb before the crash, you might
try setting breakpoints on exit and _exit, abort, things
like that, and see if you can intercept it that way.

What about libsegfault? Is something like that available
on windows?

Eli Zaretskii

2008-06-23 18:23:03 UTC

Date: Mon, 23 Jun 2008 10:12:04 -0700
You're running on a Windows host, right? Doesn't Windows have
some mechanism for automatically catching a program that is
crashing, and holding it for the debugger?

That's true, but you need special code in the debugger to be able to
work like that (it's called JIT debugging, btw). And GDB doesn't
(yet) have such code.

Michael Snyder

2008-06-23 18:31:45 UTC

Post by Eli Zaretskii

Date: Mon, 23 Jun 2008 10:12:04 -0700
You're running on a Windows host, right? Doesn't Windows have
some mechanism for automatically catching a program that is
crashing, and holding it for the debugger?

That's true, but you need special code in the debugger to be able to
work like that (it's called JIT debugging, btw). And GDB doesn't
(yet) have such code.

My failing memory. Didn't I publish a patch for glibc about
a year ago that would allow that? I'm going to hate myself
if I never posted it...

Pedro Alves

2008-06-23 18:36:31 UTC

Post by Eli Zaretskii

Date: Mon, 23 Jun 2008 10:12:04 -0700
You're running on a Windows host, right? Doesn't Windows have
some mechanism for automatically catching a program that is
crashing, and holding it for the debugger?

That's true, but you need special code in the debugger to be able to
work like that (it's called JIT debugging, btw). And GDB doesn't
(yet) have such code.

You can set error_start in the CYGWIN environment variable
to point at GDB's executable to have GDB start automatically
on an exception (the op reported --host=i686-pc-cygwin), and,

for native Windows apps, there is some registry key (I don't
remember which) you can set to point to a JIT debugger. Probably
a little exe wrapper is needed to translate the incoming args
to GDB args, that's all. I can't see what changes in GDB
would be required?

--
Pedro Alves

Brian Dessent

2008-06-23 19:39:04 UTC

Post by Pedro Alves
for native Windows apps, there is some registry key (I don't
remember which) you can set to point to a JIT debugger. Probably

For the sake of the archives it's HKLM\SOFTWARE\Microsoft\Windows
NT\CurrentVersion\AeDebug, see <http://support.microsoft.com/kb/103861>.

Post by Pedro Alves
a little exe wrapper is needed to translate the incoming args
to GDB args, that's all.

The first %ld in the command expands to the faulting PID so
"path-to\gdb.exe -p %ld" ought to work without need for a wrapper.

Post by Pedro Alves
I can't see what changes in GDB
would be required?

The main problem is that gdb thinks that it's attaching to a normal
running process, rather than a faulted app. Thus the current thread is
an artificial one with %eip in ntdll!DbgUiConnectToDbg which is just a
convenient 'int3' that lives in ntdll.dll as discussed. That's easily
fixed just by switching to the thread of interest, however the state of
that thread often looks something bogus like this:

(gdb) bt
#0 0x7c90eb94 in ntdll!LdrAccessResource ()
from C:\WINXP\system32\ntdll.dll
#1 0x7c90e9ab in ntdll!ZwWaitForMultipleObjects ()
from C:\WINXP\system32\ntdll.dll
#2 0x7c86372c in UnhandledExceptionFilter ()
from C:\WINXP\system32\kernel32.dll
#3 0x00000002 in ?? ()
#4 0x0022f560 in ?? ()
#5 0x00000001 in ?? ()
#6 0x00000001 in ?? ()
#7 0x00000000 in ?? ()

The actual location of the fault in the user code is nowhere evident (in
this testcase, it was at eip 00401304) because gdb doesn't know that it
has picked up in the middle of a fault. You can get things back on
track by re-triggering the same fault again by continuing, but this
really isn't pretty and doesn't always work. For this to work correctly
gdb would need some command line switch to tell it that it's taking over
a faulted process as the JIT, rather than breaking in on a normally
executing one.

Brian

Dr. Rolf Jansen

2008-06-23 20:50:02 UTC

Hi,

Here comes a poor mans approach.

1. add in the file gdb/main.c at line 661 the
following code:

#if defined (POOR_MAN_DEBUG)
printf_filtered ("[GDB PID %d]\n", getpid());
t0 = time(NULL);
while ((diff = difftime(time(NULL), t0)) <= 60);
#endif

and at line 43

#if defined (POOR_MAN_DEBUG)
#include <sys/time.h>
double diff;
time_t t0 = time(NULL);
#endif

2. issue before the configure command:
export CFLAGS="-g -O0 -DPOOR_MAN_DEBUG"
export CXXFLAGS=$CFLAGS

3. configure and make your cross-686-pc-cygwin/arm-elf-gdb

4. start the cross-debugging session

5. now you have 60 seconds to attach your host-gdb
to the just printed PID of your cross-gdb, and to
set a reasonable breakpoint somewhere near to the
location where you expect the crash to happen.

6. after 60 s the cross-gdb will continue running and
your host-gdb shall stop the cross-gdb at the
breakpoint that you set.

7. Now step through until the crash occurs.

Hopefully this helps.

Best regards

Rolf Jansen

Post by Dmitry Smirnov
Hi,
I've encountered the very annoying problem with GDB. While
debugging, it crashes for some reason and I cannot catch this
moment. I have a possibility to attach to the running process with
another GDB, but it does not help. Perhaps, there is some way to
catch some system exception or something similar?
I'm using a litle bit complex setup, so there is no much freedom (at
least I cannot simplify the situation). First, I'm debugging from
Eclipse (and it looks this crash happens only while running from
Eclipse, I've tried from commad line - there is no crash). The
debugger I'm using is cross-compiled arm-elf-gdb. It is compiled on
GNU gdb (GDB) 6.8.50.20080620
...
This GDB was configured as "--host=i686-pc-cygwin --target=arm-elf".
I doubt this can be linked to GDB version. I recall I've seen this
crash earlier with arm-elf-gdb 6.5 from GNUARM. I was just skipped
this crash somehow but now I cannot continue my job.
This arm-elf-gdb is running against skyeye simulator as a remote target.
As you can see there are many possibilities for mailfunctioning
software (Eclipse, GDB, skyeye). But the only way I can find the
root cause is to debug the crash in arm-elf-gdb. While crashing it
attempts to create the crashdump file, but it is incomplete and
Cygwin gdb cannot recognize it. Typically it contains just three
Frame Function Args
0022E268 7C802532 (00000058
Program exited with code 037777777777.
Is there any way to stop arm-elf-gdb on some critical error?
Dmitry

Dr. Rolf Jansen

2008-06-23 20:59:10 UTC

Sorry,

the code that needs to be added at line 43 of gdb/main.c should read:

#if defined (POOR_MAN_DEBUG)
#include <sys/time.h>
double diff;
time_t t0;
#endif

Before adding this to line 43, you should make the addition at line
661, otherwise its line 667.

Best regards

Rolf Jansen

Post by Dr. Rolf Jansen
Hi,
Here comes a poor mans approach.
1. add in the file gdb/main.c at line 661 the
#if defined (POOR_MAN_DEBUG)
printf_filtered ("[GDB PID %d]\n", getpid());
t0 = time(NULL);
while ((diff = difftime(time(NULL), t0)) <= 60);
#endif
and at line 43
#if defined (POOR_MAN_DEBUG)
#include <sys/time.h>
double diff;
time_t t0 = time(NULL);
#endif
export CFLAGS="-g -O0 -DPOOR_MAN_DEBUG"
export CXXFLAGS=$CFLAGS
3. configure and make your cross-686-pc-cygwin/arm-elf-gdb
4. start the cross-debugging session
5. now you have 60 seconds to attach your host-gdb
to the just printed PID of your cross-gdb, and to
set a reasonable breakpoint somewhere near to the
location where you expect the crash to happen.
6. after 60 s the cross-gdb will continue running and
your host-gdb shall stop the cross-gdb at the
breakpoint that you set.
7. Now step through until the crash occurs.
Hopefully this helps.
Best regards
Rolf Jansen

Post by Dmitry Smirnov
Hi,
I've encountered the very annoying problem with GDB. While
debugging, it crashes for some reason and I cannot catch this
moment. I have a possibility to attach to the running process with
another GDB, but it does not help. Perhaps, there is some way to
catch some system exception or something similar?
I'm using a litle bit complex setup, so there is no much freedom
(at least I cannot simplify the situation). First, I'm debugging
from Eclipse (and it looks this crash happens only while running
from Eclipse, I've tried from commad line - there is no crash). The
debugger I'm using is cross-compiled arm-elf-gdb. It is compiled on
GNU gdb (GDB) 6.8.50.20080620
...
This GDB was configured as "--host=i686-pc-cygwin --target=arm-elf".
I doubt this can be linked to GDB version. I recall I've seen this
crash earlier with arm-elf-gdb 6.5 from GNUARM. I was just skipped
this crash somehow but now I cannot continue my job.
This arm-elf-gdb is running against skyeye simulator as a remote target.
As you can see there are many possibilities for mailfunctioning
software (Eclipse, GDB, skyeye). But the only way I can find the
root cause is to debug the crash in arm-elf-gdb. While crashing it
attempts to create the crashdump file, but it is incomplete and
Cygwin gdb cannot recognize it. Typically it contains just three
Frame Function Args
0022E268 7C802532 (00000058
Program exited with code 037777777777.
Is there any way to stop arm-elf-gdb on some critical error?
Dmitry

Dmitry Smirnov

2008-06-24 08:51:36 UTC

Hi,

I'm sorry guys, but I believe we are going wrong way.
First, I suppose I do not need JIT: as I said, I can attach to the
running arm-elf-gdb before the crash. I supposed that GDB is smart
enough to catch system exceptions. Moreover, I have some indication
of that: my skyeye is Cygwin-compiled program and when I run it in
Eclipse (which uses Cygwin GDB as a debugger for this program), I can
see the follwing stack:
-----------------
Skyeye_1.2.4 Cygwin GCC [C/C++ Local Application]
Cygwin gdb Debugger (23.06.08 20:12) (Suspended)
Thread [1] (Suspended: Signal 'SIGSEGV' received. Description: Segmentation fault.)
3 RpcRaiseException() 0x77ea27ea
2 h_errno() 0x662b7258
1 <symbol is not available> 0x00000000
Thread [2] (Suspended)
gdb (23.06.08 20:12)
D:\Dvs\Project\Skyeye_1.2.4\binary\skyeye.exe (23.06.08 20:12)
----------------------
Here is the stack of command-line GDB:
----------------------
Program received signal SIGSEGV, Segmentation fault.
0x77ea27ea in RpcRaiseException () from /c/WINDOWS/system32/rpcrt4.dll
(gdb) info stack
#0 0x77ea27ea in RpcRaiseException () from /c/WINDOWS/system32/rpcrt4.dll
#1 0x662b7258 in h_errno () from /c/WINDOWS/system32/hnetcfg.dll
#2 0x00000000 in ?? () from
------------------------

I'm 99% sure that Cygwin GDB can intersept these sygnals.

Am I wrong? Perhaps this a Cygwin_libs-generated signal?

Second, as I mentioned in first mail, someone is trying to save the
crashdump file. Who's that guy? Why it fails to save? Can I set a breakpoint
just before he starts to save (e.g. when it detects the crash)?

Is it possible that arm-elf-gdb itself is handling some signals thus preventing
Cygwin GDB from handling it? Where is that code?

Dmitry

Dmitry Smirnov

2008-06-24 12:38:48 UTC

Hi!

I have to say that your advices were useful. I was able to catch the crash.
Maybe it will be interesting for you.
First, after I attached to the process, I've set 3 breakpoints: exit, _exit, abort (which is cygwin1!abort in fact).
Next, I've noticed that program is stopped in that weird state Brian wrote about:

(gdb) bt
#0 0x7c901231 in ntdll!DbgUiConnectToDbg () from /c/WINDOWS/system32/ntdll.dll
#1 0x7c9507a8 in ntdll!KiIntSystemCall () from /c/WINDOWS/system32/ntdll.dll
#2 0x00000005 in ?? ()

I've tried to step through the code: three or four 'ni' commands and, oops.. it looks the program has been resumed.
After that, I performed actions that lead to the crash (it is just issuing 'ni' for arm-elf-gdb from Eclipse) and I was brought to the following situation:

Breakpoint 2, 0x61084819 in cygwin1!abort () from /usr/bin/cygwin1.dll
(gdb)
0x6108481e in cygwin1!abort () from /usr/bin/cygwin1.dll
(gdb) info th
3 thread 5424.0x114c 0x7c90eb94 in ntdll!LdrAccessResource ()
from /c/WINDOWS/system32/ntdll.dll
2 thread 5424.0x1108 0x7c90eb94 in ntdll!LdrAccessResource ()
from /c/WINDOWS/system32/ntdll.dll
* 1 thread 5424.0x1bc 0x6108481e in cygwin1!abort ()
from /usr/bin/cygwin1.dll
(gdb) bt
#0 0x6108481e in cygwin1!abort () from /usr/bin/cygwin1.dll
#1 0x61086e60 in sigfillset () from /usr/bin/cygwin1.dll
#2 0x0040b6b7 in internal_verror (file=0x62380b ".././gdb/mi/mi-interp.c",
line=340, fmt=0x6237ed "%s: Assertion `%s' failed.", ap=0x22e64c "-7b")
at utils.c:809
#3 0x0040b6f6 in internal_error (file=0x62380b ".././gdb/mi/mi-interp.c",
line=340, string=0x6237ed "%s: Assertion `%s' failed.") at utils.c:818
#4 0x0048cfec in mi_on_resume (ptid={pid = 42000, lwp = 0, tid = 0})
at .././gdb/mi/mi-interp.c:340
#5 0x0046377f in observer_target_resumed_notification_stub (data=0x48cf40,
args_data=0x22e6b0) at observer.inc:378
#6 0x00463052 in generic_observer_notify (subject=0x1, args=0x5e74ac)
at observer.c:166
#7 0x004637fe in observer_notify_target_resumed (ptid=
{pid = 42000, lwp = 0, tid = 0}) at observer.inc:402
#8 0x0047bd22 in set_running (ptid={pid = 42000, lwp = 0, tid = 0},
running=1) at thread.c:435
#9 0x004265a0 in resume (step=1, sig=TARGET_SIGNAL_HUP) at infrun.c:1063
#10 0x004294c0 in proceed (addr=4294967295, siggnal=TARGET_SIGNAL_DEFAULT,
step=1) at infrun.c:1265
#11 0x00412564 in step_1 (skip_subroutines=1, single_inst=1, count_string=0x0)
at infcmd.c:789
#12 0x00402433 in execute_command (p=0x22e872 "", from_tty=1) at top.c:466
#13 0x004116e2 in catch_exception (uiout=0x100f0c80,
func=0x48e040 <do_captured_execute_command>, func_args=0x22e898, mask=6)
at exceptions.c:463
#14 0x0048e0e6 in cli_interpreter_exec (data=0x0, command_str=0x103230b0 "ni")
at .././gdb/cli/cli-interp.c:130
#15 0x0041acfb in interp_exec (interp=0x100f0ce8, command_str=0x103230b0 "ni")
at interps.c:325
#16 0x0048cc49 in mi_cmd_interpreter_exec (
command=0x64d54e "-interpreter-exec", argv=0x22e998, argc=2)
at .././gdb/mi/mi-interp.c:209
#17 0x00505a45 in captured_mi_execute_command (uiout=0x100f1660,
data=0x22ea40) at .././gdb/mi/mi-main.c:1104
#18 0x004116e2 in catch_exception (uiout=0x100f1660,
func=0x5057a0 <captured_mi_execute_command>, func_args=0x22ea40, mask=6)
at exceptions.c:463
#19 0x0050558a in mi_execute_command (cmd=0x10ef9ea0 "229 ni", from_tty=1)
at .././gdb/mi/mi-main.c:1159
#20 0x0048cd49 in mi_execute_command_wrapper (cmd=0x10ef9ea0 "229 ni")
at .././gdb/mi/mi-interp.c:265
#21 0x00436e5a in handle_file_event (event_file_desc=0) at event-loop.c:732
#22 0x004368c2 in process_event () at event-loop.c:341
#23 0x004371a5 in gdb_do_one_event (data=0x0) at event-loop.c:378
#24 0x0041192b in catch_errors (func=0x437020 <gdb_do_one_event>,
func_args=0x0, errstring=0x607b20 "", mask=6) at exceptions.c:509
#25 0x00436914 in start_event_loop () at event-loop.c:404
#26 0x004010ab in captured_command_loop (data=0x0) at .././gdb/main.c:99
#27 0x0041192b in catch_errors (func=0x4010a0 <captured_command_loop>,
func_args=0x0, errstring=0x5f7139 "", mask=6) at exceptions.c:509
#28 0x00401914 in captured_main (data=0x22ee10) at .././gdb/main.c:882
#29 0x0041192b in catch_errors (func=0x4010f0 <captured_main>,
func_args=0x22ee10, errstring=0x5f7139 "", mask=6) at exceptions.c:509
#30 0x00402113 in gdb_main (args=0x22ee10) at .././gdb/main.c:891
#31 0x0040109b in main (argc=8, argv=0x100301a0) at gdb.c:33
(gdb)

Now I can start investigations.

Dmitry

Pedro Alves

2008-06-24 12:58:20 UTC

#3 0x0040b6f6 in internal_error (file=0x62380b ".././gdb/mi/mi-interp.c",
line=340, string=0x6237ed "%s: Assertion `%s' failed.") at utils.c:818
#4 0x0048cfec in mi_on_resume (ptid={pid = 42000, lwp = 0, tid = 0})
at .././gdb/mi/mi-interp.c:340

42000 looks a lot like remote.c:MAGIC_NULL_PID, and it wasn't on the
thread list at this point:

static void
mi_on_resume (ptid_t ptid)
{
if (PIDGET (ptid) == -1)
fprintf_unfiltered (raw_stdout, "*running,thread-id=\"all\"\n");
else
{
struct thread_info *ti = find_thread_pid (ptid);
gdb_assert (ti); <<<<<<<<<<<< assert here
fprintf_unfiltered (raw_stdout, "*running,thread-id=\"%d\"\n", ti->num);
}
}

And, it looks like your stub does not implement any thread support?

It seems we either need to make sure remote.c always registers
a thread, or remove that assert. I would prefer the former,
as it's a requirement to getting rid of context switching
on the core side.

--
Pedro Alves

Dmitry Smirnov

2008-06-24 17:02:49 UTC

Finally, I was able to gather some logs from Cygwin GDB. For this, I've switched debug configuration in Eclipse from gdbServer to Cygwin GDB. It differs in how it starts the program. gdbServer connects to remote target by itself, whereas Cygwin GDB allows me to do it manually from its console.
Below is the log. I've set additional breakpoint at mi_execute_command to see what coommands are issued by Eclipse.
As you can see, first time mi_on_resume is called with ptid={pid = -1, lwp = 0, tid = 0}. Something happens between this call and second call where ptid={pid = 42000, lwp = 0, tid = 0}.

I'm going to figure out which command is followed by wrong pid. I'm suspecting memory corruption.

I have to note that I didn't see some of these commands while debugging gdbServer as a Java code (CDT debugger) from Eclipse. I've seen the following commands that are common for both cases:
info threads
-stack-info-depth
-stack-list-frames
-data-list-changed-registers
-stack-list-arguments 0 0 0
info signal SIGHUP
-data-disassemble -f <my_file> -l 333 -n 100 -- 1
-data-disassemble -s 0x8c4a8e -e 0x8c4af2 -- 0
-stack-list-locals 0

I will try to simulate these command with console (do not use Eclipse), so if you know equivalent commands for GDB console, please let me know.

BTW, command 'info threads' gives me
(gdb) info threads
warning: RMT ERROR : failed to get remote thread list.

Dmitry

(gdb) attach 4760
Attaching to program `/cygdrive/d/Install/GDB/gdb-6.8.50.20080620/gdb-6.8.50.200
80620/gdb/gdb.exe', process 4760
[Switching to thread 4760.0xcbc]
(gdb) ni
0x7c9507a8 in ntdll!KiIntSystemCall () from /c/WINDOWS/system32/ntdll.dll
(gdb)
0x7c9507bb in ntdll!KiIntSystemCall () from /c/WINDOWS/system32/ntdll.dll
(gdb)
0x7c9507bf in ntdll!KiIntSystemCall () from /c/WINDOWS/system32/ntdll.dll
(gdb)
0x7c9507c1 in ntdll!KiIntSystemCall () from /c/WINDOWS/system32/ntdll.dll
(gdb)
[Switching to thread 4760.0x1150]

Breakpoint 5, mi_execute_command (
cmd=0x113a4610 "182-interpreter-exec console \"info b\"", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {
(gdb) c
Continuing.

Breakpoint 5, mi_execute_command (cmd=0x1392c8c8 "183-exec-continue",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {
(gdb) c
Continuing.

Breakpoint 4, mi_on_resume (ptid={pid = -1, lwp = 0, tid = 0})
at .././gdb/mi/mi-interp.c:335
335 if (PIDGET (ptid) == -1)
(gdb) c
Continuing.

Breakpoint 5, mi_execute_command (cmd=0x138c2700 "184 info threads",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {
(gdb) c
Continuing.

Breakpoint 5, mi_execute_command (cmd=0x13931a38 "185-stack-info-depth",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {
(gdb) c
Continuing.

Breakpoint 5, mi_execute_command (
cmd=0x10d2abf0 "186-stack-list-frames 0 11", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {
(gdb) c
Continuing.

Breakpoint 5, mi_execute_command (
cmd=0x10c3d120 "187-data-list-changed-registers", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {
(gdb) c
Continuing.

Breakpoint 5, mi_execute_command (cmd=0x10c3d2c0 "188 info sharedlibrary",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {
(gdb) c
Continuing.

Breakpoint 5, mi_execute_command (cmd=0x10cfd660 "189 info signal SIGHUP",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {
(gdb) c
Continuing.

Breakpoint 5, mi_execute_command (
cmd=0x1144e0a8 "190-data-disassemble -f <my_file> -l 333 -n 100 -- 1", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (
cmd=0x10d438b8 "191-stack-list-arguments 0 0 0", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (
cmd=0x10d2a770 "192-data-disassemble -s 0x8c4a8e -e 0x8c4af2 -- 0",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (cmd=0x10d2a820 "193-stack-list-locals 0",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (cmd=0x10d2a8d0 "194 whatis i", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (cmd=0x10f3ea98 "195 whatis rt_status",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (cmd=0x10f34780 "196 whatis __result",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (cmd=0x10f3e748 "197-var-create - * i",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (
cmd=0x10f39150 "198-var-evaluate-expression var1", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (
cmd=0x1131f9e8 "199-var-create - * rt_status", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (
cmd=0x11244350 "200-var-evaluate-expression var2", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (
cmd=0x1127bb70 "201-var-create - * __result", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (
cmd=0x1127bdc8 "202-var-evaluate-expression var3", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {
(gdb)
Continuing.

Breakpoint 5, mi_execute_command (
cmd=0x112d6500 "203-exec-next-instruction 1", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {
(gdb) c
Continuing.

Breakpoint 4, mi_on_resume (ptid={pid = 42000, lwp = 0, tid = 0})
at .././gdb/mi/mi-interp.c:335
335 if (PIDGET (ptid) == -1)
(gdb) c
Continuing.

Breakpoint 2, 0x61084819 in cygwin1!abort () from /usr/bin/cygwin1.dll
(gdb)

Pedro Alves

2008-06-24 17:29:28 UTC

Post by Dmitry Smirnov
As you can see, first time
mi_on_resume is called with ptid={pid = -1, lwp = 0, tid = 0}. Something
happens between this call and second call where ptid={pid = 42000, lwp = 0,
tid = 0}.
I'm going to figure out which command is followed by wrong pid. I'm
suspecting memory corruption.

This pid is used by GDB internally when you are connected to a target
that does not support threads at all, like small embedded systems.
If the remote side doesn't have a notion of thread ids or pids, that
magic number is what GDB will use internally. This code path can
be triggered for example while stepping over a breakpoint (doing
nexti / -exec-next-instruction while stopped at a breakpoint).

Post by Dmitry Smirnov
BTW, command 'info threads' gives me
(gdb) info threads
warning: RMT ERROR : failed to get remote thread list.

... which this seems to corroborate.

This is a new bug in GDB that was introduced recently.

I've already said in my last reply what needs to be done
to fix it.

If your target *does* support threads and the remote protocol
thread related packets, than there's yet another bug somewhere
else.

--
Pedro Alves

Dmitry Smirnov

2008-06-25 08:02:33 UTC

Hi Pedro,

I'll try to figure out, whether skyeye (which is remote target) supports notion of thread ids or pids. Now I just suppose it does not support.
Nevertheless, I do not believe this is related to a crash.

As I said previously, I was debugging this program (ARM code) for some time previously. It is very complex program, it consists of several ELF files. Before this one, I've debugged couple more. I've used 'ni' and/or -exec-next-instruction a lot and didn't see a lot crashes. I remember that I've seen couple of them (in similar scenario: when I've set a BP and run till it hits) but I had eliminated it by moving the breakpoint to another location.
It is just feeling, no more, that crash is somehow connected with stack trace: previously I didn't see much stack traces. In most cases there was 2 or 3 frames (in fact, call stack should have more frames, but GDB was not able to correctly detect frames. This may be caused by assembler code which does not follow ABI).

BTW, I've just realized that command-line interface does not use mi_* interface (neither mi_on_resume nor mi_execute_command were hit) and this is most likely the reason why I cannot reproduce this test case with CLI.

Dmitry

-----Original Message-----
From: Pedro Alves <***@codesourcery.com>
To: ***@sourceware.org, Dmitry Smirnov <***@mail.ru>
Date: Tue, 24 Jun 2008 18:29:28 +0100
Subject: Re: How to catch GDB crash

Post by Pedro Alves

Post by Dmitry Smirnov
As you can see, first time
mi_on_resume is called with ptid={pid = -1, lwp = 0, tid = 0}. Something
happens between this call and second call where ptid={pid = 42000, lwp = 0,
tid = 0}.
I'm going to figure out which command is followed by wrong pid. I'm
suspecting memory corruption.

This pid is used by GDB internally when you are connected to a target
that does not support threads at all, like small embedded systems.
If the remote side doesn't have a notion of thread ids or pids, that
magic number is what GDB will use internally. This code path can
be triggered for example while stepping over a breakpoint (doing
nexti / -exec-next-instruction while stopped at a breakpoint).

Post by Dmitry Smirnov
BTW, command 'info threads' gives me
(gdb) info threads
warning: RMT ERROR : failed to get remote thread list.

... which this seems to corroborate.
This is a new bug in GDB that was introduced recently.
I've already said in my last reply what needs to be done
to fix it.
If your target *does* support threads and the remote protocol
thread related packets, than there's yet another bug somewhere
else.
--
Pedro Alves

Pedro Alves

2008-06-25 23:27:58 UTC

Post by Dmitry Smirnov
Hi Pedro,
I'll try to figure out, whether skyeye (which is remote target) supports
notion of thread ids or pids. Now I just suppose it does not support.
Nevertheless, I do not believe this is related to a crash.

Yes it is. :-)

Post by Dmitry Smirnov
As I said previously, I was debugging this program (ARM code) for some time
previously.

But you've certainly upgraded your GDB recently (I can tell by your log
output on your original post). As I said, this is a recently introduced
regression.

I've was able to reproduce the problem, by connecting to a local
gdbserver with a GDB with all thread support hacked out in the
remote target.

Post by Dmitry Smirnov
BTW, I've just realized that command-line interface does not use mi_*
interface (neither mi_on_resume nor mi_execute_command were hit) and this
is most likely the reason why I cannot reproduce this test case with CLI.

Yes, that's exactly the reason.

Anyway, I've posted a patch that fixes the issue in your case
(it was actually a side effect of something else I was doing),
although we may need to get rid of the assert you weren't tripping
at for the time being (there are other targets other than
remote that will also trip on the assert).

Vladimir, not sure if you noticed the issue, as it's buried in
this long thread? We can always leave the crash in place to
force targets to follow our evil plot of always registering the
main thread. :-)
I'd post a patch for it, but I don't know if we should output
thread-id=0 in that case, or not output thread-id
at all ...

--
Pedro Alves

Dmitry Smirnov

2008-06-26 13:56:26 UTC

I still have some doubts :-)

Below is a new log of my debug session. I've set the same mi_execute_command and mi_on_resume. Last one prints the value of 'inferior_ptid' when hit. Also, from Eclipse I've issues command 'ni' before 'c'. As you can see, 'inferior_ptid' it is equal to {pid = 42000, lwp = 0, tid = 0} all the time whereas mi_on_resume is called with {pid = -1, lwp = 0, tid = 0} in all cases except last one.

On my mind it indicates that while executing last 'ni', function resume() in file infrun.c goes different way and it assigned 'inferior_ptid' to 'resume_ptid' instead of default RESUME_ALL.

Digging...

Dmitry

GNU gdb 6.3.50_2004-12-28-cvs (cygwin-special)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-cygwin".
(gdb) attach 2348
Attaching to process 2348
Reading symbols from /cygdrive/d/Install/GDB/gdb-6.8.50.20080620/gdb-6.8.50.2008
0620/gdb/gdb.exe...done.
[Switching to thread 2348.0x15bc]
(gdb) info b
No breakpoints or watchpoints.
(gdb) b mi_on_resume
Breakpoint 1 at 0x48cf46: file .././gdb/mi/mi-interp.c, line 335.
(gdb) b mi_execute_command
Breakpoint 2 at 0x505537: file .././gdb/mi/mi-main.c, line 1135.
(gdb) commands 1
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".

print inferior_ptid
c
end

(gdb) commands 2
Type commands for when breakpoint 2 is hit, one per line.
End with a line saying just "end".

c
end

(gdb) ni
0x7c9507a8 in ntdll!KiIntSystemCall () from /c/WINDOWS/system32/ntdll.dll
(gdb)
0x7c9507bb in ntdll!KiIntSystemCall () from /c/WINDOWS/system32/ntdll.dll
(gdb)
0x7c9507bf in ntdll!KiIntSystemCall () from /c/WINDOWS/system32/ntdll.dll
(gdb)
0x7c9507c1 in ntdll!KiIntSystemCall () from /c/WINDOWS/system32/ntdll.dll
(gdb)
[Switching to thread 2348.0x740]

Breakpoint 2, mi_execute_command (cmd=0x138f74a0 "303 ni", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 1, mi_on_resume (ptid={pid = -1, lwp = 0, tid = 0})
at .././gdb/mi/mi-interp.c:335
335 if (PIDGET (ptid) == -1)
$1 = {pid = 42000, lwp = 0, tid = 0}

Breakpoint 2, mi_execute_command (cmd=0x1393df78 "304 info threads",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x1393dfd0 "305-stack-info-depth",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x1395a678 "306-stack-list-frames 0 1",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x1395a6d0 "307-data-list-changed-registers", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x13962898 "308 info sharedlibrary",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x139628f0 "309-stack-list-arguments 0 0 0", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x13927738 "310-stack-list-locals 0",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x13927790 "311-interpreter-exec console \"info b\"", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x13977ce8 "312 c", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 1, mi_on_resume (ptid={pid = -1, lwp = 0, tid = 0})
at .././gdb/mi/mi-interp.c:335
335 if (PIDGET (ptid) == -1)
$2 = {pid = 42000, lwp = 0, tid = 0}

Breakpoint 2, mi_execute_command (cmd=0x13974bd8 "313 info threads",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x13980e98 "314-stack-info-depth",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x14770128 "315-stack-list-frames 0 11", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x14770010 "316-data-list-changed-registers", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x10ef8958 "317 info sharedlibrary",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x10ef89b0 "318-data-disassemble -f /c/p4/views/KSW_S4000_WP_DEV_unknown
/qct/drivers/parb/parb.c -l 333 -n 100 -- 1", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x1476fe18 "319-data-disassemble -s 0x8c4a8e -e 0x8c4af2 -- 0",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x1172a8d0 "320-stack-list-arguments 0 0 0", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x11738a00 "321-stack-list-locals 0",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x1174b920 "322 whatis i", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x11758f10 "323 whatis rt_status",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x11761360 "324 whatis __result",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x11761548 "325-var-create - * i",
from_tty=1) at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x117616e8 "326-var-evaluate-expression var1", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x117c3978 "327-var-create - * rt_status", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x11785568 "328-var-evaluate-expression var2", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x1178b4b8 "329-var-create - * __result", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (
cmd=0x1178b550 "330-var-evaluate-expression var3", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 2, mi_execute_command (cmd=0x11795130 "331 ni", from_tty=1)
at .././gdb/mi/mi-main.c:1135
1135 {

Breakpoint 1, mi_on_resume (ptid={pid = 42000, lwp = 0, tid = 0})
at .././gdb/mi/mi-interp.c:335
335 if (PIDGET (ptid) == -1)
$3 = {pid = 42000, lwp = 0, tid = 0}

Program exited with code 037777777777.
(gdb)

Pedro Alves

2008-06-26 14:20:54 UTC

Post by Dmitry Smirnov
I still have some doubts :-)

Well, the doubts would go away if you tried the patches. :-)

I'm hoping to get to commit them today, though...

Post by Dmitry Smirnov
Below is a new log of my debug session. I've set the same
mi_execute_command and mi_on_resume. Last one prints the value of
'inferior_ptid' when hit. Also, from Eclipse I've issues command 'ni'
before 'c'. As you can see, 'inferior_ptid' it is equal to {pid = 42000,
lwp = 0, tid = 0} all the time whereas mi_on_resume is called with {pid =
-1, lwp = 0, tid = 0} in all cases except last one.
On my mind it indicates that while executing last 'ni', function resume()
in file infrun.c goes different way and it assigned 'inferior_ptid' to
'resume_ptid' instead of default RESUME_ALL.

Did you actually look at the function that is asserting? Here it is again:

static void
mi_on_resume (ptid_t ptid)
{
if (PIDGET (ptid) == -1)
fprintf_unfiltered (raw_stdout, "*running,thread-id=\"all\"\n");
else
{
struct thread_info *ti = find_thread_pid (ptid);
gdb_assert (ti);
fprintf_unfiltered (raw_stdout, "*running,thread-id=\"%d\"\n", ti->num);
}
}

Calling the resume functions with {-1,0,0} means "let all threads execute",
while with {42000,0,0} meant, "let only this thread execute". This last
case happens normally when GDB is trying to step over a breakpoint:

- remove breakpoints
- step only the thread of interest, leaving others stopped, so if they happen
to be executing the same code, they don't miss the breakpoint
- reinsert breakpoints
- now safe to resume all threads

It just happens that in your case there's only one "thread" always,
but the core of inferior control in GDB doesn't care and sends {42000,0,0}
anyway. The problem was that this assert is there because this function
assumes threads are always registered in the thread table, while that
is unfortunatelly still not always true throughout all of GDB's supported
targets.

Post by Dmitry Smirnov
Breakpoint 1, mi_on_resume (ptid={pid = 42000, lwp = 0, tid = 0})
at .././gdb/mi/mi-interp.c:335
335 if (PIDGET (ptid) == -1)
$3 = {pid = 42000, lwp = 0, tid = 0}
Program exited with code 037777777777.
(gdb)

--
Pedro Alves

Dmitry Smirnov

2008-06-26 14:32:46 UTC

Ok, you'd convinced me :-)
You are right, the last resume() is executed with stepping_over_breakpoint equal to 1 whereas previously it was 0.

I'll try your patch (I can find it in gdb-patches, right?).

Dmitry

-----Original Message-----
From: Pedro Alves <***@codesourcery.com>
To: ***@sourceware.org, Dmitry Smirnov <***@mail.ru>
Date: Thu, 26 Jun 2008 15:20:54 +0100
Subject: Re: How to catch GDB crash

Post by Pedro Alves

Post by Dmitry Smirnov
I still have some doubts :-)

Well, the doubts would go away if you tried the patches. :-)
I'm hoping to get to commit them today, though...

Post by Dmitry Smirnov
Below is a new log of my debug session. I've set the same
mi_execute_command and mi_on_resume. Last one prints the value of
'inferior_ptid' when hit. Also, from Eclipse I've issues command 'ni'
before 'c'. As you can see, 'inferior_ptid' it is equal to {pid = 42000,
lwp = 0, tid = 0} all the time whereas mi_on_resume is called with {pid =
-1, lwp = 0, tid = 0} in all cases except last one.
On my mind it indicates that while executing last 'ni', function resume()
in file infrun.c goes different way and it assigned 'inferior_ptid' to
'resume_ptid' instead of default RESUME_ALL.

static void
mi_on_resume (ptid_t ptid)
{
if (PIDGET (ptid) == -1)
fprintf_unfiltered (raw_stdout, "*running,thread-id=\"all\"\n");
else
{
struct thread_info *ti = find_thread_pid (ptid);
gdb_assert (ti);
fprintf_unfiltered (raw_stdout, "*running,thread-id=\"%d\"\n", ti->num);
}
}
Calling the resume functions with {-1,0,0} means "let all threads execute",
while with {42000,0,0} meant, "let only this thread execute". This last
- remove breakpoints
- step only the thread of interest, leaving others stopped, so if they happen
to be executing the same code, they don't miss the breakpoint
- reinsert breakpoints
- now safe to resume all threads
It just happens that in your case there's only one "thread" always,
but the core of inferior control in GDB doesn't care and sends {42000,0,0}
anyway. The problem was that this assert is there because this function
assumes threads are always registered in the thread table, while that
is unfortunatelly still not always true throughout all of GDB's supported
targets.

Post by Dmitry Smirnov
Breakpoint 1, mi_on_resume (ptid={pid = 42000, lwp = 0, tid = 0})
at .././gdb/mi/mi-interp.c:335
335 if (PIDGET (ptid) == -1)
$3 = {pid = 42000, lwp = 0, tid = 0}
Program exited with code 037777777777.
(gdb)

--
Pedro Alves

Dmitry Smirnov

2008-06-30 15:56:56 UTC

Hi,

I've upgraded to gdb-6.8.50.20080630 and tested it.
Unfortunately, I was totally unable to run my test case from Eclipse CDT.
I've figured out that the problem is what GDB resonds to exec-run command.

Here is the Eclipse session log (it shows requests to GDB and responses)

84-exec-run
84^error,msg="Don't know how to run. Try \"help target\"."

I've tried older version of arm-elf-gdb (6.5):

123-exec-run
123^running
&"Don't know how to run. Try \"help target\".\n"
Don't know how to run. Try "help target".
123^error,msg="Don't know how to run. Try \"help target\"."

It seems that responses differs in 123^running (or maybe the order too?).

Is it possible to fix this?

Dmitry

-----Original Message-----
From: Dmitry Smirnov <***@mail.ru>
To: Pedro Alves <***@codesourcery.com>
Date: Thu, 26 Jun 2008 18:32:46 +0400
Subject: Re: How to catch GDB crash

Post by Dmitry Smirnov
Ok, you'd convinced me :-)
You are right, the last resume() is executed with stepping_over_breakpoint equal to 1 whereas previously it was 0.
I'll try your patch (I can find it in gdb-patches, right?).
Dmitry
-----Original Message-----
Date: Thu, 26 Jun 2008 15:20:54 +0100
Subject: Re: How to catch GDB crash

Post by Pedro Alves

Post by Dmitry Smirnov
I still have some doubts :-)

Well, the doubts would go away if you tried the patches. :-)

Dmitry Smirnov

2008-07-02 11:05:28 UTC

Hi,

Regarding the problem of -exec-run, I've suspended its investigation: I've found that 6.8.50.20080630 just didn't respond "running" and this seems reasonable. So, perhaps, previous version is misbehaving so causing Eclipse behave wrong way. Though, it is not clear why gdbServer CDT debugger also fails. Just postponed...

I've found another Eclipse CDT debugger variant that can run as I wish.
And here I have a problem that I was reported to Pedro earlier: Eclipse is unable to disassemble the code, show variables, etc.

I've noticed that GDB respond to "info threads" contains an error message. Below is the stack dump of this situation.
I'm suspecting this respond prevents Eclipse from doing right.
What is that "T1"?
JIC, here is the console log of Eclipse:

&"info threads\n"
Reply contains invalid hex digit 84
info threads
&"warning: RMT ERROR : failed to get remote thread list.\n"
warning: RMT ERROR : failed to get remote thread list.
Reply contains invalid hex digit 84
~"* 1 Thread <main>"
* 1 Thread <main>&"Reply contains invalid hex digit 84\n"
553^error,msg="Reply contains invalid hex digit 84"

Breakpoint 1, fromhex (a=84) at remote.c:3007
3007 error (_("Reply contains invalid hex digit %d"), a);
(gdb) bt
#0 fromhex (a=84) at remote.c:3007
#1 0x004ea0db in hex2bin (hex=0x1004ace8 "T1", bin=0x697701 "", count=1)
at remote.c:3023
#2 0x004ebe3a in remote_threads_extra_info (tp=0x1170cc60) at remote.c:2054
#3 0x0047c53f in print_thread_info (uiout=0x1006b278, requested_thread=-1)
at thread.c:537
#4 0x00402473 in execute_command (p=0x22c6bc "", from_tty=1) at top.c:466
#5 0x00411722 in catch_exception (uiout=0x1006b278,
func=0x48e3c0 <do_captured_execute_command>, func_args=0x22c6d8, mask=6)
at exceptions.c:463
#6 0x0048e466 in cli_interpreter_exec (data=0x0,
command_str=0x10ed21c0 "info threads") at .././gdb/cli/cli-interp.c:130
#7 0x0041ad3b in interp_exec (interp=0x1006b2e0,
command_str=0x10ed21c0 "info threads") at interps.c:325
#8 0x0048cf39 in mi_cmd_interpreter_exec (
command=0x64d5fb "-interpreter-exec", argv=0x22c7d8, argc=2)
at .././gdb/mi/mi-interp.c:209
#9 0x0050626a in captured_mi_execute_command (uiout=0x1006bc58,
data=0x11572ce0) at .././gdb/mi/mi-main.c:974
#10 0x00411722 in catch_exception (uiout=0x1006bc58,
func=0x505f30 <captured_mi_execute_command>, func_args=0x11572ce0, mask=6)
at exceptions.c:463
#11 0x00505d23 in mi_execute_command (cmd=0x1170ccb8 "551 info threads",
from_tty=1) at .././gdb/mi/mi-main.c:1026
#12 0x0048d019 in mi_execute_command_wrapper (
cmd=0x1170ccb8 "551 info threads") at .././gdb/mi/mi-interp.c:254
#13 0x0043703a in handle_file_event (event_file_desc=0) at event-loop.c:732
#14 0x00436aa2 in process_event () at event-loop.c:341
#15 0x00437385 in gdb_do_one_event (data=0x0) at event-loop.c:378
#16 0x0041196b in catch_errors (func=0x437200 <gdb_do_one_event>,
func_args=0x0, errstring=0x607b30 "", mask=6) at exceptions.c:509
#17 0x00436af4 in start_event_loop () at event-loop.c:404
#18 0x004010ab in captured_command_loop (data=0x0) at .././gdb/main.c:104
#19 0x0041196b in catch_errors (func=0x4010a0 <captured_command_loop>,
func_args=0x0, errstring=0x5f7139 "", mask=6) at exceptions.c:509
#20 0x00401c94 in captured_main (data=0x22cc40) at .././gdb/main.c:891
#21 0x0041196b in catch_errors (func=0x4010f0 <captured_main>,
func_args=0x22cc40, errstring=0x5f7139 "", mask=6) at exceptions.c:509
#22 0x00402153 in gdb_main (args=0x22cc40) at .././gdb/main.c:900
#23 0x0040109b in main (argc=6, argv=0x1002aee8) at gdb.c:33

-----Original Message-----
From: Dmitry Smirnov <***@mail.ru>
To: ***@sourceware.org
Date: Mon, 30 Jun 2008 19:56:56 +0400
Subject: Re: How to catch GDB crash

Post by Dmitry Smirnov
Hi,
I've upgraded to gdb-6.8.50.20080630 and tested it.
Unfortunately, I was totally unable to run my test case from Eclipse CDT.
I've figured out that the problem is what GDB resonds to exec-run command.
Here is the Eclipse session log (it shows requests to GDB and responses)
84-exec-run
84^error,msg="Don't know how to run. Try \"help target\"."
123-exec-run
123^running
&"Don't know how to run. Try \"help target\".\n"
Don't know how to run. Try "help target".
123^error,msg="Don't know how to run. Try \"help target\"."
It seems that responses differs in 123^running (or maybe the order too?).
Is it possible to fix this?
Dmitry
-----Original Message-----
Date: Thu, 26 Jun 2008 18:32:46 +0400
Subject: Re: How to catch GDB crash

Post by Dmitry Smirnov
Ok, you'd convinced me :-)
You are right, the last resume() is executed with stepping_over_breakpoint equal to 1 whereas previously it was 0.
I'll try your patch (I can find it in gdb-patches, right?).
Dmitry
-----Original Message-----
Date: Thu, 26 Jun 2008 15:20:54 +0100
Subject: Re: How to catch GDB crash

Post by Pedro Alves

Post by Dmitry Smirnov
I still have some doubts :-)

Well, the doubts would go away if you tried the patches. :-)

Pedro Alves

2008-07-02 11:52:15 UTC

Hi Dmitry, thanks for taking the time to investigate deeper.

Post by Dmitry Smirnov
Hi,
Regarding the problem of -exec-run, I've suspended its investigation: I've
found that 6.8.50.20080630 just didn't respond "running" and this seems
reasonable. So, perhaps, previous version is misbehaving so causing Eclipse
behave wrong way. Though, it is not clear why gdbServer CDT debugger also
fails. Just postponed...

I don't know what to say about this failure. The command was already
failing before, but it was outputting a spurious "^running". Why is
eclipse even trying to run a process in with "target remote", I don't know.

Post by Dmitry Smirnov
I've found another Eclipse CDT debugger variant that can run as I wish.
And here I have a problem that I was reported to Pedro earlier: Eclipse is
unable to disassemble the code, show variables, etc.
I've noticed that GDB respond to "info threads" contains an error message.
Below is the stack dump of this situation. I'm suspecting this respond
prevents Eclipse from doing right.
What is that "T1"?

I don't know. It looks like a bug in the stub/simulator.
Do you have a way to activate remote protocol debugging?
"set debug remote 1" is the GDB command to use.
The stub embedded in the simulator you're using doesn't seem to
support any other thread related packets, and I would expect
it to reply "" to this packet too.

In the mean time, I think the attached patch (untested other than
building it) is sensible. Could you give it a try?

--
Pedro Alves

Dmitry Smirnov

2008-07-02 12:50:00 UTC

Hi Pedro,

Here is the log:

(gdb) set debug remote 1
(gdb) info threads
Sending packet: $qL1160000000000000000#55...Ack
Packet received:
warning: RMT ERROR : failed to get remote thread list.
Sending packet: $qThreadExtraInfo,ffffffff#b5...Ack
Packet received: T1
* 1 Thread <main>Reply contains invalid hex digit 84

I'm wondering why GDB is trying to get ThreadExtraInfo if stub has responded that it does not support threads?
BTW, I didn't see this "Reply contains invalid hex digit 84" in older GDBs.

Does stub HAVE to support it?

P.S. I'll test your patch a little bit later and come back with results.

Dmitry

-----Original Message-----
From: Pedro Alves <***@codesourcery.com>
To: ***@sourceware.org, Dmitry Smirnov <***@mail.ru>
Date: Wed, 2 Jul 2008 12:52:15 +0100
Subject: Re: How to catch GDB crash

Post by Pedro Alves
Hi Dmitry, thanks for taking the time to investigate deeper.

Post by Dmitry Smirnov
Hi,
Regarding the problem of -exec-run, I've suspended its investigation: I've
found that 6.8.50.20080630 just didn't respond "running" and this seems
reasonable. So, perhaps, previous version is misbehaving so causing Eclipse
behave wrong way. Though, it is not clear why gdbServer CDT debugger also
fails. Just postponed...

I don't know what to say about this failure. The command was already
failing before, but it was outputting a spurious "^running". Why is
eclipse even trying to run a process in with "target remote", I don't know.

Post by Dmitry Smirnov
I've found another Eclipse CDT debugger variant that can run as I wish.
And here I have a problem that I was reported to Pedro earlier: Eclipse is
unable to disassemble the code, show variables, etc.
I've noticed that GDB respond to "info threads" contains an error message.
Below is the stack dump of this situation. I'm suspecting this respond
prevents Eclipse from doing right.
What is that "T1"?

I don't know. It looks like a bug in the stub/simulator.
Do you have a way to activate remote protocol debugging?
"set debug remote 1" is the GDB command to use.
The stub embedded in the simulator you're using doesn't seem to
support any other thread related packets, and I would expect
it to reply "" to this packet too.
In the mean time, I think the attached patch (untested other than
building it) is sensible. Could you give it a try?
--
Pedro Alves
ATTACHMENT: text/x-diff (dont_query_internal_threads.diff)

Pedro Alves

2008-07-05 03:14:43 UTC

Post by Dmitry Smirnov
I'm wondering why GDB is trying to get ThreadExtraInfo if stub has
responded that it does not support threads? BTW, I didn't see this "Reply
contains invalid hex digit 84" in older GDBs.

That's because versions of GDB until about a week ago didn't register
a main thread/task in the internal thread table if the target didn't
report thread support. Now we're always registering a main
thread even in that case. When you do "info threads", GDB queries
the target side for more info on each of the registered
threads. Since previously there was 0 threads registered, there
were 0 qThreadExtraInfo queries performed. Now, there's a thread, so
GDB core does 1 query. But, since this thread was added internally
by GDB behind the remote side's back, we need to bar it just
before letting the query out to the remote side. That's what
this new patch does.

Post by Dmitry Smirnov
Does stub HAVE to support it?

No, it's optional. But if the stub doesn't support it, it MUST
reply an empty response (an empty C string, as for all
unsupported packets). Failing to do that is what looks
like a bug in the simulator.

Post by Dmitry Smirnov
P.S. I'll test your patch a little bit later and come back with results.

Thanks!

--
Pedro Alves
l

Dmitry Smirnov

2008-07-07 08:35:43 UTC

Didn't have much time to digger.
It seems that Eclipse CDT debugger is confused with the delayed "running" response.
CDT can connect to the remote debugger, retireve stack (it is absent at this point, in fact), retrieve variables, disassebling. But when I try to run it ("-exec-continue") there is nothing responded from gdb (i.e. "running" until it hits the breakpoint. Since in my case, it takes significant time to get to BP, Eclipse decides that target is not responding and either terminates ("gdbServer" CDT debugger) or behaves oddly ("Hardware" CDT Debugger): it sends commands like usual but do not show retrieved data (gdb itself responses correctly).

Post by Dmitry Smirnov
P.S. I'll test your patch a little bit later and come back with results.

Pedro Alves

2008-07-07 14:28:57 UTC

Post by Dmitry Smirnov
Didn't have much time to digger.
It seems that Eclipse CDT debugger is confused with the delayed "running" response.

Oh, sorry if I didn't make it clear. This patch was to fix the "info threads"
issue you found, not the "^running" problem.

Post by Dmitry Smirnov
CDT can connect to the remote debugger, retireve stack (it is
absent at this point, in fact), retrieve variables, disassebling. But when
I try to run it ("-exec-continue") there is nothing responded from gdb
(i.e. "running" until it hits the breakpoint. Since in my case, it takes
significant time to get to BP, Eclipse decides that target is not
responding and either terminates ("gdbServer" CDT debugger) or behaves
oddly ("Hardware" CDT Debugger): it sends commands like usual but do not
show retrieved data (gdb itself responses correctly).

I think you said earlier that when you switched to the other eclipse you
had around that got over the "^running" problem, you weren't even able
to inspect the stack or do any disassembling -- eclipse would get
stuck on the "info threads" command. I take it that since you
report you can now retrieve variables etc, that the "info threads"
issue is fixed? I'll post this patch for review at gdb-***@.

Thanks,

--
Pedro Alves

Dmitry Smirnov

2008-07-07 15:47:26 UTC

-----Original Message-----
From: Pedro Alves <***@codesourcery.com>
To: ***@sourceware.org, Dmitry Smirnov <***@mail.ru>
Date: Mon, 7 Jul 2008 15:28:57 +0100
Subject: Re: How to catch GDB crash

Post by Pedro Alves
I think you said earlier that when you switched to the other eclipse you
had around that got over the "^running" problem, you weren't even able
to inspect the stack or do any disassembling -- eclipse would get
stuck on the "info threads" command. I take it that since you
report you can now retrieve variables etc, that the "info threads"

Perhaps, "info threads" is fixed. But since both "info threads" and "running" problems were [most probably] caused by the "main thread registering" fix, maybe it is better to investigate "^running" problem before submission? What if they are connected? ;-)

I have to say, that my goal is not just report issues, I would like to help fixing them. Unfortunately, I do not have much time to learn GDB, so I'm just asking for hints: what can I do to discover the root cause. For example, who is responding "^running"? What functions/files should I debug to figure out the problem?

Dmitry

Pedro Alves

2008-07-07 16:00:37 UTC

Post by Dmitry Smirnov
Perhaps, "info threads" is fixed. But since both "info threads" and
"running" problems were [most probably] caused by the "main thread
registering" fix, maybe it is better to investigate "^running" problem
before submission? What if they are connected? ;-)

I seriously doubt they are connected. The code to output "^running"
has nothing to do with having threads or not.

Post by Dmitry Smirnov
I have to say, that my goal is not just report issues, I would like to help
fixing them.

Welcome on board! We need all the help we can get.

Post by Dmitry Smirnov
Unfortunately, I do not have much time to learn GDB, so I'm
just asking for hints: what can I do to discover the root cause.

The best way is to do a binary search on the CVS HEAD sources, to
find the patch that caused your issue.

Post by Dmitry Smirnov
For
example, who is responding "^running"? What functions/files should I debug
to figure out the problem?

Grepping for "^running" should get you there.

See here, your issue was most likelly introduced by this:
http://sourceware.org/ml/gdb-patches/2008-06/msg00247.html

--
Pedro Alves

Dmitry Smirnov

2008-07-08 08:27:20 UTC

Thanks,

Most likely I've found the root cause: mi_on_resume() does not flush raw_stdout. I've added "gdb_flush (raw_stdout)" at the end of this function and everything seems to work fine.

Also, I've found a solution for Eclipse problem when it unable to work correctly after this "^running". On my mind it does not related to GDB (and to delayed "^running" :-) ), so I have no problems with GDB at this moment.

M-m-m, there is just one weird and minor observation: just after "target remote:12345" executed, Eclipse shows two identical threads: Thread[0] and Thread[1]. They are stopped at the same address. After I issued "-exec-continue" and GDB hits the breakpoint, Eclipse shows just one thread: Thread[1] with a correct call stack. Perhaps, I will have some time to debug it and get back if this is related to GDB.

Thanks!
Dmitry

-----Original Message-----
From: Pedro Alves <***@portugalmail.pt>
To: ***@sourceware.org, Dmitry Smirnov <***@mail.ru>
Date: Mon, 7 Jul 2008 17:00:37 +0100
Subject: Re: How to catch GDB crash

Post by Pedro Alves

Post by Dmitry Smirnov
Perhaps, "info threads" is fixed. But since both "info threads" and
"running" problems were [most probably] caused by the "main thread
registering" fix, maybe it is better to investigate "^running" problem
before submission? What if they are connected? ;-)

I seriously doubt they are connected. The code to output "^running"
has nothing to do with having threads or not.

Post by Dmitry Smirnov
I have to say, that my goal is not just report issues, I would like to help
fixing them.

Welcome on board! We need all the help we can get.

Post by Dmitry Smirnov
Unfortunately, I do not have much time to learn GDB, so I'm
just asking for hints: what can I do to discover the root cause.

The best way is to do a binary search on the CVS HEAD sources, to
find the patch that caused your issue.

Post by Dmitry Smirnov
For
example, who is responding "^running"? What functions/files should I debug
to figure out the problem?

Grepping for "^running" should get you there.
http://sourceware.org/ml/gdb-patches/2008-06/msg00247.html
--
Pedro Alves

Vladimir Prus

2008-07-01 11:37:35 UTC

Post by Pedro Alves

Post by Dmitry Smirnov
Hi Pedro,
I'll try to figure out, whether skyeye (which is remote target) supports
notion of thread ids or pids. Now I just suppose it does not support.
Nevertheless, I do not believe this is related to a crash.

Yes it is. :-)

Post by Dmitry Smirnov
As I said previously, I was debugging this program (ARM code) for some time
previously.

But you've certainly upgraded your GDB recently (I can tell by your log
output on your original post). As I said, this is a recently introduced
regression.
I've was able to reproduce the problem, by connecting to a local
gdbserver with a GDB with all thread support hacked out in the
remote target.

Post by Dmitry Smirnov
BTW, I've just realized that command-line interface does not use mi_*
interface (neither mi_on_resume nor mi_execute_command were hit) and this
is most likely the reason why I cannot reproduce this test case with CLI.

Yes, that's exactly the reason.
Anyway, I've posted a patch that fixes the issue in your case
(it was actually a side effect of something else I was doing),
although we may need to get rid of the assert you weren't tripping
at for the time being (there are other targets other than
remote that will also trip on the assert).
Vladimir, not sure if you noticed the issue, as it's buried in
this long thread? We can always leave the crash in place to
force targets to follow our evil plot of always registering the
main thread. :-)
I'd post a patch for it, but I don't know if we should output
thread-id=0 in that case, or not output thread-id
at all ...

I think that for the time being, we can change the assert to check that
either the program is single-threaded, or the thread is known. If the
program is single-threaded, the the thread id is not registered, omitting
thread-id completely seems right.

I can make this change, or you have something already?

- Volodya

Pedro Alves

2008-07-01 11:40:41 UTC

Post by Vladimir Prus
I think that for the time being, we can change the assert to check that
either the program is single-threaded, or the thread is known. If the
program is single-threaded, the the thread id is not registered, omitting
thread-id completely seems right.
I can make this change, or you have something already?

Please go ahead. Thanks.

--
Pedro Alves

30 Replies
273 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Dmitry Smirnov 2008-06-23 16:31:53 UTC

Aleksandar Ristovski 2008-06-23 16:56:44 UTC

Michael Snyder 2008-06-23 17:12:04 UTC

Eli Zaretskii 2008-06-23 18:23:03 UTC

Michael Snyder 2008-06-23 18:31:45 UTC

Pedro Alves 2008-06-23 18:36:31 UTC

Brian Dessent 2008-06-23 19:39:04 UTC

Dr. Rolf Jansen 2008-06-23 20:50:02 UTC

Dr. Rolf Jansen 2008-06-23 20:59:10 UTC

Dmitry Smirnov 2008-06-24 08:51:36 UTC

Dmitry Smirnov 2008-06-24 12:38:48 UTC

Pedro Alves 2008-06-24 12:58:20 UTC

Dmitry Smirnov 2008-06-24 17:02:49 UTC

Pedro Alves 2008-06-24 17:29:28 UTC

Dmitry Smirnov 2008-06-25 08:02:33 UTC

Pedro Alves 2008-06-25 23:27:58 UTC

Dmitry Smirnov 2008-06-26 13:56:26 UTC

Pedro Alves 2008-06-26 14:20:54 UTC

Dmitry Smirnov 2008-06-26 14:32:46 UTC

Dmitry Smirnov 2008-06-30 15:56:56 UTC

Dmitry Smirnov 2008-07-02 11:05:28 UTC

Pedro Alves 2008-07-02 11:52:15 UTC

Dmitry Smirnov 2008-07-02 12:50:00 UTC

Pedro Alves 2008-07-05 03:14:43 UTC

Dmitry Smirnov 2008-07-07 08:35:43 UTC

Pedro Alves 2008-07-07 14:28:57 UTC

Dmitry Smirnov 2008-07-07 15:47:26 UTC

Pedro Alves 2008-07-07 16:00:37 UTC

Dmitry Smirnov 2008-07-08 08:27:20 UTC

Vladimir Prus 2008-07-01 11:37:35 UTC

Pedro Alves 2008-07-01 11:40:41 UTC

about - legalese

Loading...