Discussion:
gdb very slow during 'step into'
dodji Seketeli
2007-01-02 15:23:52 UTC
Permalink
Hello gdb hackers and users,

First of all, I would like to wish a happy new year to you and your families.
I hope this year will be full of achievements (again) for gdb people.

Now the real meat of my post :-)

I have noticed that gdb was very slow (taking 100% of CPU during
several tens of seconds) when I step into some functions (with the
step command, on the cli interface).

I am not sure, but I think it happens when I step into functions that
are defined in certain shared libraries. Once in the function,
subsequent steping into functions of the same library are okay. Once
out of the library, stepping into a function of that library can be
slow again.

The problem is that I cannot reproduce this problem all the time.

I have straced gdb during one of this slow stepping. You can find the
log at http://dodji.seketeli.free.fr/gdb/slow-step-into-trace.txt.

The debugged language is C++. I have noticed the problem with gdb
6.4.90, 6.5 and 6.6.
My system is debian testing.

Are these information valuable to investigate the problem ? If yes, I
can file a bug in the gnatsweb application if you wish. If not, please
tell me what information I can add to make provide you guys with
valuable information.

Cheers,

Dodji.
Jim Blandy
2007-01-02 19:30:16 UTC
Permalink
Post by dodji Seketeli
Hello gdb hackers and users,
First of all, I would like to wish a happy new year to you and your families.
I hope this year will be full of achievements (again) for gdb people.
Now the real meat of my post :-)
I have noticed that gdb was very slow (taking 100% of CPU during
several tens of seconds) when I step into some functions (with the
step command, on the cli interface).
I am not sure, but I think it happens when I step into functions that
are defined in certain shared libraries. Once in the function,
subsequent steping into functions of the same library are okay. Once
out of the library, stepping into a function of that library can be
slow again.
If you set the environment variable LD_BIND_NOW to a non-empty value
before running your program (use GDB's 'set env' command), does that
eliminate the slow steps?
Daniel Jacobowitz
2007-01-02 19:30:49 UTC
Permalink
Post by Jim Blandy
If you set the environment variable LD_BIND_NOW to a non-empty value
before running your program (use GDB's 'set env' command), does that
eliminate the slow steps?
Is this where we step through the dynamic linker? We really should
avoid that...
--
Daniel Jacobowitz
CodeSourcery
Jim Blandy
2007-01-02 19:48:35 UTC
Permalink
Post by Daniel Jacobowitz
Post by Jim Blandy
If you set the environment variable LD_BIND_NOW to a non-empty value
before running your program (use GDB's 'set env' command), does that
eliminate the slow steps?
Is this where we step through the dynamic linker? We really should
avoid that...
I'm pretty sure we set a breakpoint at the function's true entry point
(since we know it too), and wait for that to hit. I believe I made
that change myself years ago. But maybe something broke.
Mark Kettenis
2007-01-02 21:36:40 UTC
Permalink
Date: Tue, 02 Jan 2007 11:48:35 -0800
Post by Daniel Jacobowitz
Post by Jim Blandy
If you set the environment variable LD_BIND_NOW to a non-empty value
before running your program (use GDB's 'set env' command), does that
eliminate the slow steps?
Is this where we step through the dynamic linker? We really should
avoid that...
I'm pretty sure we set a breakpoint at the function's true entry point
(since we know it too), and wait for that to hit. I believe I made
that change myself years ago. But maybe something broke.
Quite likely. The code in glibc-tdep.c does rely on some knowledge
about the internals of the implementation. The glibc developers have
been quite aggressive about not experting symbols in the recent past.
Do it might very well be that the lookup for "_dl_runtime_resolve" or
"fixup" fails, especially on a system whithout debug info for glibc.

Mark
Andreas Schwab
2007-01-02 22:59:29 UTC
Permalink
Post by Mark Kettenis
Do it might very well be that the lookup for "_dl_runtime_resolve" or
"fixup" fails, especially on a system whithout debug info for glibc.
In current glibc versions the function is called _dl_fixup anyway.

Andreas.
--
Andreas Schwab, SuSE Labs, ***@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Jim Blandy
2007-01-02 23:45:44 UTC
Permalink
Post by Andreas Schwab
Post by Mark Kettenis
Do it might very well be that the lookup for "_dl_runtime_resolve" or
"fixup" fails, especially on a system whithout debug info for glibc.
In current glibc versions the function is called _dl_fixup anyway.
I think those are two different things:

$ nm /lib/ld-linux.so.2 | grep _dl_fixup
0087ea30 t _dl_fixup
$ nm /lib/ld-linux.so.2 | grep _dl_runtime_resolve
00883ec0 t _dl_runtime_resolve

(Fedora Core 6)
Daniel Jacobowitz
2007-01-02 23:56:55 UTC
Permalink
Post by Jim Blandy
Post by Andreas Schwab
Post by Mark Kettenis
Do it might very well be that the lookup for "_dl_runtime_resolve" or
"fixup" fails, especially on a system whithout debug info for glibc.
In current glibc versions the function is called _dl_fixup anyway.
$ nm /lib/ld-linux.so.2 | grep _dl_fixup
0087ea30 t _dl_fixup
$ nm /lib/ld-linux.so.2 | grep _dl_runtime_resolve
00883ec0 t _dl_runtime_resolve
He's talking about fixup, not about _dl_runtime_resolve. They're
different things, IIRC.
--
Daniel Jacobowitz
CodeSourcery
Andreas Schwab
2007-01-03 15:03:31 UTC
Permalink
Post by Daniel Jacobowitz
Post by Jim Blandy
Post by Andreas Schwab
Post by Mark Kettenis
Do it might very well be that the lookup for "_dl_runtime_resolve" or
"fixup" fails, especially on a system whithout debug info for glibc.
In current glibc versions the function is called _dl_fixup anyway.
$ nm /lib/ld-linux.so.2 | grep _dl_fixup
0087ea30 t _dl_fixup
$ nm /lib/ld-linux.so.2 | grep _dl_runtime_resolve
00883ec0 t _dl_runtime_resolve
He's talking about fixup, not about _dl_runtime_resolve. They're
different things, IIRC.
_dl_runtime_resolve is just a wrapper around _dl_fixup. The latter has
been renamed because the wrapper has been moved to its own source file.

Andreas.
--
Andreas Schwab, SuSE Labs, ***@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
dodji Seketeli
2007-01-03 09:14:22 UTC
Permalink
Post by Jim Blandy
If you set the environment variable LD_BIND_NOW to a non-empty value
before running your program (use GDB's 'set env' command), does that
eliminate the slow steps?
Wow, yes. It does eliminate the problem. Thank you !.

Cheers,

Dodji.
Jim Blandy
2007-01-03 19:52:25 UTC
Permalink
Post by dodji Seketeli
Post by Jim Blandy
If you set the environment variable LD_BIND_NOW to a non-empty value
before running your program (use GDB's 'set env' command), does that
eliminate the slow steps?
Wow, yes. It does eliminate the problem. Thank you !.
I'm glad that helped! But setting LD_BIND_NOW is just a workaround;
GDB ought to work properly without that. Could you apply the
following patch to GDB and see if it makes the problem go away, even
with LD_BIND_NOW left unset?

(Tested without regressions on Fedora Core 6 IA-32. I haven't been
able to reproduce the problem myself, so I'm just guessing that this
is the patch.)
Smith, Stephen (SWCOE)
2007-01-03 19:59:12 UTC
Permalink
Sounds good, but I am experiencing the same problem. The trouble is
that on my platform, neither symbol is defined (I am running on an
embedded platform that does not have glibc ported to it.) What should
be the generic fix?

-----Original Message-----
From: gdb-***@sourceware.org [mailto:gdb-***@sourceware.org] On
Behalf Of Jim Blandy
Sent: Wednesday, January 03, 2007 12:52 PM
To: dodji Seketeli
Cc: ***@sourceware.org
Subject: Re: gdb very slow during 'step into'
Post by dodji Seketeli
Post by Jim Blandy
If you set the environment variable LD_BIND_NOW to a non-empty value
before running your program (use GDB's 'set env' command), does that
eliminate the slow steps?
Wow, yes. It does eliminate the problem. Thank you !.
I'm glad that helped! But setting LD_BIND_NOW is just a workaround;
GDB ought to work properly without that. Could you apply the
following patch to GDB and see if it makes the problem go away, even
with LD_BIND_NOW left unset?

(Tested without regressions on Fedora Core 6 IA-32. I haven't been
able to reproduce the problem myself, so I'm just guessing that this
is the patch.)
Jim Blandy
2007-01-03 21:19:49 UTC
Permalink
Post by Smith, Stephen (SWCOE)
Sounds good, but I am experiencing the same problem. The trouble is
that on my platform, neither symbol is defined (I am running on an
embedded platform that does not have glibc ported to it.) What should
be the generic fix?
It looks as if you need to supply some appropriate function for
gdbarch_skip_solib_resolver. See the comments for SKIP_SOLIB_RESOLVER
at the top of infrun.c.
Andreas Schwab
2007-01-03 20:59:17 UTC
Permalink
@@ -90,8 +90,14 @@
if (resolver)
{
+ /* This is the name used in the dynamic linker at the beginning
+ of 2007. */
struct minimal_symbol *fixup
- = lookup_minimal_symbol ("fixup", NULL, objfile);
+ = lookup_minimal_symbol ("_dl_fixup", NULL, objfile);
JFTR, the function was renamed (almost exactly) 2 years ago.

Andreas.
--
Andreas Schwab, SuSE Labs, ***@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Jim Blandy
2007-01-03 21:17:42 UTC
Permalink
Post by Andreas Schwab
@@ -90,8 +90,14 @@
if (resolver)
{
+ /* This is the name used in the dynamic linker at the beginning
+ of 2007. */
struct minimal_symbol *fixup
- = lookup_minimal_symbol ("fixup", NULL, objfile);
+ = lookup_minimal_symbol ("_dl_fixup", NULL, objfile);
JFTR, the function was renamed (almost exactly) 2 years ago.
Great --- I'll update the comment, if this fix actually works (or if
it looks right enough to others that they suggest just putting it in
anyway).
dodji Seketeli
2007-01-03 21:58:04 UTC
Permalink
Hello Jim,
Could you apply the following patch to GDB and see if it makes the problem go away,
even with LD_BIND_NOW left unset?
(Tested without regressions on Fedora Core 6 IA-32. I haven't been
able to reproduce the problem myself, so I'm just guessing that this
is the patch.)
I have applied the patch and it doesn't help, unfortunately.

I have tried to add some printf() around the patch to see if I could
trace some stuff but nothing got printed on stdout. Is that normal ?.
If yes, how can I add logs to understand what is going on ?

Here is the content of the dynamic symbol table of my /lib/ld-linux.so:

***@tintin:~$ objdump -T /lib/ld-linux.so.2 | grep dl
0000e6c0 g DF .text 00000117 GLIBC_PRIVATE _dl_make_stack_executable
0000d9c0 g DF .text 00000058 GLIBC_PRIVATE _dl_deallocate_tls
0000d990 g DF .text 00000022 GLIBC_PRIVATE _dl_get_tls_static_info
0000c010 g DF .text 00000005 GLIBC_PRIVATE _dl_debug_state
00015ca0 g DO .data.rel.ro 00000004 GLIBC_PRIVATE _dl_argv
0000df40 g DF .text 0000020a GLIBC_PRIVATE _dl_allocate_tls_init
0000e480 g DF .text 000000cb GLIBC_PRIVATE _dl_tls_setup
00006460 g DF .text 00000195 GLIBC_PRIVATE _dl_rtld_di_serinfo
00011b99 g DO .rodata 0000000e GLIBC_PRIVATE _dl_out_of_memory
0000cec0 g DF .text 0000022d GLIBC_2.1 _dl_mcount
0000e240 g DF .text 00000027 GLIBC_PRIVATE _dl_allocate_tls

I don't see any _dl_fixup symbol in there, but maybe that is not relevant.
Also, nm /lib/ld-linux.so.2 is stripped on my system, so nm is doesn't
show anything.

Here is my version of GNU ld:

***@tintin:~$ ld -v
GNU ld version 2.17 Debian GNU/Linux .

Cheers,

Dodji.
Jim Blandy
2007-01-03 23:37:53 UTC
Permalink
Post by dodji Seketeli
I have applied the patch and it doesn't help, unfortunately.
Rats.
Post by dodji Seketeli
I have tried to add some printf() around the patch to see if I could
trace some stuff but nothing got printed on stdout. Is that normal ?.
If yes, how can I add logs to understand what is going on ?
No; ordinary fprintf (stderr, ...) should work in GDB. You're using
GDB's command line interface, right? If that function isn't getting
called at all, then that may be part of the problem.

First, what kind of system are you using? You've mentioned Debian,
and binutils 2.17, but what architecture is it?

Could you set a breakpoint (or put an fprintf %p) in
set_gdbarch_skip_solib_resolver, and see what function gets passed
there, if anything? It should be glibc_skip_solib_resolver.
Post by dodji Seketeli
Here is the content of the dynamic symbol table of my
0000e6c0 g DF .text 00000117 GLIBC_PRIVATE _dl_make_stack_executable
0000d9c0 g DF .text 00000058 GLIBC_PRIVATE _dl_deallocate_tls
0000d990 g DF .text 00000022 GLIBC_PRIVATE _dl_get_tls_static_info
0000c010 g DF .text 00000005 GLIBC_PRIVATE _dl_debug_state
00015ca0 g DO .data.rel.ro 00000004 GLIBC_PRIVATE _dl_argv
0000df40 g DF .text 0000020a GLIBC_PRIVATE _dl_allocate_tls_init
0000e480 g DF .text 000000cb GLIBC_PRIVATE _dl_tls_setup
00006460 g DF .text 00000195 GLIBC_PRIVATE _dl_rtld_di_serinfo
00011b99 g DO .rodata 0000000e GLIBC_PRIVATE _dl_out_of_memory
0000cec0 g DF .text 0000022d GLIBC_2.1 _dl_mcount
0000e240 g DF .text 00000027 GLIBC_PRIVATE _dl_allocate_tls
I don't see any _dl_fixup symbol in there, but maybe that is not relevant.
Also, nm /lib/ld-linux.so.2 is stripped on my system, so nm is doesn't
show anything.
Is /lib/ld-linux.so.2 normally installed stripped on your system? If
GDB can't find the address of the 'fixup' function, then it can't do
anything but single-step through the dynamic linker as it looks up the
symbol, which is where we're guessing you're spending your time.
Daniel Jacobowitz
2007-01-03 23:42:26 UTC
Permalink
Post by Jim Blandy
Is /lib/ld-linux.so.2 normally installed stripped on your system? If
GDB can't find the address of the 'fixup' function, then it can't do
anything but single-step through the dynamic linker as it looks up the
symbol, which is where we're guessing you're spending your time.
Yes, generally ld.so is installed stripped on Debian.
--
Daniel Jacobowitz
CodeSourcery
Jim Blandy
2007-01-04 00:01:12 UTC
Permalink
Post by Daniel Jacobowitz
Post by Jim Blandy
Is /lib/ld-linux.so.2 normally installed stripped on your system? If
GDB can't find the address of the 'fixup' function, then it can't do
anything but single-step through the dynamic linker as it looks up the
symbol, which is where we're guessing you're spending your time.
Yes, generally ld.so is installed stripped on Debian.
Hmm. That kind of scuttles that strategy for skipping the resolver
quickly, then, doesn't it. Unless someone has a better idea, I guess
setting LD_BIND_NOW is the best solution.
dodji Seketeli
2007-01-04 10:04:17 UTC
Permalink
Post by Jim Blandy
Post by dodji Seketeli
I have tried to add some printf() around the patch to see if I could
trace some stuff but nothing got printed on stdout. Is that normal ?.
If yes, how can I add logs to understand what is going on ?
No; ordinary fprintf (stderr, ...) should work in GDB.
Okay, I was foolishly doing printf(). My fault.
fprintf(stderr,...) does work.
Post by Jim Blandy
You're using GDB's command line interface, right? If that function
isn't getting
Post by Jim Blandy
called at all, then that may be part of the problem.
It is actually getting called.
This code fails:

struct minimal_symbol *resolver
= find_minsym_and_objfile ("_dl_runtime_resolve", &objfile);

So the _dl_runtime_resolve symbol is not found.
Post by Jim Blandy
First, what kind of system are you using? You've mentioned Debian,
and binutils 2.17, but what architecture is it?
x86.
Post by Jim Blandy
Is /lib/ld-linux.so.2 normally installed stripped on your system?
Yes .


Cheers,

Dodji.
Daniel Jacobowitz
2007-01-04 03:28:23 UTC
Permalink
Post by dodji Seketeli
Hello Jim,
Could you apply the following patch to GDB and see if it makes the problem go away,
even with LD_BIND_NOW left unset?
(Tested without regressions on Fedora Core 6 IA-32. I haven't been
able to reproduce the problem myself, so I'm just guessing that this
is the patch.)
I have applied the patch and it doesn't help, unfortunately.
You said that this is a Debian system, right? Could you try using the
system GDB, and making sure that the libc6-dbg package is installed?

That adds separated debugging symbols for libc, including ld.so. It
may help. If so, you can make GDBs you build yourself use them too
by configuring with --prefix=/usr. You don't need to install them
there; it just sets the default for the debug-file-directory variable.
--
Daniel Jacobowitz
CodeSourcery
dodji Seketeli
2007-01-04 10:10:29 UTC
Permalink
Post by Daniel Jacobowitz
You said that this is a Debian system, right? Could you try using the
system GDB, and making sure that the libc6-dbg package is installed?
I have installed the package libc2-dbg package and it does solve the problem.
So this explains why the issue appears only on debian based distros.

I am actually writing a gdb front end (yeah, another one) so I think I
will stick to the LD_BIND_NOW solution because I cannot force users to
install the libc2-dbg package. For the record, the front end I am
writting is http://home.gna.org/nemiver.

Maybe I should file a bug to debian asking if they could install
libc6-dbg as a dependancy of gdb ? Does that make sense ?

Thank you very much.

Dodji.
Daniel Jacobowitz
2007-01-04 13:52:58 UTC
Permalink
Post by dodji Seketeli
Maybe I should file a bug to debian asking if they could install
libc6-dbg as a dependancy of gdb ? Does that make sense ?
We don't because it's too big, but maybe we can make a few changes in
the next release.
--
Daniel Jacobowitz
CodeSourcery
Loading...