Discussion:
recursion limit exceeded in Python API, but there's only one function in traceback
(too old to reply)
Ömer Sinan Ağacan
2014-10-16 10:45:52 UTC
Permalink
Hi all,

I'm putting some breakpoints and then running some actions when
program reaches that points, using Python API.

After a few breaks, GDB is starting to print these lines:

Traceback (most recent call last):
File "/home/omer/gdb_script/script.py", line 71, in handle_breakpoint
self.breakpoint_jump_addrs[bp.location].add(addr)
RuntimeError: maximum recursion depth exceeded

Weird thing about this is that it says "maximum recursion depth
exceeded" but there's only one function in the traceback. This doesn't
make sense. I think something else is going wrong.

One detail about this script is that it's doing lots of bookkeeping,
like collecting some information when GDB stops at a breakpoint. These
information are held in-memory, as Python data structures. I'm
wondering if this may be the cause of this error. i.e. maybe memory
allocated by GDB for Python interpreter is full or something like
that.

Any ideas about this?

Thanks.
Phil Muldoon
2014-10-16 12:45:51 UTC
Permalink
Post by Ömer Sinan Ağacan
Hi all,
I'm putting some breakpoints and then running some actions when
program reaches that points, using Python API.
File "/home/omer/gdb_script/script.py", line 71, in handle_breakpoint
self.breakpoint_jump_addrs[bp.location].add(addr)
RuntimeError: maximum recursion depth exceeded
That's an error from Python. It tells me something in script.py is
not quite right. Impossible to tell without seeing script.py in
general. You can increase the recursion depth by doing something like

import sys
sys.setrecursionlimit(9000)

Where '9000' is a limit you can set and vary. However, this might
indeed be papering over the cracks, and not fixing the fault.

Cheers

Phil
P***@Dell.com
2014-10-16 14:28:39 UTC
Permalink
Post by Phil Muldoon
Post by Ömer Sinan Ağacan
Hi all,
I'm putting some breakpoints and then running some actions when
program reaches that points, using Python API.
File "/home/omer/gdb_script/script.py", line 71, in handle_breakpoint
self.breakpoint_jump_addrs[bp.location].add(addr)
RuntimeError: maximum recursion depth exceeded
That's an error from Python. It tells me something in script.py is
not quite right. Impossible to tell without seeing script.py in
general. You can increase the recursion depth by doing something like
import sys
sys.setrecursionlimit(9000)
Where '9000' is a limit you can set and vary. However, this might
indeed be papering over the cracks, and not fixing the fault.
I would expect that you could wrap the script in a try/except block, to catch the stack overflow and print a Python stack trace when that happens.

Should GDB print a Python backtrace when the Python script fails, just as executing a Python script sta
Ömer Sinan Ağacan
2014-10-16 14:55:50 UTC
Permalink
Thanks for tips. I managed to blow the stack using a minimal script.
Here's the code:

import gdb
import traceback

def handler(ev):
try:
print "handling a stop"
gdb.execute("si")
gdb.execute("c")
except:
traceback.print_stack()

gdb.events.stop.connect(handler)

Output:

[...snip...]
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
gdb.execute("si")
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
gdb.execute("si")
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
gdb.execute("si")
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
gdb.execute("si")
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
gdb.execute("si")
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
gdb.execute("si")
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
gdb.execute("si")
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
gdb.execute("si")
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
gdb.execute("si")
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
gdb.execute("si")
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
gdb.execute("si")
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
gdb.execute("si")
[...snip...]

I don't understand why I'm having nested `gdb.execute("si")` calls
here. Does `gdb.execute("si")` calls itself? Otherwise how can I have
this stack?

Any ideas?
P***@Dell.com
2014-10-16 15:03:24 UTC
Permalink
Is that the handler for a breakpoint? Does the completion of the “si” command invoke the breakpoint handler? If yes, that’s your answer.

paul
Post by Ömer Sinan Ağacan
Thanks for tips. I managed to blow the stack using a minimal script.
import gdb
import traceback
print "handling a stop"
gdb.execute("si")
gdb.execute("c")
traceback.print_stack()
gdb.events.stop.connect(handler)
[...snip..
Ömer Sinan Ağacan
2014-10-16 15:14:50 UTC
Permalink
Post by P***@Dell.com
Is that the handler for a breakpoint? Does the completion of the “si” command invoke the breakpoint handler? If yes, that’s your answer.
paul
Interesting, but I don't think that's causing the problem. I changed
the script to:

import gdb
import traceback

def handler(ev):
try:
print "handling a stop"
gdb.execute("stepi")
gdb.execute("continue")
except:
traceback.print_stack()

gdb.events.stop.connect(handler)

When I first attach to the process, I'm getting:

[..snip..]
handling a stop
0x080eecea in UpdateInput() ()
handling a stop
0x080eecef in UpdateInput() ()
handling a stop
0x080eece0 in UpdateInput() ()
handling a stop
0x080eece7 in UpdateInput() ()
Traceback (most recent call last):
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
traceback.print_stack()
File "/usr/lib64/python2.7/traceback.py", line 269, in print_stack
print_list(extract_stack(f, limit), file)
File "/usr/lib64/python2.7/traceback.py", line 304, in extract_stack
linecache.checkcache(filename)
RuntimeError: maximum recursion depth exceeded

Again a weird "recursion error" with just 3 stack frames.

When I ignore and continue with `c`, it's failing with this:

[.. snip ..]
File "/home/omer/gdb_script/stackoverflow.py", line 7, in handler
gdb.execute("stepi")
File "/home/omer/gdb_script/stackoverflow.py", line 7, in handler
gdb.execute("stepi")
File "/home/omer/gdb_script/stackoverflow.py", line 10, in handler
traceback.print_stack()
File "/home/omer/gdb_script/stackoverflow.py", line 7, in handler
gdb.execute("stepi")
[.. snip ..]

There are thousands of same lines like this.
Ömer Sinan Ağacan
2014-10-16 15:18:05 UTC
Permalink
Post by P***@Dell.com
Is that the handler for a breakpoint? Does the completion of the “si” command invoke the breakpoint handler? If yes, that’s your answer.
I see what you mean now. I think you're right...
Ömer Sinan Ağacan
2014-10-17 09:30:47 UTC
Permalink
I'm still having this problem. I just tried this:

def handler():
gdb.execute("continue")
print "continue returned"

This doesn't print anything, until the script fails with "maximum
recursion depth". Then it prints lots of "continue returned" lines.

So the problem is `gdb.execute` doesn't immediately return and that's
causing Python stack to grow, because GDB is calling this function
without returning anything to previous calls.

I think I need a version of `gdb.execute` that returns immediately.
ie. async version or something like that. Is such a thing possible?

Thanks again.
Phil Muldoon
2014-10-17 10:11:05 UTC
Permalink
Post by Ömer Sinan Ağacan
gdb.execute("continue")
print "continue returned"
This doesn't print anything, until the script fails with "maximum
recursion depth". Then it prints lots of "continue returned" lines.
So the problem is `gdb.execute` doesn't immediately return and that's
causing Python stack to grow, because GDB is calling this function
without returning anything to previous calls.
I think I need a version of `gdb.execute` that returns immediately.
ie. async version or something like that. Is such a thing possible?
Thanks again.
Right. gdb.execute won't return until the command has completed.
Also the Python GIL has been acquired (as this is coming from the
Python interpreter) and so now Python is also blocked too. So in
effect the only thing running at this point is the gdb.execute command
that was invoked (in your case, the continue command). That will
return, and then the Python GIL will be released and the rest of the
script will continue.

I have a patch I need to upstream that adds a release_gil keyword to
gdb.execute. This optionally releases the GIL before executing the
command. But I have not got around to that yet.

A workaround would be to post any gdb.execute statements into the GDB
event loop. See gdb.post_event. That will return immediately and the
gdb.execute function will be scheduled to be called in the event loop.
Note there is no guarantee when this is. But as long as GDB is not
busy processing other events it usually means right away.

I'll work on posting that GIL patch soon.

Cheers

Phil
Ömer Sinan Ağacan
2014-10-17 10:52:15 UTC
Permalink
Right. gdb.execute won't return until the command has completed. Also the
Python GIL has been acquired (as this is coming from the Python interpreter)
and so now Python is also blocked too. So in effect the only thing running
at this point is the gdb.execute command that was invoked (in your case, the
continue command). That will return, and then the Python GIL will be released
and the rest of the script will continue.
I have a patch I need to upstream that adds a release_gil keyword to
gdb.execute. This optionally releases the GIL before executing the command.
But I have not got around to that yet.
IMO, something like dont_block would be more useful for me. What I would expect
from that argument is that when it's True then `gdb.execute` would return
immediately after GDB starts running the command.
A workaround would be to post any gdb.execute statements into the GDB event
loop. See gdb.post_event. That will return immediately and the gdb.execute
function will be scheduled to be called in the event loop. Note there is no
guarantee when this is. But as long as GDB is not busy processing other
events it usually means right away.
Thanks for the tip. I'll try that.


Do you think adding something like `dont_block` would be hard? Maybe I can hack
on that this weekend.
Phil Muldoon
2014-10-17 14:20:07 UTC
Permalink
Post by Ömer Sinan Ağacan
IMO, something like dont_block would be more useful for me. What I would expect
from that argument is that when it's True then `gdb.execute` would return
immediately after GDB starts running the command.
A workaround would be to post any gdb.execute statements into the GDB event
loop. See gdb.post_event. That will return immediately and the gdb.execute
function will be scheduled to be called in the event loop. Note there is no
guarantee when this is. But as long as GDB is not busy processing other
events it usually means right away.
Do you think adding something like `dont_block` would be hard? Maybe I can hack
on that this weekend.
Hi,

The patch has already been written (I had to fix it for a RH bugzilla
entry). I just have not gotten around to posting it upstream yet. I
will do that very soon. But if you are interested, the patch is here:


https://bugzilla.redhat.com/show_bug.cgi?id=1116957

Cheers

Phil
Ömer Sinan Ağacan
2014-10-17 14:26:30 UTC
Permalink
Post by Phil Muldoon
The patch has already been written (I had to fix it for a RH bugzilla
entry). I just have not gotten around to posting it upstream yet. I
https://bugzilla.redhat.com/show_bug.cgi?id=1116957
Can anyone explain me how does GIL related with my problem? A blocking call
will still be blocking no matter what happens to GIL, I don't understand how
`gdb.execute("continue")` won't fill the stack with free GIL.
Phil Muldoon
2014-10-17 15:02:38 UTC
Permalink
Post by Ömer Sinan Ağacan
Post by Phil Muldoon
The patch has already been written (I had to fix it for a RH bugzilla
entry). I just have not gotten around to posting it upstream yet. I
https://bugzilla.redhat.com/show_bug.cgi?id=1116957
Can anyone explain me how does GIL related with my problem? A blocking call
will still be blocking no matter what happens to GIL, I don't understand how
`gdb.execute("continue")` won't fill the stack with free GIL.
The GIL is only part of the problem. You are seeing the recursion
limit as you are recursively entering the handler.

If you could expand what you are trying to do, with what codebase that
would be the best thing.

Cheers,

Phil
P***@Dell.com
2014-10-17 15:04:06 UTC
Permalink
Post by Phil Muldoon
...
Right. gdb.execute won't return until the command has completed.
Also the Python GIL has been acquired (as this is coming from the
Python interpreter) and so now Python is also blocked too. So in
effect the only thing running at this point is the gdb.execute command
that was invoked (in your case, the continue command). That will
return, and then the Python GIL will be released and the rest of the
script will continue.
I have a patch I need to upstream that adds a release_gil keyword to
gdb.execute. This optionally releases the GIL before executing the
command. But I have not got around to that yet.
Could you explain why gdb.execute should ever hold onto the GIL while performing the command? I view gdb.execute as akin to an I/O operation, which releases the GIL around the I/O. Another way to look at it is that execute is performing a GDB command. Either that isn’t a Python operation — in which case the GIL is not needed since the data it protects won’t be touched. Or it is a command that (possibly indirectly) invokes another Python operation — in which case the GIL has to be released or you end up with a deadlock.

What am I missing?

paul
Phil Muldoon
2014-10-17 17:31:07 UTC
Permalink
Post by P***@Dell.com
Post by Phil Muldoon
...
Right. gdb.execute won't return until the command has completed.
Also the Python GIL has been acquired (as this is coming from the
Python interpreter) and so now Python is also blocked too. So in
effect the only thing running at this point is the gdb.execute command
that was invoked (in your case, the continue command). That will
return, and then the Python GIL will be released and the rest of the
script will continue.
I have a patch I need to upstream that adds a release_gil keyword to
gdb.execute. This optionally releases the GIL before executing the
command. But I have not got around to that yet.
Could you explain why gdb.execute should ever hold onto the GIL while performing the command? I view gdb.execute as akin to an I/O operation, which releases the GIL around the I/O. Another way to look at it is that execute is performing a GDB command. Either that isn’t a Python operation — in which case the GIL is not needed since the data it protects won’t be touched. Or it is a command that (possibly indirectly) invokes another Python operation — in which case the GIL has to be released or you end up with a deadlock.
It (GDB) is not holding the GIL, Python is. The gdb.execute call at that
point has been called from the Python interpreter, and it has managed
the GIL until that point.

This means in current behavior, say you had three threads running,
they are all suspended during the call to gdb.execute. A user
submitted a request that we release the GIL (even though GDB did not
acquire it). The patch that I will submit (soon) just releases the GIL
so that on long-lived operations Python threads can still continue to
execute. It does this with SaveThread/RestoreThread. There is more
detail on this in the bugzilla posted.

Cheers

Phil
Doug Evans
2014-10-17 16:40:58 UTC
Permalink
Post by Ömer Sinan Ağacan
gdb.execute("continue")
print "continue returned"
This doesn't print anything, until the script fails with "maximum
recursion depth". Then it prints lots of "continue returned" lines.
So the problem is `gdb.execute` doesn't immediately return and that's
causing Python stack to grow, because GDB is calling this function
without returning anything to previous calls.
One thing to keep in mind here is that gdb.execute is akin to typing
the command in at the (gdb) prompt.
IOW, if you as a user typed "continue" at the (gdb) prompt what would
you want to happen?
The definition of the "continue" command is that the inferior is
resumed until it stops and then the "continue" command completes.
Post by Ömer Sinan Ağacan
I think I need a version of `gdb.execute` that returns immediately.
ie. async version or something like that. Is such a thing possible?
There is "continue &" for the case at hand.
There is no async version of gdb.execute itself.
Alas there is no corresponding "wait" command for "continue &" (I have
one in a sandbox that I get to when I'm able).
So once the inferior is running you've kinda lost programmatic control.
There is the "interrupt" command but it is broken in the sense that it
is implicitly async (the "&" is implicitly present).
I have a sandbox that fixes this too (getting this submitted tripped
over another gdb bug which I'm needing to fix first - a not uncommon
occurrence in gdb-land).

Other things come into play here like all-stop vs non-stop
https://sourceware.org/gdb/current/onlinedocs/gdb/Thread-Stops.html#Thread-Stops

but the async version of gdb.execute("continue") is gdb.execute("continue &").

Also note that resuming the inferior in a breakpoint handler is
supported, but further commands after the continue are not.
This isn't enforced in the python API, so I'm not sure what might
happen. Some things may work, others may not.
https://sourceware.org/gdb/current/onlinedocs/gdb/Break-Commands.html#Break-Commands
Phil Muldoon
2014-10-17 17:35:06 UTC
Permalink
Post by Doug Evans
Also note that resuming the inferior in a breakpoint handler is
supported, but further commands after the continue are not. This isn't
enforced in the python API, so I'm not sure what might happen. Some
things may work, others may
not.
https://sourceware.org/gdb/current/onlinedocs/gdb/Break-Commands.html#Break-Commands
Yeah we can't police it only document it right now. Until Python has
discrete control of the inferior (instead of issuing commands through
gdb.execute) we would have to parse the code looking for "forbidden"
operations. That's a deep dark hole to go into. ;)

Hopefully one day when Guile and/or Python have rich and discrete
inferior control we could better police what the user should or should
not do at various states.

Cheers

Phil

Doug Evans
2014-10-17 16:45:20 UTC
Permalink
Post by Ömer Sinan Ağacan
Post by P***@Dell.com
Is that the handler for a breakpoint? Does the completion of the “si” command invoke the breakpoint handler? If yes, that’s your answer.
I see what you mean now. I think you're right...
Yikes. If there is another breakpoint at the next instruction then
ok, otherwise that feels unfortunate.
[implementation detail bubbling up into the API, bleah]
Phil Muldoon
2014-10-16 15:12:50 UTC
Permalink
Post by P***@Dell.com
Post by Phil Muldoon
Post by Ömer Sinan Ağacan
Hi all,
I'm putting some breakpoints and then running some actions when
program reaches that points, using Python API.
File "/home/omer/gdb_script/script.py", line 71, in handle_breakpoint
self.breakpoint_jump_addrs[bp.location].add(addr)
RuntimeError: maximum recursion depth exceeded
That's an error from Python. It tells me something in script.py is
not quite right. Impossible to tell without seeing script.py in
general. You can increase the recursion depth by doing something like
import sys
sys.setrecursionlimit(9000)
Where '9000' is a limit you can set and vary. However, this might
indeed be papering over the cracks, and not fixing the fault.
I would expect that you could wrap the script in a try/except block, to catch the stack overflow and print a Python stack trace when that happens.
Should GDB print a Python backtrace when the Python script fails, just as executing a Python script standalone would do?
It prints an abbreviated backtrace by default. "set python print-stack full" will enable the full backtrace (we had a request to do this for pretty printers).

Cheers

Phil
Ömer Sinan Ağacan
2014-10-16 15:15:55 UTC
Permalink
Post by Phil Muldoon
It prints an abbreviated backtrace by default. "set python print-stack full" will enable the full backtrace (we had a request to do this for pretty printers).
I already have that enabled, even in my first email it's enabled.
Loading...