Discussion:
Time when the Heap snapshot is taken when dumping Core
(too old to reply)
Abhishek Karoliya
2014-07-25 15:23:39 UTC
Permalink
Raw Message
We have a posix mutli-threaded C++ program running on Linux 2.6.32,
which core-dumps in one of the threads. Analysing the core file with
gdb-7.2 corss-compiled, we see that the faulting instruction is here

0x11491178 <+208>: lwz r0,8(r9)

and registers in the frame show:

(gdb) info reg
r0 0x0 0
….
r9 0xdeaddead 3735936685

Which makes sense as r9 has an invalid address value(in fact heap
scrub pattern we write) in the context of the process/thread.

The confusing bit is that r9 is loaded from like this

0x1149116c <+196>: lwz r9,0(r4)

and r4 contains the value of (first and only) function parameter
"data". GDB tells me the following information about data:

(gdb) p data
$6 = (TextProcessorIF *) 0x4b3fe858

(gdb) p *data
$7 = {_vptr.TextProcessorIF = 0x128b5390}

(gdb) info symbol 0x128b5390
vtable for TextProcessorT<unsigned short> + 8 in section .rodata

Which is all correct in this context. So r9 should have had a value of
0x128b5390 instead of the pattern "0xdeaddead" which is written when
the memory is free'd and given back to the heap.

So, why the register r9 contains the scrubbed value when the memory
contains a legal object. My theory is that the core contains snapshot
of the memory just as the process died which is much further down the
line when the actual crash happened. After the SIGSEGV has been
raised, this location of the heap memory can still be used by other
threads as they are logging data till the time process dies. So, it is
possible that the memory pointed to by data maybe have been allocated
again and being used/been used at the time memory snapshot has been
taken and preserved in core.

My question is:
A) Is my theory correct?
B) Am I right in presuming that the heap memory snapshot is not taken
at the time crash (signal being raised) but at in the final moments of
the process?
C) Address/location that caused a SIGSEGV can still be used (by other threads)?

Thanks!
Pedro Alves
2014-07-30 14:52:24 UTC
Permalink
Raw Message
Post by Abhishek Karoliya
A) Is my theory correct?
It's plausible at least. Though I'd suspect a simpler
explanation is more likely.
Post by Abhishek Karoliya
B) Am I right in presuming that the heap memory snapshot is not taken
at the time crash (signal being raised) but at in the final moments of
the process?
C) Address/location that caused a SIGSEGV can still be used (by other threads)?
Both correct, though if you don't have a SIGSEGV handler, the race
window is very narrow.
--
Thanks,
Pedro Alves
Abhishek Karoliya
2014-07-31 14:46:02 UTC
Permalink
Raw Message
Thanks!

I am now sure that the core won't be generated up till the point
abort() is called, which happens after we have printed out our
diagnostics and heap parameters. To narrow down the race window you
are talking about, we will have to sacrifice most of these.
Post by Pedro Alves
Post by Abhishek Karoliya
A) Is my theory correct?
It's plausible at least. Though I'd suspect a simpler
explanation is more likely.
Post by Abhishek Karoliya
B) Am I right in presuming that the heap memory snapshot is not taken
at the time crash (signal being raised) but at in the final moments of
the process?
C) Address/location that caused a SIGSEGV can still be used (by other threads)?
Both correct, though if you don't have a SIGSEGV handler, the race
window is very narrow.
--
Thanks,
Pedro Alves
Loading...