[Openmcl-devel] mystery SEGV starting 64bit ccl on linux
bitwiddler at gmail.com
Thu Nov 18 17:57:58 CST 2010
I'm still not sure what the problem is, since I've been dealing
with so many different versions of ccl over the last few days.
I distinctly remember seeing MAP_GROWSDOWN in the
call to mmap() from MapMemoryForStack(), but I just looked
in the 1.5 release, and it isn't there. So, I must have been
running something else.
I'll keep an eye out from now on by running ccl under gdb
with the right .gdbinit, and keep you guys posted if
anything shows up.
Thanks for all the help, BTW... I appreciate the responsiveness
with which you have responded on this matter.
On Thu, Nov 18, 2010 at 3:02 PM, Gary Byers <gb at clozure.com> wrote:
> When running under GDB, it's necessary to tell GDB to ignore the
> signals that CCL's own exception-handling mechanisms handle. See
> (You basically need to tell GDB to "source" a platform-specific
> .gdbinit file that tells it which signals should be quietly passed
> to the application.)
> What you're seeing here is very likely expected behavior: the initial lisp
> thread has started running, tries to allocate some
> lisp object, finds that it doesn't have any memory to cons in,
> and executes a software interrupt that (on Linux) maps to SIGSEGV.
> GDB's proudly announcing that it's noticed this, but hasn't yet
> passed the signal on to CCL's handler, which should try to give
> the thread a chunk of memory to cons in, skip over the interrupt
> instruction, and allow the thread to resume execution.
> A relatively recent change in the Linux kernel (described in
> causes one of those calls to mmap() (the first one that tries to map memory
> for a stack) to fail, since the address returned by mmap() in that case
> actually mapped. This usually causes a very hard crash on startup; I
> that that happens pretty deterministically, but I'm not sure of that (and
> not, it'd be a likely explanation for sporadic crashes on startup.)
> That problem seems to be triggered by use of the MAP_GROWSDOWN option in
> the call to mmap that allocates stacks. MAP_GROWSDOWN doesn't do what we
> thought it does, and we've removed it from the sources in svn in 1.5 and
> I don't know if that's the cause of the problem that you're (sometimes)
> seeing; if you haven't already done so, it'd probably be a good idea to
> disable that option (if only to remove a variable from the equation.)
> The output below indicates that you got past that, started to run
> some lisp code, that lisp code consed, and GDB (mistakenly) thought
> that that was notable. It isn't. We don't know whether you would
> have run into some other problem after consing or whether what you've
> been seing is just the problem described in ticket 731; if you have
> been seeing that problem, the workaround seems to work reliably.
> On Thu, 18 Nov 2010, Bit Twiddler wrote:
> I'm getting sporadic crashes starting ccl on various linuxes
>> (CentOS 5.5, Scientific Linux 5.5, and Open Suse 11.3)
>> Current directory is ~/p/cl/ccl/1.5/release/ccl/
>> GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5_5.1)
>> Copyright (C) 2009 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.? Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu".
>> For bug reporting instructions, please see:
>> Reading symbols from
>> (gdb) run
>> Starting program: /mnt/data1/home1/xxx/p/cl/ccl/1.5/release/ccl/lx86cl64
>> [Thread debugging using libthread_db enabled]
>> Reserving heap at 0x300000000000, size 0x8000000000
>> Committing memory at 0x302000000000, size 0x540000
>> Committing memory at 0x307c00000000, size 0xb000
>> Committing memory at 0x307e3f800000, size 0xb000
>> Committing memory at 0x302000540000, size 0x2000000
>> Committing memory at 0x307c0000b000, size 0x40000
>> Unprotecting memory at 0x307c0000b000, size 0x40000
>> Committing memory at 0x307e3f80b000, size 0x40000
>> Unprotecting memory at 0x307e3f80b000, size 0x40000
>> Mapping stack of size 0x24d000
>> Protecting memory at 0x2aaaaaacb000, size 0x1000
>> Protecting memory at 0x2aaaaaacc000, size 0x19000
>> Mapping stack of size 0x51000
>> Protecting memory at 0x2aaaaad18000, size 0x10000
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x0000300000ab133f in ?? ()
>> (gdb) bt
>> #0? 0x0000300000ab133f in ?? ()
>> #1? 0x000030200052f8af in ?? ()
>> #2? 0x00002aaaaad16ff0 in ?? ()
>> #3? 0x00000000004122c4 in toplevel_loop () at ../x86-subprims64.s:60
>> Backtrace stopped: frame did not save the PC
>> I don't understand why gdb can't display the backtrace, I have
>> the optimizer turned off, and -g turned on.
>> Previously, I was able to get a crash after a mmap call to allocate
>> a stack segment, but now the program runs after that, and I wind
>> up with the above situation.
>> Does anybody know of any debugging code that I can enable?
>> I turned on DEBUG_MEMORY by setting it to 1 in memory.c
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Openmcl-devel