[Openmcl-devel] Advice needed to debug shared library problem on Mac and Linux
gb at clozure.com
Tue Mar 29 20:42:40 CDT 2011
Well, the first theory (that it's a name conflict) seems to explain the mysterious
allocation failure on Linux. A function called 'get_arcs_and_states()' in the
library calls 'create_stack', and gets the CCL kernel's definition of that function
instead of its own. CCL's create_stack() interprets an argument as the size of
stack to create, reports that it can't create a stack that large, and just dies.
If I change things so that that doesn't happen, the call to load_net() runs to
completion on 64-bit Linux, and returns a plausible-looking pointer. I would
have thought that stripping external symbols from the CCL kernel would have had
the same effect, but you said that it didn't work for you (and indeed it doesn't.
Time to RTFM.)
What I did instead was to remove the "export_dynamic" linker option in the
linuxx8664 Makefile, then doing a 'make clean' and a 'make' from the kernel
linuxx8664 directory. We might want some of the other effects of that option,
but one thing that the option does is to force name conflicts between things
defined in the executable and things defined in libraries to be resolved in
favor of the executable's version. We don't want that here and probably don't
want that in general.
The option is expressed as
in the Makefile's build rule for ../../lx86cl64, around line 74 in the Makefile.
If you just remove that string, so that the long line contains
... $(CDEBUG) $(HASH_STYLE) ...
, save the Makefile, and force the kernel to be rebuilt, that may fix things
for you on Linux. (It seems to do so for me.)
The linuxx8632 Makefile (and likely several others) also use this
linker option. We generally only want external symbols in the lisp
kernel to be visible for the sake of debugging; it's rarely useful for
a shared library to call much of anything in the lisp kernel and even
less likely to be useful for the lisp kernel version to override a
definition from the calling library.
Essentially the same thing is happening on Darwin (the lisp kernel's
create_stack() function is getting called instead of the library's.)
Because of the (arbitrary) address at which the library loads on Darwin,
the first arg to the lisp kernel's create_stack() function looks like
a more plausible stack size and the kernel's notion of a stack (a largish
memory region) is created and returned. It's not surprising that cfsm_push()
doesn't like that.
I need significant caffeination before R'ing TFM to see how to avoid that
on Darwin; I'll get back to you when I've done so.
On Tue, 29 Mar 2011, Paul Meurer wrote:
> I tried your suggestion to use a larger stack, but the outcome is the same. There seems to be an infinite recursion going on:
> paul at kakadu:~/lisp/ccl> ccl64 -Z 10M --no-init
> Welcome to Clozure Common Lisp Version 1.7-dev-r14406M-trunk (DarwinX8664)!
> ? (load "~/lisp/projects/iness/cl-fst/test-case.lisp")
> ; Loading system definition from /Users/paul/lisp/lib/cffi/cffi.asd into #<Package "ASDF0">
> ; Registering #<SYSTEM CFFI> as CFFI
> ; Loading system definition from /Users/paul/lisp/lib/babel/babel.asd into #<Package "ASDF0">
> ; Registering #<SYSTEM BABEL> as BABEL
> ; Loading system definition from /Users/paul/lisp/lib/alexandria/alexandria.asd into #<Package "ASDF0">
> ; Registering #<SYSTEM :ALEXANDRIA> as ALEXANDRIA
> ; Loading system definition from /Users/paul/lisp/lib/trivial-features/trivial-features.asd intoUnhandled exception 10 at 0x2a09677, context->regs at #xb0a9b6f0
> Exception occurred while executing foreign code
> at cfsm_push + 23
> received signal 10; faulting address: 0x0
> ? for help
>  Clozure CL kernel debugger: t
> Current Thread Context Record (tcr) = 0x6050b0
> Control (C) stack area: low = 0xb0038000, high = 0xb0a9c000
> Value (lisp) stack area: low = 0x1900000, high = 0x231a000
> Exception stack pointer = 0xb0a9bbc0
>  Clozure CL kernel debugger: r
> %rax = 0x0000000000000000 %r8 = 0x00000000ffffffff
> %rcx = 0x000000000082ea00 %r9 = 0x0000000000000000
> %rdx = 0x0000000000000000 %r10 = 0x0000000000001002
> %rbx = 0x0000000002b62000 %r11 = 0x0000000000000206
> %rsp = 0x00000000b0a9bbc0 %r12 = 0x0000000000000002
> %rbp = 0x00000000b0a9bbd0 %r13 = 0x0000000002b06e38
> %rsi = 0x0000000002b62000 %r14 = 0x0000000002b06d20
> %rdi = 0x0000000000833a00 %r15 = 0x00007fff703abf40
> %rip = 0x0000000002a09677 %rflags = 0x00010206
>  Clozure CL kernel debugger:
> I'll send you the libraries in a separate mail, it is OK for Lauri.
More information about the Openmcl-devel