[Bug-openmcl] Debugging crashes of dppccl
Gary Byers
gb at clozure.com
Fri Mar 5 00:40:02 MST 2004
--On Thursday, March 4, 2004 3:21 PM -0800 Erik Pearson
<erik at adaptations.com> wrote:
> Well, that is not quite true. The last time this happened the message
> "Abort trap" was printed.
>
> This is running under the bleeding edge, checked out a couple of days ago
> with a few manual tweaks to the threading code.
>
I was going to say the OpenMCL never itself calls abort() or could do
anything that would cause that message to be printed. I'd have been wrong;
in "ccl:lisp-kernel;lisp-exceptions.c", there's something like:
#define MACH_CHECK_ERROR(x) if (x != KERN_SUCCESS) {abort()};
MACH_CHECK_ERROR is used in three places that I can see, all of which are
related
to setting up a thread's exception handling. None of these things should
ever fail,
but things that can't happen should probably throw themselves at the mercy
of the
kernel debugger instead of aborting the whole process ...
I'm not sure if my kernel sources would even compile at the moment. If you
get a
chance, could you please try replacing the macro above with something like:
#define MACH_CHECK_ERROR(context,x) if (x != KERN_SUCCESS) \
{Bug(NULL, "Mach error while %s : ~d", context, x)};
and replacing the three calls to it (all in lisp-exceptions.c) with:
MACH_CHECK_ERROR("allocating thread exception_ports",kret);
MACH_CHECK_ERROR("renaming exception_port",kret);
MACH_CHECK_ERROR("adding send right to exception_port",kret);
If that's what's causing the abort() you're getting, it should result in
some sort of message instead. You probably can't do much in the debugger
((b)acktrace might work),
but the message would at least tell us which case is failing.
If I had to bet (just because I've seen it before), I'd bet on the second
(port rename)
case. In Mach, a "port name" is just a 32-bit number; what we're doing
here is allocating a port and making that port be the place where the
kernel will send exception messages, then "renaming" that port to be the
new thread's TCR. The callback function that'll be called when an
exception occurs on a thread will receive the thread's exception port as an
argument; making the TCR -be- the exception port is a sleazy way to get our
hands on the thread's TCR.
If a TCR's deallocated, it's still a valid Mach port; the function
darwin_exception_cleanup makes the TCR stop being a valid port. If we're
somehow freeing the TCR without calling darwin_exception_cleanup() on it,
and subsequently allocating a new TCR at the same address, the new TCR is
still a valid port name and the rename call will fail.
(The only reason that I know this is that Apple's Exception Manager has the
same bug.)
If it's one of the other cases, I don't have a good theory; if it's not one
of the cases above, I get to give my speech about How Horrible It Is To
Just Call abort().
More information about the Bug-openmcl
mailing list