[Bug-openmcl] problem with timed-wait-on-semaphore ?
Gary Byers
gb at clozure.com
Thu Feb 26 15:58:27 MST 2004
> >> Also, something is very slow here, as it takes over 10 minutes to
> >> complete this test on a 500mhz ibook, and consumes over 50% cpu. I
> >> suppose further tests could reveal what the performance culprit is.
> >>
I suspect that there may be other factors involved, but creating and
destroying 10000 threads is probably never going to be exactly quick ...
> >>
> >
> > I haven't tried this yet, but ...
I probably should try it and see if I can see what's happening.
> >
> > If this code was reorganized so that a single semaphore was allocated
> > outside the loop, does it still fail ?
> >
> > CCL:CREATE-SEMAPHORE's work is actually done in the kernel (in the C
> > function new_semaphore() in "ccl:lisp-kernel;thread_manager.c"). It
> > uses semaphore_create() under darwin, but noting checks the return
> > value from that call.
>
>
> I noticed this problem when doing far fewer (on the order of 5) of these in
> succession. I just tried the test with a single semaphore and the results
> are much better, although not perfect. (2 errors compared to 15.) However,
> this is not the best solution, since I want the timeout mechanism to be as
> transparent as possible. Also, I don't want to have to build in a retry
> loop just to handle the case of the timeout mechanism failing...
>
The other thing that can cause a (spurious) timeout is a signal. The
GC needs to stop other threads from running during the GC; it sends them
a POSIX signal to do this.
Most BSD system calls (well, "many" if not "most") are set up to quietly
retry themselves if they're interrupted; the Mach syscall used by
TIMED-WAIT-ON-SEMAPHORE may not be. If that's what's happening, then
TIMED-WAIT-ON-SEMAPHORE should probably notice if it got interrupted,
try to have some idea of how much longer it -would- have waited if
it hadn't been interrupted, and wait until the original timeout expires
or the semaphore's acquired.
(I just did a Google search on "semaphore_timedwait"; one of the few
pages it found was from OpenMCL's source code. Oh well.)
As an experiment, you might try disabling the ephemeral GC:
? (ccl:egc nil)
That'll cause GCs to be a bit longer but less frequent. If the
spurious timeouts are being caused by the GC signaling other threads,
you should get far fewer of them if GC's happening less often.
> > None of this stuff does too much on the lisp side: it's just a pretty
> > thin wrapper around a few (OS-dependent) system calls.
> >
>
> Yeah, I noticed, it makes it tempting to get in there and play. Would you
> say this is a benefit of using native threads? That you can fairly easily
> utilize OS level process api features without too much lisp magic?
>
Yes, though some of the stuff that the OS provides is kind of ... weak.
(Unless you ask for the Advanced Model of POSIX mutex, trying to lock
a mutex that you already own deadlocks. Etc. Etc.)
That example means that (dumb) POSIX mutexes wouldn't be good choices
for lisp locks, but if a lisp thread calls code that waits on a mutex
or POSIX condition variable or anthing like that it'll behave just like
a non-lisp thread will.
>
> BTW - Are there other os mechanisms that are feasible, such as condition
> variables? I noticed in the semaphore wait code that it uses the mach api,
> rather than sys/semaphore, as the Linux code does.
I haven't even checked in Panther, but in Jaguar POSIX semaphores
weren't really implemented: the header files declared the functions
existed and there was some glue in the C library, but the actual
system calls weren't there.
>
> Thanks,
>
> Erik.
>
>
>
> --
> Erik Pearson
> Adaptations
> "Adaptation: It's not just for finches anymore."
>
>
More information about the Bug-openmcl
mailing list