[Bug-openmcl] problem with timed-wait-on-semaphore ?

Gary Byers gb at clozure.com
Thu Feb 26 15:58:27 MST 2004


> >> Also, something is very slow here, as it takes over 10 minutes to
> >> complete this test on a 500mhz ibook, and consumes over 50% cpu. I
> >> suppose further tests could reveal what the performance culprit is.
> >>

I suspect that there may be other factors involved, but creating and
destroying 10000 threads is probably never going to be exactly quick ...

> >>
> >
> > I haven't tried this yet, but ...

I probably should try it and see if I can see what's happening.

> >
> > If this code was reorganized so that a single semaphore was allocated
> > outside the loop, does it still fail ?
> >
> > CCL:CREATE-SEMAPHORE's work is actually done in the kernel (in the C
> > function new_semaphore() in "ccl:lisp-kernel;thread_manager.c"). It
> > uses semaphore_create() under darwin, but noting checks the return
> > value from that call.
>
>
> I noticed this problem when doing far fewer (on the order of 5) of these in
> succession. I just tried the test with a single semaphore and the results
> are much better, although not perfect. (2 errors compared to 15.) However,
> this is not the best solution, since I want the timeout mechanism to be as
> transparent as possible. Also, I don't want to have to build in a retry
> loop just to handle the case of the timeout mechanism failing...
>

The other thing that can cause a (spurious) timeout is a signal.  The
GC needs to stop other threads from running during the GC; it sends them
a POSIX signal to do this.

Most BSD system calls (well, "many" if not "most") are set up to quietly
retry themselves if they're interrupted; the Mach syscall used by
TIMED-WAIT-ON-SEMAPHORE may not be.  If that's what's happening, then
TIMED-WAIT-ON-SEMAPHORE should probably notice if it got interrupted,
try to have some idea of how much longer it -would- have waited if
it hadn't been interrupted, and wait until the original timeout expires
or the semaphore's acquired.

(I just did a Google search on "semaphore_timedwait"; one of the few
pages it found was from OpenMCL's source code.  Oh well.)

As an experiment, you might try disabling the ephemeral GC:

? (ccl:egc nil)

That'll cause GCs to be a bit longer but less frequent.  If the
spurious timeouts are being caused by the GC signaling other threads,
you should get far fewer of them if GC's happening less often.

> > None of this stuff does too much on the lisp side: it's just a pretty
> > thin wrapper around a few (OS-dependent) system calls.
> >
>
> Yeah, I noticed, it makes it tempting to get in there and play. Would you
> say this is a benefit of using native threads? That you can fairly easily
> utilize OS level process api features without too much lisp magic?
>

Yes, though some of the stuff that the OS provides is kind of ... weak.
(Unless you ask for the Advanced Model of POSIX mutex, trying to lock
a mutex that you already own deadlocks.  Etc.  Etc.)

That example means that (dumb) POSIX mutexes wouldn't be good choices
for lisp locks, but if a lisp thread calls code that waits on a mutex
or POSIX condition variable or anthing like that it'll behave just like
a non-lisp thread will.

>
> BTW - Are there other os mechanisms that are feasible, such as condition
> variables? I noticed in the semaphore wait code that it uses the mach api,
> rather than sys/semaphore, as the Linux code does.

I haven't even checked in Panther, but in Jaguar POSIX semaphores
weren't really implemented: the header files declared the functions
existed and there was some glue in the C library, but the actual
system calls weren't there.

>
> Thanks,
>
> Erik.
>
>
>
> --
> Erik Pearson
> Adaptations
> "Adaptation: It's not just for finches anymore."
>
>


More information about the Bug-openmcl mailing list