[Bug-openmcl] problem with timed-wait-on-semaphore ?
gb at clozure.com
Thu Feb 26 15:58:27 MST 2004
> >> Also, something is very slow here, as it takes over 10 minutes to
> >> complete this test on a 500mhz ibook, and consumes over 50% cpu. I
> >> suppose further tests could reveal what the performance culprit is.
I suspect that there may be other factors involved, but creating and
destroying 10000 threads is probably never going to be exactly quick ...
> > I haven't tried this yet, but ...
I probably should try it and see if I can see what's happening.
> > If this code was reorganized so that a single semaphore was allocated
> > outside the loop, does it still fail ?
> > CCL:CREATE-SEMAPHORE's work is actually done in the kernel (in the C
> > function new_semaphore() in "ccl:lisp-kernel;thread_manager.c"). It
> > uses semaphore_create() under darwin, but noting checks the return
> > value from that call.
> I noticed this problem when doing far fewer (on the order of 5) of these in
> succession. I just tried the test with a single semaphore and the results
> are much better, although not perfect. (2 errors compared to 15.) However,
> this is not the best solution, since I want the timeout mechanism to be as
> transparent as possible. Also, I don't want to have to build in a retry
> loop just to handle the case of the timeout mechanism failing...
The other thing that can cause a (spurious) timeout is a signal. The
GC needs to stop other threads from running during the GC; it sends them
a POSIX signal to do this.
Most BSD system calls (well, "many" if not "most") are set up to quietly
retry themselves if they're interrupted; the Mach syscall used by
TIMED-WAIT-ON-SEMAPHORE may not be. If that's what's happening, then
TIMED-WAIT-ON-SEMAPHORE should probably notice if it got interrupted,
try to have some idea of how much longer it -would- have waited if
it hadn't been interrupted, and wait until the original timeout expires
or the semaphore's acquired.
(I just did a Google search on "semaphore_timedwait"; one of the few
pages it found was from OpenMCL's source code. Oh well.)
As an experiment, you might try disabling the ephemeral GC:
? (ccl:egc nil)
That'll cause GCs to be a bit longer but less frequent. If the
spurious timeouts are being caused by the GC signaling other threads,
you should get far fewer of them if GC's happening less often.
> > None of this stuff does too much on the lisp side: it's just a pretty
> > thin wrapper around a few (OS-dependent) system calls.
> Yeah, I noticed, it makes it tempting to get in there and play. Would you
> say this is a benefit of using native threads? That you can fairly easily
> utilize OS level process api features without too much lisp magic?
Yes, though some of the stuff that the OS provides is kind of ... weak.
(Unless you ask for the Advanced Model of POSIX mutex, trying to lock
a mutex that you already own deadlocks. Etc. Etc.)
That example means that (dumb) POSIX mutexes wouldn't be good choices
for lisp locks, but if a lisp thread calls code that waits on a mutex
or POSIX condition variable or anthing like that it'll behave just like
a non-lisp thread will.
> BTW - Are there other os mechanisms that are feasible, such as condition
> variables? I noticed in the semaphore wait code that it uses the mach api,
> rather than sys/semaphore, as the Linux code does.
I haven't even checked in Panther, but in Jaguar POSIX semaphores
weren't really implemented: the header files declared the functions
existed and there was some glue in the C library, but the actual
system calls weren't there.
> Erik Pearson
> "Adaptation: It's not just for finches anymore."
More information about the Bug-openmcl