[Bug-openmcl] problem with timed-wait-on-semaphore ?

Erik Pearson erik at adaptations.com
Thu Feb 26 16:34:38 MST 2004


--On Thursday, February 26, 2004 3:58 PM -0700 Gary Byers <gb at clozure.com> 
wrote:

>> >> Also, something is very slow here, as it takes over 10 minutes to
>> >> complete this test on a 500mhz ibook, and consumes over 50% cpu. I
>> >> suppose further tests could reveal what the performance culprit is.
>> >>
>
> I suspect that there may be other factors involved, but creating and
> destroying 10000 threads is probably never going to be exactly quick ...
>

This may be true with native threads, but there are some very fast 
non-native thread systems out there (although none that I know of in the 
Lisp world.)

>> >>
>> >
>> > I haven't tried this yet, but ...
>
> I probably should try it and see if I can see what's happening.
>
>> >
>> > If this code was reorganized so that a single semaphore was allocated
>> > outside the loop, does it still fail ?
>> >
>> > CCL:CREATE-SEMAPHORE's work is actually done in the kernel (in the C
>> > function new_semaphore() in "ccl:lisp-kernel;thread_manager.c"). It
>> > uses semaphore_create() under darwin, but noting checks the return
>> > value from that call.
>>
>>
>> I noticed this problem when doing far fewer (on the order of 5) of these
>> in succession. I just tried the test with a single semaphore and the
>> results are much better, although not perfect. (2 errors compared to
>> 15.) However, this is not the best solution, since I want the timeout
>> mechanism to be as transparent as possible. Also, I don't want to have
>> to build in a retry loop just to handle the case of the timeout
>> mechanism failing...
>>
>
> The other thing that can cause a (spurious) timeout is a signal.  The
> GC needs to stop other threads from running during the GC; it sends them
> a POSIX signal to do this.
>
> Most BSD system calls (well, "many" if not "most") are set up to quietly
> retry themselves if they're interrupted; the Mach syscall used by
> TIMED-WAIT-ON-SEMAPHORE may not be.  If that's what's happening, then
> TIMED-WAIT-ON-SEMAPHORE should probably notice if it got interrupted,
> try to have some idea of how much longer it -would- have waited if
> it hadn't been interrupted, and wait until the original timeout expires
> or the semaphore's acquired.

And that would be Apple's realm, unless one is hacking Darwin?

>
> (I just did a Google search on "semaphore_timedwait"; one of the few
> pages it found was from OpenMCL's source code.  Oh well.)
>
> As an experiment, you might try disabling the ephemeral GC:
>
> ? (ccl:egc nil)

YUP, that appears to have done the trick. Zero errors. I'm happy now, and 
will try to understand the gory details later.

>
> That'll cause GCs to be a bit longer but less frequent.  If the
> spurious timeouts are being caused by the GC signaling other threads,
> you should get far fewer of them if GC's happening less often.
>
>> > None of this stuff does too much on the lisp side: it's just a pretty
>> > thin wrapper around a few (OS-dependent) system calls.
>> >
>>
>> Yeah, I noticed, it makes it tempting to get in there and play. Would you
>> say this is a benefit of using native threads? That you can fairly easily
>> utilize OS level process api features without too much lisp magic?
>>
>
> Yes, though some of the stuff that the OS provides is kind of ... weak.
> (Unless you ask for the Advanced Model of POSIX mutex, trying to lock
> a mutex that you already own deadlocks.  Etc.  Etc.)
>
> That example means that (dumb) POSIX mutexes wouldn't be good choices
> for lisp locks, but if a lisp thread calls code that waits on a mutex
> or POSIX condition variable or anthing like that it'll behave just like
> a non-lisp thread will.
>
>>
>> BTW - Are there other os mechanisms that are feasible, such as condition
>> variables? I noticed in the semaphore wait code that it uses the mach
>> api, rather than sys/semaphore, as the Linux code does.
>
> I haven't even checked in Panther, but in Jaguar POSIX semaphores
> weren't really implemented: the header files declared the functions
> existed and there was some glue in the C library, but the actual
> system calls weren't there.

Oh, yes, this is on Panther.

Thanks for the info and the day-saving gc tip.

Erik.

>
>>
>> Thanks,
>>
>> Erik.
>>
>>
>>
>> --
>> Erik Pearson
>> Adaptations
>> "Adaptation: It's not just for finches anymore."
>>
>>



--
Erik Pearson
Adaptations
"Adaptation: It's not just for finches anymore."


More information about the Bug-openmcl mailing list