[Bug-openmcl] problem with timed-wait-on-semaphore ?

Erik Pearson erik at adaptations.com
Thu Feb 26 14:59:08 MST 2004


Thanks for the detailed post.

Comments interspersed below.

--On Thursday, February 26, 2004 10:24 AM -0700 Gary Byers <gb at clozure.com> 
wrote:

>
>
> On Thu, 26 Feb 2004, Erik Pearson wrote:
>
>> Hi,
>>
>> Using "Welcome to OpenMCL Version (Beta: Darwin) 0.14.1!" on an iBook
>> 500Mhz.
>>
>> I'm attempting to write a "with-timeout" and friends for running
>> time-constrained code in threads. In the process (you know, the process
>> of programming) I've encountered some odd behavior from
>> timed-wait-on-semaphore. The test function below should create a
>> semaphore, pass it to a function which is run in a new process, which
>> will signal the semaphore when (if) it finishes, and back in the
>> original thread we wait on the semaphore. This is run by another
>> function 10000 times back to back. The timed-wait-on-semaphore should
>> never fail because it is passed a timeout of 10000 and the function does
>> very little (which should complete well within the time limit).
>
>
> The limit's in SECONDS, so I'd hope so.
>
>>
>> The result is the output below, which prints out a line for every
>> timed-wait-on-semaphore failures (i.e. timeouts), with the "bad count"
>> and "total count" following. (Code is below.)
>>
>> BAD: 1 / 2034
>> BAD: 2 / 2092
>> BAD: 3 / 2149
>> BAD: 4 / 2178
>> BAD: 5 / 2647
>> BAD: 6 / 2809
>> BAD: 7 / 3087
>> BAD: 8 / 3507
>> BAD: 9 / 3551
>> BAD: 10 / 3623
>> BAD: 11 / 3638
>> BAD: 12 / 4013
>> BAD: 13 / 4027
>> BAD: 14 / 4212
>> BAD: 15 / 5132
>> 7528^C
>> > Break in process listener(1):
>> > While executing: #<Anonymous Function #x513E12E>
>> > Type :GO to continue, :POP to abort.
>> > If continued: Return from BREAK.
>> Type :? for other options.
>>
>> Oops, I gave up because it was taking too long, and seemed to be slowing
>> down.
>>
>> Of course, timeouts should not really happen under these conditions. So
>> what is happening?
>>
>> Also, something is very slow here, as it takes over 10 minutes to
>> complete this test on a 500mhz ibook, and consumes over 50% cpu. I
>> suppose further tests could reveal what the performance culprit is.
>>
>> Code is:
>>
>> (Defun test ()
>>   (let* ((sem (ccl:make-semaphore))
>> 	 (process (ccl:process-run-function "test" #'(lambda (s)
>> 						       (format nil "Do nothing productive.")
>> 						       (ccl:signal-semaphore s)) sem)))
>>     (ccl:timed-wait-on-semaphore sem 10000)))
>>
>> (defun test2 ()
>>   (let ((bad 0))
>>   (dotimes (i 10000)
>>     (format t "~A~A" #\Return i)
>>     (unless (test)
>>       (incf bad)
>>       (format t "~ABAD: ~A / ~A~&" #\Return bad i)))))
>>
>>
>> Run as
>>
>> (compile-file "test-timeout.cl")
>> (load "test-timeout")
>> (test2)
>>
>>
>> Any comments welcome!
>>
>> Thanks,
>>
>> Erik.
>> _______________________________________________
>> Bug-openmcl mailing list
>> Bug-openmcl at clozure.com
>> http://clozure.com/mailman/listinfo/bug-openmcl
>>
>>
>
> I haven't tried this yet, but ...
>
> If this code was reorganized so that a single semaphore was allocated
> outside the loop, does it still fail ?
>
> CCL:CREATE-SEMAPHORE's work is actually done in the kernel (in the C
> function new_semaphore() in "ccl:lisp-kernel;thread_manager.c"). It
> uses semaphore_create() under darwin, but noting checks the return
> value from that call.


I noticed this problem when doing far fewer (on the order of 5) of these in 
succession. I just tried the test with a single semaphore and the results 
are much better, although not perfect. (2 errors compared to 15.) However, 
this is not the best solution, since I want the timeout mechanism to be as 
transparent as possible. Also, I don't want to have to build in a retry 
loop just to handle the case of the timeout mechanism failing...

>
> One of the possible error returns is KERN_RESOURCE_SHORTAGE, which
> is Mach's way of telling you that it's too busy to do what you want.
> new_semaphore() returns the semaphore that it expects semaphore_create()
> to have initialized; if semaphore_create() fails, new_semaphore()'s return
> value is either (a) not a semaphore or (b) some other semaphore that
> just happened to be sitting at that stack address, perhaps leftover
> from the last call.
>
> Something somewhere should clearly notice the fact that new_semaphore()
> could fail and either tell you about Mach's unfortunate resource problems
> or try again on your behalf.  By the time they get exposed to lisp code,
> semaphores are more-or-less first class objects; the GC will free them
> if they become unreferenced, but there's no other advertised way to
> destroy them.
>
> If your code was reorganized to allocate a single semaphore outside of
> the loop and then repeatedly signal/wait on it, these particular
> scenarios wouldn't likely be involved.
>
> None of this stuff does too much on the lisp side: it's just a pretty
> thin wrapper around a few (OS-dependent) system calls.
>

Yeah, I noticed, it makes it tempting to get in there and play. Would you 
say this is a benefit of using native threads? That you can fairly easily 
utilize OS level process api features without too much lisp magic?


BTW - Are there other os mechanisms that are feasible, such as condition 
variables? I noticed in the semaphore wait code that it uses the mach api, 
rather than sys/semaphore, as the Linux code does.

Thanks,

Erik.



--
Erik Pearson
Adaptations
"Adaptation: It's not just for finches anymore."


More information about the Bug-openmcl mailing list