[Bug-openmcl] problem with timed-wait-on-semaphore ?
erik at adaptations.com
Thu Feb 26 14:59:08 MST 2004
Thanks for the detailed post.
Comments interspersed below.
--On Thursday, February 26, 2004 10:24 AM -0700 Gary Byers <gb at clozure.com>
> On Thu, 26 Feb 2004, Erik Pearson wrote:
>> Using "Welcome to OpenMCL Version (Beta: Darwin) 0.14.1!" on an iBook
>> I'm attempting to write a "with-timeout" and friends for running
>> time-constrained code in threads. In the process (you know, the process
>> of programming) I've encountered some odd behavior from
>> timed-wait-on-semaphore. The test function below should create a
>> semaphore, pass it to a function which is run in a new process, which
>> will signal the semaphore when (if) it finishes, and back in the
>> original thread we wait on the semaphore. This is run by another
>> function 10000 times back to back. The timed-wait-on-semaphore should
>> never fail because it is passed a timeout of 10000 and the function does
>> very little (which should complete well within the time limit).
> The limit's in SECONDS, so I'd hope so.
>> The result is the output below, which prints out a line for every
>> timed-wait-on-semaphore failures (i.e. timeouts), with the "bad count"
>> and "total count" following. (Code is below.)
>> BAD: 1 / 2034
>> BAD: 2 / 2092
>> BAD: 3 / 2149
>> BAD: 4 / 2178
>> BAD: 5 / 2647
>> BAD: 6 / 2809
>> BAD: 7 / 3087
>> BAD: 8 / 3507
>> BAD: 9 / 3551
>> BAD: 10 / 3623
>> BAD: 11 / 3638
>> BAD: 12 / 4013
>> BAD: 13 / 4027
>> BAD: 14 / 4212
>> BAD: 15 / 5132
>> > Break in process listener(1):
>> > While executing: #<Anonymous Function #x513E12E>
>> > Type :GO to continue, :POP to abort.
>> > If continued: Return from BREAK.
>> Type :? for other options.
>> Oops, I gave up because it was taking too long, and seemed to be slowing
>> Of course, timeouts should not really happen under these conditions. So
>> what is happening?
>> Also, something is very slow here, as it takes over 10 minutes to
>> complete this test on a 500mhz ibook, and consumes over 50% cpu. I
>> suppose further tests could reveal what the performance culprit is.
>> Code is:
>> (Defun test ()
>> (let* ((sem (ccl:make-semaphore))
>> (process (ccl:process-run-function "test" #'(lambda (s)
>> (format nil "Do nothing productive.")
>> (ccl:signal-semaphore s)) sem)))
>> (ccl:timed-wait-on-semaphore sem 10000)))
>> (defun test2 ()
>> (let ((bad 0))
>> (dotimes (i 10000)
>> (format t "~A~A" #\Return i)
>> (unless (test)
>> (incf bad)
>> (format t "~ABAD: ~A / ~A~&" #\Return bad i)))))
>> Run as
>> (compile-file "test-timeout.cl")
>> (load "test-timeout")
>> Any comments welcome!
>> Bug-openmcl mailing list
>> Bug-openmcl at clozure.com
> I haven't tried this yet, but ...
> If this code was reorganized so that a single semaphore was allocated
> outside the loop, does it still fail ?
> CCL:CREATE-SEMAPHORE's work is actually done in the kernel (in the C
> function new_semaphore() in "ccl:lisp-kernel;thread_manager.c"). It
> uses semaphore_create() under darwin, but noting checks the return
> value from that call.
I noticed this problem when doing far fewer (on the order of 5) of these in
succession. I just tried the test with a single semaphore and the results
are much better, although not perfect. (2 errors compared to 15.) However,
this is not the best solution, since I want the timeout mechanism to be as
transparent as possible. Also, I don't want to have to build in a retry
loop just to handle the case of the timeout mechanism failing...
> One of the possible error returns is KERN_RESOURCE_SHORTAGE, which
> is Mach's way of telling you that it's too busy to do what you want.
> new_semaphore() returns the semaphore that it expects semaphore_create()
> to have initialized; if semaphore_create() fails, new_semaphore()'s return
> value is either (a) not a semaphore or (b) some other semaphore that
> just happened to be sitting at that stack address, perhaps leftover
> from the last call.
> Something somewhere should clearly notice the fact that new_semaphore()
> could fail and either tell you about Mach's unfortunate resource problems
> or try again on your behalf. By the time they get exposed to lisp code,
> semaphores are more-or-less first class objects; the GC will free them
> if they become unreferenced, but there's no other advertised way to
> destroy them.
> If your code was reorganized to allocate a single semaphore outside of
> the loop and then repeatedly signal/wait on it, these particular
> scenarios wouldn't likely be involved.
> None of this stuff does too much on the lisp side: it's just a pretty
> thin wrapper around a few (OS-dependent) system calls.
Yeah, I noticed, it makes it tempting to get in there and play. Would you
say this is a benefit of using native threads? That you can fairly easily
utilize OS level process api features without too much lisp magic?
BTW - Are there other os mechanisms that are feasible, such as condition
variables? I noticed in the semaphore wait code that it uses the mach api,
rather than sys/semaphore, as the Linux code does.
"Adaptation: It's not just for finches anymore."
More information about the Bug-openmcl