[Bug-openmcl] problem with timed-wait-on-semaphore ?
Gary Byers
gb at clozure.com
Thu Feb 26 17:40:18 MST 2004
On Thu, 26 Feb 2004, Erik Pearson wrote:
> --On Thursday, February 26, 2004 3:58 PM -0700 Gary Byers <gb at clozure.com>
> wrote:
>
> >> >> Also, something is very slow here, as it takes over 10 minutes to
> >> >> complete this test on a 500mhz ibook, and consumes over 50% cpu. I
> >> >> suppose further tests could reveal what the performance culprit is.
> >> >>
> >
> > I suspect that there may be other factors involved, but creating and
> > destroying 10000 threads is probably never going to be exactly quick ...
> >
>
> This may be true with native threads, but there are some very fast
> non-native thread systems out there (although none that I know of in the
> Lisp world.)
It's possible that some of the overhead could be reduced here: when a
thread's created, a few largish (~1MB) blocks of memory are mapped as
stacks; when it exits, those blocks of memory get unmapped. If you're
going to be creating and destroying a bunch of short-lived threads,
that memory could be managed by the lisp.
> > Most BSD system calls (well, "many" if not "most") are set up to quietly
> > retry themselves if they're interrupted; the Mach syscall used by
> > TIMED-WAIT-ON-SEMAPHORE may not be. If that's what's happening, then
> > TIMED-WAIT-ON-SEMAPHORE should probably notice if it got interrupted,
> > try to have some idea of how much longer it -would- have waited if
> > it hadn't been interrupted, and wait until the original timeout expires
> > or the semaphore's acquired.
>
> And that would be Apple's realm, unless one is hacking Darwin?
>
Dick Gabriel wrote a paper on this several years ago; he used the case
of interrupted system calls to contrast what he called "the MIT school
of design" with "the New Jersey school". Since UNIX and Mach are
more nearly products of the latter school, it's up to the programmer
(or TIMED-WAIT-ON-SEMAPHORE, in this case) to explicitly check for
an error return that indicates that the wait was prematurely interrupted.
(Gabriel's old paper is at:
<http://www.ai.mit.edu/docs/articles/good-news/good-news.html>.)
> > As an experiment, you might try disabling the ephemeral GC:
> >
> > ? (ccl:egc nil)
>
> YUP, that appears to have done the trick. Zero errors. I'm happy now, and
> will try to understand the gory details later.
>
If a thread's waiting for a semaphore and another thread triggers a GC,
the GC will interrupt the first thread to suspend it (so that it can see
where its registers are pointing and be sure that they're correctly
updated if the GC moves things around in memory.) When the GC's finished,
the thread is allowed to resume.
If the thread was in the middle of a system call when it was interrupted,
the system call would either need to be restarted (with the timeout parameter
adjusted for elapsed time) or aborted (with an appropriate return value.)
It's easier to implement the latter.
>
> Oh, yes, this is on Panther.
>
> Thanks for the info and the day-saving gc tip.
The real fix is to make TIMED-WAIT-ON-SEMAPHORE keep trying until it
either gets the semaphore or the original timeout has elapsed. If
it's just getting interrupted as a side-effect of GC activity, that's
not really very interesting ...
(The first theory - that the OS was just running out of semaphores -
doesn't seem to have been the problem, but the lisp should clearly
check to make sure that a semaphore's really been created.)
>
> Erik.
More information about the Bug-openmcl
mailing list