[Bug-openmcl] processes not gc'd?

Gary Byers gb at clozure.com
Mon Mar 1 09:42:21 MST 2004



On Mon, 1 Mar 2004, Erik Pearson wrote:

> Thanks for the detailed info, Gary -- comments below.
>
> > for concurrency issues.
>
> Is it a big task to make this stuff thread-safe? And do you think it is
> a long road to get the codebase fully thread-safe?

No and no, but I actually thought that the all_areas list was already
locked.

>
> >
> > That can certainly lead to all kinds of bad behavior.  It's not clear
> > to me whether the fprintf's enabled by DEBUG_THREAD_CLEANUP really tell
> > us what's going on, since I'm not sure how thread-safe fprintf is.
>
> Another interesting fprintf "debug" clue is that the cleanup mischief
> seems to occur after a cluster of interleaved threads. That is, the test
> code will run fine while there are serialized or even a couple of
> interleaved threads, but the cleanup seems to stop after a fprintf
> outputs more than three threads are active and perhaps in the process of
> being cleaned up. Not very scientific, though.
>
> >
> > If all of this stuff works reliably, there's still a leak: for no
> > good reason, the objects that sort of sit between a lisp PROCESS and
> > the underlying thread are kept in a weak list that's marked as being
> > finalizable.  (This is leftover from the old cooperative scheduler,
> > where the only way to know that a thread died - and the only way
> > to free up its stacks and other resources - was to be told by the
> > GC that it's about to become garbage.)  The short version is that
> > the GC thinks that a lisp thread needs to be held onto for finalization/
> > termination, but nothing bothers to look at the list of "finalized
> > threads" anymore (there's no reason to do so.)
>
> I removed what I think is this code with no discernable effect (neither
> good nor bad.)

There's a "population" (weak list header) that holds lisp thread
objects (called *LISP-THREAD-POPULATION* or something similar.)  It's
created with support for termination, but (a) it doesn't need to be
and (b) nothing ever processes its "termination queue", so it just
fills up with otherwise garbage thread objects.  The last arg in the
call to %CONS-POPULATION that creates that should be NIL instead of T.

Fixing that won't keep threads from chewing on all_areas at the same
time, but there's probably at least 1KB of data associated with each
thread that's being held onto because the thread is.  That isn't
necessarily everything that TOP claims is there, but it does account
for changes reported by ROOM.  I have gotten the kernel thread stuff
straightened out to the point where things aren't getting chewed on;
it may still need some additional locking to ensure that readers
of the all_areas data structure see a consistent view of it.

> (The old MCL comments in the form of an essay are in
> there too!)

It's all Bill's fault.


More information about the Bug-openmcl mailing list