Re: MM deadlock [was: Re: arca-vm-8...]

MOLNAR Ingo (mingo@chiara.csoma.elte.hu)
Tue, 12 Jan 1999 02:39:56 +0100 (CET)


Linus Torvalds wrote:

> And I don't see any way of getting rid of it without another spinlock. I
> _could_ possibly do it with something like
>
> up(sem) {
> if (!sem.count)
> sem.owner = 0;
> old_up();
> }
>
> but I can't convince myself that that always works either.

i dont think this is correct, because !sem.count does not mean that we
'drop ownership' of the critical region. All waiters decrease sem.count
without increasing it while sleeping, so the 'owner' does not know the
'depth' of recursion.

i think we can do it without another spinlock, but it's not nearly as
cheap as the original semaphore code. 'owner' is only meaningful to the
owner of the critical region, nobody else is is supposed to write or
evaluate that field. We should not access the owner field outside the
critical region at all. So i dont think anything could go wrong there, the
bug was not some subtle multi-CPU race, but the fact that the 'owner'
field was simply false after we have exited the critical region.

something like this should work:

new sem.depth field (sigh), and:

down()
{
if (old_down() == JUST_GOT_OWNERSHIP) {
sem.owner = esp;
/* sem.depth == 0 is true here */ ;
}
}

up()
{
if (!sem.depth--)
sem.owner = 0;
old_up();
}

i think up() could be 4 instructions on newer CPUs and newer compilers:

decl 12(%0) # depth
cmovl $0, 4(%0) # owner with conditional move
lock;
... # old up() stuff

but it's not too cheap. OTOH, the depth-related instructions are free to
be reordered into the critical region, so it's still much cheaper than 2
spinlocks.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/