Re: [BUG] long freezes on thinkpad t60

From: Ravikiran G Thirumalai
Date: Tue Jun 19 2007 - 00:22:54 EST


On Mon, Jun 18, 2007 at 01:20:55AM -0700, Andrew Morton wrote:
> On Mon, 18 Jun 2007 10:12:04 +0200 Ingo Molnar <mingo@xxxxxxx> wrote:
>
> > ---------------------------------------------------->
> > Subject: [patch] x86: fix spin-loop starvation bug
> > From: Ingo Molnar <mingo@xxxxxxx>
> >
> > Miklos Szeredi reported very long pauses (several seconds, sometimes
> > more) on his T60 (with a Core2Duo) which he managed to track down to
> > wait_task_inactive()'s open-coded busy-loop. He observed that an
> > interrupt on one core tries to acquire the runqueue-lock but does not
> > succeed in doing so for a very long time - while wait_task_inactive() on
> > the other core loops waiting for the first core to deschedule a task
> > (which it wont do while spinning in an interrupt handler).
> >
> > The problem is: both the spin_lock() code and the wait_task_inactive()
> > loop uses cpu_relax()/rep_nop(), so in theory the CPU should have
> > guaranteed MESI-fairness to the two cores - but that didnt happen: one
> > of the cores was able to monopolize the cacheline that holds the
> > runqueue lock, for extended periods of time.
> >
> > This patch changes the spin-loop to assert an atomic op after every REP
> > NOP instance - this will cause the CPU to express its "MESI interest" in
> > that cacheline after every REP NOP.
>
> Kiran, if you're still able to reproduce that zone->lru_lock starvation problem,
> this would be a good one to try...

We tried this approach a week back (speak of co-incidences), and it did not
help the problem. I'd changed calls to the zone->lru_lock spin_lock
to do spin_trylock in a while loop with cpu_relax instead. It did not help,
This was on top of 2.6.17 kernels. But the good news is 2.6.21, as
is does not have the starvation issue -- that is, zone->lru_lock does not
seem to get contended that much under the same workload.

However, this was not on the same hardware I reported zone->lru_lock
contention on (8 socket dual core opteron). I don't have access to it
anymore :(

Thanks,
Kiran
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/