Re: idle issues running sembench on 128 cpus

From: Andi Kleen
Date: Wed May 04 2011 - 19:04:00 EST


> No, it does not even need refcounting. We can access it outside of the

Ok.

> lock as this is atomic context called on the cpu which is about to go
> idle and therefor the device cannot go away. Easy and straightforward
> fix.

Ok. Patch appended. Looks good?

BTW why must the lock be irqsave?

>
> > But yes it would be still good to fix Nehalem too.
> >
> > One fix would be to make all the masks hierarchical,
> > similar to what RCU does. Perhaps even some code
> > could be shared with RCU on that because it's a very
> > similar problem.
>
> In theory. It's not about the mask. The mask is uninteresting. It's
> about the expiry time, which we have to protect. There is nothing
> hierarchical about that. It all boils down on _ONE_ single functional

The mask can be used to see if another thread on this core is still
running. If yes you don't need that. Right now Linux doesn't
know that, but it could be taught. The only problem is that once
the other guy goes idle too their timeouts have to be merged.

This would cut contention in half.

Also if it's HPET you could actually use multiple independent HPET channels.
I remember us discussing this a long time ago... Not sure if it's worth
it, but it may be a small relief.

> device and you don't want to miss out your deadline just because you
> decided to be extra clever. RCU does not care much whether you run the
> callbacks a tick later on not. Time and timekeeping does.

You can at least check lockless if someone else has a <= timeout, right?

-Andi

---

Move C3 stop test outside lock

Avoid taking locks in the idle path for systems where the timer
doesn't stop in C3.

Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index da800ff..9cf0415 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -456,23 +456,22 @@ void tick_broadcast_oneshot_control(unsigned long reason)
unsigned long flags;
int cpu;

- raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
-
/*
* Periodic mode does not care about the enter/exit of power
* states
*/
if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
- goto out;
+ return;

+ cpu = raw_smp_processor_id();
bc = tick_broadcast_device.evtdev;
- cpu = smp_processor_id();
td = &per_cpu(tick_cpu_device, cpu);
dev = td->evtdev;

if (!(dev->features & CLOCK_EVT_FEAT_C3STOP))
- goto out;
+ return;

+ raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
if (reason == CLOCK_EVT_NOTIFY_BROADCAST_ENTER) {
if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) {
cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask());
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/