Re: [RFC, v2] powerpc/powernv: Introduce kernel param to control fastsleep workaround behavior

From: Michael Ellerman
Date: Tue Mar 17 2015 - 04:57:19 EST

On Tue, 2015-17-03 at 04:13:53 UTC, "Shreyas B. Prabhu" wrote:
> Fastsleep is one of the idle state which cpuidle subsystem currently
> uses on power8 machines. In this state L2 cache is brought down to a
> threshold voltage. Therefore when the core is in fastsleep, the
> communication between L2 and L3 needs to be fenced. But there is a bug
> in the current power8 chips surrounding this fencing. OPAL provides an
> interface to workaround this bug, and in the current implementation,
> every time before a core enters fastsleep OPAL call is made to 'apply'
> the workarond and when the core wakes up from fastsleep OPAL call is
> made to 'undo' the workaround. These OPAL calls account for roughly
> 4000 cycles everytime the core has to enter or wakeup from fastsleep.

OK. The bit you don't explain is that while the workaround is applied there is
a risk ...

> The other alternative is to apply this workaround once at boot, and not
> undo it at all. While this would quicken fastsleep entry/wakeup path,
> downside is, any correctable error detected in L2 directory will result
> in a checkstop.

Of this happening.

Which is why we don't just always apply the workaround. Am I right?

> This patch adds a new kernel paramerter
> pnv_fastsleep_workaround_once, which can be used to override
> the default behavior and apply the workaround once at boot and not undo
> it.

So my first preference is that you just bite the bullet and decide to either
always apply the workaround, or just stick with the current behaviour. That's a
trade-off between (I think) better idle latency but a risk of checkstops, vs.
slower idle latency but less (how much less?) risk of checkstops.

I think the reason you're proposing a kernel parameter is because we aren't
willing to make that decision, ie. we're saying that users should decide. Is
that right?

I'm not a big fan of kernel parameters. They are a pain to use, and are often
just pushing a decision down one layer for no reason. What I mean is that
individual users are probably just going to accept whatever the default value
is from their distro.

But anyway, that's a bit of a rant.

As far as this patch is concerned, I don't think it actually needs to be a
kernel parameter.

>From what I can see below, the decision as to whether you apply the workaround
or not doesn't affect the list of idle states. So this could just as well be a
runtime parameter, ie. a sysfs file, which can then be set by the user whenever
they like? They might do it in a boot script, but that's up to them.

For simplicity I think it would also be fine to make it a write-once parameter,
ie. you don't need to handle undoing it.

I think the only complication that would add is that you'd need to be a little
careful about the order in which you nop out the calls vs applying the
workaround, in case some threads are idle when you're called.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at