Re: [PATCH 1/6] x86-mce: Modify CMCI poll interval to adjust for small check_interval values.

From: Borislav Petkov
Date: Fri Jul 11 2014 - 16:22:23 EST


On Fri, Jul 11, 2014 at 11:56:11AM -0700, Havard Skinnemoen wrote:
> > * max number of CMCIs per second a system can sustain fine, i.e. the 100
> > above
>
> What's the definition of "fine"? 1% performance hit? 10%? How can we
> make that decision without knowing how hard the users are pushing
> their systems?

Those are definitely unchartered territories we're moving into so yes,
answering that won't be easy.

Let's try it: if the anount of time we spend per second in the CMCI
handler becomes higher than the amount of time we spend polling for that
same second, then we might just as well poll and save us the interrupt
load.

So, for example, say for 10ms poll rate and single poll duration of
2ms, if time spent in CMCI exceeds 200ms for that second, we switch to
polling. Hypothetical numbers, of course.

Can we measure it on every system? Probably not. So we need to
approximate it somehow.

>
> > * total polling duration during storm, i.e. the 1 second above
> >
> > and if those are chosen generously for every system out there, then we
> > don't need to dynamically adjust the polling interval.
>
> I'm not sure how we can be generous when there's a tradeoff involved.
> If we make the interval "generously low", we end up hurting
> performance. if we make it "generously high", we'll lose information.

Yeah, by "generous" I meant, choose values which fit all. But I realize
now that this is a dumb idea. Maybe we could measure it on each system,
read the TSC on CMCI entry and exit and thus get an average CMCI
duration...

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/