Re: [PATCH] x86: add workaround monitor bug

From: Jacob Pan
Date: Fri Jul 08 2016 - 15:14:45 EST


On Fri, 8 Jul 2016 14:07:12 +0200
Ingo Molnar <mingo@xxxxxxxxxx> wrote:

>
> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > On Fri, Jul 08, 2016 at 10:55:15AM +0200, Ingo Molnar wrote:
> >
> > > > static inline void mwait_idle_with_hints(unsigned long eax,
> > > > unsigned long ecx) {
> > > > - if (!current_set_polling_and_test()) {
> > > > + if (static_cpu_has_bug(X86_BUG_MONITOR)
> > > > || !current_set_polling_and_test()) {
> > >
> > > Hm, this might be suboptimal: if MONITOR/MWAIT is implemented by
> > > setting the exclusive flag for the monitored memory address and
> > > then snooping for cache invalidation requests for that cache
> > > line, then not modifying the ->flags value with
> > > TIF_POLLING_NRFLAG makes MWAIT not wake up - only the IPI would
> > > wake it up.
> >
> > Confused.. POLLING_NRFLAGS is not used to wake up ever. It is only
> > used to determine if we want to send IPIs or not.
>
> I called the IPI the 'wakeup' - it's the 'CPU wakeup' :-)
>
> > And since we _must_ send an IPI in this case, because the monitor is
> > busted, we cannot set this.
> >
> > > I think a better approach would be to still optimistically modify
> > > the ->flags value _AND_ to also send an IPI, to make sure the
> > > wakeup is not lost. This means that the woken CPU will wake up
> > > much faster (no IPI latency).
> >
> > This is exactly what is done. See resched_curr()'s use of
> > set_nr_and_not_polling(). That does:
> >
> > if (!(fetch_or(&flags, NEED_RESCHED) & POLLING_NRFLAG))
> > smp_send_reschedule(cpu);
> >
> > So we unconditionally set NEED_RESCHED, if, when we set that,
> > POLLING was set, we skip the IPI.
>
> Ah, indeed, we set NEED_RESCHED in the same memory address that
> __monitor() is watching so all is good.
>
> > So again, since monitor is busted, simply setting NEED_RESCHED will
> > not wake us, we must send the IPI, this is achieved by not setting
> > POLLING_NRFLAG.
>
> Yeah, so I got the impression that it might be broken in only certain
> circumstances, or is it completely busted?
>
That is right, monitor is only partially broken not completely busted. I
don't have the statistics but it is not rare to miss wakeup. The
typical symptom is random slowness and fails to boot, without this
patch.

So doing both can speed up wake up in some cases.

Jacob
> Thanks,
>
> Ingo

[Jacob Pan]