Re: Q: smp.c && barriers (Was: [PATCH 1/4] generic-smp: removesingle ipi fallback for smp_call_function_many())

From: Ingo Molnar
Date: Wed Feb 18 2009 - 11:22:39 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Wed, 18 Feb 2009, Nick Piggin wrote:
> >
> > I agree with you both that we *should* make arch interrupt code
> > do the ordering, but given the subtle lockups on some architectures
> > in this new code, I didn't want to make it significantly weaker...
> >
> > Though perhaps it appears that I have, if I have removed an smp_mb
> > that x86 was relying on to emit an mfence to serialise the apic.
>
> The thing is, if the architecture doesn't order IPI wrt cache coherency,
> then the "smp_mb()" doesn't really do so _either_.
>
> It might hide some architecture-specific implementation issue, of course,
> so random amounts of "smp_mb()"s sprinkled around might well make some
> architecture "work", but it's in no way guaranteed. A smp_mb() does not
> guarantee that some separate IPI network is ordered - that may well take
> some random machine-specific IO cycle.
>
> That said, at least on x86, taking an interrupt should be a serializing
> event, so there should be no reason for anything on the receiving side.
> The _sending_ side might need to make sure that there is serialization
> when generating the IPI (so that the IPI cannot happen while the writes
> are still in some per-CPU write buffer and haven't become part of the
> cache coherency domain).
>
> And at least on x86 it's actually pretty hard to generate out-of-order
> accesses to begin with (_regardless_ of any issues external to the CPU).
> You have to work at it, and use a WC memory area, and I'm pretty sure we
> use UC for the apic accesses.

yeah, we do. I do remember one x2apic related memory ordering
bug though which happened in the past 6 months or so, lemme find
the commit.

This one is it:

d6f0f39: x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR

attached below.

The reason for that is that x2apic changes the access sequence
from mmio (which old lapic used to be, and which was mapped UC),
to an MSR sequence:

static inline void native_x2apic_icr_write(u32 low, u32 id)
{
wrmsrl(APIC_BASE_MSR + (APIC_ICR >> 4), ((__u64) id) << 32 | low);
}

But ... WRMSR should already be serializing - it is documented
as a serializing instruction.

I've cc:-ed Suresh & other APIC experts - exactly what type of
hang did that patch fix? Do certain CPUs perhaps cut
serialization corners, to speed up x2apic accesses?

Ingo

------------------->