Re: Q: smp.c && barriers (Was: [PATCH 1/4] generic-smp: removesingle ipi fallback for smp_call_function_many())

From: Gleb Natapov
Date: Wed Feb 18 2009 - 11:40:55 EST


On Wed, Feb 18, 2009 at 05:21:16PM +0100, Ingo Molnar wrote:
>
> * Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Wed, 18 Feb 2009, Nick Piggin wrote:
> > >
> > > I agree with you both that we *should* make arch interrupt code
> > > do the ordering, but given the subtle lockups on some architectures
> > > in this new code, I didn't want to make it significantly weaker...
> > >
> > > Though perhaps it appears that I have, if I have removed an smp_mb
> > > that x86 was relying on to emit an mfence to serialise the apic.
> >
> > The thing is, if the architecture doesn't order IPI wrt cache coherency,
> > then the "smp_mb()" doesn't really do so _either_.
> >
> > It might hide some architecture-specific implementation issue, of course,
> > so random amounts of "smp_mb()"s sprinkled around might well make some
> > architecture "work", but it's in no way guaranteed. A smp_mb() does not
> > guarantee that some separate IPI network is ordered - that may well take
> > some random machine-specific IO cycle.
> >
> > That said, at least on x86, taking an interrupt should be a serializing
> > event, so there should be no reason for anything on the receiving side.
> > The _sending_ side might need to make sure that there is serialization
> > when generating the IPI (so that the IPI cannot happen while the writes
> > are still in some per-CPU write buffer and haven't become part of the
> > cache coherency domain).
> >
> > And at least on x86 it's actually pretty hard to generate out-of-order
> > accesses to begin with (_regardless_ of any issues external to the CPU).
> > You have to work at it, and use a WC memory area, and I'm pretty sure we
> > use UC for the apic accesses.
>
> yeah, we do. I do remember one x2apic related memory ordering
> bug though which happened in the past 6 months or so, lemme find
> the commit.
>
> This one is it:
>
> d6f0f39: x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR
>
> attached below.
>
> The reason for that is that x2apic changes the access sequence
> from mmio (which old lapic used to be, and which was mapped UC),
> to an MSR sequence:
>
> static inline void native_x2apic_icr_write(u32 low, u32 id)
> {
> wrmsrl(APIC_BASE_MSR + (APIC_ICR >> 4), ((__u64) id) << 32 | low);
> }
>
> But ... WRMSR should already be serializing - it is documented
> as a serializing instruction.
>
FWIW that is what Intel docs says:

To allow for efficient access to the APIC registers in x2APIC mode,
the serializing semantics of WRMSR are relaxed when writing to the APIC
registers. Thus, system software should not use “WRMSR to APIC registers
in x2APIC mode” as a serializing instruction. Read and write accesses
to the APIC registers will occur in program order.

> I've cc:-ed Suresh & other APIC experts - exactly what type of
> hang did that patch fix? Do certain CPUs perhaps cut
> serialization corners, to speed up x2apic accesses?
>
> Ingo
>
> ------------------->
> >From d6f0f39b7d05e62b347c4352d070e4afb3ade4b5 Mon Sep 17 00:00:00 2001
> From: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
> Date: Tue, 4 Nov 2008 13:53:04 -0800
> Subject: [PATCH] x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR
>
> Impact: fix rare x2apic hang
>
> On x86, x2apic mode accesses for sending IPI's don't have serializing
> semantics. If the IPI receivner refers(in lock-free fashion) to some
> memory setup by the sender, the need for smp_mb() before sending the
> IPI becomes critical in x2apic mode.
>
> Add the smp_mb() in native_flush_tlb_others() before sending the IPI.
>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> ---
> arch/x86/kernel/tlb_32.c | 6 ++++++
> arch/x86/kernel/tlb_64.c | 5 +++++
> 2 files changed, 11 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/tlb_32.c b/arch/x86/kernel/tlb_32.c
> index e00534b..f4049f3 100644
> --- a/arch/x86/kernel/tlb_32.c
> +++ b/arch/x86/kernel/tlb_32.c
> @@ -154,6 +154,12 @@ void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,
> flush_mm = mm;
> flush_va = va;
> cpus_or(flush_cpumask, cpumask, flush_cpumask);
> +
> + /*
> + * Make the above memory operations globally visible before
> + * sending the IPI.
> + */
> + smp_mb();
> /*
> * We have to send the IPI only to
> * CPUs affected.
> diff --git a/arch/x86/kernel/tlb_64.c b/arch/x86/kernel/tlb_64.c
> index dcbf7a1..8f919ca 100644
> --- a/arch/x86/kernel/tlb_64.c
> +++ b/arch/x86/kernel/tlb_64.c
> @@ -183,6 +183,11 @@ void native_flush_tlb_others(const cpumask_t *cpumaskp, struct mm_struct *mm,
> cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask);
>
> /*
> + * Make the above memory operations globally visible before
> + * sending the IPI.
> + */
> + smp_mb();
> + /*
> * We have to send the IPI only to
> * CPUs affected.
> */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/