Re: [PATCH 10/17] RISC-V: Atomic and Locking Code
From: Arnd Bergmann
Date: Wed Jul 12 2017 - 09:13:26 EST
On Wed, Jul 12, 2017 at 3:31 AM, Palmer Dabbelt <palmer@xxxxxxxxxxx> wrote:
> +/*
> + * FIXME: I'm flip-flopping on whether or not we should keep this or enforce
> + * the ordering with I/O on spinlocks. The worry is that drivers won't get
> + * this correct, but I also don't want to introduce a fence into the lock code
> + * that otherwise only uses AMOs and LR/SC. For now I'm leaving this here:
> + * "w,o" is sufficient to ensure that all writes to the device has completed
> + * before the write to the spinlock is allowed to commit. I surmised this from
> + * reading "ACQUIRES VS I/O ACCESSES" in memory-barriers.txt.
> + */
> +#define mmiowb() __asm__ __volatile__ ("fence o,w" : : : "memory");
Not sure if this has been discussed before, but have you considered doing
it like powerpc with a per-cpu flag that gets checked by spin_unlock()?
> +/*
> + * Relaxed I/O memory access primitives. These follow the Device memory
> + * ordering rules but do not guarantee any ordering relative to Normal memory
> + * accesses. The memory barriers here are necessary as RISC-V doesn't define
> + * any ordering constraints on accesses to the device I/O space. These are
> + * defined to order the indicated access (either a read or write) with all
> + * other I/O memory accesses.
> + */
> +/*
> + * FIXME: The platform spec will define the default Linux-capable platform to
> + * have some extra IO ordering constraints that will make these fences
> + * unnecessary.
> + */
> +#define __iorrmb() __asm__ __volatile__ ("fence i,io" : : : "memory");
> +#define __iorwmb() __asm__ __volatile__ ("fence io,o" : : : "memory");
> +
> +#define readb_relaxed(c) ({ u8 __v = readb_cpu(c); __iorrmb(); __v; })
> +#define readw_relaxed(c) ({ u16 __v = readw_cpu(c); __iorrmb(); __v; })
> +#define readl_relaxed(c) ({ u32 __v = readl_cpu(c); __iorrmb(); __v; })
> +#define readq_relaxed(c) ({ u64 __v = readq_cpu(c); __iorrmb(); __v; })
> +
> +#define writeb_relaxed(v,c) ({ __iorwmb(); writeb_cpu((v),(c)); })
> +#define writew_relaxed(v,c) ({ __iorwmb(); writew_cpu((v),(c)); })
> +#define writel_relaxed(v,c) ({ __iorwmb(); writel_cpu((v),(c)); })
> +#define writeq_relaxed(v,c) ({ __iorwmb(); writeq_cpu((v),(c)); })
What are the ordering rules for a write followed by a read? You say
that there are no ordering constraints on I/O access at all, but you don't
add any barriers between that pair, which is probably the most important
to get right.
Also, can you say here why the architecture is defined like this?
I'm sure that there have been very careful considerations on what
the semantics should be, yet it appears that MMIO access is defined
to be so relaxed purely as an optimization that as a consequence
leads to much slower I/O than a more typical definition as the operating
system has to add much heavier barriers than would otherwise be
needed.
> +/*
> + * I/O memory access primitives. Reads are ordered relative to any
> + * following Normal memory access. Writes are ordered relative to any prior
> + * Normal memory access. The memory barriers here are necessary as RISC-V
> + * doesn't define any ordering between the memory space and the I/O space.
> + * They may be stronger than necessary ("i,r" and "w,o" might be sufficient),
> + * but I feel kind of queasy making these weaker in any manner than the relaxed
> + * versions above.
> + */
> +#define __iormb() __asm__ __volatile__ ("fence i,ior" : : : "memory");
> +#define __iowmb() __asm__ __volatile__ ("fence iow,o" : : : "memory");
> +
> +#define readb(c) ({ u8 __v = readb_cpu(c); __iormb(); __v; })
> +#define readw(c) ({ u16 __v = readw_cpu(c); __iormb(); __v; })
> +#define readl(c) ({ u32 __v = readl_cpu(c); __iormb(); __v; })
> +#define readq(c) ({ u64 __v = readq_cpu(c); __iormb(); __v; })
> +
> +#define writeb(v,c) ({ __iowmb(); writeb_cpu((v),(c)); })
> +#define writew(v,c) ({ __iowmb(); writew_cpu((v),(c)); })
> +#define writel(v,c) ({ __iowmb(); writel_cpu((v),(c)); })
> +#define writeq(v,c) ({ __iowmb(); writeq_cpu((v),(c)); })
These look good to me (once the _relaxed version has been clarified, the
same question applies here as well). I see that you don't override the
PCI I/O space accessors (inb/outb etc), which have even stricter
ordering requirements.
I think you also need a barrier after outb/outw/outl to ensure that the write
has completed on the bus before anything else happens.
Finally, I think you also need to override the string functions
(readsl/writesl/insl/outsl) that are lacking barriers in the asm-generic
version.
Arnd