Re: [PATCH 01/20] asm-generic/mmiowb: Add generic implementation of mmiowb() tracking

From: Nicholas Piggin
Date: Sat Mar 02 2019 - 20:43:31 EST


Will Deacon's on March 2, 2019 12:03 am:
> In preparation for removing all explicit mmiowb() calls from driver
> code, implement a tracking system in asm-generic based loosely on the
> PowerPC implementation. This allows architectures with a non-empty
> mmiowb() definition to have the barrier automatically inserted in
> spin_unlock() following a critical section containing an I/O write.

Is there a reason to call this "mmiowb"? We already have wmb that
orders cacheable stores vs mmio stores don't we?

Yes ia64 "sn2" is broken in that case, but that can be fixed (if
anyone really cares about the platform any more). Maybe that's
orthogonal to what you're doing here, I just don't like seeing
"mmiowb" spread.

This series works for spin locks, but you would want a driver to
be able to use wmb() to order locks vs mmio when using a bit lock
or a mutex or whatever else. Calling your wmb-if-io-is-pending
version io_mb_before_unlock() would kind of match with existing
patterns.

> +static inline void mmiowb_set_pending(void)
> +{
> + struct mmiowb_state *ms = __mmiowb_state();
> + ms->mmiowb_pending = ms->nesting_count;
> +}
> +
> +static inline void mmiowb_spin_lock(void)
> +{
> + struct mmiowb_state *ms = __mmiowb_state();
> + ms->nesting_count++;
> +}
> +
> +static inline void mmiowb_spin_unlock(void)
> +{
> + struct mmiowb_state *ms = __mmiowb_state();
> +
> + if (unlikely(ms->mmiowb_pending)) {
> + ms->mmiowb_pending = 0;
> + mmiowb();
> + }
> +
> + ms->nesting_count--;
> +}

Humour me for a minute and tell me what this algorithm is doing, or
what was broken about the powerpc one, which is basically:

static inline void mmiowb_set_pending(void)
{
struct mmiowb_state *ms = __mmiowb_state();
ms->mmiowb_pending = 1;
}

static inline void mmiowb_spin_lock(void)
{
}

static inline void mmiowb_spin_unlock(void)
{
struct mmiowb_state *ms = __mmiowb_state();

if (unlikely(ms->mmiowb_pending)) {
ms->mmiowb_pending = 0;
mmiowb();
}
}

> diff --git a/include/asm-generic/mmiowb_types.h b/include/asm-generic/mmiowb_types.h
> new file mode 100644
> index 000000000000..8eb0095655e7
> --- /dev/null
> +++ b/include/asm-generic/mmiowb_types.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_GENERIC_MMIOWB_TYPES_H
> +#define __ASM_GENERIC_MMIOWB_TYPES_H
> +
> +#include <linux/types.h>
> +
> +struct mmiowb_state {
> + u16 nesting_count;
> + u16 mmiowb_pending;
> +};

Really need more than 255 nested spin locks? I had the idea that 16
bit operations were a bit more costly than 8 bit on some CPUs... may
not be true, but at least the smaller size packs a bit better on
powerpc.

Thanks,
Nick