Re: [PATCH v2 1/2] iopoll: Call cpu_relax() in busy loops

From: Tony Lindgren
Date: Thu May 11 2023 - 02:49:03 EST

* Geert Uytterhoeven <geert+renesas@xxxxxxxxx> [230510 13:23]:
> It is considered good practice to call cpu_relax() in busy loops, see
> Documentation/process/volatile-considered-harmful.rst. This can not
> only lower CPU power consumption or yield to a hyperthreaded twin
> processor, but also allows an architecture to mitigate hardware issues
> (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the
> architecture-specific cpu_relax() implementation.
> In addition, cpu_relax() is also a compiler barrier. It is not
> immediately obvious that the @op argument "function" will result in an
> actual function call (e.g. in case of inlining).
> Where a function call is a C sequence point, this is lost on inlining.
> Therefore, with agressive enough optimization it might be possible for
> the compiler to hoist the:
> (val) = op(args);
> "load" out of the loop because it doesn't see the value changing. The
> addition of cpu_relax() would inhibit this.
> As the iopoll helpers lack calls to cpu_relax(), people are sometimes
> reluctant to use them, and may fall back to open-coded polling loops
> (including cpu_relax() calls) instead.
> Fix this by adding calls to cpu_relax() to the iopoll helpers:
> - For the non-atomic case, it is sufficient to call cpu_relax() in
> case of a zero sleep-between-reads value, as a call to
> usleep_range() is a safe barrier otherwise. However, it doesn't
> hurt to add the call regardless, for simplicity, and for similarity
> with the atomic case below.
> - For the atomic case, cpu_relax() must be called regardless of the
> sleep-between-reads value, as there is no guarantee all
> architecture-specific implementations of udelay() handle this.

Reviewed-by: Tony Lindgren <tony@xxxxxxxxxxx>