Re: [PATCH v2] iopoll: use udelay() for initial polling

From: Peter Collingbourne

Date: Fri May 29 2026 - 02:03:35 EST

On Fri, May 22, 2026 at 7:51 AM Mark Brown <broonie@xxxxxxxxxx> wrote:
>
> On Thu, May 21, 2026 at 10:03:28AM +0300, Jani Nikula wrote:
> > On Wed, 20 May 2026, Peter Collingbourne <peter@xxxxxxxxx> wrote:
>
> > > That's what I had in v1; we decided this approach would better handle
> > > misbehaving devices.
> > > https://lore.kernel.org/all/20260517150253.031dec09@pumpkin/
>
> > I think the problem with trying to adapt to everything within
> > read_poll_timeout() is that every step like this adapts to a *specific*
> > use case, and once it gets specific enough, it's no longer usable to
> > other scenarios.
>
> > Having to reimplement the whole thing in drivers is much worse than
> > having to do two calls.
>
> > Could a staggered approach work?
>
> > ret = read_poll_timeout_atomic("short delay/timeout")
> > if (ret)
> > ret = read_poll_timeout("longer delay/timeout")
>
> > Then you have better control of the behaviour in the driver, instead of
> > adapting a generic function to a specific use case.
>
> If we're doing that it still seems like it'd be better to have a helper
> than just takes the two timeouts and does the right thing with them. My
> original feedback here was that we should have helpers which encourage
> good patterns, and TBH I expect there's probably a bunch of scenarios
> where some fixed cutoff would be a worthwile improvement for very little
> effort on the part of users.

Thinking about it some more:

Do we really want to udelay() for the initial busy wait? Assuming that
we've decided to spend some time burning cycles, I think we may as
well use those cycles to check the condition so that we have a chance
of an early exit.

As an experiment, I open coded it in the SPI NAND driver:

diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
index a09371a075d2..cce34ada7611 100644
--- a/drivers/spi/spi-mem.c
+++ b/drivers/spi/spi-mem.c
@@ -1005,6 +1005,20 @@ int spi_mem_poll_status(struct spi_mem *mem,
usleep_range((initial_delay_us >> 2) + 1,
initial_delay_us);

+ {
+ ktime_t __start_time = ktime_get();
+ u64 __delay_timeout_us = 10;
+ ktime_t __delay_timeout =
+ ktime_add_us(__start_time, __delay_timeout_us);
+ while (ktime_compare(ktime_get(), __delay_timeout) <
+ 0) {
+ ret = spi_mem_read_status(mem, op, &status);
+ if (ret < 0)
+ return ret;
+ if ((status & mask) == match)
+ return 0;
+ }
+ }
ret = read_poll_timeout(spi_mem_read_status, read_status_ret,
(read_status_ret || ((status)
& mask) == match),
polling_delay_us, timeout_ms *
1000, false, mem,

And it did in fact improve performance on my target compared to status
quo and even v2.

With hrtimers enabled, timing the UBI scan, 3 runs of each:
- With no changes: 0.784s, 0.779s, 0.777s
- With my v2: 0.687s, 0.686s, 0.687s
- With the above patch: 0.665s, 0.665s, 0.665s

So 3 options:
1. Take my v2 and change read_poll_timeout() to do the check during
the busy wait.
2. Adjust read_poll_timeout_atomic() to busy wait using the check
instead of udelay(), then change the SPI NAND driver to do the 2
calls.
3. Introduce a new macro that does the 2 calls, and have the SPI NAND
driver use it.

I don't have a strong opinion, but my preference would be either 1 or
3. I agree with Mark that it would make the preferred usage code
easier to write (and to read), as there would be no need to duplicate
the argument list or compute the timeouts correctly in every caller.

Peter