Re: [PATCH v2] iopoll: use udelay() for initial polling

From: David Laight

Date: Sat May 30 2026 - 06:20:38 EST


On Fri, 29 May 2026 22:56:23 +0100
Mark Brown <broonie@xxxxxxxxxx> wrote:

> On Fri, May 29, 2026 at 12:20:16PM +0100, David Laight wrote:
>
> > I think I remember someone saying that the spi hardware interface normally
> > generates an interrupt when the request completes?
> > So for spi this is only fall-back code for a few systems.
>
> For something like spi-mem I'd expect to have interrupt support, we go
> all the way down to fully bitbanged though. You also often have
> copybreak cutoffs for smaller transfers (and quicker operations) where
> it's more efficient to poll for completion than use an interrupt, or
> PIO rather than DMA.

I do wonder about copybreak cutoffs especially for systems with iommu
(which has started including x86 systems).
Many years ago a colleague got a figure of ~1200 bytes for a sparc sbus card.
I suspect that the iommu setup code has got more complex (and slower)
since then and the memory copies faster (especially if you can do overlong
aligned copies that include the required data).

For spi-mem writes (max size is probably 256 bytes) that take 100s of us
it is best the sleep (for timer or isr).

spi-mem reads are another matter since the only delays are those needed
to bit-bang the physical device - which might be at over 100MHz and 4
(or even 8) data bits at a time.
Possibly worth using DMA for the data (to avoid leisurely PCIe reads)
but polling (ideally host memory) for completion to avoid the cost
of the ISR as well as avoiding context switches.

For the much slower smbus and i2c you pretty much always want to
offload the 'bit bang' to hardware and wait for an interrupts.
(I've seen ethernet drivers 'bugger' the system by repeatedly reading
the phy status - that could be done 1 bit every timer interrupt.)

>
> > The code has this comment:
> > /* Wait for the write - typically 0.6ms (max 5ms).
> > * In spite of the datasheet values, I'm seeing 200us writes. */
> > It waits 200us and then polls every 50us for 2 seconds.

FWIW I wrote the comment, the code below it, and the logic on the fpga
that converts the PCIe slave cycles into signals to the memory chip.

What I/we never resolved was why some chips/boards failed to act on
the 'read status' command issued after the first delay.

> You can also get fun with things like contention on shared buses.

Indeed - and in places you don't realise.
In some cases repeated reads of a slow device can restrict bus throughput
enough to make DMA requests underrun (eg trying to use an LCD panel on
a SA1100/SA1101 strongarm system).

-- David