Re: Documentation/io_ordering.txt is wrong

From: Jesse Barnes
Date: Mon Sep 20 2004 - 19:41:04 EST


On Saturday, September 18, 2004 10:57 am, Matthew Wilcox wrote:
> On Sat, Sep 18, 2004 at 12:10:01AM -0600, Grant Grundler wrote:
> > Jesse Barnes wrote:
> > ...
> >
> > > Btw Andrew (Vasquez), there's a small doc I put together that should
> > > describe when you have to worry about PCI posting. It's in the tree:
> > > Documentation/io_ordering.txt. If it's incomplete or confusing, just
> > > let me know and I'll update it.
> >
> > Jesse,
> > Both. incomplete and confusing.
> > "concrete example of a hypothetical driver" wasn't my first warning
> > this document needed work. :^)
>
> Not just incomplete and confusing, but actively wrong. spin_lock/
> spin_unlock should imply ordering of readb(). What you're describing
> there is __readb(). See Documentation/DocBook/deviceiobook.tmpl. Also,
> rmb() should ensure ordering of io reads; there should be no need to
> touch the device.

Is this any better? I just sent it off to Grant too.

Thanks,
Jesse
Dealing with posted writes
--------------------------

On some platforms platforms, driver writers are responsible for
ensuring that I/O writes to memory-mapped addresses on their device
arrive in the order intended. This is typically done by reading a
'safe' device or bridge register, causing the I/O chipset to flush
pending writes to the device before any reads are posted. A driver
would usually use this technique immediately prior to the exit of a
critical section of code protected by spinlocks. This would ensure
that subsequent writes to I/O space arrived only after all prior
writes (much like a memory barrier op, mb(), only with respect to
I/O).

Some pseudocode to illustrate the problem:

...
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: ...
CPU B: writel(newval2, ring_ptr);
CPU B: spin_unlock_irqrestore(&dev_lock, flags)
...

In the case above, the device may receive newval2 before it receives newval,
which could cause problems. Fixing it is easy enough though:

...
CPU A: spin_lock_irqsave(&dev_lock, flags)
CPU A: ...
CPU A: writel(newval, ring_ptr);
CPU A: (void)readl(safe_register); /* maybe a config register? */
CPU A: spin_unlock_irqrestore(&dev_lock, flags)
...
CPU B: spin_lock_irqsave(&dev_lock, flags)
CPU B: ...
CPU B: writel(newval2, ring_ptr);
CPU B: (void)readl(safe_register); /* or read_relaxed() */
CPU B: spin_unlock_irqrestore(&dev_lock, flags)

Here, the reads from safe_register will cause the I/O chipset to flush any
pending writes before actually posting the read to the chipset, preventing
possible data corruption.

This sort of synchronization is only necessary for read/write calls,
not in/out calls, since they're by definition strongly ordered.

We should probably add a writeflush call or something to deal with the
above in an easier to read way. Some platforms could even implement
such a routine more efficiently than a regular read.