Re: [PATCH/RFC] I/O-check interface for driver's error handling

From: Linas Vepstas
Date: Wed Mar 02 2005 - 14:52:18 EST


On Wed, Mar 02, 2005 at 10:41:06AM -0800, Jesse Barnes was heard to remark:
> On Wednesday, March 2, 2005 10:22 am, Linas Vepstas wrote:
> > On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds was heard to
> remark:
> > > The new API is what _allows_ a driver to care. It doesn't handle DMA, but
> > > I think that's because nobody knows how to handle it (ie it's probably
> > > hw-dependent and all existign implementations would thus be
> > > driver-specific anyway).
> >
> > ?
> > We could add a call
> >
> > int pci_was_there_an_error_during_dma (struct pci_dev);
> >
> > right? And it could return true/false, right? I can certainly
> > do that today with ppc64. I just can't tell you which dma triggered
> > the problem.
>
> One issue with that is how to notify drivers that they need to make this call.
> In may cases, DMA completion will be signalled by an interrupt, but if the
> DMA failed, that interrupt may never happen, which means the call to
> pci_unmap or the above function from the interrupt handler may never occur.

Hmm. Well, I notice that e100/e1000 has a heartbeat built into it, so
that if the card doesn't respond, it resets the card. So this would
be a natural place for them. And I suspect that *all* scsi drivers
have timeouts; they initiate a cascade of reset sequences if they
don't get data back after X seconds.

I see nothing wrong with adding a requirement, something that says
"If a device driver wants to be pci-error aware, then yea-verily it
must use a dma-timeout timer (say 15 seconds) and check for
pci errors in the dma-timeout handler".

> Some I/O errors are by nature asynchronous and unpredictable, so I think we
> need a separate thread and callback interface to deal with errors where the
> beginning of the transaction and the end of the transaction occur in
> different contexts, in addition to the PIO transaction interface already
> proposed.

I don't know what the pci-express implementation is like, but the
ppc64 implementation is *not* transactional, so I don't have that issue.

If some pci chipsets only report dma errors on a transactional basis,
then we need to modify pci_map_sg() and pci_map_single() and etc. to
take a cookie as well. It would be up to the device driver to alloc
and retire cookies as the dma's complete (and make sure it can find
its cookies in any context). Is this or something like this that
is needed?

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/