amd5536udc interrupts bug

From: Vadim Lobanov
Date: Wed Jan 07 2009 - 18:18:20 EST


Hello,

>From perusing the code and playing with the module, it seems to me that
the amd5536udc driver's handling of interrupts is currently "bustificated".

The long story:

During the amd5536udc initialization sequence within udc_pci_probe(),
the code attempts to request a shared irq for the device thusly:
request_irq(pdev->irq, udc_irq, IRQF_SHARED, name, dev)
where 'dev' is the internal state structure. By the time this call is made,
the 'dev' structure is still not fully initialized and contains blank/zero
data for many of the fields, in particular both 'dev->lock' and 'dev->regs'
which are both clearly used within the udc_irq() handler. Those get
initialized a bit later, namely inside the udc_probe() call at the bottom of
udc_pci_probe().

It is my understanding that a handler for a shared interrupt can be
invoked at any time after the corresponding request_irq() call is made,
simply because some other device on the same interrupt may already
be active. This leaves us with a very noticeable race condition, where
udc_irq() can be invoked with uninitialized 'dev' data.

In particular, this effect is very noticeable when I try modprobing the
amd5536udc driver on a kernel that is built with CONFIG_DEBUG_SHIRQ
set. Given that the debug config option forces an invocation of the irq
handler from within the request_irq() function, the immediate effect is a
masterful kernel NULL-dereference OOPS within udc_irq().

The simple fix may be to say that amd5536udc does not support shared
irqs, and to remove the IRQF_SHARED flag from the request_irq() call. A
more complicated fix may be to try to shuffle all the code around to
make sure that 'dev' is fully initialized before we request the irq, but I
don't understand the code well enough (yet) to comfortably do this.

Comments? Thoughts?

On a side note, it occurs to me that the CONFIG_DEBUG_SHIRQ option
went into the kernel a bit less than two years ago, and that it exposes a
very immediate and reproducible OOPS in this driver. Does this mean
that noone uses the 5536 UDC functionality with any recent kernels?
Should I be worried? :)

-- Vadim Lobanov

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/