Re: [RFC] CONFIG_NET_DMA can hang the system if DMA engine driveruses tasklets

From: Dan Williams
Date: Thu Oct 07 2010 - 19:33:32 EST


On 10/7/2010 4:14 PM, Ilya Yanok wrote:
[..]
We can see that the network stack calls dma_memcpy_to_iovec() function
from the softirq context and it never returns in case of DMA driver runs
out of descriptors and thus blocks the tasklet from being executed. We
have a deadlock.

Dan, I'd like to ask your opinion, do you think this is a problem of
CONFIG_NET_DMA feature implementation or should the DMA engine drivers
be aware of it? How should we fix it?

I can imagine the following possible solutions:
1. Add a possibility to return a failure to the dma_memcpy_to_iovec()
function (and reschedule it from the upper level) to give tasklets a
chance to be executed.
2. Place a restriction on the DMA drivers that descriptors should be
freed from the hard-irq context, not soft-irq and fix the existing drivers.
3. Try to free the descriptors not only from tasklet but also from the
place they get requested.

This is what ioatdma and iop-adma do i.e. process descriptor reclaim from the allocation failure path. For example in ioat2_check_space_lock():

/* progress reclaim in the allocation failure case we may be
* called under bh_disabled so we need to trigger the timer
* event directly
*/
if (jiffies > chan->timer.expires && timer_pending(&chan->timer)) {
struct ioatdma_device *device = chan->device;

mod_timer(&chan->timer, jiffies + COMPLETION_TIMEOUT);
device->timer_fn((unsigned long) &chan->common);
}

The assumption is that a free descriptor is always a short time delay.

Maybe somebody has a better solution.

Not really, but extending dmatest with a test for this expectation would help make this more clear but it would need a config option that injects descriptor allocation failures.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/