Re: [PATCH 3/5] dmaengine: plx-dma: Introduce PLX DMA engine PCI driver skeleton

From: Logan Gunthorpe
Date: Mon Nov 11 2019 - 12:50:23 EST




On 2019-11-09 10:35 a.m., Vinod Koul wrote:
> On 22-10-19, 15:46, Logan Gunthorpe wrote:
>> +static irqreturn_t plx_dma_isr(int irq, void *devid)
>> +{
>> + return IRQ_HANDLED;
>
> ??

Yes, sorry this is more of an artifact of how I chose to split the
patches up. The ISR is filled-in in patch 4.


>> + */
>> + schedule_work(&plxdev->release_work);
>> +}
>> +
>> +static void plx_dma_put(struct plx_dma_dev *plxdev)
>> +{
>> + kref_put(&plxdev->ref, plx_dma_release);
>> +}
>> +
>> +static int plx_dma_alloc_chan_resources(struct dma_chan *chan)
>> +{
>> + struct plx_dma_dev *plxdev = chan_to_plx_dma_dev(chan);
>> +
>> + kref_get(&plxdev->ref);
>
> why do you need to do this?

This has to do with being able to probably unbind while a channel is in
use. If we don't hold a reference to the struct plx_dma_dev between
alloc_chan_resources() and free_chan_resources() then it will panic if a
call back is called after plx_dma_remove(). The way I've done it, once a
device is removed, subsequent calls to dma_prep_memcpy() will fail (see
ring_active).

struct plx_dma_dev needs to be alive between plx_dma_probe() and
plx_dma_remove(), and between calls to alloc_chan_resources() and
free_chan_resources(). So we use a reference count to ensure this.

>> +}
>> +
>> +static void plx_dma_release_work(struct work_struct *work)
>> +{
>> + struct plx_dma_dev *plxdev = container_of(work, struct plx_dma_dev,
>> + release_work);
>> +
>> + dma_async_device_unregister(&plxdev->dma_dev);
>> + put_device(plxdev->dma_dev.dev);
>> + kfree(plxdev);
>> +}
>> +
>> +static void plx_dma_release(struct kref *ref)
>> +{
>> + struct plx_dma_dev *plxdev = container_of(ref, struct plx_dma_dev, ref);
>> +
>> + /*
>> + * The dmaengine reference counting and locking is a bit of a
>> + * mess so we have to work around it a bit here. We might put
>> + * the reference while the dmaengine holds the dma_list_mutex
>> + * which means we can't call dma_async_device_unregister() directly
>> + * here and it must be delayed.
>
> why is that, i have not heard any complaints about locking, can you
> elaborate on why you need to do this?

Per the above explanation, we need to call plx_dma_put() in
plx_dma_free_chan_resources(); and plx_dma_release() is when we can call
dma_async_device_unregister() (seeing that's when we know there are no
longer any active channels).

However, dma_chan_put() (which calls device_free_chan_resources()) holds
the dma_list_mutex and dma_async_device_unregister() tries to take the
dma_list_mutex so, if we call unregister inside free_chan_resources we
would deadlock.

I suspect that most dmaengine drivers will WARN[1] and then panic if
they are unbound while in use. The locking, reference counting and
structure of the API didn't seem to consider it and required extra
reference counts and workarounds to make it work correctly at all.

This is the mess I'm referring to and would require some significant
restructuring to fix generally.

Logan

[1]
https://elixir.bootlin.com/linux/latest/source/drivers/dma/dmaengine.c#L1119