Re: [PATCH v2 1/1] dmaengine: switchtec-dma: Introduce Switchtec DMA engine PCI driver

From: Logan Gunthorpe
Date: Tue Apr 11 2023 - 11:47:23 EST




On 2023-04-10 10:42, Christoph Hellwig wrote:
> On Mon, Apr 03, 2023 at 11:06:28AM -0700, Kelvin Cao wrote:
>> +#define HALT_RETRY 100
>> +static int halt_channel(struct switchtec_dma_chan *swdma_chan)
>> +{
>> + u32 status;
>> + struct chan_hw_regs __iomem *chan_hw = swdma_chan->mmio_chan_hw;
>> + int retry = HALT_RETRY;
>> + struct pci_dev *pdev;
>> + int ret;
>> +
>> + rcu_read_lock();
>> + pdev = rcu_dereference(swdma_chan->swdma_dev->pdev);
>> + if (!pdev) {
>> + ret = -ENODEV;
>> + goto unlock_and_exit;
>> + }
>
> This whole RCU critical section around every access to ->pdev scheme
> looks a bit bothersome to me. This means that all the low-level
> PCI ops are done in RCU critical section. Is this something you
> came up with or is it copied from other drivers?

I suspect they copied it from plx_dma driver that I wrote ;(, though
that driver uses rcu_dereference a bit more sparingly (only on stop,
issue_pending and when allocating and freeing a channel).

> Normally we'd do an unregistration from the dmaengine subsystem
> first, which might do a RCU synchronization at a high level,
> and then we're sure that none of the methods gets called again
> on the unregistered device.
>
> Can't this driver (and the dmaengine core) support an operation
> mode where you set a shutdown flag at the beginning

The dmaengine code didn't support hot unplug at all. I believe most
drivers are likely to crash if this happens. When I wrote the plx-dma
engine, I had to make a bunch of changes to dmaengine just so the
framework didn't crash when I tested this. The framework is pretty thin,
so there's not much to synchronize on to indicate other threads are not
in the middle of issuing new IO when a flag is set.

>> + tasklet_schedule(&swdma_dev->chan_status_task);
>
> What speaks against simply using threaded irqs here instead of the
> tasklets?

Almost all the dmaengine drivers use tasklets. I don't know if this is
the best approach, but my understanding was that it was due to needing
low latency in processing the completed descriptors, otherwise it can be
hard to reach the full bandwidth of the dma engine.

Logan