Re: [PATCH v6 1/1] dmaengine: switchtec-dma: Introduce Switchtec DMA engine PCI driver

From: Vinod Koul
Date: Mon Oct 09 2023 - 01:42:12 EST


On 06-10-23, 22:34, Kelvin.Cao@xxxxxxxxxxxxx wrote:
> On Fri, 2023-10-06 at 16:00 +0530, Vinod Koul wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > know the content is safe
> >
> > On 05-10-23, 18:35, Kelvin.Cao@xxxxxxxxxxxxx wrote:
> >
> > > > > > +static struct dma_async_tx_descriptor *
> > > > > > +switchtec_dma_prep_wimm_data(struct dma_chan *c, dma_addr_t
> > > > > > dma_dst, u64 data,
> > > > > > +                          unsigned long flags)
> > > > >
> > > > > can you please explain what this wimm data refers to...
> > > > >
> > > > > I think adding imm callback was a mistake, we need a better
> > > > > justification for another user for this, who programs this,
> > > > > what
> > > > > gets
> > > > > programmed here
> > > >
> > > > Sure. I think it's an alternative method to prep_mem and would be
> > > > more
> > > > convenient to use when the write is 8-byte and the data to be
> > > > moved
> > > > is
> > > > not in a DMA mapped memory location. For example, we write to a
> > > > doorbell register with the value from a local variable which is
> > > > not
> > > > associated with a DMA address to notify the receiver to consume
> > > > the
> > > > data, after confirming that the previously initiated DMA
> > > > transactions
> > > > of the data have completed. I agree that the use scenario would
> > > > be
> > > > very
> > > > limited.
> >
> > Can you please explain more about this 'value' where is it derived
> > from?
> > Who programs it and how...
>
> Sure. Think of a producer/consumer use case where the producer is a
> host DMA client driver and the consumer is a PCIe EP. On the EP, there

What are the examples of DMA clients here?

> is a memory-mapped data buffer for data receiving and a memory-mapped
> doorbell buffer for triggering data consuming. Each time for a bulk
> data transfer, the DMA client driver first DMA the data of size X to
> the memory-mapped data buffer, then it DMA the value X to the doorbell
> buffer to trigger data consumption on device. On receiving a doorbell
> writing, the device starts to consume the data of size X from the data
> buffer.
>
> For the first DMA operation, the DMA client would use dma_prep_memory()
> like what most DMA clients do. However, for the second transfer, value
> X is held in a local variable like below.
>
> u64 size_to_transfer;

Why cant the client driver write to doorbell, is there anything which
prevents us from doing so?

>
> In this case, the client would use dma_prep_wimm_data() to DMA value X
> to the doorbell buffer, like below.
>
> dma_prep_wimm_data(chan, dst_for_db_buffer, size_to_transfer, flag);
>
> Hope this example explains the thing. People would argue that the
> client could use the same dma_prep_memory() for the doorbell ringing. I
> would agree, this API just adds an alternative way to do so when the
> data is as little as 64 bit and it also saves a call site to
> dma_alloc_coherent() to allocate a source DMA buffer just for holding
> the 8-byte value, which is required by dma_prep_memcpy().

I would prefer that, (after the option of mmio write from client), there
is nothing special about this, another txn. You can queue up both and
have dmaengine write to doorbell immediately after the buffer completes

>
>
> > > > > > +     /* set sq/cq */
> > > > > > +     writel(lower_32_bits(swdma_chan->dma_addr_sq),
> > > > > > &chan_fw-
> > > > > > > sq_base_lo);
> > > > > > +     writel(upper_32_bits(swdma_chan->dma_addr_sq),
> > > > > > &chan_fw-
> > > > > > > sq_base_hi);
> > > > > > +     writel(lower_32_bits(swdma_chan->dma_addr_cq),
> > > > > > &chan_fw-
> > > > > > > cq_base_lo);
> > > > > > +     writel(upper_32_bits(swdma_chan->dma_addr_cq),
> > > > > > &chan_fw-
> > > > > > > cq_base_hi);
> > > > > > +
> > > > > > +     writew(SWITCHTEC_DMA_SQ_SIZE, &swdma_chan-
> > > > > > >mmio_chan_fw-
> > > > > > > sq_size);
> > > > > > +     writew(SWITCHTEC_DMA_CQ_SIZE, &swdma_chan-
> > > > > > >mmio_chan_fw-
> > > > > > > cq_size);
> > > > >
> > > > > what is write happening in the descriptor alloc callback, that
> > > > > does
> > > > > not
> > > > > sound correct to me
> > > >
> > > > All the queue descriptors of a channel are pre-allocated, so I
> > > > think
> > > > it's proper to convey the queue address/size to hardware at this
> > > > point.
> > > > After this initialization, we only need to assign cookie in
> > > > submit
> > > > and
> > > > update queue head to hardware in issue_pending.
> >
> > Sorry that is not right, you can prepare multiple descriptors and
> > then
> > submit. Only at submit is the cookie assigned which is in order, so
> > this
> > should be moved to when we start the txn and not in this call
> >
> The hardware assumes the SQ/CQ is a contiguous circular buffer of fix
> sized element. And the above code passes the address and size of SQ/CQ
> to the hardware. At this point hardware will do nothing but take note
> of the SQ/CQ location/size. 
>
> When do dma_issue_pending(), the actual SQ head write will occur to
> update hardware with the current SQ head, as below:
>
> writew(swdma_chan->head, &swdma_chan->mmio_chan_hw->sq_tail);

But if the order of txn submission is changed you will issue dma txn in
wrong order, so it should be written only when desc is submitted not in
the prep invocation!

--
~Vinod