Re: [PATCH v7 1/3] dmaengine: Add support for APM X-Gene SoC DMA engine driver

From: Rameshwar Sahu
Date: Tue Mar 17 2015 - 05:33:28 EST

Hi Vinod,

On Mon, Mar 16, 2015 at 11:01 PM, Rameshwar Sahu <rsahu@xxxxxxx> wrote:
> Hi Vinod,
> On Mon, Mar 16, 2015 at 9:56 PM, Vinod Koul <vinod.koul@xxxxxxxxx> wrote:
>> On Mon, Mar 16, 2015 at 05:24:34PM +0530, Rameshwar Sahu wrote:
>>> >> >> +static void xgene_dma_free_desc_list_reverse(struct xgene_dma_chan *chan,
>>> >> >> + struct list_head *list)
>>> >> > do we really care about free order?
>>> >>
>>> >> Yes it start dellocation of descriptor by tail.
>>> > and why by tail is not clear.
>>> We can free allocated descriptor in forward order from head or in
>>> reverse order, I just followed here fsldma.c driver.
>>> Does this make sense ??
>> No, you have two APIs to free list. Why do you need two?
> Yes, basically we have tow API to free list.
> xgene_dma_free_desc_list_reverse will call if any failure in
> allocation of memory from DMA pool in prep routines.
> Like e.g. in prep routing we have some descriptors allocated and still
> need to get descriptor to complete the DMA request and failure happen,
> so we need to free all allocated descriptor.
>>> >
>>> >> > where are you mapping dma buffers?
>>> >>
>>> >> I didn't get you here. Can you please explain me here what you mean.
>>> >> As per my understanding client should map the dma buffer and give the
>>> >> physical address and size to this callback prep routines.
>>> > not for memcpy, that is true for slave transfers
>>> >
>>> > For mempcy the idea is that drivers will do buffer mapping
>>> Still I am clear here, why memcpy will do buffer mapping, I see other
>>> drivers and also async_memcpy.c , they only map it and pass mapped
>>> physical dma address to driver.
>>> Buffer mapping mean you here is dma_map_xxx ?? Am I correct.
>> Yes
> I have confusion here, I don't see any driver dma buffer mapping in
> prep_dma_memcpy.
> Can you please clear me here if driver does this on behalf of client,
> like any example so that I can proceed further.

Any comment here ??

>>> >
>>> >> > why are you calling this here, status check shouldnt do this...
>>> >>
>>> >> Okay, I will remove it.
>>> >>
>>> >>
>>> >> >> + spin_unlock_bh(&chan->lock);
>>> >> >> + return DMA_IN_PROGRESS;
>>> >> > residue here is size of transacation.
>>> >>
>>> >> We can't calculate here residue size. We don't have any controller
>>> >> register which will tell about remaining transaction size.
>>> > Okay if you cant calculate residue why do we have this fn?
>>> So basically case here for me is completion of dma descriptor
>>> submitted to hw is not same as order of submission to hw.
>>> So scenario coming in multithread running :e.g. let's assume we have
>>> submitted two descriptors first has cookie 1001 and second has 1002,
>>> now 1002 is completed first, so updated last_completed_cookie as 1002
>>> but not yer checked for dma_tx_status, and then first cookie completes
>>> and update last_completed_cookie as 1001, now second transaction check
>>> for tx_status and it get DMA_IN_PROGRESS, because
>>> last_completed_cookie(1001) is less than second transaction's
>>> cookie(1002).
>>> Due to this issue I am traversing that transaction in pending list and
>>> running list, if not there means we are done.
>>> Does this make sense??
>> That only convinces me that there is something not so correct.
>> To help me understand pls let me know if below is fine:
>> - for a physical channel, do you submit multiple transactions?
> Yes
>> - if yes, how does DMA deal with multiple transactions, how does it schedule
>> them?
> So , basically we submit multiple descriptor to dma physical channel,
> and dma engine execute it one by one and give us completion callback.
> So in this way we expect callback on same order as submission order
> and it does also, no issue.
> But problem is with supporting p+q offload, here we have P
> functionality supports in dma physical channel 0 and Q functionality
> supports in dma physical channel 1. So for pq we need to submit two
> descriptor, one to channel 0 and second to channel1, in this case we
> can't expect the completion order, because channnel 0 can finish P
> before Q or vice versa, and we need to wait to complete both before
> calling client callback() and completing cookie.
> Second thing we submit memcpy and sg on same channel, and can complete
> before even though if it submitted after PQ.

So our SoC dma engine hw design idea was to get more throughput while
running two channel concurrent and calculating the P and Q together,
but somehow now today we came to scenario where running P and Q on
different channel causing hang to dmaengine, some hw bug, So now I am
going to support P and Q generation in same channel, so above
mentioned cookie status scenario will never come.
I will send you the patch for review.

>> --
>> ~Vinod
>> --
>> To unsubscribe from this list: send the line "unsubscribe dmaengine" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at