Re: [PATCH 2/5] dmaengine: Add support for custom data mapping

From: Vinod Koul
Date: Thu Jan 19 2017 - 00:02:12 EST


On Mon, Jan 02, 2017 at 10:12:33AM -0600, Andy Gross wrote:
> On Mon, Jan 02, 2017 at 07:55:37PM +0530, Abhishek Sahu wrote:
> > >>>>>>> So a couple of thoughts on how to deal with this:
> > >>>>>>>
> > >>>>>>> 1) Define a virtual channel for the command descriptors vs a normal DMA
> > >>>>>>> transaction. This lets you use the same hardware channel, but lets you discern
> > >>>>>>> which descriptor format you need to use. The only issue I see with this is the
> > >>>>>>> required change in device tree binding to target the right type of channel (cmd
> > >>>>>>> vs normal).
> > >>>>>>
> > >>>>>>Or mark the descriptor is cmd and write accordingly...
> > >>>>>
> > >>>>>The only issue i see with that is knowing how much to pre-allocate during
> > >>>>>the
> > >>>>>prep call. The flag set API would be called on the allocated tx
> > >>>>>descriptor. So
> > >>>>>you'd have to know up front and be able to specify it.
> > >>>>>
> > >>>>>>
> > >>>>>>>
> > >>>>>>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis.
> > >>>>>>>
> > >>>>>>> 3) If you have a set of transactions described by an sgl that has disparate use
> > >>>>>>> of flags, you split the list and use a separate transaction. In other words, we
> > >>>>>>> need to enforce that the flag set API will be applied to all descriptors
> > >>>>>>> described by an sgl. This means that the whole transaction may be comprised of
> > >>>>>>> multiple async TX descriptors.
> > >>>>
> > >>>>Each async TX descriptor will generate the BAM interrupt so we are
> > >>>>deviating
> > >>>>from main purpose of DMA where ideally we should get the interrupt at
> > >>>>the
> > >>>>end
> > >>>>of transfer. This is the main reason for going for this patch.
> > >>>
> > >>>If the client queues 1 descriptor or 5 descriptors, it doesn't matter that
> > >>>much.
> > >>>The client knows when it is done by waiting for the descriptors to
> > >>>complete. It
> > >>>is less efficient than grouping them all, but it should still work.
> > >>>
> > >>Yes. client will wait for final descriptor completion. But these
> > >>interrupts
> > >>will be overhead for CPU. For 1-2 page it won't matter much I guess it
> > >>will
> > >>be
> > >>significant for complete chip read/write(during boot and FS i.e JFFS
> > >>operations).
> > >>>>
> > >>>>With the submitted patch, only 1 interrupt per channel is required for
> > >>>>complete NAND page and it solves the setting of BAM specific flags also.
> > >>>>Only issue with this patch is adding new API in DMA framework itself.
> > >>>>But
> > >>>>this API can be used by other DMA engines in future for which mapping
> > >>>>cannot
> > >>>>be done with available APIs and if this mapping is vendor specific.
> > >>>
> > >>>I guess the key point in all of this is that the DMA operation being done
> > >>>is not
> > >>>a normal data flow to/from the device. It's direct remote register access
> > >>>to
> > >>>the device using address information contained in the sgl. And you are
> > >>>collating the standard data access along with the special command access
> > >>>in the
> > >>>same API call.
> > >>Yes. Normally DMA engine (QCA ADM DMA engine also) allows to read/write
> > >>memory mapped
> > >>registers just like data. But BAM is different (Since it is not a global
> > >>DMA
> > >>Engine
> > >>and coupled with peripheral). Also, this different flag requirement is
> > >>not
> > >>just
> > >>for command descriptors but for data descriptors also.
> > >>
> > >>BAM data access and command access differs only with flag and register
> > >>read/write
> > >>list. The register read and write list will be simply array of
> > >>struct bam_cmd_element added in patch
> > >>struct bam_cmd_element {
> > >> __le32 addr:24;
> > >> __le32 command:8;
> > >> __le32 data;
> > >> __le32 mask;
> > >> __le32 reserved;
> > >>};
> > >>
> > >>The address and size of the array will be passed in data and size field
> > >>of
> > >>SGL.
> > >>If we want to form the SGL for mentioned list then we will have SGL of
> > >>size
> > >>15
> > >>with just one descriptor.
> > >>
> > >>Now we require different flag for each SG entry. currently SG does not
> > >>have
> > >>flag parameter and we can't add flag parameter just for our requirement
> > >>in
> > >>generic SG. So we have added the custom mapping function and passed
> > >>modified
> > >>SG
> > >>as parameter which is generic SG and flag.
> > >
> > >I really think that we need some additional API that allows for the flag
> > >munging
> > >for the descriptors instead of overriding the prep_slave_sg. We already
> > >needed
> > >to change the way the flags are passed anyway. And instead of building up
> > >a
> > >special sg list, the API should take a structure that has a 1:1 mapping of
> > >the
> > >flags to the descriptors. And you would call this API on your descriptor
> > >before
> > >issuing it.

Munging wont be a good idea, but for some of the cases current flags can be
used, and if need be, we can add additional flags

> > >
> > >So build up the sglist. Call the prep_slave_sg. You get back a tx
> > >descriptor
> > >that underneath is a bam descriptor. Then call the API giving the
> > >descriptor
> > >and the structure that defines the flags for the descriptors. Then submit
> > >the
> > >descriptor.
> > >
> > >Something like:
> > >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx,
> > > u16 *flags)
> > >{
> > > struct bam_async_desc async_desc = container_of(tx,
> > > struct bam_async_desc,
> > > vd.tx);
> > > int i;
> > >
> > > for (i = 0; i < async_desc->num_desc; i++)
> > > async_desc->desc[i].flags = cpu_to_le16(flags[i]);
> > >}
> > >
> > >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags)

This makes it bam specific and causes issues if we want to use this code in
generic libs, but yes this is an option but should be last resort.

--
~Vinod