Re: [PATCH v2 4/4] dmaengine: dma-axi-dmac: Defer freeing DMA descriptors
From: Frank Li
Date: Wed Apr 01 2026 - 18:39:18 EST
On Wed, Apr 01, 2026 at 05:14:16PM +0100, Nuno Sá wrote:
> On Tue, Mar 31, 2026 at 04:21:06PM +0100, Nuno Sá wrote:
> > On Tue, Mar 31, 2026 at 10:16:09AM -0400, Frank Li wrote:
> > > On Tue, Mar 31, 2026 at 09:53:45AM +0100, Nuno Sá wrote:
> > > > On Mon, Mar 30, 2026 at 11:24:34AM -0400, Frank Li wrote:
> > > > > On Fri, Mar 27, 2026 at 04:58:41PM +0000, Nuno Sá wrote:
> > > > > > From: Eliza Balas <eliza.balas@xxxxxxxxxx>
> > > > > >
> > > > > > This IP core can be used in architectures (like Microblaze) where DMA
> > > > > > descriptors are allocated with vmalloc().
> > > > >
> > > > > strage, why use vmalloc()?
> > > >
> > > > It's just one of the paths in dma_alloc_coherent(). It should be
> > > > architecture dependant.
> > >
> > > Which architectures, this may common problem, dma_alloc/free_coherent() is
> > > quite common at other dma-engine driver.
> >
> > I'll double check this but I believe this was triggered on microblaze
> > where we also use this IP. Will come back with confirmation!
> >
>
> Hi Frank,
>
> I now went to the bottom of the issue! The problem is that for archs
> like microblaze and arm64 we have DMA_DIRECT_REMAP which means that when
> calling dma_alloc_coherent() in [1] we will get into the code path in
> [2]. Now I did some research and we might have other solution for this
> that does not involve this refcount craziness plus async work. But I
> need to test it. FYI, what I have in mind is similar to the what
> loongson2-apb-dma.c does. This means, using the dma pool API. IIUC, with
> the pool we only actually free the memory (dma_free_coherent()) in the
> .terminate_all() callback (when destroying the pool) which should not
> happen in interrupt context right?
I think so. If your dma engineer descriptor is link-list, suggest use dma
pool. If it is cicylic buffer, suggest pre-alloc enough descriptors when
apply channel.
Frank
>
> [1]: https://elixir.bootlin.com/linux/v7.0-rc6/source/drivers/dma/dma-axi-dmac.c#L549
> [2]: https://elixir.bootlin.com/linux/v7.0-rc6/source/kernel/dma/direct.c#L278
>
> - Nuno Sá
>
> > - Nuno Sá
> > >
> > > Frank
> > >
> > > >
> > > > - Nuno Sá
> > > >
> > > > >
> > > > > Frank
> > > > >
> > > > > > Hence, given that freeing the
> > > > > > descriptors happen in softirq context, vunmpap() will BUG().
> > > > > >
> > > > > > To solve the above, we setup a work item during allocation of the
> > > > > > descriptors and schedule in softirq context. Hence, the actual freeing
> > > > > > happens in threaded context.
> > > > > >
> > > > > > Also note that to account for the possible race where the struct axi_dmac
> > > > > > object is gone between scheduling the work and actually running it, we
> > > > > > now save and get a reference of struct device when allocating the
> > > > > > descriptor (given that's all we need in axi_dmac_free_desc()) and
> > > > > > release it in axi_dmac_free_desc().
> > > > > >
> > > > > > Signed-off-by: Eliza Balas <eliza.balas@xxxxxxxxxx>
> > > > > > Co-developed-by: Nuno Sá <nuno.sa@xxxxxxxxxx>
> > > > > > Signed-off-by: Nuno Sá <nuno.sa@xxxxxxxxxx>
> > > > > > ---
> > > > > > drivers/dma/dma-axi-dmac.c | 50 ++++++++++++++++++++++++++++++++++------------
> > > > > > 1 file changed, 37 insertions(+), 13 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/dma/dma-axi-dmac.c b/drivers/dma/dma-axi-dmac.c
> > > > > > index 70d3ad7e7d37..46f1ead0c7d7 100644
> > > > > > --- a/drivers/dma/dma-axi-dmac.c
> > > > > > +++ b/drivers/dma/dma-axi-dmac.c
> > > > > > @@ -25,6 +25,7 @@
> > > > > > #include <linux/regmap.h>
> > > > > > #include <linux/slab.h>
> > > > > > #include <linux/spinlock.h>
> > > > > > +#include <linux/workqueue.h>
> > > > > >
> > > > > > #include <dt-bindings/dma/axi-dmac.h>
> > > > > >
> > > > > > @@ -133,6 +134,9 @@ struct axi_dmac_sg {
> > > > > > struct axi_dmac_desc {
> > > > > > struct virt_dma_desc vdesc;
> > > > > > struct axi_dmac_chan *chan;
> > > > > > + struct device *dev;
> > > > > > +
> > > > > > + struct work_struct sched_work;
> > > > > >
> > > > > > bool cyclic;
> > > > > > bool cyclic_eot;
> > > > > > @@ -666,6 +670,25 @@ static void axi_dmac_issue_pending(struct dma_chan *c)
> > > > > > spin_unlock_irqrestore(&chan->vchan.lock, flags);
> > > > > > }
> > > > > >
> > > > > > +static void axi_dmac_free_desc(struct axi_dmac_desc *desc)
> > > > > > +{
> > > > > > + struct axi_dmac_hw_desc *hw = desc->sg[0].hw;
> > > > > > + dma_addr_t hw_phys = desc->sg[0].hw_phys;
> > > > > > +
> > > > > > + dma_free_coherent(desc->dev, PAGE_ALIGN(desc->num_sgs * sizeof(*hw)),
> > > > > > + hw, hw_phys);
> > > > > > + put_device(desc->dev);
> > > > > > + kfree(desc);
> > > > > > +}
> > > > > > +
> > > > > > +static void axi_dmac_free_desc_schedule_work(struct work_struct *work)
> > > > > > +{
> > > > > > + struct axi_dmac_desc *desc = container_of(work, struct axi_dmac_desc,
> > > > > > + sched_work);
> > > > > > +
> > > > > > + axi_dmac_free_desc(desc);
> > > > > > +}
> > > > > > +
> > > > > > static struct axi_dmac_desc *
> > > > > > axi_dmac_alloc_desc(struct axi_dmac_chan *chan, unsigned int num_sgs)
> > > > > > {
> > > > > > @@ -681,6 +704,7 @@ axi_dmac_alloc_desc(struct axi_dmac_chan *chan, unsigned int num_sgs)
> > > > > > return NULL;
> > > > > > desc->num_sgs = num_sgs;
> > > > > > desc->chan = chan;
> > > > > > + desc->dev = get_device(dmac->dma_dev.dev);
> > > > > >
> > > > > > hws = dma_alloc_coherent(dev, PAGE_ALIGN(num_sgs * sizeof(*hws)),
> > > > > > &hw_phys, GFP_ATOMIC);
> > > > > > @@ -703,21 +727,18 @@ axi_dmac_alloc_desc(struct axi_dmac_chan *chan, unsigned int num_sgs)
> > > > > > /* The last hardware descriptor will trigger an interrupt */
> > > > > > desc->sg[num_sgs - 1].hw->flags = AXI_DMAC_HW_FLAG_LAST | AXI_DMAC_HW_FLAG_IRQ;
> > > > > >
> > > > > > + /*
> > > > > > + * We need to setup a work item because this IP can be used on archs
> > > > > > + * that rely on vmalloced memory for descriptors. And given that freeing
> > > > > > + * the descriptors happens in softirq context, vunmpap() will BUG().
> > > > > > + * Hence, setup the worker so that we can queue it and free the
> > > > > > + * descriptor in threaded context.
> > > > > > + */
> > > > > > + INIT_WORK(&desc->sched_work, axi_dmac_free_desc_schedule_work);
> > > > > > +
> > > > > > return desc;
> > > > > > }
> > > > > >
> > > > > > -static void axi_dmac_free_desc(struct axi_dmac_desc *desc)
> > > > > > -{
> > > > > > - struct axi_dmac *dmac = chan_to_axi_dmac(desc->chan);
> > > > > > - struct device *dev = dmac->dma_dev.dev;
> > > > > > - struct axi_dmac_hw_desc *hw = desc->sg[0].hw;
> > > > > > - dma_addr_t hw_phys = desc->sg[0].hw_phys;
> > > > > > -
> > > > > > - dma_free_coherent(dev, PAGE_ALIGN(desc->num_sgs * sizeof(*hw)),
> > > > > > - hw, hw_phys);
> > > > > > - kfree(desc);
> > > > > > -}
> > > > > > -
> > > > > > static struct axi_dmac_sg *axi_dmac_fill_linear_sg(struct axi_dmac_chan *chan,
> > > > > > enum dma_transfer_direction direction, dma_addr_t addr,
> > > > > > unsigned int num_periods, unsigned int period_len,
> > > > > > @@ -958,7 +979,10 @@ static void axi_dmac_free_chan_resources(struct dma_chan *c)
> > > > > >
> > > > > > static void axi_dmac_desc_free(struct virt_dma_desc *vdesc)
> > > > > > {
> > > > > > - axi_dmac_free_desc(to_axi_dmac_desc(vdesc));
> > > > > > + struct axi_dmac_desc *desc = to_axi_dmac_desc(vdesc);
> > > > > > +
> > > > > > + /* See the comment in axi_dmac_alloc_desc() for the why! */
> > > > > > + schedule_work(&desc->sched_work);
> > > > > > }
> > > > > >
> > > > > > static bool axi_dmac_regmap_rdwr(struct device *dev, unsigned int reg)
> > > > > >
> > > > > > --
> > > > > > 2.53.0
> > > > > >