[no subject]

Next message: Yixun Lan: "[PATCH] riscv: dts: spacemit: k3: Enable SD card support"
Previous message: Andrew Jeffery: "Re: [PATCH v2 1/2] soc: aspeed: add BMC-side PCIe BMC device driver"
In reply to: Karim Manaouil: "Re: [PATCH 6/7] drivers/migrate_offload: add DMA batch copy driver (dcbm)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Shivank Garg

Date: Wed Jun 10 2026 - 08:34:50 EST

...
> I'm still testing, but the initial implementation I wrote with
> DMAEngine had too much overhead because of the sgtable allocations
> and the conversion between kernel scatterlists to device descriptors.
> So I entirely bypassed the DMAEngine API by directly passing the folios
> lists to the driver.
>
> I know it depends on the use case. If you just want to offload with no
> latency requirements, then DMAEngine is fine, but if the goal is to
> achieve high bandwidth with minimal latency, then it's a problem.
>
> Another example, if you have to do several independent copies of 256 or
> 512 4KiB pages in a short period of time, there will to much stress on
> sgtable allocations.
>
> Another problem for low latency is DMA mapping.
>
> Anyway, I need to collect more numbers. I will try to share my insights
> with idxd asap.

Thanks, looking forward to those insights and numbers.

An IDXD specific implementation is good for experimentation, but
for upstream path, I think this would be hard to maintain and add duplicate
logic. The cleanest approch is the DMA_MEMCPY_SG API. So, a single offload
driver can drive any engine that implements it. dmaengine_prep_dma_memcpy_sg()
submits a whole src/dst scatterlist as one transaction, which cuts the
per-descriptor setup overhead that dominates for 4KB pages.

I've added a patch for dmaengine_prep_dma_memcpy_sg(), Could you look into
wiring up device_prep_dma_memcpy_sg hook in the IDXD?

This will keep it generic and address the bandwidth/latency problem for
small transfers.

Best Regards,
Shivank

---