[PATCH 00/17] dmaengine: dw-edma: Support dynamic LL appends
From: Koichiro Den
Date: Mon Jun 15 2026 - 11:46:16 EST
Hi,
This series is a reworked version of Frank's earlier RFT series:
https://lore.kernel.org/dmaengine/20260109-edma_dymatic-v1-0-9a98c9c98536@xxxxxxx/
After discussing the HDMA test results with Frank, I am sending this as a
standalone series that keeps the main dynamic-append direction, while adding the
fixes and HDMA handling needed to make it work reliably on both eDMA and HDMA.
Several patches are kept from, or based on, Frank's RFT series; the individual
patches carry the corresponding attribution.
The series has been tested on both eDMA and HDMA systems. Both completed the fio
test set reliably; performance results are shown below.
Dependencies
============
1). [PATCH v7 0/9] dmaengine: Add new API to combine configuration and descriptor preparation
https://lore.kernel.org/dmaengine/20260521-dma_prep_config-v7-0-1f73f4899883@xxxxxxx/
2). [PATCH v2 00/11] dmaengine: dw-edma: flatten desc structions and simple code
https://lore.kernel.org/dmaengine/20260109-edma_ll-v2-0-5c0b27b2c664@xxxxxxx/
Performance measurements
========================
"Before" means the dependency series applied, without this series.
"After" means the same tree plus this series.
The fio test cases follow the set used in Frank's original RFT series.
Each full fio test set was run three times in alternating order (B-A-B-A-B-A),
with runtime=30s and ramp_time=5s. The tables below report mean bandwidth; the
detailed per-test rows also include standard deviation.
Note:
- These results are from one eDMA platform and one HDMA platform, so the exact
deltas should NOT be read as generic numbers for all dw-edma integrations.
- Both endpoint setups used nvmet_pci_epf with a namespace backed by a
null_blk device.
Summary by group (BW delta %)
all read write qd32 q1 small 4K large >=128K
eDMA +54.6 +46.5 +66.3 +56.1 +53.5 +82.0 +46.3
HDMA +9.0 +5.5 +14.1 +14.9 -0.7 +24.5 +4.3
The eDMA setup shows broad improvement across the test set. On HDMA, the main
gains are in high queue-depth and small-block write cases; low queue-depth cases
are mostly neutral, with some run-to-run noise. For HDMA, watermark interrupts
are needed to obtain a reliable running HDMA_LLP_* progress point. They can be
mostly overhead for low queue-depth workloads where the current descriptor fits
in the LL ring and there is no later descriptor to append.
eDMA:
- Testbed:
* Endpoint: RK3588 (Rock 5B)
controller IP version: v5.60a
ll_max: 170
- Summary by group (BW delta %)
all n=26 mean= +54.6 median= +38.4 min= +16.3 max=+119.0
read n=14 mean= +46.5 median= +37.5 min= +18.7 max=+119.0
write n=11 mean= +66.3 median= +68.1 min= +16.3 max=+117.2
qd32 n=16 mean= +56.1 median= +46.8 min= +18.7 max=+117.2
q1 n= 9 mean= +53.5 median= +36.8 min= +16.3 max=+119.0
small 4K n= 6 mean= +82.0 median= +93.6 min= +18.7 max=+117.2
large >=128K n=20 mean= +46.3 median= +37.6 min= +16.3 max=+119.0
- Before mean -> After mean (MiB/s)
Case Before After Delta
--------------------------- ----------------- ----------------- ------
Rnd read 4KB q1 1j 22.7 (sd 7.7) 48.3 (sd 11.3) +112.8%
Rnd read 4KB q32 1j 206.3 (sd 23.8) 245.0 (sd 21.7) +18.7%
Rnd read 4KB q32 4j 213.3 (sd 28.0) 332.7 (sd 45.6) +55.9%
Rnd read 128KB q1 1j 512.7 (sd 193.6) 644.0 (sd 152.8) +25.6%
Rnd read 128KB q32 1j 2285.7 (sd 15.5) 3071.7 (sd 4.2) +34.4%
Rnd read 128KB q32 4j 2392.0 (sd 6.1) 3290.0 (sd 1.0) +37.5%
Rnd read 512KB q1 1j 634.3 (sd 7.8) 788.7 (sd 15.2) +24.3%
Rnd read 512KB q32 1j 2388.7 (sd 5.5) 3282.0 (sd 2.6) +37.4%
Rnd read 512KB q32 4j 2391.7 (sd 5.5) 3293.0 (sd 0.0) +37.7%
Rnd write 4KB q1 1j 24.4 (sd 10.2) 42.8 (sd 13.2) +75.8%
Rnd write 4KB q32 1j 109.0 (sd 13.0) 230.3 (sd 27.1) +111.3%
Rnd write 4KB q32 4j 110.3 (sd 14.4) 239.7 (sd 34.4) +117.2%
Rnd write 128KB q1 1j 339.0 (sd 41.1) 498.7 (sd 102.9) +47.1%
Rnd write 128KB q32 1j 1027.3 (sd 33.5) 1617.0 (sd 14.8) +57.4%
Rnd write 128KB q32 4j 951.3 (sd 72.6) 1599.0 (sd 3.6) +68.1%
Seq read 128KB q1 1j 379.7 (sd 120.1) 831.3 (sd 89.9) +119.0%
Seq read 128KB q32 1j 2291.7 (sd 6.1) 3091.3 (sd 22.8) +34.9%
Seq read 512KB q1 1j 644.7 (sd 34.4) 882.0 (sd 28.5) +36.8%
Seq read 512KB q32 1j 2387.7 (sd 5.7) 3284.0 (sd 2.6) +37.5%
Seq read 1MB q32 1j 2390.0 (sd 5.3) 3292.3 (sd 2.1) +37.8%
Seq write 128KB q1 1j 354.0 (sd 88.4) 438.0 (sd 65.1) +23.7%
Seq write 128KB q32 1j 934.3 (sd 46.0) 1620.0 (sd 15.6) +73.4%
Seq write 512KB q1 1j 552.7 (sd 14.6) 642.7 (sd 38.1) +16.3%
Seq write 512KB q32 1j 1041.0 (sd 39.5) 1621.3 (sd 1.5) +55.7%
Seq write 1MB q32 1j 808.3 (sd 22.7) 1479.7 (sd 3.5) +83.1%
Rnd rdwr 4K..1MB q8 4j 846.7 (sd 18.8) 1177.7 (sd 23.1) +39.1%
HDMA:
- Testbed:
* Endpoint: SpacemiT K3
controller IP version: v6.30a
ll_max: 170
- Summary by group (BW delta %)
all n=26 mean= +9.0 median= +6.9 min= -15.2 max= +50.2
read n=14 mean= +5.5 median= +6.4 min= -15.2 max= +24.0
write n=11 mean= +14.1 median= +9.0 min= -0.2 max= +50.2
qd32 n=16 mean= +14.9 median= +9.1 min= +5.7 max= +50.2
q1 n= 9 mean= -0.7 median= +0.2 min= -15.2 max= +5.2
small 4K n= 6 mean= +24.5 median= +21.5 min= -0.2 max= +50.2
large >=128K n=20 mean= +4.3 median= +6.4 min= -15.2 max= +9.8
- Before mean -> After mean (MiB/s)
Case Before After Delta
--------------------------- ----------------- ----------------- ------
Rnd read 4KB q1 1j 68.5 (sd 5.7) 72.0 (sd 6.8) +5.1%
Rnd read 4KB q32 1j 310.7 (sd 38.0) 385.3 (sd 43.6) +24.0%
Rnd read 4KB q32 4j 324.0 (sd 45.1) 385.7 (sd 9.5) +19.0%
Rnd read 128KB q1 1j 737.7 (sd 63.3) 746.0 (sd 47.1) +1.1%
Rnd read 128KB q32 1j 1513.0 (sd 24.0) 1617.0 (sd 2.0) +6.9%
Rnd read 128KB q32 4j 1552.7 (sd 7.0) 1641.0 (sd 29.9) +5.7%
Rnd read 512KB q1 1j 828.3 (sd 16.9) 815.7 (sd 14.0) -1.5%
Rnd read 512KB q32 1j 1550.0 (sd 8.5) 1661.7 (sd 14.3) +7.2%
Rnd read 512KB q32 4j 1547.3 (sd 20.4) 1670.0 (sd 27.0) +7.9%
Rnd write 4KB q1 1j 67.2 (sd 5.1) 67.1 (sd 5.5) -0.2%
Rnd write 4KB q32 1j 207.7 (sd 6.8) 309.7 (sd 3.8) +49.1%
Rnd write 4KB q32 4j 208.0 (sd 5.6) 312.3 (sd 4.0) +50.2%
Rnd write 128KB q1 1j 545.0 (sd 42.5) 573.3 (sd 45.7) +5.2%
Rnd write 128KB q32 1j 1251.3 (sd 16.0) 1363.3 (sd 6.7) +9.0%
Rnd write 128KB q32 4j 1251.0 (sd 17.1) 1365.3 (sd 4.9) +9.1%
Seq read 128KB q1 1j 803.3 (sd 78.2) 681.0 (sd 110.1) -15.2%
Seq read 128KB q32 1j 1513.3 (sd 23.5) 1618.3 (sd 4.0) +6.9%
Seq read 512KB q1 1j 846.7 (sd 26.9) 797.7 (sd 73.9) -5.8%
Seq read 512KB q32 1j 1522.0 (sd 36.2) 1671.0 (sd 1.7) +9.8%
Seq read 1MB q32 1j 1544.0 (sd 21.8) 1636.3 (sd 25.1) +6.0%
Seq write 128KB q1 1j 544.3 (sd 13.3) 572.3 (sd 28.4) +5.1%
Seq write 128KB q32 1j 1251.3 (sd 15.5) 1364.3 (sd 4.9) +9.0%
Seq write 512KB q1 1j 772.7 (sd 23.0) 774.3 (sd 64.1) +0.2%
Seq write 512KB q32 1j 1251.3 (sd 17.0) 1365.0 (sd 5.2) +9.1%
Seq write 1MB q32 1j 1250.3 (sd 16.5) 1366.0 (sd 5.3) +9.3%
Rnd rdwr 4K..1MB q8 4j 875.0 (sd 9.0) 884.3 (sd 4.5) +1.1%
Best regards,
Koichiro
Frank Li (5):
dmaengine: dw-edma: Add dw_edma_core_ll_cur_idx() to get current LL
entry index
dmaengine: dw-edma: Move dw_hdma_set_callback_result() up
dmaengine: dw-edma: Make DMA link list work as a circular buffer
dmaengine: dw-edma: Dynamically append requests while running
dmaengine: dw-edma: Add trace support
Koichiro Den (12):
dmaengine: dw-edma: Fix residue burst index in tx_status()
dmaengine: dw-edma: Fix HDMA channel status register access
dmaengine: dw-edma: Terminate STOP requests without callbacks
dmaengine: dw-edma: Clean up vchan descriptors on termination
dmaengine: dw-edma: Serialize channel state checks
dmaengine: dw-edma: Add LL interrupt placement policy
dmaengine: dw-edma: Reclaim issued descriptors from LL progress
dmaengine: dw-edma: Use HDMA watermarks as progress events
dmaengine: dw-edma: Clear LL data entries on reset
dmaengine: dw-edma: Dispatch DONE interrupts by channel request
dmaengine: dw-edma: Reset LL state after terminate and abort
dmaengine: dw-edma: Recover stopped HDMA from tx_status
drivers/dma/dw-edma/Makefile | 3 +
drivers/dma/dw-edma/dw-edma-core.c | 577 +++++++++++++++++++++-----
drivers/dma/dw-edma/dw-edma-core.h | 63 ++-
drivers/dma/dw-edma/dw-edma-trace.c | 4 +
drivers/dma/dw-edma/dw-edma-trace.h | 150 +++++++
drivers/dma/dw-edma/dw-edma-v0-core.c | 50 ++-
drivers/dma/dw-edma/dw-hdma-v0-core.c | 125 +++++-
drivers/dma/dw-edma/dw-hdma-v0-regs.h | 1 +
8 files changed, 847 insertions(+), 126 deletions(-)
create mode 100644 drivers/dma/dw-edma/dw-edma-trace.c
create mode 100644 drivers/dma/dw-edma/dw-edma-trace.h
--
2.51.0