Re: DPAA2 triggers, [PATCH] dma debug: report -EEXIST errors in add_dma_entry

From: Karsten Graul
Date: Thu Sep 30 2021 - 09:48:29 EST


On 14/09/2021 17:45, Ioana Ciornei wrote:
> On Wed, Sep 08, 2021 at 10:33:26PM -0500, Jeremy Linton wrote:
>> +DPAA2, netdev maintainers
>> Hi,
>>
>> On 5/18/21 7:54 AM, Hamza Mahfooz wrote:
>>> Since, overlapping mappings are not supported by the DMA API we should
>>> report an error if active_cacheline_insert returns -EEXIST.
>>
>> It seems this patch found a victim. I was trying to run iperf3 on a
>> honeycomb (5.14.0, fedora 35) and the console is blasting this error message
>> at 100% cpu. So, I changed it to a WARN_ONCE() to get the call trace, which
>> is attached below.
>>
>
> These frags are allocated by the stack, transformed into a scatterlist
> by skb_to_sgvec and then DMA mapped with dma_map_sg. It was not the
> dpaa2-eth's decision to use two fragments from the same page (that will
> also end un in the same cacheline) in two different in-flight skbs.
>
> Is this behavior normal?
>

We see the same problem here and it started with 5.15-rc2 in our nightly CI runs.
The CI has panic_on_warn enabled so we see the panic every day now.

Its always the same pattern: module SMC calls dma_map_sg_attrs() which ends
up in the EEXIST warning sooner or later.

It would be better to revert this patch now and start to better understand the
checking logic for overlapping areas.

Thank you.


The call trace for reference:

[ 864.189864] DMA-API: mlx5_core 0662:00:00.0: cacheline tracking EEXIST, overlapping mappings aren't supported
[ 864.189883] WARNING: CPU: 0 PID: 33720 at kernel/dma/debug.c:570 add_dma_entry+0x208/0x2c8
...
[ 864.190747] CPU: 0 PID: 33720 Comm: smcapp Not tainted 5.15.0-20210928.rc3.git0.a59bf04db7bb.300.fc34.s390x+debug #1
[ 864.190758] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
[ 864.190766] Krnl PSW : 0704d00180000000 00000000fa6239fc (add_dma_entry+0x20c/0x2c8)
[ 864.190783] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
[ 864.190795] Krnl GPRS: c0000000ffffbfff 0000000080000000 0000000000000061 0000000000000000
[ 864.190804] 0000000000000001 0000000000000001 0000000000000001 0000000000000001
[ 864.190813] 0700000000000001 000000000020ff00 00000000ffffffff 000000008137b300
[ 864.190822] 0000000020020100 0000000000000001 00000000fa6239f8 00000380074536f8
[ 864.190837] Krnl Code: 00000000fa6239ec: c020007a4964 larl %r2,00000000fb56ccb4
00000000fa6239f2: c0e5005ef2ff brasl %r14,00000000fb201ff0
#00000000fa6239f8: af000000 mc 0,0
>00000000fa6239fc: ecb60057007c cgij %r11,0,6,00000000fa623aaa
00000000fa623a02: c01000866149 larl %r1,00000000fb6efc94
00000000fa623a08: e31010000012 lt %r1,0(%r1)
00000000fa623a0e: a774ff73 brc 7,00000000fa6238f4
00000000fa623a12: c010008a9227 larl %r1,00000000fb775e60
[ 864.202949] Call Trace:
[ 864.202959] [<00000000fa6239fc>] add_dma_entry+0x20c/0x2c8
[ 864.202971] ([<00000000fa6239f8>] add_dma_entry+0x208/0x2c8)
[ 864.202981] [<00000000fa624988>] debug_dma_map_sg+0x140/0x160
[ 864.202992] [<00000000fa61eadc>] __dma_map_sg_attrs+0x9c/0xd8
[ 864.203002] [<00000000fa61eb3a>] dma_map_sg_attrs+0x22/0x40
[ 864.203012] [<000003ff80483bde>] smc_ib_buf_map_sg+0x5e/0x90 [smc]
[ 864.203036] [<000003ff80486b44>] smcr_buf_map_link.part.0+0x12c/0x1e8 [smc]
[ 864.203053] [<000003ff80486cb6>] _smcr_buf_map_lgr+0xb6/0xf8 [smc]
[ 864.203071] [<000003ff8048b91c>] smcr_buf_map_lgr+0x4c/0x90 [smc]
[ 864.211496] [<000003ff80490ac2>] smc_llc_cli_add_link+0x152/0x420 [smc]
[ 864.211522] [<000003ff8047acbc>] smcr_clnt_conf_first_link+0x124/0x1e0 [smc]
[ 864.211537] [<000003ff8047bfb2>] smc_connect_rdma+0x25a/0x2e8 [smc]
[ 864.211551] [<000003ff8047da4a>] __smc_connect+0x38a/0x650 [smc]
[ 864.211566] [<000003ff8047de70>] smc_connect+0x160/0x190 [smc]
[ 864.211580] [<00000000faf10c70>] __sys_connect+0x98/0xd0
[ 864.211592] [<00000000faf12e9a>] __do_sys_socketcall+0x16a/0x350
[ 864.211603] [<00000000fb216752>] __do_syscall+0x1c2/0x1f0
[ 864.211616] [<00000000fb229148>] system_call+0x78/0xa0

--
Karsten