[...]> For the reference count debugging, I have sent a patch series here:We've tested your patch on our servers and ran into an issue.
With heavy I/O load the aoe device had stale I/Os (e.g. rsync waiting
indefinetly on one core) that can be "fixed" by running aoe-revalidate on
that device.
[RFC PATCH 0/2] tracking the references of net_device in aoe
https://lore.kernel.org/lkml/20241002040616.25193-1-jlee@xxxxxxxx/T/#t
Base on my testing, the number of dev_hold(nd) and dev_put(nd) are balance
in aoe after the this 'aoe: fix the potential use-after-free problem in more places'
patch be applied on v6.11 kernel. I have tested add/modify/delete files in remote
target by aoe. My testing is not a heavy I/O testing. But the result is
balance.
Could you please help to try the above debug patch series for looking at the
refcnt value in aoe in your side?