Re: [PATCH][v3] rtnetlink: add rtnl_lock debug log

From: Jakub Kicinski
Date: Wed May 12 2021 - 13:39:23 EST


On Tue, 11 May 2021 19:32:57 +0800 Rocco yue wrote:
> We often encounter system hangs caused by certain process
> holding rtnl_lock for a long time. Even if there is a lock
> detection mechanism in Linux, it is a bit troublesome and
> affects the system performance. We hope to add a lightweight
> debugging mechanism for detecting rtnl_lock.
>
> Up to now, we have discovered and solved some potential bugs
> through this lightweight rtnl_lock debugging mechanism, which
> is helpful for us.
>
> When you say Y for RTNL_LOCK_DEBUG, then the kernel will
> detect if any function hold rtnl_lock too long and some key
> information will be printed out to help locate the problem.
>
> i.e: from the following logs, we can clearly know that the
> pid=2206 RfxSender_4 process holds rtnl_lock for a long time,
> causing the system to hang. And we can also speculate that the
> delay operation may be performed in devinet_ioctl(), resulting
> in rtnl_lock was not released in time.

You can achieve that with a pair of fexit/fentry hooks or kprobes,
and maybe a bit of BPF. No need for config options, and hardcoded
parameters..