Re: [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock
From: Stephen Hemminger
Date: Thu Jul 13 2023 - 18:23:48 EST
On Tue, 11 Jul 2023 17:47:50 -0700
Ziqi Zhao <astrajoan@xxxxxxxxx> wrote:
> The following 3 locks would race against each other, causing the
> deadlock situation in the Syzbot bug report:
>
> - j1939_socks_lock
> - active_session_list_lock
> - sk_session_queue_lock
>
> A reasonable fix is to change j1939_socks_lock to an rwlock, since in
> the rare situations where a write lock is required for the linked list
> that j1939_socks_lock is protecting, the code does not attempt to
> acquire any more locks. This would break the circular lock dependency,
> where, for example, the current thread already locks j1939_socks_lock
> and attempts to acquire sk_session_queue_lock, and at the same time,
> another thread attempts to acquire j1939_socks_lock while holding
> sk_session_queue_lock.
>
> NOTE: This patch along does not fix the unregister_netdevice bug
> reported by Syzbot; instead, it solves a deadlock situation to prepare
> for one or more further patches to actually fix the Syzbot bug, which
> appears to be a reference counting problem within the j1939 codebase.
>
> #syz test:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>
> Signed-off-by: Ziqi Zhao <astrajoan@xxxxxxxxx>
> ---
Reader-writer locks are not the best way to fix a lock hierarchy problem.
Instead either fix the lock ordering, or use RCU.
Other devices don't have this problem, so perhaps the unique locking
in this device is the problem.