Re: [PATCH] kernfs: switch global kernfs_rwsem lock to per-fs lock

From: Minchan Kim
Date: Mon Nov 29 2021 - 15:24:12 EST


On Fri, Nov 26, 2021 at 12:54:45PM +0100, Marek Szyprowski wrote:
> Hi,
>
> On 19.11.2021 00:00, Minchan Kim wrote:
> > The kernfs implementation has big lock granularity(kernfs_rwsem) so
> > every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the
> > lock. It makes trouble for some cases to wait the global lock
> > for a long time even though they are totally independent contexts
> > each other.
> >
> > A general example is process A goes under direct reclaim with holding
> > the lock when it accessed the file in sysfs and process B is waiting
> > the lock with exclusive mode and then process C is waiting the lock
> > until process B could finish the job after it gets the lock from
> > process A.
> >
> > This patch switches the global kernfs_rwsem to per-fs lock, which
> > put the rwsem into kernfs_root.
> >
> > Suggested-by: Tejun Heo <tj@xxxxxxxxxx>
> > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
>
> This patch landed recently in linux-next (20211126) as commit
> 393c3714081a ("kernfs: switch global kernfs_rwsem lock to per-fs lock").
> In my tests I've found that it causes the following warning during the
> system reboot:
>
>  =========================
>  WARNING: held lock freed!
>  5.16.0-rc2+ #10984 Not tainted
>  -------------------------
>  kworker/1:0/18 is freeing memory ffff00004034e200-ffff00004034e3ff,
> with a lock still held there!
>  ffff00004034e348 (&root->kernfs_rwsem){++++}-{3:3}, at:
> __kernfs_remove+0x310/0x37c
>  3 locks held by kworker/1:0/18:
>   #0: ffff000040107938 ((wq_completion)cgroup_destroy){+.+.}-{0:0}, at:
> process_one_work+0x1f0/0x6f0
>   #1: ffff80000b55bdc0
> ((work_completion)(&(&css->destroy_rwork)->work)){+.+.}-{0:0}, at:
> process_one_work+0x1f0/0x6f0
>   #2: ffff00004034e348 (&root->kernfs_rwsem){++++}-{3:3}, at:
> __kernfs_remove+0x310/0x37c
>
>  stack backtrace:
>  CPU: 1 PID: 18 Comm: kworker/1:0 Not tainted 5.16.0-rc2+ #10984
>  Hardware name: Raspberry Pi 4 Model B (DT)
>  Workqueue: cgroup_destroy css_free_rwork_fn
>  Call trace:
>   dump_backtrace+0x0/0x1ac
>   show_stack+0x18/0x24
>   dump_stack_lvl+0x8c/0xb8
>   dump_stack+0x18/0x34
>   debug_check_no_locks_freed+0x124/0x140
>   kfree+0xf0/0x3a4
>   kernfs_put+0x1f8/0x224
>   __kernfs_remove+0x1b8/0x37c
>   kernfs_destroy_root+0x38/0x50
>   css_free_rwork_fn+0x288/0x3d4
>   process_one_work+0x288/0x6f0
>   worker_thread+0x74/0x470
>   kthread+0x188/0x194
>   ret_from_fork+0x10/0x20
>
> Let me know if you need more information or help in reproducing this issue.

Hi Marek,

Thanks for the report. Could you try this one?