Re: [PATCH] vfs: stop shrinker while fs is freezed

From: Dave Chinner
Date: Sun Dec 15 2019 - 23:11:37 EST


On Fri, Dec 13, 2019 at 04:40:12PM -0800, Darrick J. Wong wrote:
> [adding mm to cc]
>
> On Fri, Dec 13, 2019 at 02:24:40PM -0800, Junxiao Bi wrote:
> > Shrinker could be blocked by freeze while dropping the last reference of
> > some inode that had been removed. As "s_umount" lock was acquired by the
> > Shrinker before blocked, the thaw will hung by this lock. This caused a
> > deadlock.
> >
> > crash7latest> set 132
> > PID: 132
> > COMMAND: "kswapd0:0"
> > TASK: ffff9cdc9dfb5f00 [THREAD_INFO: ffff9cdc9dfb5f00]
> > CPU: 6
> > STATE: TASK_UNINTERRUPTIBLE
> > crash7latest> bt
> > PID: 132 TASK: ffff9cdc9dfb5f00 CPU: 6 COMMAND: "kswapd0:0"
> > #0 [ffffaa5d075bf900] __schedule at ffffffff8186487c
> > #1 [ffffaa5d075bf998] schedule at ffffffff81864e96
> > #2 [ffffaa5d075bf9b0] rwsem_down_read_failed at ffffffff818689ee
> > #3 [ffffaa5d075bfa40] call_rwsem_down_read_failed at ffffffff81859308
> > #4 [ffffaa5d075bfa90] __percpu_down_read at ffffffff810ebd38
> > #5 [ffffaa5d075bfab0] __sb_start_write at ffffffff812859ef
> > #6 [ffffaa5d075bfad0] xfs_trans_alloc at ffffffffc07ebe9c [xfs]
> > #7 [ffffaa5d075bfb18] xfs_free_eofblocks at ffffffffc07c39d1 [xfs]
> > #8 [ffffaa5d075bfb80] xfs_inactive at ffffffffc07de878 [xfs]
> > #9 [ffffaa5d075bfba0] __dta_xfs_fs_destroy_inode_3543 at ffffffffc07e885e [xfs]
> > #10 [ffffaa5d075bfbd0] destroy_inode at ffffffff812a25de
> > #11 [ffffaa5d075bfbe8] evict at ffffffff812a2b73
> > #12 [ffffaa5d075bfc10] dispose_list at ffffffff812a2c1d
> > #13 [ffffaa5d075bfc38] prune_icache_sb at ffffffff812a421a
> > #14 [ffffaa5d075bfc70] super_cache_scan at ffffffff812870a1
> > #15 [ffffaa5d075bfcc8] shrink_slab at ffffffff811eebb3
> > #16 [ffffaa5d075bfdb0] shrink_node at ffffffff811f4788
> > #17 [ffffaa5d075bfe38] kswapd at ffffffff811f58c3
> > #18 [ffffaa5d075bff08] kthread at ffffffff810b75d5
> > #19 [ffffaa5d075bff50] ret_from_fork at ffffffff81a0035e
> > crash7latest> set 31060
> > PID: 31060
> > COMMAND: "safefreeze"
> > TASK: ffff9cd292868000 [THREAD_INFO: ffff9cd292868000]
> > CPU: 2
> > STATE: TASK_UNINTERRUPTIBLE
> > crash7latest> bt
> > PID: 31060 TASK: ffff9cd292868000 CPU: 2 COMMAND: "safefreeze"
> > #0 [ffffaa5d10047c90] __schedule at ffffffff8186487c
> > #1 [ffffaa5d10047d28] schedule at ffffffff81864e96
> > #2 [ffffaa5d10047d40] rwsem_down_write_failed at ffffffff81868f18
> > #3 [ffffaa5d10047dd8] call_rwsem_down_write_failed at ffffffff81859367
> > #4 [ffffaa5d10047e20] down_write at ffffffff81867cfd
> > #5 [ffffaa5d10047e38] thaw_super at ffffffff81285d2d
> > #6 [ffffaa5d10047e60] do_vfs_ioctl at ffffffff81299566
> > #7 [ffffaa5d10047ee8] sys_ioctl at ffffffff81299709
> > #8 [ffffaa5d10047f28] do_syscall_64 at ffffffff81003949
> > #9 [ffffaa5d10047f50] entry_SYSCALL_64_after_hwframe at ffffffff81a001ad
> > RIP: 0000000000453d67 RSP: 00007ffff9c1ce78 RFLAGS: 00000206
> > RAX: ffffffffffffffda RBX: 0000000001cbe92c RCX: 0000000000453d67
> > RDX: 0000000000000000 RSI: 00000000c0045878 RDI: 0000000000000014
> > RBP: 00007ffff9c1cf80 R8: 0000000000000000 R9: 0000000000000012
> > R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000401fb0
> > R13: 0000000000402040 R14: 0000000000000000 R15: 0000000000000000
> > ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
> >
> > Signed-off-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx>
> > ---
> > fs/super.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/fs/super.c b/fs/super.c
> > index cfadab2cbf35..adc18652302b 100644
> > --- a/fs/super.c
> > +++ b/fs/super.c
> > @@ -80,6 +80,11 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
> > if (!trylock_super(sb))
> > return SHRINK_STOP;
> >
> > + if (sb->s_writers.frozen != SB_UNFROZEN) {
> > + up_read(&sb->s_umount);
> > + return SHRINK_STOP;
> > + }
>
> Whatever happened to "let's just fsfreeze the filesystems shortly before
> freezing the system? Did someone find a reason why that wouldn't work?
>
> Also, uh, doesn't this disable memory reclaim for frozen filesystems?
>
> Maybe we all need to go review the xfs io-less inode reclaim series so
> we can stop running transactions in reclaim... I can't merge any of it
> until the mm changes go upstream.

IO-less reclaim doesn't prevent ->destroy_inode from having to run
transactions. e.g. this is the path through which unlink does inode
freeing. Background inode inactivation is the patchset that
addresses this problem... :)

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx