Re: [PATCH] xfs: fix possible assert failed in xfs_fs_put_super() when do cpu offline
From: Darrick J. Wong
Date: Tue Mar 14 2023 - 23:06:49 EST
[add lfsdevel to cc to spread the, um, love]
TLDR: percpu_counter_sum doesn't add in the values from CPUs in the
dying mask. As a result, the summation can race with cpu hotunplug
and return the wrong values.
On Wed, Mar 15, 2023 at 09:13:05AM +1100, Dave Chinner wrote:
> On Tue, Mar 14, 2023 at 09:31:00AM -0700, Darrick J. Wong wrote:
> > On Tue, Mar 14, 2023 at 05:06:49PM +0800, Ye Bin wrote:
> > > From: Ye Bin <yebin10@xxxxxxxxxx>
> > >
> > > There's a issue when do cpu offline test:
> > > CPU: 48 PID: 1168152 Comm: umount Kdump: loaded Tainted: G L
> > > pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
> > > pc : assfail+0x8c/0xb4
> > > lr : assfail+0x38/0xb4
> > > sp : ffffa00033ce7c40
> > > x29: ffffa00033ce7c40 x28: ffffa00014794f30
> > > x27: ffffa00014f6ca20 x26: 1fffe0120b2e2030
> > > x25: ffff009059710188 x24: ffff00886c0a4650
> > > x23: 1fffe0110d8148ca x22: ffff009059710180
> > > x21: ffffa00015155680 x20: ffff00886c0a4000
> > > x19: 0000000000000001 x18: 0000000000000000
> > > x17: 0000000000000000 x16: 0000000000000000
> > > x15: 0000000000000007 x14: 1fffe00304cef265
> > > x13: ffff00182642b200 x12: ffff8012d37757bf
> > > x11: 1fffe012d37757be x10: ffff8012d37757be
> > > x9 : ffffa00010603a0c x8 : 0000000041b58ab3
> > > x7 : ffff94000679cf44 x6 : 00000000ffffffc0
> > > x5 : 0000000000000021 x4 : 00000000ffffffca
> > > x3 : 1ffff40002a27ee1 x2 : 0000000000000004
> > > x1 : 0000000000000000 x0 : ffffa0001513f000
> > > Call trace:
> > > assfail+0x8c/0xb4
> > > xfs_destroy_percpu_counters+0x98/0xa4
> > > xfs_fs_put_super+0x1a0/0x2a4
> > > generic_shutdown_super+0x104/0x2c0
> > > kill_block_super+0x8c/0xf4
> > > deactivate_locked_super+0xa4/0x164
> > > deactivate_super+0xb0/0xdc
> > > cleanup_mnt+0x29c/0x3ec
> > > __cleanup_mnt+0x1c/0x30
> > > task_work_run+0xe0/0x200
> > > do_notify_resume+0x244/0x320
> > > work_pending+0xc/0xa0
> > >
> > > We analyzed the data in vmcore is correct. But triggered above issue.
> > > As f689054aace2 ("percpu_counter: add percpu_counter_sum_all interface")
> > > commit describes there is a small race window between the online CPUs traversal
> > > of percpu_counter_sum and the CPU offline callback. This means percpu_counter_sum()
> > > may return incorrect result during cpu offline.
> > > To solve above issue use percpu_counter_sum_all() interface to make sure
> > > result is correct to prevent false triggering of assertions.
> >
> > How about the other percpu_counter_sum callsites inside XFS? Some of
> > them are involved in writing ondisk metadata (xfs_log_sb) or doing
> > correctness checks (fs/xfs/scrub/*); shouldn't those also be using the
> > _all variant?
>
> Ugh. I kinda wish that the percpu_counter_sum_all() patch had been
> cc'd to lists for subsystems that use percpu_counter_sum()
> extensively, or just to people who have modified that code in the
> past.
>
> The problem is that it uses cpu_possible_mask, which means it
> walks all possible CPUs that can be added to the system even if the
> CPUs aren't physically present. That can be a lot in the case of
> systems that can have cpu-capable nodes hotplugged into them, and
> that makes percpu_counter_sum_all() excitingly expensive for no good
> reason.
>
> AFAICT, if we are trying to close a race condition between iterating
> online CPUs not summing dying CPUs and the cpu dead notification
> updating the sum, then shouldn't we be using
> cpu_mask_or(cpu_online_mask, cpu_dying_mask) for summing iteration
> rather than just cpu_online_mask?
>
> i.e. when a CPU is being taken down, it gets added to the
> cpu_dying_mask, then removed from the cpu_online_mask, then the
> offline notifications are run (i.e. the percpu counter dead
> callback), and when the CPU reaches the CPUHP_TEARDOWN_CPU state,
> it is removed from the cpu_dying_mask because it is now dead and all
> the "cpu dying" callbacks have been run.
>
> Except when a hotplug event is being processed, cpu_dying_mask will
> be empty, hence there is little change in summing overhead. But it
> will close the summing race condition when a CPU is being
> offlined...
>
> That, I think, is the solution we want for XFS. Having the percpu
> counters just do the right thing is far better than always having to
> wonder if summation interface we are using is correct in the face of
> CPU hotplug. I'll put a patchset together to do:
>
> 1. fix percpu_counter_sum() to include the dying mask in it's
> iteration. This should fix the XFS issue.
I took a quick look at ext4 and btrfs usage of percpu_counter_sum. I
/think/ they're less impacted because most of the usage seems to be for
things like statfs which are inherently racy. That said, mixing in the
dying mask sounds like a cheap fix.
> 2. change the only user of percpu_counter_sum_all() to only use
> percpu_counter_sum() because percpu_counter_sum_all() is now
> redundant.
> 3. remove percpu_counter_sum_all() because it is unused.
Sounds reasonable to /me. :)
--D
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx