Re: [PATCH v2] ext4: prevent data-race that occur when read/write ext4_group_desc structure members

From: Jeongjun Park
Date: Tue Oct 01 2024 - 08:26:54 EST


Andreas Dilger <adilger@xxxxxxxxx> wrote:
>
> On Sep 20, 2024, at 9:00 AM, Jeongjun Park <aha310510@xxxxxxxxx> wrote:
> >
> > Currently, data-race like [1] occur in fs/ext4/ialloc.c
> >
> > find_group_other() and find_group_orlov() read multiple ext4_groups but
> > do not protect them with locks, which causes data-race. I think it would
> > be appropriate to add ext4_lock_group() at an appropriate location to solve
> > this.
> >
> > [1]
> >
> > ==================================================================
> > BUG: KCSAN: data-race in ext4_free_inodes_count / ext4_free_inodes_set
> >
> > write to 0xffff88810404300e of 2 bytes by task 6254 on cpu 1:
> > ext4_free_inodes_set+0x1f/0x80 fs/ext4/super.c:405
> > __ext4_new_inode+0x15ca/0x2200 fs/ext4/ialloc.c:1216
> > ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391
> > vfs_symlink+0xca/0x1d0 fs/namei.c:4615
> > do_symlinkat+0xe3/0x340 fs/namei.c:4641
> > __do_sys_symlinkat fs/namei.c:4657 [inline]
> > __se_sys_symlinkat fs/namei.c:4654 [inline]
> > __x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654
> > x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > read to 0xffff88810404300e of 2 bytes by task 6257 on cpu 0:
> > ext4_free_inodes_count+0x1c/0x80 fs/ext4/super.c:349
> > find_group_other fs/ext4/ialloc.c:594 [inline]
> > __ext4_new_inode+0x6ec/0x2200 fs/ext4/ialloc.c:1017
> > ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391
> > vfs_symlink+0xca/0x1d0 fs/namei.c:4615
> > do_symlinkat+0xe3/0x340 fs/namei.c:4641
> > __do_sys_symlinkat fs/namei.c:4657 [inline]
> > __se_sys_symlinkat fs/namei.c:4654 [inline]
> > __x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654
> > x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > value changed: 0x185c -> 0x185b
>
> I see now after you sent this patch that all of these cases are for
> read-only access to the free inodes count, which doesn't really
> matter if the value is racy. These values are only used in heuristics
> for block group selection, and if the value is wrong then creating a
> new subdirectory may be in a different group, but that doesn't make
> much difference.
>
> It looks like the write side of all these accesses are already under
> ext4_group_lock(), so the code is actually correct and not in danger
> of two threads updating bg_free_inodes_count_lo/hi inconsistently.
>
> We probably *do not* want locking in the read case, as it will cause
> unnecessary lock contention scanning groups for subdirectory allocation.
>
> My suggestion at this point would be to go back to using READ_ONCE() and
> WRITE_ONCE() in ext4_free_inodes_count()/ext4_free_inodes_set() like in
> your original patch. but *only* for functions used by find_group_*(). We
> want to be warned by KASAN if any of the other fields are accessed without
> a proper ext4_group_lock(), since READ_ONCE()/WRITE_ONCE() does not fix
> _lo/_hi tearing.
>
> It probably also makes sense to add comments to all of these functions
> that they should hold ext4_group_lock() for access/updates, *except*
> ext4_free_inodes_count() can be called to read the inode count without it
> if the result does not need to be totally accurate.

Thanks for the comment! Then it seems appropriate to annotate
ext4_free_inodes_count() called in find_group_* so that KCSAN does
not report it. I will write a v3 patch that implements this and send it to
you right away.

Regards,
Jeongjun Park

>
> Cheers, Andreas
>
> > Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3")
> > Signed-off-by: Jeongjun Park <aha310510@xxxxxxxxx>
> > ---
> > fs/ext4/ialloc.c | 27 ++++++++++++++++++++++++---
> > 1 file changed, 24 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> > index 9dfd768ed9f8..5cae247ff21f 100644
> > --- a/fs/ext4/ialloc.c
> > +++ b/fs/ext4/ialloc.c
> > @@ -500,11 +500,14 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> > for (i = 0; i < flex_size; i++) {
> > if (grp+i >= real_ngroups)
> > break;
> > + ext4_lock_group(sb, grp+i);
> > desc = ext4_get_group_desc(sb, grp+i, NULL);
> > if (desc && ext4_free_inodes_count(sb, desc)) {
> > *group = grp+i;
> > + ext4_unlock_group(sb, grp+i);
> > return 0;
> > }
> > + ext4_unlock_group(sb, grp+i);
> > }
> > goto fallback;
> > }
> > @@ -544,14 +547,17 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> > parent_group = EXT4_I(parent)->i_block_group;
> > for (i = 0; i < ngroups; i++) {
> > grp = (parent_group + i) % ngroups;
> > + ext4_lock_group(sb, grp);
> > desc = ext4_get_group_desc(sb, grp, NULL);
> > if (desc) {
> > grp_free = ext4_free_inodes_count(sb, desc);
> > if (grp_free && grp_free >= avefreei) {
> > *group = grp;
> > + ext4_unlock_group(sb, grp);
> > return 0;
> > }
> > }
> > + ext4_unlock_group(sb, grp);
> > }
> >
> > if (avefreei) {
> > @@ -590,11 +596,14 @@ static int find_group_other(struct super_block *sb, struct inode *parent,
> > if (last > ngroups)
> > last = ngroups;
> > for (i = parent_group; i < last; i++) {
> > + ext4_lock_group(sb, i);
> > desc = ext4_get_group_desc(sb, i, NULL);
> > if (desc && ext4_free_inodes_count(sb, desc)) {
> > *group = i;
> > + ext4_unlock_group(sb, i);
> > return 0;
> > }
> > + ext4_unlock_group(sb, i);
> > }
> > if (!retry && EXT4_I(parent)->i_last_alloc_group != ~0) {
> > retry = 1;
> > @@ -616,10 +625,14 @@ static int find_group_other(struct super_block *sb, struct inode *parent,
> > * Try to place the inode in its parent directory
> > */
> > *group = parent_group;
> > + ext4_lock_group(sb, *group);
> > desc = ext4_get_group_desc(sb, *group, NULL);
> > if (desc && ext4_free_inodes_count(sb, desc) &&
> > - ext4_free_group_clusters(sb, desc))
> > + ext4_free_group_clusters(sb, desc)) {
> > + ext4_unlock_group(sb, *group);
> > return 0;
> > + }
> > + ext4_unlock_group(sb, *group);
> >
> > /*
> > * We're going to place this inode in a different blockgroup from its
> > @@ -640,10 +653,14 @@ static int find_group_other(struct super_block *sb, struct inode *parent,
> > *group += i;
> > if (*group >= ngroups)
> > *group -= ngroups;
> > + ext4_lock_group(sb, *group);
> > desc = ext4_get_group_desc(sb, *group, NULL);
> > if (desc && ext4_free_inodes_count(sb, desc) &&
> > - ext4_free_group_clusters(sb, desc))
> > + ext4_free_group_clusters(sb, desc)) {
> > + ext4_unlock_group(sb, *group);
> > return 0;
> > + }
> > + ext4_unlock_group(sb, *group);
> > }
> >
> > /*
> > @@ -654,9 +671,13 @@ static int find_group_other(struct super_block *sb, struct inode *parent,
> > for (i = 0; i < ngroups; i++) {
> > if (++*group >= ngroups)
> > *group = 0;
> > + ext4_lock_group(sb, *group);
> > desc = ext4_get_group_desc(sb, *group, NULL);
> > - if (desc && ext4_free_inodes_count(sb, desc))
> > + if (desc && ext4_free_inodes_count(sb, desc)) {
> > + ext4_unlock_group(sb, *group);
> > return 0;
> > + }
> > + ext4_unlock_group(sb, *group);
> > }
> >
> > return -1;
> > --
>
>
> Cheers, Andreas
>
>
>
>
>