Re: [PATCH v3] ovl: use a dedicated semaphore for dir upperfile caching

From: Amir Goldstein
Date: Sat Jan 16 2021 - 12:11:40 EST


On Tue, Jan 5, 2021 at 8:47 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> On Tue, Jan 5, 2021 at 2:36 AM Icenowy Zheng <icenowy@xxxxxxx> wrote:
> >
> > The function ovl_dir_real_file() currently uses the semaphore of the
> > inode to synchronize write to the upperfile cache field.
>
> Although the inode lock is a rw_sem it is referred to as the "inode lock"
> and you also left semaphore in the commit subject.
> No need to re-post. This can be fixed on commit.
>
> >
> > However, this function will get called by ovl_ioctl_set_flags(), which
> > utilizes the inode semaphore too. In this case ovl_dir_real_file() will
> > try to claim a lock that is owned by a function in its call stack, which
> > won't get released before ovl_dir_real_file() returns.
> >
> > Define a dedicated semaphore for the upperfile cache, so that the
> > deadlock won't happen.
> >
> > Fixes: 61536bed2149 ("ovl: support [S|G]ETFLAGS and FS[S|G]ETXATTR ioctls for directories")
> > Cc: stable@xxxxxxxxxxxxxxx # v5.10
> > Signed-off-by: Icenowy Zheng <icenowy@xxxxxxx>
> > ---
> > Changes in v2:
> > - Fixed missing replacement in error handling path.
> > Changes in v3:
> > - Use mutex instead of semaphore.
> >
> > fs/overlayfs/readdir.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> > index 01620ebae1bd..3980f9982f34 100644
> > --- a/fs/overlayfs/readdir.c
> > +++ b/fs/overlayfs/readdir.c
> > @@ -56,6 +56,7 @@ struct ovl_dir_file {
> > struct list_head *cursor;
> > struct file *realfile;
> > struct file *upperfile;
> > + struct mutex upperfile_mutex;
>
> That's a very specific name.
> This mutex protects members of struct ovl_dir_file, which could evolve
> into struct ovl_file one day (because no reason to cache only dir upper file),
> so I would go with a more generic name, but let's leave it to Miklos to decide.
>
> He could have a different idea altogether for fixing this bug.
>

Miklos,

Please fast track this or an alternative fix.
It fixes an easy to reproduce deadlock introduced in 5.10.
Icenowy Zheng has written a simple xfstest reproducer, but it wasn't
posted - best to avoid hanging tester's machines until a fix is merged...

Thanks,
Amir.