Re: [RFC PATCH] ovl: keep merged and impure readdir caches separate
From: Amir Goldstein
Date: Tue May 12 2026 - 15:23:01 EST
On Tue, May 12, 2026 at 8:28 PM Nirmoy Das <nirmoyd@xxxxxxxxxx> wrote:
>
> Hi Amir,
>
> On 11.05.26 23:54, Amir Goldstein wrote:
> > Hi Nirmoy,
> >
> > Thanks for the patch.
> >
> > On Mon, May 11, 2026 at 8:21 AM Nirmoy Das <nirmoyd@xxxxxxxxxx> wrote:
> >> Overlayfs uses one inode cache slot for two readdir cache users with
> >> different lifetime rules. Merged directory iteration pins the cache from
> >> open directory files with cache->refcount. Impure real-directory iteration
> >> uses the inode cache as an unrefcounted lookup table.
> >>
> >> Those caches cannot be reused interchangeably. If merged iteration finds
> >> an impure cache in the inode slot, it can pin and seek through a cache
> >> that was not built for merged iteration. If impure iteration finds a merged
> >> cache, it can walk an object whose lifetime is controlled by open directory
> >> files. Either direction can leave ovl_iterate() using stale cache entries.
> >>
> >> Add ovl_dir_cache_drop() to detach the inode cache before freeing it. Keep
> >> refcounted merged caches alive until ovl_cache_put(), stop publishing new
> >> merged caches through the inode slot,
> > This is unacceptable - invalidating merged dir cache to paper over
> > another root bug, which was not really fixed.
> >
> >> and let impure iteration reuse only
> >> unrefcounted caches.
> > All those guards are nice, but how does a directory inode change from
> > being merged to impure or vice versa?
> > This should never happen.
> >
> > It took a lot of arguing with Sonnet about wrong leads to find the real bug.
> >
> > Real bug was introduced by:
> > b79e05aaa1667 ("ovl: no direct iteration for dir with origin xattr")
> >
> > It changed the test from:
> >
> > od->is_real = !OVL_TYPE_MERGE(type);
> > od->is_upper = OVL_TYPE_UPPER(type);
> >
> > to:
> >
> > od->is_real = !ovl_test_flag(OVL_WHITEOUTS, d_inode(dir));
> > od->is_upper = OVL_TYPE_UPPER(type);
> >
> > But there is a race window in copy up of a directory where
> > upper is set and published before the OVL_WHITEOUTS flag
> > is set.
> >
> > an opendir() observing this state inside the race window will
> > wrongly start to iterate_real and another opendir later will
> > observe the flag and start iterate_merged - boom!
> >
> > Attached is what I think is a correct fix.
> > WDYT?
>
> This looks correct with my limited understanding of the overlay code. I
> still see the issue when running with
>
> the syzkaller-derived C reproducer in a loop inside an arm64 virtme/qemu
> VM with a KASAN/debug kernel.
>
>
> Let me try to debug it more and get back.
Thanks! I appreciate it.
I wasn't sure how reliably the reproducer works.
If you do find a fix to the root cause, do feel free to use my patch as is
or modified and post with changes and proper commit message.
b.t.w I am not against adding WARN_ON() assertions in readdir
code to make sure we do not hit a problem like this again,
but as assertions, not as code that looks like it should really
work in this situation (i.e. no dropping of wrong cache just error).
FYI, I had contemplated an alternative fix, but went with the
smallest one:
implement ovl_i_path_type(struct inode *inode)
and make ovl_path_type(struct dentry *dentry) a wrapper
static inline bool ovl_dir_is_real(struct inode *dir)
{
return !ovl_test_flag(OVL_WHITEOUTS, dir) &&
!OVL_TYPE_MERGE(ovl_i_path_type(dir));
}
This fix is a bit nicer to understand and the OVL_WHITEOUTS
flags becomes an optimization for OVL_TYPE_MERGE()
in addition to handling the remnant whiteouts case.
If this happens to survive the reproducer for some reason,
do feel free to post it instead of my patch.
Thanks,
Amir.