Re: [PATCH] mm: do file ownership checks with the proper mount idmap
From: Christian Brauner
Date: Tue Jun 30 2026 - 03:07:36 EST
On 2026-06-29 19:30 +0100, Pedro Falcato wrote:
> On Mon, Jun 29, 2026 at 02:15:19PM +0200, Christian Brauner wrote:
> > On 2026-06-26 16:19:18+02:00, Jan Kara wrote:
> > > On Thu 25-06-26 16:38:53, Pedro Falcato wrote:
> > >
> > > > Ever since idmapped mounts were introduced, inode ownership checks
> > > > (for side-channel protection) in mincore() and madvise(MADV_PAGEOUT) were
> > > > done against the nop_mnt_idmap, which completely ignores the file's mount's
> > > > idmap. This results in odd edgecases like:
> > > >
> > > > 1) mount/bind-mount with an idmap userA:userB:1
> > > > 2) userB runs an owner_or_capable() check on file that is owned by userA
> > > > on-disk/in-memory, but owned by userB after idmap translation
> > > > 3) owner_or_capable() mysteriously fails as the correct idmap wasn't supplied
> > > >
> > > > In the case of mincore/madvise MADV_PAGEOUT, this is usually benign, because
> > > > file_permission(file, MAY_WRITE) will probably succeed, as it uses the proper
> > > > idmap internally, but it does not need to be the case on e.g a 0444 file
> > > > where even the owner itself doesn't have permissions to write to it.
> > > >
> > > > Since this is clearly not trivial to get right, introduce a
> > > > file_owner_or_capable() that can carry the correct semantics, and switch
> > > > the various users in mm to it.
> > > >
> > > > The issue was found by manual code inspection & an off-list discussion with
> > > > Jan Kara.
> > > >
> > > > Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP")
> > > > Cc: stable@xxxxxxxxxxxxxxx
> > > > Signed-off-by: Pedro Falcato <pfalcato@xxxxxxx>
> > >
> > > This looks good to me. I'm a bit curious why Christian initially (in 2021)
> > > used init_user_ns here instead of the file namespace... Anyway feel free to
> > > add:
> >
> > Back when this was added only the do_mincore() codepath existed and that
> > was intentionally left unconverted because it exposes the cache
> > residency status. So it was effectively a massive side-channel.
>
> Hmm. I'm not sure what you mean by this. Wouldn't it be more correct to respect
> the mount idmap (given that a mount-ns-capable user mounted it with an idmap for
> someone else, or itself) for mincore? Am I missing something? Or maybe I'm
> misunderstanding that paragraph.
TL;DR: It is unclear to me why mincore() would actually be used in
containers.