Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

From: Linus Torvalds
Date: Wed Jan 16 2019 - 00:50:10 EST


On Wed, Jan 16, 2019 at 4:54 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Jan 16, 2019 at 11:45 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > I'm assuming that you can invalidate the page cache reliably by a
> > means that does not repeated require probing to detect invalidation
> > has occurred. I've mentioned one method in this discussion
> > already...
>
> Yes. And it was made clear to you that it was a bug in xfs dio and
> what the right thing to do was.

Side note: I actually think we *do* the right thing. Even for xfs. I
couldn't find the alleged place that invalidates the page cache on dio
reads.

The *generic* dio code only does it for writes (which is correct and
fine). And maybe xfs has some extra invalidation, but I don't see it.

So I actually hope your "you can use direct-io read to do directed
invalidating of the page cache" isn't true. I admittedly did *not* try
to delve very deeply into it, but the invalidates I found looked
correct. The generic code does it for writes, and at least ext4 does
the "writeback and wait" for reads.

There *does* seem to be a 'invalidate_inode_pages2_range()' call in
iomap_dio_rw(). That has a *comment* that says it only is for writes,
but it looks to me like it would trigger for reads too.

Just a plain bug/oversight? Or me misreading things.

So yes, maybe xfs does that "invalidate on read", but it really seems
to be just a bug. If the xfs people insist on keeping the bug, fine
(looks like gfs2 and xfs are the only users), but it seems kind of
sad.

Linus