Re: [PATCH v11 09/12] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped

From: Michal Hocko
Date: Tue Jun 11 2024 - 07:55:23 EST


On Tue 11-06-24 09:55:23, Byungchul Park wrote:
> On Mon, Jun 10, 2024 at 03:23:49PM +0200, Michal Hocko wrote:
> > On Tue 04-06-24 09:34:48, Byungchul Park wrote:
> > > On Mon, Jun 03, 2024 at 06:01:05PM +0100, Matthew Wilcox wrote:
> > > > On Mon, Jun 03, 2024 at 09:37:46AM -0700, Dave Hansen wrote:
> > > > > Yeah, we'd need some equivalent of a PTE marker, but for the page cache.
> > > > > Presumably some xa_value() that means a reader has to go do a
> > > > > luf_flush() before going any farther.
> > > >
> > > > I can allocate one for that. We've got something like 1000 currently
> > > > unused values which can't be mistaken for anything else.
> > > >
> > > > > That would actually have a chance at fixing two issues: One where a new
> > > > > page cache insertion is attempted. The other where someone goes to look
> > > > > in the page cache and takes some action _because_ it is empty (I think
> > > > > NFS is doing some of this for file locks).
> > > > >
> > > > > LUF is also pretty fundamentally built on the idea that files can't
> > > > > change without LUF being aware. That model seems to work decently for
> > > > > normal old filesystems on normal old local block devices. I'm worried
> > > > > about NFS, and I don't know how seriously folks take FUSE, but it
> > > > > obviously can't work well for FUSE.
> > > >
> > > > I'm more concerned with:
> > > >
> > > > - page goes back to buddy
> > > > - page is allocated to slab
> > >
> > > At this point, tlb flush needed will be performed in prep_new_page().
> >
> > But that does mean that an unaware caller would get an additional
> > overhead of the flushing, right? I think it would be just a matter of
>
> pcp for locality is already a better source of side channel attack. FYI,
> tlb flush gets barely performed only if pending tlb flush exists.

Right but rare and hard to predict latencies are much worse than
consistent once.

> > time before somebody can turn that into a side channel attack, not to
> > mention unexpected latencies introduced.
>
> Nope. The pending tlb flush performed in prep_new_page() is the one
> that would've done already with the vanilla kernel. It's not additional
> tlb flushes but it's subset of all the skipped ones.

But those skipped once could have happened in a completely different
context (e.g. a different process or even a diffrent security domain),
right?

> It's worth noting all the existing mm reclaim mechaisms have already
> introduced worse unexpected latencies.

Right, but a reclaim, especially direct reclaim, are expected to be
slow. It is much different to see spike latencies on system with a lot
of memory.
--
Michal Hocko
SUSE Labs