Re: [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90%

From: Byungchul Park
Date: Sun May 26 2024 - 21:57:57 EST


On Fri, May 24, 2024 at 10:16:39AM -0700, Dave Hansen wrote:
> On 5/9/24 23:51, Byungchul Park wrote:
> > To achieve that:
> >
> > 1. For the folios that map only to non-writable tlb entries, prevent
> > tlb flush during unmapping but perform it just before the folios
> > actually become used, out of buddy or pcp.
>
> Is this just _pure_ unmapping (like MADV_DONTNEED), or does it apply to
> changing the memory map, like munmap() itself?

I think it can be applied to any unmapping of ro ones but LUF for now is
working only with unmapping during folio migrion and reclaim.

> > 2. When any non-writable ptes change to writable e.g. through fault
> > handler, give up luf mechanism and perform tlb flush required
> > right away.
> >
> > 3. When a writable mapping is created e.g. through mmap(), give up
> > luf mechanism and perform tlb flush required right away.
>
> Let's say you do this:
>
> fd = open("/some/file", O_RDONLY);
> ptr1 = mmap(-1, size, PROT_READ, ..., fd, ...);
> foo1 = *ptr1;
>
> You now have a read-only PTE pointing to the first page of /some/file.
> Let's say try_to_unmap() comes along and decides it can_luf_folio().
> The page gets pulled out of the page cache and freed, the PTE is zeroed.
> But the TLB is never flushed.
>
> Now, someone does:
>
> fd2 = open("/some/other/file", O_RDONLY);
> ptr2 = mmap(ptr1, size, PROT_READ, MAP_FIXED, fd, ...);
> foo2 = *ptr2;
>
> and they overwrite the old VMA. Does foo2 have the contents of the new
> "/some/other/file" or the old "/some/file"? How does the new mmap()

Good point. It should've give up LUF at the 2nd mmap() in this case.
I will fix it by introducing a new flag in task_struct indicating if LUF
has left stale maps for the task so that LUF can give up and flush right
away in mmap().

> know that there was something to flush?
>
> BTW, the same thing could happen without a new mmap(). Someone could
> modify the file in the middle, maybe even from another process.

Thank you for the pointing out. I will fix it too by introducing a new
flag in inode or something to make LUF aware if updating the file has
been tried so that LUF can give up and flush right away in the case.

Plus, I will add another give-up at code changing the permission of vma
to writable.

Thank you very much.

Byungchul

> fd = open("/some/file", O_RDONLY);
> ptr1 = mmap(-1, size, PROT_READ, ..., fd, ...);
> foo1 = *ptr1;
> // LUF happens here
> // "/some/file" changes
> foo2 = *ptr1; // Does this see the change?