Re: [PATCH 01/13] mm: Update ptep_get_lockless()s comment

From: Nadav Amit
Date: Thu Oct 27 2022 - 16:15:39 EST


On Oct 27, 2022, at 11:13 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> Anybody willing to try to write up the rules (and have each rule
> document *why* it's a rule - not just "by fiat", but an actual "these
> are the rules and this is *why* they are the rules").
>
> Because right now I think all of our rules are almost entirely just
> encoded in the code, with a couple of comments, and a few people who
> just remember why we do what we do.

I think it might be easier to come up with new rules instead of phrasing the
existing ones.

The approach I suggested before [1] is something like:

1. Turn x86’s TLB-generation mechanism to be generic. Turn the
TLB-generation into “pending TLB-generation”.

2. For each mm track “completed TLB-generation”, whenever an actual flush
takes place.

3. When you defer a TLB-flush, while holding the PTL:
a. Increase the TLB-generation.
b. Save the updated “table generation" in a new field in the
page-table’s page-struct.

4. When you are about to rely on a PTE value that is read from a page-table,
first check if a TLB flush is needed. The check is performed by comparing
the “table generation” with the “completed generation”. If the “table
generation” is behind, a TLB flush is needed.

[ You rely on the PTE value when you install new PTEs or change them ]

That’s about it. I might have not covered some issues with fast-GUP. But in
general I think it is a simple scheme. The thing I like about this scheme
the most is that it avoids relying on almost all the OS data-structures
(e.g., PageAnon()), making it much easier to grasp.

I can revive the patch-set if the overall approach is agreeable.


[1] https://lore.kernel.org/lkml/20210131001132.3368247-1-namit@xxxxxxxxxx/