Re: [RFC PATCH v2 1/7] mm: pgtable: make pte_offset_map_nolock() return pmdval

From: Qi Zheng
Date: Fri Aug 16 2024 - 05:22:16 EST


Hi David,

On 2024/8/16 16:59, David Hildenbrand wrote:
On 12.08.24 08:21, Qi Zheng wrote:
Hi David,

On 2024/8/10 00:54, David Hildenbrand wrote:
On 07.08.24 05:08, Qi Zheng wrote:
Hi David,

On 2024/8/6 22:16, David Hildenbrand wrote:
On 06.08.24 04:40, Qi Zheng wrote:
Hi David,

On 2024/8/5 22:43, David Hildenbrand wrote:
On 05.08.24 14:55, Qi Zheng wrote:
Make pte_offset_map_nolock() return pmdval so that we can recheck the
*pmd once the lock is taken. This is a preparation for freeing empty
PTE pages, no functional changes are expected.

Skimming the patches, only patch #4 updates one of the callsites
(collapse_pte_mapped_thp).

In addition, retract_page_tables() and reclaim_pgtables_pmd_entry()
also used the pmdval returned by pte_offset_map_nolock().

Right, and I am questioning if only touching these two is sufficient,
and how we can make it clearer when someone actually has to recheck the
PMD.



Wouldn't we have to recheck if the PMD val changed in more cases after
taking the PTL?

If not, would it make sense to have a separate function that
returns the
pmdval and we won't have to update each and every callsite?

pte_offset_map_nolock() had already obtained the pmdval previously,
just
hadn't returned it. And updating those callsite is simple, so I think
there may not be a need to add a separate function.

Let me ask this way: why is retract_page_tables() and
reclaim_pgtables_pmd_entry() different to the other ones, and how would
someone using pte_offset_map_nolock() know what's to do here?

If we acuqire the PTL (PTE or PMD lock) after calling
pte_offset_map_nolock(), it means we may be modifying the corresponding
pte or pmd entry. In that case, we need to perform a pmd_same() check
after holding the PTL, just like in pte_offset_map_lock(), to prevent
the possibility of the PTE page being reclaimed at that time.

Okay, what I thought.


If we call pte_offset_map_nolock() and do not need to acquire the PTL
afterwards, it means we are only reading the PTE page. In this case, the
rcu_read_lock() in pte_offset_map_nolock() will ensure that the PTE page
cannot be reclaimed.


IIUC, we must check the PMDVAL after taking the PTL in case

(a) we want to modify the page table to turn pte_none() entries to
       !pte_none(). Because it could be that the page table was
removed and
       now is all pte_none()

(b) we want to remove the page table ourselves and want to check if it
       has already been removed.

Is that it?

Yes.


So my thinking is if another function variant can make that clearer.

OK, how about naming it pte_offset_map_before_lock?

That's the issue with some of the code: for example in
filemap_fault_recheck_pte_none() we'll call pte_offset_map_nolock() and
conditionally take the PTL. But we won't be modifying the pages tables.

Maybe something like:

pte_offset_map_readonly_nolock()

and

pte_offset_map_maywrite_nolock()

The latter would require you to pass the PMD pointer such that you have
to really mess up to ignore what to do with it (check PMD same or not
check PMD same if you really know what you are douing).

The first would not take a PMD pointer at all, because there is no need to.

These two function names LGTM. Will do in the next version.

That is probably something you can send as a separate patch independent of this full series.

I think it makes sense. I am analyzing whether the existing path of
calling pte_offset_map_nolock + spin_lock will be concurrent with
the path of reclaiming PTE pages in THP. If so, it actually needs to
be fixed first.


Then we might also get more review+thoughts from other folks!

I hope so!

Thanks,
Qi