Re: [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking

From: Baolin Wang
Date: Fri Jul 26 2024 - 05:39:08 EST

Next message: kernel test robot: "Re: [PATCH 2/2] ASoC: constify snd_soc_component_driver struct"
Previous message: Ron Economos: "Re: [PATCH 5.15 00/90] 5.15.164-rc2 review"
In reply to: David Hildenbrand: "Re: [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"
Next in thread: David Hildenbrand: "Re: [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2024/7/26 16:04, David Hildenbrand wrote:

On 26.07.24 04:33, Baolin Wang wrote:

On 2024/7/26 02:39, David Hildenbrand wrote:

We recently made GUP's common page table walking code to also walk
hugetlb VMAs without most hugetlb special-casing, preparing for the
future of having less hugetlb-specific page table walking code in the
codebase. Turns out that we missed one page table locking detail: page
table locking for hugetlb folios that are not mapped using a single
PMD/PUD.

Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
page tables, will perform a pte_offset_map_lock() to grab the PTE table
lock.

However, hugetlb that concurrently modifies these page tables would
actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
locks would differ. Something similar can happen right now with hugetlb
folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.

Let's make huge_pte_lockptr() effectively uses the same PT locks as any
core-mm page table walker would.

Thanks for raising the issue again. I remember fixing this issue 2 years
ago in commit fac35ba763ed ("mm/hugetlb: fix races when looking up a
CONT-PTE/PMD size hugetlb page"), but it seems to be broken again.

Ah, right! We fixed it by rerouting to hugetlb code that we then removed :D

Did we have a reproducer back then that would make my live easier?

I don't have any reproducers right now. I remember I added some ugly hack code (adding delay() etc.) in kernel to analyze this issue, and not easy to reproduce. :(

Next message: kernel test robot: "Re: [PATCH 2/2] ASoC: constify snd_soc_component_driver struct"
Previous message: Ron Economos: "Re: [PATCH 5.15 00/90] 5.15.164-rc2 review"
In reply to: David Hildenbrand: "Re: [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"
Next in thread: David Hildenbrand: "Re: [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]