On 20.12.24 03:55, Donet Tom wrote:
On 12/19/24 18:28, David Hildenbrand wrote:
On 19.12.24 13:47, Donet Tom wrote:Yes, in this scenario, the migration will fail. Do you think the
The migration selftest is currently failing for shared anonymous
mappings due to a race condition.
During migration, the source folio's PTE is unmapped by nuking the
PTE, flushing the TLB,and then marking the page for migration
(by creating the swap entries). The issue arises when, immediately
after the PTE is nuked and the TLB is flushed, but before the page
is marked for migration, another thread accesses the page. This
triggers a page fault, and the page fault handler invokes
do_pte_missing() instead of do_swap_page(), as the page is not yet
marked for migration.
In the fault handling path, do_pte_missing() calls __do_fault()
->shmem_fault() -> shmem_get_folio_gfp() -> filemap_get_entry().
This eventually calls folio_try_get(), incrementing the reference
count of the folio undergoing migration. The thread then blocks
on folio_lock(), as the migration path holds the lock. This
results in the migration failing in __migrate_folio(), which expects
the folio's reference count to be 2. However, the reference count is
incremented by the fault handler, leading to the failure.
The issue arises because, after nuking the PTE and before marking the
page for migration, the page is accessed. To address this, we have
updated the logic to first nuke the PTE, then mark the page for
migration, and only then flush the TLB. With this patch, If the page is
accessed immediately after nuking the PTE, the TLB entry is still
valid, so no fault occurs.
But what about if the PTE is not in the TLB yet, and you get an access
from another CPU just after clearing the PTE (but not flushing the
TLB)? The other CPU will still observe PTE=none, trigger a fault etc.
migration test
failure, even after a retry, should be considered a major issue that
must be fixed?
I think it is something we should definitely improve, but I think our page migration should handle this in a better way, not the unmap logic. I recall we discussed with Dev some ideas on how to improve that?
I'm pretty sure one can trigger similar case using a tmpfs file and using read/write in a loop instead of memory access -> page faults. So where racing with page faults is completely out of the picture.