On Sun, Aug 08, 2021 at 11:13:28PM +0800, Baolin Wang wrote:
On 2021/8/8 18:26, Matthew Wilcox wrote:
On Sun, Aug 08, 2021 at 10:55:30AM +0800, Baolin Wang wrote:
Hi,
On Fri, Aug 06, 2021 at 11:07:18AM +0800, Baolin Wang wrote:
Hi Matthew,
On Thu, Aug 05, 2021 at 11:05:56PM +0800, Baolin Wang wrote:
We've got the expected count for anonymous page or file page by
expected_page_refs() at the beginning of migrate_page_move_mapping(),
thus we should move the page count validation a little forward to
reduce duplicated code.
Please add an explanation to the changelog for why it's safe to pull
this out from under the i_pages lock.
Sure. In folio_migrate_mapping(), we are sure that the migration page was
isolated from lru list and locked, so I think there are no race to get the
page count without i_pages lock. Please correct me if I missed something
else. Thanks.
Unless the page has been removed from i_pages, this isn't a correct
explanation. Even if it has been removed from i_pages, unless an
RCU grace period has passed, another CPU may still be able to inc the
refcount on it (temporarily). The same is true for the page tables,
by the way; if someone is using get_user_pages_fast(), they may still
be able to see the page.
I don't think this is an issue, cause now we've established a migration pte
for this migration page under page lock. If the user want to get page by
get_user_pages_fast(), it will wait for the page miggration finished by
migration_entry_wait(). So I still think there is no need to check the
migration page count under the i_pages lock.
I don't know whether the patch is correct or not, but you aren't nearly
paranoid enough. Consider this sequence of events:
Thanks for describing this scenario.
CPU 0: CPU 1:
get_user_pages_fast()
lockless_pages_from_mm()
local_irq_save()
gup_pgd_range()
gup_p4d_range()
gup_pud_range()
gup_pmd_range()
gup_pte_range()
pte_t pte = ptep_get_lockless(ptep);
migrate_vma_collect_pmd()
ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl)
ptep_get_and_clear(mm, addr, ptep);
page = pte_page(pte);
set_pte_at(mm, addr, ptep, swp_pte);
migrate_page_move_mapping()
head = try_grab_compound_head(page, 1, flags);
On CPU0, after grab the page count, it will validate the PTE again. If swap
PTE has been established for this page, it will drop the count and go to the
slow path.
if (unlikely(pte_val(pte) != pte_val(*ptep))) {
put_compound_head(head, 1, flags);
goto pte_unmap;
}
So CPU1 can not observe the abnormal higher refcount in this case if I did
not miss anything.
This is a race between CPUs. There is no synchronisation between them,
so CPU 1 can absolutely see the refcount higher temporarily. Yes,
CPU 0 will eventually put the refcount, but CPU 1 can observe it high.