Re: [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page

From: Chris Mason
Date: Fri Jul 17 2020 - 13:41:42 EST


On 16 Jul 2020, at 6:15, Robbie Ko wrote:

Kirill A. Shutemov æ 2020/7/15 äå4:11 åé:
On Wed, Jul 15, 2020 at 10:45:39AM +0800, Robbie Ko wrote:
Kirill A. Shutemov æ 2020/7/14 äå6:19 åé:
On Tue, Jul 14, 2020 at 11:46:12AM +0200, Vlastimil Babka wrote:
On 7/13/20 3:57 AM, Robbie Ko wrote:
Vlastimil Babka æ 2020/7/10 äå11:31 åé:
On 7/9/20 4:48 AM, robbieko wrote:
From: Robbie Ko <robbieko@xxxxxxxxxxxx>

When a migrate page occurs, we first create a migration entry
to replace the original pte, and then go to fallback_migrate_page
to execute a writeout if the migratepage is not supported.

In the writeout, we will clear the dirty bit of the page and use
page_mkclean to clear the dirty bit along with the corresponding pte,
but page_mkclean does not support migration entry.
I don't follow the scenario.

When we establish migration entries with try_to_unmap(), it transfers
dirty bit from PTE to the page.
Sorry, I mean is _PAGE_RW with pte_write

When we establish migration entries with try_to_unmap(),
we create a migration entry, and if pte_write we set it to SWP_MIGRATION_WRITE,
which will replace the migration entry with the original pte.

When migratepage, we go to fallback_migrate_page to execute a writeout
if the migratepage is not supported.

In the writeout, we call clear_page_dirty_for_io to clear the dirty bit of the page
and use page_mkclean to clear pte _PAGE_RW with pte_wrprotect in page_mkclean_one.

However, page_mkclean_one does not support migration entries, so the
migration entry is still SWP_MIGRATION_WRITE.

In writeout, then we call remove_migration_ptes to remove the migration entry,
because it is still SWP_MIGRATION_WRITE so set _PAGE_RW to pte via pte_mkwrite.

Therefore, subsequent mmap wirte will not trigger page_mkwrite to cause data loss.
Hm, okay.

Folks, is there any good reason why try_to_unmap(TTU_MIGRATION) should not
clear PTE (make the PTE none) for file page?

This, I'm not sure.
But I think that for the fs that support migratepage, when migratepage is finished,
the page should still be dirty, and the pte should still have _PAGE_RW,
when the next mmap write occurs, we don't need to trigger the page_mkwrite again.

I donât know the page migration code well, but youâll need this one as well on the 4.4 kernel you mentioned:

commit 25f3c5021985e885292980d04a1423fd83c967bb
Author: Chris Mason <clm@xxxxxx>
Date: Tue Jan 21 11:51:42 2020 -0500

Btrfs: keep pages dirty when using btrfs_writepage_fixup_worker

And this one as well:

commit 7703bdd8d23e6ef057af3253958a793ec6066b28
Author: Chris Mason <clm@xxxxxx>
Date: Wed Jun 20 07:56:11 2018 -0700

Btrfs: don't clean dirty pages during buffered writes

With those two in place, we havenât found lost data from the migration code, but we did see the fallback migration helper dirtying pages without going through page_mkwrite, which triggers the suboptimal btrfs fixup worker code path. This isnât a yea or nay on the patch, just additional info.

-chris