On 10/05/18 00:31, Andrew Morton wrote:
On Fri, 4 May 2018 11:11:46 +0800 Jia He <hejianet@xxxxxxxxx> wrote:
In our armv8a server(QDF2400), I noticed lots of WARN_ON caused by PAGE_SIZE
unaligned for rmap_item->address under memory pressure tests(start 20 guests
and run memhog in the host).
...
In rmap_walk_ksm, the rmap_item->address might still have the STABLE_FLAG,
then the start and end in handle_hva_to_gpa might not be PAGE_SIZE aligned.
Thus it will cause exceptions in handle_hva_to_gpa on arm64.
This patch fixes it by ignoring(not removing) the low bits of address when
doing rmap_walk_ksm.
Signed-off-by: jia.he@xxxxxxxxxxxxxxxx
I assumed you wanted this patch to be committed as
From:jia.he@xxxxxxxxxxxxxxxx rather than From:hejianet@xxxxxxxxx, so I
made that change. Please let me know if this was inappropriate.
You can do this yourself by adding an explicit From: line to the very
start of the patch's email text.
Also, a storm of WARN_ONs is pretty poor behaviour. Is that the only
misbehaviour which this bug causes? Do you think the fix should be
backported into earlier kernels?
I think its just not the WARN_ON(). We do more than what is probably
intended with an unaligned address. i.e, We could be modifying the
flags for other pages that were not affected.
e.g :
In the original report [0], the trace looked like :
[Â 800.511498] [<ffff0000080b4f2c>] kvm_age_hva_handler+0xcc/0xd4
[Â 800.517324] [<ffff0000080b4838>] handle_hva_to_gpa+0xec/0x15c
[Â 800.523063] [<ffff0000080b6c5c>] kvm_age_hva+0x5c/0xcc
[Â 800.528194] [<ffff0000080a7c3c>] kvm_mmu_notifier_clear_flush_young+0x54/0x90
[Â 800.535324] [<ffff00000827a0e8>] __mmu_notifier_clear_flush_young+0x6c/0xa8
[Â 800.542279] [<ffff00000825a644>] page_referenced_one+0x1e0/0x1fc
[Â 800.548279] [<ffff00000827e8f8>] rmap_walk_ksm+0x124/0x1a0
[Â 800.553759] [<ffff00000825c974>] rmap_walk+0x94/0x98
[Â 800.558717] [<ffff00000825ca98>] page_referenced+0x120/0x180
[Â 800.564369] [<ffff000008228c58>] shrink_active_list+0x218/0x4a4
[Â 800.570281] [<ffff000008229470>] shrink_node_memcg+0x58c/0x6fc
[Â 800.576107] [<ffff0000082296c4>] shrink_node+0xe4/0x328
[Â 800.581325] [<ffff000008229c9c>] do_try_to_free_pages+0xe4/0x3b8
[Â 800.587324] [<ffff00000822a094>] try_to_free_pages+0x124/0x234
[Â 800.593150] [<ffff000008216aa0>] __alloc_pages_nodemask+0x564/0xf7c
[Â 800.599412] [<ffff000008292814>] khugepaged_alloc_page+0x38/0xb8
[Â 800.605411] [<ffff0000082933bc>] collapse_huge_page+0x74/0xd70
[Â 800.611238] [<ffff00000829470c>] khugepaged_scan_mm_slot+0x654/0xa98
[Â 800.617585] [<ffff000008294e0c>] khugepaged+0x2bc/0x49c
[Â 800.622803] [<ffff0000080ffb70>] kthread+0x124/0x150
[Â 800.627762] [<ffff0000080849f0>] ret_from_fork+0x10/0x1c
[Â 800.633066] ---[ end trace 944c130b5252fb01 ]---
Now, the ksm wants to mark *a page* as referenced via page_referenced_one(),
passing it an unaligned address. This could eventually turn out to be
one of :
ptep_clear_flush_young_notify(address, address + PAGE_SIZE)
or
pmdp_clear_flush_young_notify(address, address + PMD_SIZE)
which now spans two pages/pmds and the notifier consumer might
take an action on the second page as well, which is not something
intended. So, I do think that old behavior is wrong and has other
side effects as mentioned above.
[0] https://lkml.kernel.org/r/1525244911-5519-1-git-send-email-hejianet@xxxxxxxxx
Suzuki