Re: [PATCH] virtio_balloon: fix another race between migration and ballooning

From: jiang.biao2
Date: Tue Jul 24 2018 - 20:40:52 EST


Ping.....
> Kernel panic when with high memory pressure, calltrace looks like,
>
> PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java"
>#0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb
> #1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942
> #2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30
> #3 [ffff881ec7ed7778] oops_end at ffffffff816902c8
> #4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46
> #5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc
> #6 [ffff881ec7ed7838] __node_set at ffffffff81680300
> #7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f
> #8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5
> #9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8
> [exception RIP: _raw_spin_lock_irqsave+47]
> RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046
> RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8
> RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008
> RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098
> R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000
>R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #10 [ffff881ec7ed79b0] balloon_page_putback at ffffffff811fbfb9
> #11 [ffff881ec7ed79e0] putback_movable_pages at ffffffff811e3155
> #12 [ffff881ec7ed7a10] compact_zone at ffffffff811a843f
> #13 [ffff881ec7ed7a60] compact_zone_order at ffffffff811a85ac
> #14 [ffff881ec7ed7b00] try_to_compact_pages at ffffffff811a8961
> #15 [ffff881ec7ed7b60] __alloc_pages_direct_compact at ffffffff816827d6
> #16 [ffff881ec7ed7bc0] __alloc_pages_slowpath at ffffffff81682f64
> #17 [ffff881ec7ed7cb0] __alloc_pages_nodemask at ffffffff8118b775
> #18 [ffff881ec7ed7d60] alloc_pages_vma at ffffffff811d2a6a
> #19 [ffff881ec7ed7dc8] do_huge_pmd_anonymous_page at ffffffff811ebf93
> #20 [ffff881ec7ed7e28] handle_mm_fault at ffffffff811b1c1f
> #21 [ffff881ec7ed7ec0] __do_page_fault at ffffffff81692f84
> #22 [ffff881ec7ed7f20] do_page_fault at ffffffff816932b5
> #23 [ffff881ec7ed7f50] page_fault at ffffffff8168f4c8
>
> It happens in the pagefault and results in double pagefault
> during compacting pages when memory allocation fails.
>
> Analysed the vmcore, the page leads to second pagefault is corrupted
> with _mapcount=-256, but private=0.
>
> It's caused by the race between migration and ballooning, and lock
> missing in virtballoon_migratepage() of virtio_balloon driver.
> This patch fix the bug.
>
> Signed-off-by: Jiang Biao <jiang.biao2@xxxxxxxxxx>
> Signed-off-by: Huang Chong <huang.chong@xxxxxxxxxx>
> ---
> drivers/virtio/virtio_balloon.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 6b237e3..3988c09 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -513,7 +513,9 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
> tell_host(vb, vb->inflate_vq);
> /* balloon's page migration 2nd step -- deflate "page" */
> + spin_lock_irqsave(&vb_dev_info->pages_lock, flags);
> balloon_page_delete(page);
> + spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags);
> vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
> set_page_pfns(vb, vb->pfns, page);
> tell_host(vb, vb->deflate_vq);
> --
> 2.7.4