[PATCH] virtio_balloon: fix another race between migration and ballooning
From: Jiang Biao
Date: Tue Jul 17 2018 - 22:31:06 EST
Kernel panic when with high memory pressure, calltrace looks like,
PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java"
#0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb
#1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942
#2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30
#3 [ffff881ec7ed7778] oops_end at ffffffff816902c8
#4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46
#5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc
#6 [ffff881ec7ed7838] __node_set at ffffffff81680300
#7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f
#8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5
#9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8
[exception RIP: _raw_spin_lock_irqsave+47]
RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046
RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8
RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008
RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098
R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff881ec7ed79b0] balloon_page_putback at ffffffff811fbfb9
#11 [ffff881ec7ed79e0] putback_movable_pages at ffffffff811e3155
#12 [ffff881ec7ed7a10] compact_zone at ffffffff811a843f
#13 [ffff881ec7ed7a60] compact_zone_order at ffffffff811a85ac
#14 [ffff881ec7ed7b00] try_to_compact_pages at ffffffff811a8961
#15 [ffff881ec7ed7b60] __alloc_pages_direct_compact at ffffffff816827d6
#16 [ffff881ec7ed7bc0] __alloc_pages_slowpath at ffffffff81682f64
#17 [ffff881ec7ed7cb0] __alloc_pages_nodemask at ffffffff8118b775
#18 [ffff881ec7ed7d60] alloc_pages_vma at ffffffff811d2a6a
#19 [ffff881ec7ed7dc8] do_huge_pmd_anonymous_page at ffffffff811ebf93
#20 [ffff881ec7ed7e28] handle_mm_fault at ffffffff811b1c1f
#21 [ffff881ec7ed7ec0] __do_page_fault at ffffffff81692f84
#22 [ffff881ec7ed7f20] do_page_fault at ffffffff816932b5
#23 [ffff881ec7ed7f50] page_fault at ffffffff8168f4c8
It happens in the pagefault and results in double pagefault
during compacting pages when memory allocation fails.
Analysed the vmcore, the page leads to second pagefault is corrupted
with _mapcount=-256, but private=0.
It's caused by the race between migration and ballooning, and lock
missing in virtballoon_migratepage() of virtio_balloon driver.
This patch fix the bug.
Signed-off-by: Jiang Biao <jiang.biao2@xxxxxxxxxx>
Signed-off-by: Huang Chong <huang.chong@xxxxxxxxxx>
---
drivers/virtio/virtio_balloon.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 6b237e3..3988c09 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -513,7 +513,9 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
tell_host(vb, vb->inflate_vq);
/* balloon's page migration 2nd step -- deflate "page" */
+ spin_lock_irqsave(&vb_dev_info->pages_lock, flags);
balloon_page_delete(page);
+ spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags);
vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
set_page_pfns(vb, vb->pfns, page);
tell_host(vb, vb->deflate_vq);
--
2.7.4