Re: [PATCH] KVM: SEV: Add cond_resched() to loop in sev_clflush_pages()
From: Peter Gonda
Date: Tue Apr 05 2022 - 19:18:14 EST
On Thu, Mar 31, 2022 at 3:31 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Wed, Mar 30, 2022, Mingwei Zhang wrote:
> > On Wed, Mar 30, 2022 at 9:43 AM Peter Gonda <pgonda@xxxxxxxxxx> wrote:
> > >
> > > Add resched to avoid warning from sev_clflush_pages() with large number
> > > of pages.
> > >
> > > Signed-off-by: Peter Gonda <pgonda@xxxxxxxxxx>
> > > Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
> > > Cc: kvm@xxxxxxxxxxxxxxx
> > > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > >
> > > ---
> > > Here is a warning similar to what I've seen many times running large SEV
> > > VMs:
> > > [ 357.714051] CPU 15: need_resched set for > 52000222 ns (52 ticks) without schedule
> > > [ 357.721623] WARNING: CPU: 15 PID: 35848 at kernel/sched/core.c:3733 scheduler_tick+0x2f9/0x3f0
> > > [ 357.730222] Modules linked in: kvm_amd uhaul vfat fat hdi2_standard_ftl hdi2_megablocks hdi2_pmc hdi2_pmc_eeprom hdi2 stg elephant_dev_num ccp i2c_mux_ltc4306 i2c_mux i2c_via_ipmi i2c_piix4 google_bmc_usb google_bmc_gpioi2c_mb_common google_bmc_mailbox cdc_acm xhci_pci xhci_hcd sha3_generic gq nv_p2p_glue accel_class
> > > [ 357.758261] CPU: 15 PID: 35848 Comm: switchto-defaul Not tainted 4.15.0-smp-DEV #11
> > > [ 357.765912] Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 30.20.2-gce 11/05/2021
> > > [ 357.779372] RIP: 0010:scheduler_tick+0x2f9/0x3f0
> > > [ 357.783988] RSP: 0018:ffff98558d1c3dd8 EFLAGS: 00010046
> > > [ 357.789207] RAX: 741f23206aa8dc00 RBX: 0000005349236a42 RCX: 0000000000000007
> > > [ 357.796339] RDX: 0000000000000006 RSI: 0000000000000002 RDI: ffff98558d1d5a98
> > > [ 357.803463] RBP: ffff98558d1c3ea0 R08: 0000000000100ceb R09: 0000000000000000
> > > [ 357.810597] R10: ffff98558c958c00 R11: ffffffff94850740 R12: 00000000031975de
> > > [ 357.817729] R13: 0000000000000000 R14: ffff98558d1e2640 R15: ffff98525739ea40
> > > [ 357.824862] FS: 00007f87503eb700(0000) GS:ffff98558d1c0000(0000) knlGS:0000000000000000
> > > [ 357.832948] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 357.838695] CR2: 00005572fe74b080 CR3: 0000007bea706006 CR4: 0000000000360ef0
> > > [ 357.845828] Call Trace:
> > > [ 357.848277] <IRQ>
> > > [ 357.850294] [<ffffffff94411420>] ? tick_setup_sched_timer+0x130/0x130
> > > [ 357.856818] [<ffffffff943ed60d>] ? rcu_sched_clock_irq+0x6ed/0x850
> > > [ 357.863084] [<ffffffff943fdf02>] ? __run_timers+0x42/0x260
> > > [ 357.868654] [<ffffffff94411420>] ? tick_setup_sched_timer+0x130/0x130
> > > [ 357.875182] [<ffffffff943fd35b>] update_process_times+0x7b/0x90
> > > [ 357.881188] [<ffffffff944114a2>] tick_sched_timer+0x82/0xd0
> > > [ 357.886845] [<ffffffff94400671>] __run_hrtimer+0x81/0x200
> > > [ 357.892331] [<ffffffff943ff222>] hrtimer_interrupt+0x192/0x450
> > > [ 357.898252] [<ffffffff950002fa>] ? __do_softirq+0x2fa/0x33e
> > > [ 357.903911] [<ffffffff94e02edc>] smp_apic_timer_interrupt+0xac/0x1d0
> > > [ 357.910349] [<ffffffff94e01ef6>] apic_timer_interrupt+0x86/0x90
> > > [ 357.916347] </IRQ>
> > > [ 357.918452] RIP: 0010:clflush_cache_range+0x3f/0x50
> > > [ 357.923324] RSP: 0018:ffff98529af89cc0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff12
> > > [ 357.930889] RAX: 0000000000000040 RBX: 0000000000038135 RCX: ffff985233d36000
> > > [ 357.938013] RDX: ffff985233d36000 RSI: 0000000000001000 RDI: ffff985233d35000
> > > [ 357.945145] RBP: ffff98529af89cc0 R08: 0000000000000001 R09: ffffb5753fb23000
> > > [ 357.952271] R10: 000000000003fe00 R11: 0000000000000008 R12: 0000000000040000
> > > [ 357.959401] R13: ffff98525739ea40 R14: ffffb5753fb22000 R15: ffff98532a58dd80
> > > [ 357.966536] [<ffffffffc07afd41>] svm_register_enc_region+0xd1/0x170 [kvm_amd]
> > > [ 357.973758] [<ffffffff94246e8c>] kvm_arch_vm_ioctl+0x84c/0xb00
> > > [ 357.979677] [<ffffffff9455980f>] ? handle_mm_fault+0x6ff/0x1370
> > > [ 357.985683] [<ffffffff9423412b>] kvm_vm_ioctl+0x69b/0x720
> > > [ 357.991167] [<ffffffff945dfd9d>] do_vfs_ioctl+0x47d/0x680
> > > [ 357.996654] [<ffffffff945e0188>] SyS_ioctl+0x68/0x90
> > > [ 358.001706] [<ffffffff942066f1>] do_syscall_64+0x71/0x110
> > > [ 358.007192] [<ffffffff94e00081>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > >
> > > Tested by running a large 256gib SEV VM several times, saw no warnings.
> > > Without the change warnings are seen.
>
> Clean up the splat (remove timestamps, everything with a ?, etc... I believe there
> is a kernel scripts/ to do this...) and throw it in the changelog. Documenting the
> exact problem is very helpful, e.g. future readers may wonder "what warning?".
Paolo has queued this I think, so I'll do this next time I am fixing a
warning. Thanks Sean.
>
>
> > > ---
> > > arch/x86/kvm/svm/sev.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > index 75fa6dd268f0..c2fe89ecdb2d 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -465,6 +465,7 @@ static void sev_clflush_pages(struct page *pages[], unsigned long npages)
> > > page_virtual = kmap_atomic(pages[i]);
> > > clflush_cache_range(page_virtual, PAGE_SIZE);
> > > kunmap_atomic(page_virtual);
> > > + cond_resched();
> >
> > If you add cond_resched() here, the frequency (once per 4K) might be
> > too high. You may want to do it once per X pages, where X could be
> > something like 1G/4K?
>
> No, every iteration is perfectly ok. The "cond"itional part means that this will
> reschedule if and only if it actually needs to be rescheduled, e.g. if the task's
> timeslice as expired. The check for a needed reschedule is cheap, using
> cond_resched() in tight-ish loops is ok and intended, e.g. KVM does a reched
> check prior to enterring the guest.