Re: [RFC][PATCH] kvm: add suspend pm-notifier

From: Marc Zyngier
Date: Fri Jun 04 2021 - 06:04:02 EST


On Fri, 04 Jun 2021 10:20:28 +0100,
Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> wrote:
>
> On (21/06/04 09:46), Marc Zyngier wrote:
> [..]
> > > +void kvm_arch_pm_notifier(struct kvm *kvm)
> > > +{
> > > +#ifdef CONFIG_PM
> > > + int c;
> > > +
> > > + mutex_lock(&kvm->lock);
> > > + for (c = 0; c < kvm->created_vcpus; c++) {
> > > + struct kvm_vcpu *vcpu = kvm->vcpus[c];
> > > + int r;
> > > +
> > > + if (!vcpu)
> > > + continue;
> >
> > Wouldn't kvm_for_each_vcpu() avoid this kind of checks?
>
> Right, that's what I do in v2, which I haven't posted yet.
>
> [..]
> > > +#include <linux/notifier.h>
> > > +
> > > #ifndef KVM_MAX_VCPU_ID
> > > #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS
> > > #endif
> > > @@ -579,6 +581,10 @@ struct kvm {
> > > pid_t userspace_pid;
> > > unsigned int max_halt_poll_ns;
> > > u32 dirty_ring_size;
> > > +
> > > +#ifdef CONFIG_PM
> > > + struct notifier_block pm_notifier;
> > > +#endif
> >
> > I'd certainly like to be able to opt out from this on architectures
> > that do not implement anything useful in the PM callbacks.
>
> Well on the other hand PM-callbacks are harmless on those archs, they
> won't overload the __weak function.

I don't care much for the callbacks. But struct kvm is bloated enough,
and I'd be happy not to have this structure embedded in it if I can
avoid it.

>
> > Please consider making this an independent config option that individual
> > archs can buy into.
>
> Sure, I can take a look into this, but how is this better than __weak
> function? (that's a real question)

Weak functions are OK AFAIC. More crud in struct kvm is what I'm not
OK with.

>
> [..]
> > > +#ifdef CONFIG_PM
> > > +static int kvm_pm_notifier_call(struct notifier_block *bl,
> > > + unsigned long state,
> > > + void *unused)
> > > +{
> > > + struct kvm *kvm = container_of(bl, struct kvm, pm_notifier);
> > > +
> > > + switch (state) {
> > > + case PM_HIBERNATION_PREPARE:
> > > + case PM_SUSPEND_PREPARE:
> > > + kvm_arch_pm_notifier(kvm);
> >
> > How about passing the state to the notifier callback? I'd expect it to
> > be useful to do something on resume too.
>
> For different states we can have different kvm_arch functions instead.
> kvm_arch_pm_notifier() can be renamed to kvm_arch_suspend_notifier(),
> so that we don't need to have `switch (state)` in every arch-code. Then
> for resume/post resume states we can have kvm_arch_resume_notifier()
> arch functions.

I'd rather we keep an arch API that is similar to the one the rest of
the kernel has, instead of a flurry of small helpers that need to grow
each time someone adds a new PM state. A switch() in the arch-specific
implementation is absolutely fine.

>
> > > + break;
> > > + }
> > > + return NOTIFY_DONE;
> > > +}
> > > +
> > > +static void kvm_init_pm_notifier(struct kvm *kvm)
> > > +{
> > > + kvm->pm_notifier.notifier_call = kvm_pm_notifier_call;
> > > + kvm->pm_notifier.priority = INT_MAX;
> >
> > How is this priority determined?
>
> Nothing magical here. I want this to be executed first, before we suspend
> ftrace, RCU and the like. Besides KVM is usually the last one to register
> its PM callbacks, so there can be something on the notifier list that
> returns NOTIFY_STOP_MASK in front of KVM PM-notifier list entry.

Which begs the question: should arch-specific code be able to veto
suspend and return an error itself? Always returning NOTIFY_DONE seems
highly optimistic.

M.

--
Without deviation from the norm, progress is not possible.