Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler

From: Halil Pasic
Date: Thu May 20 2021 - 04:38:42 EST


On Wed, 19 May 2021 20:22:02 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Wed, May 19, 2021 at 07:04:46PM -0400, Tony Krowiak wrote:
> >
> >
> > On 5/19/21 12:16 PM, Jason Gunthorpe wrote:
> > > On Wed, May 19, 2021 at 11:39:21AM -0400, Tony Krowiak wrote:
> > >
> > > > @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
> > > > if (!(vcpu->arch.sie_block->eca & ECA_AIV))
> > > > return -EOPNOTSUPP;
> > > > - apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
> > > > - mutex_lock(&matrix_dev->lock);
> > > > + rcu_read_lock();
> > > > + pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
> > > > + if (!pqap_module_hook) {
> > > > + rcu_read_unlock();
> > > > + goto set_status;
> > > > + }
> > > > - if (!vcpu->kvm->arch.crypto.pqap_hook)
> > > > - goto out_unlock;
> > > > - matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
> > > > - struct ap_matrix_mdev, pqap_hook);
> > > > + matrix_mdev = pqap_module_hook->data;
> > > > + rcu_read_unlock();
> > > > + mutex_lock(&matrix_dev->lock);
> > > The matrix_mdev pointer was extracted from the pqap_module_hook,
> > > but now there is nothing protecting it since the rcu was dropped and
> > > it gets freed in vfio_ap_mdev_remove.
> >
> > Therein lies the rub. We can't hold the rcu_read_lock across the
> > entire time that the interception is being processed because of
> > wait conditions in the interception handler. Regardless of whether
> > the pointer to the matrix_mdev is retrieved as the container of
> > or extracted from the pqap_hook, there is nothing protecting it
> > and there appears to be no way to do so using RCU.
>
> RCU is a lock that should only be used for highly performance
> sensitive read work loads.

This is not a highly performance sensitive read workload.

> It eliminates one atomic from a lock, but
> if you go on to immediately do something like try_module_get, which
> has an atomic inside, the whole thing is pointless (assuming
> try_module_get was even the right thing to do)

I'm not sure about this argument, or that RCU should be used only for
highly performance sensitive read workloads. Can you please elaborate on
the argument above and also put your statement in perspective with
https://lwn.net/Articles/263130/?

@Christian: Since you proposed RCU for this, I guess your opinion
does not align with Jason's.

>
> Use a simple sleepable rwsem around the whole thing and forget about
> the module_get. Hold the write side when NULL'ing the pointer.
>

Yes a simple sleepable lock would work, and we wouldn't need
get_module(). Because before the vfio_ap module unloads, it
sets all vcpu->kvm->arch.crypto.pqap_hook instances to NULL. So if
we know that vcpu->kvm->arch.crypto.pqap_hook then we know
that the still have valid references to the module.

> > > And, again, module locking doesn't prevent vfio_ap_mdev_remove() from
> > > being called. None of these patches should be combining module locking
> > > with RCU.
> >
> > Is there any other way besides user interaction with the mdev's
> > sysfs remove interface for the remove callback to get invoked?
>
> There are more options after my re-organizing series.
>
> But I'm not sure why you ask in response to my point about module
> locking? Module locking is not really related to sysfs.
>
> > If I try to remove the mdev using the sysfs interface while the
> > mdev fd is still open by the guest, the remove hangs until the
> > fd is closed.
>
> Yes, it will wait when the vfio_device is being destroyed.
>
> > That being the case, the mdev release callback will get invoked
> > prior to the remove callback being invoked which renders this whole
> > debate moot. What am I missing here?
>
> AFAICT the control flow is such that release can immediately move on
> to remove on the same CPU without an additional synchronization. So
> the kfree can still race with the above.
>
> But the more I look at this the wonkier it is.. The real issue is not
> the matrix_mdev, it is the whole vcpu->kvm->arch.crypto.pqap_hook
>
> This is nonesense too:
>
> if (vcpu->kvm->arch.crypto.pqap_hook) {
> if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> return -EOPNOTSUPP;
> ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);


>
> It should have a lock around it of some kind, not a
> try_module_get. module_get is not la lock.

I tend to agree. In fact I asked for a lock around it several times:
https://www.lkml.org/lkml/2019/3/1/260
https://lkml.org/lkml/2020/12/3/987
https://lkml.org/lkml/2020/12/4/994

But in my opinion RCU is also viable (not this patch). I think, I prefer
a lock for simplicity, unless it is not (deadlocks) ;).

Many thanks for bringing your perspective to this. I'm optimistic about
getting this finally addressed properly.

Regards,
Halil