Re: [PATCH] s390/vfio-ap: do not use open locks during VFIO_GROUP_NOTIFY_SET_KVM notification
From: Halil Pasic
Date: Wed Jun 30 2021 - 18:39:53 EST
On Wed, 30 Jun 2021 10:31:22 -0400
Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:
> On 6/28/21 4:29 PM, Halil Pasic wrote:
> > On Fri, 25 Jun 2021 18:07:58 -0400
> > Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:
> >
> > What is a suitable base for this patch. I've tried the usual suspects,
> > but none of them worked.
>
> I discovered what the problem is here. The patch is based on our
> master branch along with the two pre-requisite patches that were
> recently reviewed and are currently being merged. The two patches
> of which I speak are:
> * [PATCH v6 1/2] s390/vfio-ap: clean up mdev resources when remove
> callback invoked
> Message ID: <20210621155714.1198545-2-akrowiak@xxxxxxxxxxxxx>
>
> * [PATCH v6 2/2] s390/vfio-ap: r/w lock for PQAP interception handler
> function pointer
> <20210621155714.1198545-3-akrowiak@xxxxxxxxxxxxx>
>
> I probably should have included those along with this one.
Either that, or state in the cover letter that those are prerequisites.
>
> >
> >> The fix to resolve a lockdep splat while handling the
> >> VFIO_GROUP_NOTIFY_SET_KVM event introduced a kvm_busy flag indicating that
> >> the vfio_ap device driver is busy setting or unsetting the KVM pointer.
> >> A wait queue was employed to allow functions requiring access to the KVM
> >> pointer to wait for the kvm_busy flag to be cleared. For the duration of
> >> the wait period, the mdev lock was unlocked then acquired again after the
> >> kvm_busy flag was cleared. This got rid of the lockdep report, but didn't
> >> really resolve the problem.
> > Can you please elaborate on the last point. You mean that we can have
> > circular locking even after 0cc00c8d4050, but instead of getting stuck in
> > on a lock we will get stuck on wait_event_cmd()? If that is it, please
> > state it clearly in the description, and if you can to it in the short
> > description.
>
> This patch was in response to the following review comments made by Jason
> Gunthorpe:
>
> * Message ID: <20210525162927.GC1002214@xxxxxxxxxx>
> "... the kvm_busy should be replaced by a proper rwsem,
> don't try to open code locks like that - it just defeats lockdep
> analysis".
>
> * Message ID: <20210527112433.GX1002214@xxxxxxxxxx>
> "Usually when people start open coding locks it is often
> because lockdep complained. Open coding a lock makes
> lockdep stop because the lockdep code
> is removed, but it doesn't fix anything. The kvm_busy
> should be replaced by a proper rwsem, don't try to
> open code locks like that - it just defeats lockdep
> analysis."
>
> I will paraphrase and include the information from Jason's
> comments in the description.
>
This does not answer my questions.
I'm in favor of Jason's proposal, because it is much easier to
comprehend simple rwsem protected than a mutex + wait_queue dance.
I think Jason was talking about open coding locks in general. I don't
consider it as proof of commit 0cc00c8d4050 not doing what it
advertised. You can add a Suggested-by tag if you like, but you should
be able to tell us what is the merit of your patch.
Regards,
Halil