Re: [PATCH v5] locking/rwsem: Make handoff bit handling more consistent

From: Doug Anderson
Date: Tue Aug 30 2022 - 12:18:42 EST


Hi,

On Fri, Aug 5, 2022 at 12:16 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> On Fri, Aug 5, 2022 at 12:02 PM Waiman Long <longman@xxxxxxxxxx> wrote:
> >
> >
> > On 8/5/22 13:14, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Fri, Jul 22, 2022 at 5:17 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
> > >> On Fri, 22 Jul 2022 07:02:42 -0700 Doug Anderson wrote:
> > >>> Thanks! I added this diff to your previous diff and my simple test
> > >>> still passes and I don't see your WARN_ON triggered.
> > >> Thanks!
> > >>> How do we move forward? Are you going to officially submit a patch
> > >>> with both of your diffs squashed together? Are we waiting for
> > >>> additional review from someone?
> > >> Given it is not unusual for us to miss anything important, lets take
> > >> a RWSEM_WAIT_TIMEOUT nap now or two.
> > > It appears that another fix has landed in the meantime. Commit
> > > 6eebd5fb2083 ("locking/rwsem: Allow slowpath writer to ignore handoff
> > > bit if not set by first waiter").
> > >
> > > ...unfortunately with that patch my test cases still hangs. :(
> >
> > The aim of commit 6eebd5fb2083 ("locking/rwsem: Allow slowpath writer to
> > ignore handoff bit if not set by first waiter") is to restore slowpath
> > writer behavior to be the same as before commit d257cc8cb8d5
> > ("locking/rwsem: Make handoff bit handling more consistent").
>
> Ah, OK. I just saw another fix to the same commit and assumed that
> perhaps it was intended to address the same issue.
>
>
> > If the hang still exists, there may be other cause for it. Could you
> > share more information about what the test case is doing and any kernel
> > splat that you have?
>
> It's all described in my earlier reply including my full test case:
>
> https://lore.kernel.org/r/CAD=FV=URCo5xv3k3jWbxV1uRkUU5k6bcnuB1puZhxayEyVc6-A@xxxxxxxxxxxxxx
>
> Previously I tested Hillf's patches and they fixed it for me.

Hillf: do you have any plan here for your patches?

I spent some time re-testing this today atop mainline, specifically
atop commit dcf8e5633e2e ("tracing: Define the is_signed_type() macro
once"). Some notes:

1. I can confirm that my test case still reproduces a hang on
mainline, though it seems a bit harder to reproduce (sometimes I have
to run for a few minutes). I didn't spend lots of time confirming that
the hang is exactly the same, but the same testcase reproduces it so
it probably is. If it's important I can drop into kgdb and dig around
to confirm.

2. Blindly applying the first (and resolving the trivial merge
conflict) or both of your proposed patches no longer fixes the hang on
mainline.

3. Reverting Waiman's commit 6eebd5fb2083 ("locking/rwsem: Allow
slowpath writer to ignore handoff bit if not set by first waiter") and
then applying your two fixes _does_ still fix the patch on mainline. I
ran for 20 minutes w/ no reproduction.

So it seems like Waiman's recent commit interacts with your fix in a bad way. :(

-Doug







-Doug