Re: [RFC][RT][PATCH 3/4] rtmutex: Revert Optimize rt lock wakeup
From: Steven Rostedt
Date: Mon Jan 03 2011 - 14:06:44 EST
On Tue, 2010-12-28 at 07:06 -0700, Gregory Haskins wrote:
> >>> On 12/23/2010 at 11:54 PM, in message
> > Sure, but as I said, it is mostly broken anyway. I could even insert
> > some tracepoints to show that this is always missed (heck I'll add an
> > unlikely and do the branch profiler ;-)
>
> Well, I think that would be a good datapoint and is one of the things I'd like to see.
OK, here it is, after running a simple "dbench 10":
correct incorrect % Function File Line
------- --------- - -------- ---- ----
150 726979 99 wakeup_next_waiter rtmutex.c 581
And the patch I added:
diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index 318d7ed..dacdcb6 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -578,7 +578,7 @@ static void wakeup_next_waiter(struct rt_mutex *lock, int savestate)
smp_mb();
/* If !RUNNING && !RUNNING_MUTEX */
- if (pendowner->state & ~TASK_RUNNING_MUTEX)
+ if (unlikely(pendowner->state & ~TASK_RUNNING_MUTEX))
wake_up_process_mutex(pendowner);
}
Interesting that we hit 150 times that the new owner was already awake.
Perhaps it was because the new owner was about to spin on a spinlock
after it had put itself into the TASK_INTERRUPTIBLE state, and it was
woken up by someone else?
>
> >
> > The reason is that adaptive spinners spin in some other state than
> > TASK_RUNNING, thus it does not help adaptive spinners at all. I first
> > tried to fix that, but it made dbench run even slower.
>
> This is why I am skeptical. You are essentially asserting there are two issues here, IIUC:
>
> 1) The intent of avoiding a wakeup is broken and we take the double whammy of a mb()
> plus the wakeup() anyway.
Yep, this is what I believe is happening.
>
> 2) mb() is apparently slower than wakeup().
Forget this point, as #1 seems to be the main issue. Also, I'm not sure
it is safe to "fix" this, as I started to try.
>
> I agree (1) is plausible, though I would like to see the traces to confirm. Its been a long time
> since I looked at that code, but I think the original code either ran in RUNNING_MUTEX and was
> inadvertently broken in the mean time or the other cpu would have transitioned to RUNNING on
> its own when we flipped the owner before the release-side check was performed. Or perhaps
> we just plain screwed this up and it was racy ;) I'm not sure. But as Peter (M) stated, it seems
> like a shame to walk away from the concept without further investigation. I think everyone can
> agree that at the very least, if it is in fact taking a double whammy we should fix that.
[ cut out all the 2's since I'm not worried about that now, even if it
is not a problem. ]
Now, the way I was going to fix this is to have the adaptive wait be in
TASK_RUNNING state, and then change the state iff we are going to sleep.
But this can be a bit more race. Especially with Lai's new changes. With
the new changes, the waiter->task does not get nulled anymore.
The test to take the lock, now needs to be under the lock->wait_lock
spinlock. We have to grab that lock, and see if the owner is null, and
that we are the next waiter in the queue. Thus, on taking the lock we
need to have something like this:
if (adaptive_wait(&waiter, orig_owner))
sleep = 1;
else
sleep = 0;
if (sleep)
raw_spin_lock(&lock->wait_lock);
saved_state = rt_set_current_block_state(saved_state);
if (!lock->owner && &waiter == rt_mutex_top_waiter(lock))
sleep = 0;
raw_spin_unlock(&lock->wait_lock);
if (sleep)
schedule_rt_mutex(lock);
saved_state = rt_restore_current_blocked_state(saved_state);
}
Otherwise we can risk the wakeup_next_waiter() missing the wakeup.
To clarify, we want the adaptive_wait() to run as TASK_RUNNING. Then if
we must sleep, then we must set the state to TASK_UNINTERRUPTIBLE, test
again if we can still the lock, and if not then sleep. Otherwise, if a
wakeup happens just before we set the state to TASK_UNINTERRUPTIBLE,
then we miss the wake up all together.
I can do this change, and see what impact it makes.
I'm also curious if this ever worked? If it did not, then are you sure
your tests that show the benefit of it was true. I don't have a large
scale box at my disposal ATM, so I can only see what this does on 4way
machines.
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/