Re: [PATCH RFC/RFT] sched/fair: Improve the behavior of sync flag

From: Joel Fernandes
Date: Sun Aug 27 2017 - 02:20:00 EST


(Sorry my last reply was incorrectly formatted, resending..)

Hi Mike,

On Sat, Aug 26, 2017 at 10:44 PM, Mike Galbraith <efault@xxxxxx> wrote:
> On Sat, 2017-08-26 at 18:02 -0700, Joel Fernandes wrote:
>> Binder (Android's IPC mechanism) which uses sync wake ups during
synchronous
>> transactions to the scheduler to indicate that the waker is about to
sleep
>> soon. The current wake up path can improved when the sync flag is passed
>> resulting in higher binder performance. In this patch we more strongly
wake up
>> the wakee on the waker's CPU if sync is passed based on a few other
conditions
>> such as wake_cap, cpus allowed. wake_wide is checked only after the
sync flag
>> check so that it doesn't mess up sync. Binder throughput tests see good
>> improvement improvement when waking up wakee (calling thread) on the
waker's
>> CPU (called thread) with this flag. Some tests results are below:
>
> Sync is not a contract, it's a hint. If you really want sync behavior,
> you need to create a contract signed in blood to signal that you really
> really are passing the baton.

Yes that is the usecase of binder, we are really passing the baton when we
pass sync. We also make binder to not pass sync if there's more work todo
and more tasks to wake up. In all current and past products, we have been
using sync has a hard contract as you said. Are you proposing addition of
another flag to differentiate between the existing hint and the contract?

I tried making sync be ignored if wake_wide = 1 as well but its not working
well for our use cases and hurts performance.

> Sync wakeups make tons of sense when the waker really really has one
> and only one wakee, AND really really is going to sleep immediately,

Yes that is the case of binder. If we're going to be doing more work and
waking up others before going to sleep, we wouldn't pass sync.

Binder is actually an RPC mechanism, where the calling thread and called
thread are essentially a single entity but are split across process
boundaries. By using thread pools, we increase the likelihood that there's
a single thread available for each caller which will go back to sleep after
replying.

> with zero overlap that can be converted to throughput by waking to an
> idle core.

That's exactly why the micro benchmark too speeds up, from our observation
the wake up of an idle core increases the latency (it probably also wastes
power).

thanks,

-Joel