Re: [PATCH RFC 1/2] sched: Minimize the idle cpu selection race window.
From: Joel Fernandes
Date: Sat Nov 04 2017 - 20:58:17 EST
Hi Peter,
On Tue, Oct 31, 2017 at 1:20 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Tue, Oct 31, 2017 at 12:27:41AM -0500, Atish Patra wrote:
>> Currently, multiple tasks can wakeup on same cpu from
>> select_idle_sibiling() path in case they wakeup simulatenously
>> and last ran on the same llc. This happens because an idle cpu
>> is not updated until idle task is scheduled out. Any task waking
>> during that period may potentially select that cpu for a wakeup
>> candidate.
>>
>> Introduce a per cpu variable that is set as soon as a cpu is
>> selected for wakeup for any task. This prevents from other tasks
>> to select the same cpu again. Note: This does not close the race
>> window but minimizes it to accessing the per-cpu variable. If two
>> wakee tasks access the per cpu variable at the same time, they may
>> select the same cpu again. But it minimizes the race window
>> considerably.
>
> The very most important question; does it actually help? What
> benchmarks, give what numbers?
I collected some numbers with an Android benchmark called Jankbench.
Most tests didn't show an improvement or degradation with the patch.
However, one of the tests called "list view", consistently shows an
improvement. Particularly striking is the improvement at mean and 25
percentile.
For list_view test, Jankbench pulls up a list of text and scrolls the
list, this exercises the display pipeline in Android to render and
display the animation as the scroll happens. For Android, lower frame
times is considered quite important as that means we are less likely
to drop frames and give the user a good experience vs a perceivable
poor experience.
For each frame, Jankbench measures the total time a frame takes and
stores it in a DB (the time from which the app starts drawing, to when
the rendering completes and the frame is submitted for display).
Following is the distribution of frame times in ms.
count 16304 (@60 fps, 4.5 minutes)
Without patch With patch
mean 5.196633 4.429641 (+14.75%)
std 2.030054 2.310025
25% 5.606810 1.991017 (+64.48%)
50% 5.824013 5.716631 (+1.84%)
75% 5.987102 5.932751 (+0.90%)
95% 6.461230 6.301318 (+2.47%)
99% 9.828959 9.697076 (+1.34%)
Note that although Android uses energy aware scheduling patches, I
turned those off to bring the test as close to mainline as possible. I
also backported Vincent's and Brendan's slow path fixes to the 4.4
kernel that the Pixel 2 uses.
Personally I am in favor of this patch considering this test data but
also that in the past, I remember that our teams had to deal with the
same race issue and used cpusets to avoid it (although they probably
tested with "energy aware" CPU selection kept on).
thanks,
- Joel