Re: [RFC] The Linux Scheduler: a Decade of Wasted Cores Report

From: Mike Galbraith
Date: Sun Apr 24 2016 - 03:05:32 EST


On Sat, 2016-04-23 at 18:38 -0700, Brendan Gregg wrote:

> The bugs they found seem real, and their analysis is great (although
> using visualizations to find and fix scheduler bugs isn't new), and it
> would be good to see these fixed. However, it would also be useful to
> double check how widespread these issues really are. I suspect many on
> this list can test these patches in different environments.

Part of it sounded to me very much like they're meeting and "fixing"
SMP group fairness. Take the worst case, a threads=cores group of
synchronized threads passing checkpoints in lockstep competing with a
group of one hog: synchronized threads that have a core to themselves
must wait (busy as they mentioned, or sleep) for the straggler thread
who's fair share is a small fraction (1/65 for 64 core box) of a core
to catch up before the group as a unit can proceed.

Without SMP fairness, groups intersecting compete as equals at any
given intersection (assuming shares have not been twiddled), thus a
fully synchronized load can utilize up to 50% of a box [1], whereas
with SMP fairness, worst case load slams head on into a one core wall.
Pondering the progress dependency thingy a bit, seems some degree of
that is likely, thus it logically follows that SMP fairness is likely
to find some non zero delta to multiply by box size.

This came up fairly recently, with a university math department admin
grumbling that cranky professors were beating him bloody. Testing, I
couldn't confirm exactly what he was grumbling about (couldn't figure
out exactly what that was actually), but thinking about it combined
with what I was seeing made me too want to "fix" it by smacking it
squarely between the eyes with my BFH. Turned out that it had grown a
wart though, isn't nearly as bad in the real world (defined as
measuring random generic stuff on my little box;) as idle pondering,
and measurement of slightly dinged up code had indicated. Like
everything else, it cuts both ways.

-Mike

1. IOW do NOT run highly specialized load in generic environment, it is
guaranteed to either suck rock or suck gigantic frick'n boulders.