Re: Scheduler balancing statistics

From: Nick Piggin
Date: Fri Apr 02 2004 - 02:56:33 EST


Rick Lindsley wrote:
The important thing, as always, is that collecting the stats not impact
the action being taken. If you stick with incrementing counters and
not taking additional locks, then you've probably done what you can to
minimize any impact.


Yes, they're all simple increments without the need for any
locking.

From an analysis standpoint it would be nice to know which of the major
features are being activated for a particular load. So imbalance-driven
moves, power-driven moves, and the number of times each domain tried
to balance and failed would all be useful. I think your output covered
those.


It doesn't get into the finer points of how the imbalance
is derived, but maybe it should...

Another useful stat might be how many times the self-adjusting fields
(min, max) adjusted themselves. That might yield some insights on
whether that's working well (or correctly).


Might be a good idea.

When I started thinking about these stats, I started thinking about how to
identify the domains. "domain0" and "domain1" do uniquely identify some
data structures, but especially as they get hierarchical, can we easily
tie them to the cpus they manage? Perhaps the stats should include a
bitmap of what cpus are covered by the domain too.


Well, every domain that is reported here will cover the entire
system because it simply takes the sum of statistics from all
domains.

It is a good overview, but it probably would be a good idea to
be able to break down the views and zoom in a bit.

Looks very useful for those times when some workload causes the scheduler
to burp -- between scheduler stats and domain stats we may find it much
easier to track down issues.

Would you say these would be in addition to the schedstats or would
these replace them?

It will replace some of them, I think.
For example, all load_balance operations are done within the
context of a sched domain, so you would use the sched domain's
statistics there. However you have other statistics that are
specific to the runqueue, for example, which would stay where
they are.

Thanks
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/