The important thing, as always, is that collecting the stats not impact
the action being taken. If you stick with incrementing counters and
not taking additional locks, then you've probably done what you can to
minimize any impact.
From an analysis standpoint it would be nice to know which of the majorfeatures are being activated for a particular load. So imbalance-driven
moves, power-driven moves, and the number of times each domain tried
to balance and failed would all be useful. I think your output covered
those.
Another useful stat might be how many times the self-adjusting fields
(min, max) adjusted themselves. That might yield some insights on
whether that's working well (or correctly).
When I started thinking about these stats, I started thinking about how to
identify the domains. "domain0" and "domain1" do uniquely identify some
data structures, but especially as they get hierarchical, can we easily
tie them to the cpus they manage? Perhaps the stats should include a
bitmap of what cpus are covered by the domain too.
Looks very useful for those times when some workload causes the scheduler
to burp -- between scheduler stats and domain stats we may find it much
easier to track down issues.
Would you say these would be in addition to the schedstats or would
these replace them?