To reiterate: this is probably reproducible on smaller SMP systems, too.
Just do a 'runon' (using sys_sched_setaffinity) of ~200 (or more) small
computebound processes on a single CPU.
My patch -- that has load_balance() skip over (busiest->active_balance = 1)
trigger that starts up active_load_balance() -- does seem to reduce the
frequency of bursts of long-running activity of the migration thread, but
those burst of activity are still there, with migration_thread consuming
75-95% of its CPU for several seconds (as observed by 'top'). I have not yet
determined what's happening. It might be an artifact of how long it takes to
do those 'runon' startups of the computebound processes.