Re: [RFC] CPU controllers?

From: MAEDA Naoaki
Date: Mon Jun 19 2006 - 04:41:07 EST


Sam Vilain wrote:
MAEDA Naoaki wrote:
ok, so basically the bit in cpu_rc_load() where for_each_cpu_mask() is
called, in Maeda Naoaki's patch "CPU controller - Add class load
estimation support", is where O(N) creeps in that could be remedied with
a token bucket algorithm. You don't want this because if you have 10,000
processes on a system in two resource groups, the aggregate performance
will suffer due to the large number of cacheline misses during the 5,000
size loop that runs every resched.
Thank you for looking the code.

cpu_rc_load() is never called unless sysadm tries to access the load
information via configfs from userland. In addition, it sums up per-CPU
group stats, so the size of loop is the number of CPU, not process in
the group.

However, there is a similer loop in cpu_rc_recalc_tsfactor(), which runs
every CPU_RC_RECALC_INTERVAL that is defined as HZ. I don't think it
will cause a big performance penalty.

Ok, so that's not as bad as it looked. So, while it is still O(N), the
fact that it is O(N/HZ) makes this not a problem until you get to
possibly impractical levels of runqueue length.

Do you mean N is the size of the loop? for_each_cpu_mask() loops
the number of CPUs times. It is not directly related to runqueue length.

I'm thinking it's probably worth doing anyway, just so that it can be
performance tested to see if this performance guestimate is accurate.

To apply the token bucket here, you would first change the per-CPU
struct cpu_rc to have the TBF fields; minimally:

[...]
I think that the characteristics of these two approaches are subtly
different. Both scale timeslices, but in a different way - instead of
estimating the load and scaling back timeslices up front, busy Resource
Groups are relied on to deplete their tokens in a timely manner, and get
shorter slices allocated because of that. No doubt from 10,000 feet they
both look the same.
Current 0(1) scheduler gives extra bonus for interactive tasks by
requeuing them to active array for a while. It would break
the controller's efforts. So, I'm planning to stop the interactive
task requeuing if the target share doesn't meet.

Are there a similar issue on the vserver scheduler?

Not an issue - those extra requeued timeslices are accounted for normally.

It's great.

Thanks,
MAEDA Naoaki

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/