Re: [RFC] CPU controllers?

From: Sam Vilain
Date: Mon Jun 19 2006 - 04:17:53 EST


MAEDA Naoaki wrote:
>> ok, so basically the bit in cpu_rc_load() where for_each_cpu_mask() is
>> called, in Maeda Naoaki's patch "CPU controller - Add class load
>> estimation support", is where O(N) creeps in that could be remedied with
>> a token bucket algorithm. You don't want this because if you have 10,000
>> processes on a system in two resource groups, the aggregate performance
>> will suffer due to the large number of cacheline misses during the 5,000
>> size loop that runs every resched.
>>
>
> Thank you for looking the code.
>
> cpu_rc_load() is never called unless sysadm tries to access the load
> information via configfs from userland. In addition, it sums up per-CPU
> group stats, so the size of loop is the number of CPU, not process in
> the group.
>
> However, there is a similer loop in cpu_rc_recalc_tsfactor(), which runs
> every CPU_RC_RECALC_INTERVAL that is defined as HZ. I don't think it
> will cause a big performance penalty.
>

Ok, so that's not as bad as it looked. So, while it is still O(N), the
fact that it is O(N/HZ) makes this not a problem until you get to
possibly impractical levels of runqueue length.

I'm thinking it's probably worth doing anyway, just so that it can be
performance tested to see if this performance guestimate is accurate.

>> To apply the token bucket here, you would first change the per-CPU
>> struct cpu_rc to have the TBF fields; minimally:
>>
>> [...]
>> I think that the characteristics of these two approaches are subtly
>> different. Both scale timeslices, but in a different way - instead of
>> estimating the load and scaling back timeslices up front, busy Resource
>> Groups are relied on to deplete their tokens in a timely manner, and get
>> shorter slices allocated because of that. No doubt from 10,000 feet they
>> both look the same.
>>
>
> Current 0(1) scheduler gives extra bonus for interactive tasks by
> requeuing them to active array for a while. It would break
> the controller's efforts. So, I'm planning to stop the interactive
> task requeuing if the target share doesn't meet.
>
> Are there a similar issue on the vserver scheduler?
>

Not an issue - those extra requeued timeslices are accounted for normally.

Sam.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/