On Mon, 10 Dec 2001, Alan Cox wrote:
> > Alan, you're mixing switch mm costs with cache image reload ones.
> > Note that equal mm does not mean matching cache image, at all.
> They are often close to the same thing. Take a look at the constraints
> on virtually cached processors like the ARM where they _are_ the same thing.
> Equal mm for cpu sucking tasks often means equal cache image. On the other
> hand unmatched mm pretty much definitively says "doesnt matter". The cost
> of getting the equal mm/shared cache case wrong is too horrific to handwave
> it out of existance using academic papers.
This is very difficult to prove and heavily depend on the application
Anyway i was just thinking that, if we scan the cpu bound queue like
usual, by correctly sorting out mm related tasks, we're going to blend
this time ( about 2.9us with rqlen=32 on a dual PIII 733, std scheduler )
inside the cpu bound average run time ( 30-60ms ).
This means ~ 0.005%
> > By having only two queues maintain the implementation simple and solves
> > 99% of common/extraordinary cases.
> > The cost of a tlb flush become "meaningful" for I/O bound tasks where
> > their short average run time is not sufficent to compensate the tlb flush
> > cost, and this is handled correctly/like-usual inside the I/O bound queue.
> You don't seem to solve either problem directly.
> Without per cpu queues you spend all your time bouncing stuff between
> processors which hurts. Without multiple queues for interactive tasks you
> walk the interactive task list so you don't scale. Without some sensible
> handling of mm/cpu binding you spend all day wasting ram bandwidth with
> cache writeback.
It has two queues ( cpubound + iobound-rttasks ) per CPU, as i wrote in my
By having split in a multi-queue model plus having a separate queue for
iobound tasks, i think it'll scale pretty well indeed.
Two queues against N means even a lower scheduler memory footprint.
The whole point is to understand where greater optimizations will not pay
for the greater complexity ( plus D/I cache footprint ).
> The single cpu sucker queue is easy, the cost of merging mm equivalent tasks
> in that queue is almost nil. Priority ordering interactive stuff using
> several queues is easy and very cheap if you need it (I think you do hence
> I have 7 priority bands and you have 1). In all these cases the hard bits
> end up being
> On a wake up which cpu do you give a task ?
The multi queue scheduler is not like the old one where the bunch of
running tasks virtually own to all cpu.
In a multi queue scheduler tasks are _local_by_default_.
> When does an idle cpu steal a task, who from and which task ?
> How do I define "imbalance" for cpu load balancing ?
As i wrote in previous messages the idle(s) is woken up at every timer
tick and check the balance status.
After N ( tunable ) consecutive ticks that the idle has found an
unbalanced status, it'll try to steal a tasks.
>From where, you could ask ?
This is a good question and the best answer to good questions is:
I don't know :)
Just kidding ( only in part ).
There're several concepts:
1) cpu load expressed in run queue length
2) cpu load expressed as the sum ( in jiffies ) of the average run time of
the currently running task set
3) distance/metric of the move ( think about NUMA )
4) last mm used by the idle
5) last time in jiffies the task is ran
The trick is to find a function :
F = F( RQL, JLD, DIS, LMM, JLT )
so we can sort out and select the best task to run on the idle cpu.
And more, the cpu selection must be done without locking remote runqueue.
So it should be a two-phase task:
1) cpu selection ( no locks held )
2) task selection inside the selected cpu ( remote queue lock held )
The first phase will use 1, 2 and 3 from the above variable set, while the
second will use 4 and 5 for tasks selection inside the queue.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sat Dec 15 2001 - 21:00:16 EST