Re: Auto-Adaptive scheduler - Final chapter ( the numbers ) ...

From: Davide Libenzi (davidel@maticad.it)
Date: Fri Jan 28 2000 - 08:48:40 EST

Next message: Jens Axboe: "Re: (2.2.38) CDROM eject problems?"
Previous message: Buddy Smith: "Re: (2.2.38) CDROM eject problems?"
In reply to: Larry McVoy: "Re: Auto-Adaptive scheduler - Final chapter ( the numbers ) ..."
Next in thread: water modem: "Re: Auto-Adaptive scheduler - Final chapter ( the numbers ) ..."
Reply: water modem: "Re: Auto-Adaptive scheduler - Final chapter ( the numbers ) ..."
Reply: Marco Colombo: "Re: Auto-Adaptive scheduler - Final chapter ( the numbers ) ..."
Reply: Jamie Lokier: "A different metric for scheduler optimisation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I prefer to give a single response to points emerged in this thread.

Steve say :
> I really do urge you to go to a bookstore, find a recent book on computer
> architecture, and try to find out how a modern machine works. Larry
> recommended a book. Maybe you should try that one.

Larry say :
> > : However, I think that Davide knows something about caches as well.

> Hmm. With no offense intended, that's certainly not at all clear from
> this discussion. What is it that makes you think that?

I really like that guys that having read the last Intel ( or Digital. etc
... ) data sheet,
thinks to be God in earth, at a point to judge the knowledge of other guys.
I really like them.

My Mother taught me two basic things :

1) Have always respect of people that have respect of other people dignity
2) Always go in deep when analyzing things, and never stop at a superficial
view

Even if condition 1) has been repeatedly violated during this thread I'll
continue
to use educated and respective tones in my responses.
The first data sheet I've studied was the Intel 8088 one in fall 1984, and
I've continued until now since electronics and computer science is my
passion
as long as my work.
I don't want to bore other guys with cache lines architectures, computers
layouts, etc ...,
as long as I don't see this list as a way to recite data sheets to show
others own knowledge.
Reading data sheets don't makes You a guru, but simply a reader.
Using the gray material that You've inside that sphere that You've upon Your
shoulders,
this makes You a guru.
Anyway I've just placed an order to Amazon for "In search of clusters" and
"Computer Architecture, A Quantitative Approach".
These books will go to enrich my collection of readed and meditated books,
but I hope that the reading of these don't makes me another "reader guru",
that misses respect of other guys dignity.
It seems that this list has enough of them.

Horst say :
> No. Your scheduler dirties the cache much more than the standard one, so
it
> will blow out much more of the application (and get blown out of cache
more
> too).

Another thing that I'll like is that people that want to speak about cache
misses
of my patch, should take a read to it before ( it's few lines of code ).

> Most probably just one cache miss kills all your advantage (if there
> is one at all, Larry has said that a 3% difference could be due just to
> random factors).

And also I'll like that my responses being readed.
The 3% is not on my vantage ( mine !?!? ) but It's a perf loose emerging
at 759000 switch / sec.
And is not random, as I've said in another message :
"They are a lower limit of a 16 test set", as Larry taught me.

Horst say :
> Again: Low load is << 1; normal load is 1 or thereabouts; 2 or more is
high
> load; 10 or so is already ridiculously overloaded (UP case, for SMP
roughly
> multiply by #CPUs). This has been shown here with real world data from
> higly loaded servers all over the place.

How many switch / sec will do a system with RQ << 1 ?
Stay big say 500 ( keep this number in mind ).

Jamie say :
> Real world problems do not have that kind of load. Real world problems
> _do_ have cache issues.
> The cache issue is so important for real applications that optimising
> the scheduler under unusual RQ loads isn't worth doing.

Rik say :
> > Real world problems do not have that kind of load. Real world
> > problems _do_ have cache issues.

> Indeed. We should optimise the scheduler for these, not
> for the fastest possible switch times.

Take out another cache miss into the scheduler, multiply the time a cache
miss
takes for the number above ( 500 ), and weight it in one second.
What is Your improvement percent ?

<UNDERLINE>
More, the actual implementation, due to the fact that linearly scan RQ, have
a number
of "distant" memory loads that are directly proportional to the number of
processes in RQ.

DL = |p - p->next_run| >> Cache load cluster

While the patch :

1) Has a less number of process memory loads ( p = p->next_run ) given by
the fact
    that does not linear scan RQ, and that have "typically" 1 task / cluster
2) The "distance" of loads it makes to traverse clusters is 8 ( two
pointers ), so
    a single cache load can store more than one cluster ( depending on
architecture ).
    More again, typically a single cluster is accessed.

So if You optimize a cache miss with RQ < 1 You don't get, due to the low
switch / sec,
a relevant improvement.
If You want to give the cache miss optimization a relevant weight, You must
shift to
higher switch / sec, that are given by higher RQ, that will make You loosing
the
cache miss gain due to the linear scan.
</UNDERLINE>

The data cache cost of the patch with RQ < "Cluster scheduler boot point" is
a variabile
onto the stack ( 4 bytes ), that thinking on about, I can even remove.
The code cache cost of the patch with RQ < "Cluster scheduler boot point" is
0,
due to the fact that the code is the same of the previous scheduler.
There is the block of code of the cluster scheduler that can be moved out of
fast path,
but I think that It'll be better to keep it on fast path since we need it
faster under
high workloads not with RQ < 2.

But all of this topics would be clear if that guys that shoot over my patch
cache misses,
would have readed it.

[New coding]
I've moved some code to better optimize branch prediction ( with RQ <
boot_limit ),
and more important I've found a constant redundant loop ( with RQ >
boot_limit ) that
in SMP ( and only in SMP version ) makes the patch have longer times than
the patch
itself can give.

PS:
Tonight I'll try :

for size in 0 2 4 8 10 12 14 16 18 20 22 24 26 28 30 32
do lat_ctx -s $size 2 2 2 2 2 2
done

but I think to get the same results given the low cache overhead,
and the high noise figure that lat_ctx have when we've to measure
so closer performances.

With respect,
Davide.

-- All this stuff is IMVHO

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Jens Axboe: "Re: (2.2.38) CDROM eject problems?"
Previous message: Buddy Smith: "Re: (2.2.38) CDROM eject problems?"
In reply to: Larry McVoy: "Re: Auto-Adaptive scheduler - Final chapter ( the numbers ) ..."
Next in thread: water modem: "Re: Auto-Adaptive scheduler - Final chapter ( the numbers ) ..."
Reply: water modem: "Re: Auto-Adaptive scheduler - Final chapter ( the numbers ) ..."
Reply: Marco Colombo: "Re: Auto-Adaptive scheduler - Final chapter ( the numbers ) ..."
Reply: Jamie Lokier: "A different metric for scheduler optimisation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:21 EST