Re: A different metric for scheduler optimisation

From: Davide Libenzi (dlibenzi@maticad.it)
Date: Sun Jan 30 2000 - 09:33:48 EST


On Sun, 30 Jan 2000, Steve Underwood wrote:
> > We can stay here speaking about my hypothesise for a while, You showing me an
> > app that confirm Your hypothesise and me one that confirm the mine.
> > I've included the word "probability" meaning that can exist app types that get
> > off my hypothesise.
> > Anyway consider the faster switching app that I know ( this can be an FTP
> > server for example ):
> >
> > char buffer[N];
> >
> > for(;;)
> > {
> > read(infd, buffer, N);
> > write(outfd, buffer, N);
> > }
> >
> > for N = 1 this app "can switch very fast" and touch very few bytes, and this
> > seems to confirm my hypothesise.
>
> This is a very artificial, and seriously inefficient, example. When you said fast
> switching, I assumed your intention was fast _real_ application switching. Can we
> forget the silly benchmark-like examples, and talk about real applications? The
> design of some of those is pretty bad (Well, that may often be unfair. Some
> applications have been around, with minimal maintenance, for 30 years, and their
> design is just no longer inappropriate for a modern machine).

This is only an example to prove that if an app perform fast switch there is a
higher probability that it touchs few data, and when I say touch I don't mean
only "for(i=0;i<N;i++) touch(data[i]);" but I mean also include I/O operations.
If You fast switch imply that You have a small "time footprint", but to have a
small time footprint You must either load / store few byte or have on both sides
very fast devices.
This does not mean that is always in that way, but, as I've specified in the
previous message, there are a prominent probability.

> > What if N = 1000000 ?
> > If N = 1000000 it flush out every single bit of cache, but ..., but can this app
> > switch very fast ?
> > What is the "time footprint" of this app ?
> > The time footprint of this app can be defined as the average time taken by the
> > instruction pointer ( EIP for Intel ) to reach the same value.
> > The pattern of this app is :
> > RCW
> > R = read
> > C = compute ( 0 )
> > R = write
> > and the entire pattern must be used to define the time footprint.
> > A necessary condition to make a app switch fast, not one time but continuosly,
> > is to have a low time footprint that this app don't have ( for that N ).
> > And the time footprint of this app is inverse proportional to the average speed
> > of devices used to transfer the data.
> > More if You spawn M copies of this app, the switching rate is upper limited by
> > the lower throwput of the used devices.
>
> What types of device are we talking about here? Disk, networking, etc. will behave
> differently. You won't get 1MB when you read 1MB from a TCP source, for example. I
> think for most real world I/O what I said still holds. Again, you are talking about
> something artificial.

The TCP/IP layer will split 1Mb into M MTU chunks, but You must wait anyway
to get 1Mb of data. The same for write.

> What you seem to miss when we criticise your postings, is that we aren't a
> stuck-in-the-mud bunch of old farts who won't listen to new ideas. We have just seen
> enough artificial tests give artificial results, that we don't trust any examples ot
> tests that can't be shown to be representative of what people have to contend with in
> the real world on a large scale. In many cases, even when it can be shown to be like
> real life, the response will be that the application design is dumb - fix that and
> not the OS. If you try to use examples and tests which are closer to real life you
> will get a far warmer reception here. This is a "no bench-marketers" zone!
>

> > This app example put in evidence a thing that I'd really like in a
processor : > >
> > THE CONTROL OF CACHE LINES ( PAGES )
> >
> > If N = sizeof(cache) and infd and outfd are very fast devices become true Your
> > hypothesise, cache trashing + fast switching.
> > But We don't touch the data at all, so what if We can say ( and the IRQ
> > drivers too ) :
> >
> > disable_cache_map(buffer, N);
> >
> > We don't need this data loaded into cache since we don't need to do computing
> > on it ( and DMA operations read from main memory not from CPU cache ) !
>
> The Pentium III can do this, more or less, with its media streaming facility. This
> allows the L2 cache (I think its just L2, but it might be L1 as well - I've never
> actually worked with it) to be bypassed in a controlled manner, primarily for
> streaming multi-media DSP activities, where the data is read, processed, written, and
> never read again. I don't think Linux currently permits use of this functionality,
> but for certain key applications it has substantial potential for cache optimisation.
>
> However, for what you want here it isn't necessary anyway. If the data passes through
> the machine with bus-mastered (or DMA) read and write, the processor and its cache
> need never see it on most machines. Of course, an exception would be if the relevant
> main memory locations are already in cache, when bus snooping would update the cache.
> Whether this can be achieved in practice depends on the I/O device types.

-- 
All this stuff is IMVHO

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:25 EST