Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

From: Linus Torvalds
Date: Thu Feb 01 2007 - 15:08:40 EST




On Thu, 1 Feb 2007, Ingo Molnar wrote:
>
> there's almost no scheduling cost from being able to arbitrarily
> schedule a kernel thread - but there are /huge/ benefits in it.

That's a singularly *stupid* argument.

Of course scheduling is fast. That's the whole *point* of fibrils. They
still schedule. Nobody claimed anything else.

Bringing up RT kernels and scheduling latency is idiotic. It's like saying
"we should do this because the sky is blue". Sure, that's true, but what
the *hell* does raleigh scattering have to do with anything?

The cost has _never_ been scheduling. That was never the point. Why do you
even bring it up? Only to make an argument that makes no sense?

The cost of AIO is

- maintenance. It'sa separate code-path, and it's one that simply doesn't
fit into anything else AT ALL. It works (mostly) for simple things, ie
reads and writes, but even there, it's really adding a lot of crud that
we could do without.

- setup and teardown costs: both in CPU and in memory. These are the big
costs. It's especially true since a lot of AIO actually ends up cached.
The user program just wants the data - 99% of the time it's likely to
be there, and the whole point of AIO is to get at it cheaply, but not
block if it's not there.

So your scheduling arguments are inane. They totally miss the point. They
have nothing to do with *anything*.

Ingo: everybody *agrees* that scheduling is cheap. Scheduling isn't the
issue. Scheduling isn't even needed in the perfect path where the AIO
didn't need to do any real IO (and that _is_ the path we actually would
like to optimize most).

So instead of talking about totally irrelevant things, please keep your
eyes on the ball.

So I claim that the ball is here:

- cached data (and that is *espectally* true of some of the more
interesting things we can do with a more generic AIO thing: path
lookup, inode filling (stat/fstat) etc usually has hit-rates in the 99%
range, but missing even just 1% of the time can be deadly, if the miss
costs you a hundred msec of not doing anythign else!

Do the math. A "stat()" system call generally takes on the other of a
couple of microseconds. But if it misses even just 1% of the time (and
takes 100 msec when it does that, because there is other IO also
competing for the disk arm), ON AVERAGE it takes 1ms.

So what you should aim for is improving that number. The cached case
should hopefully still be in the microseconds, and the uncached case
should be nonblocking for the caller.

- setup/teardown costs. Both memory and CPU. This is where the current
threads simply don't work. The setup cost of doing a clone/exit is
actually much higher than the cost of doing the whole operation, most
of the time. Remember: caches still work.

- maintenance. Clearly AIO will always have some special code, but if we
can move the special code *away* from filesystems and networking and
all the thousands of device drivers, and into core kernel code, we've
done something good. And if we can extend it from just pure read/write
into just about *anything*, then people will be happy.

So stop blathering about scheduling costs, RT kernels and interrupts.
Interrupts generally happen a few thousand times a second. This is
soemthing you want to do a *million* times a second, without any IO
happening at all except for when it has to.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/