Re: [PATCH v4] sched: automated per session task groups
From: Linus Torvalds
Date: Sun Dec 05 2010 - 15:48:41 EST
On Sun, Dec 5, 2010 at 11:22 AM, Colin Walters <walters@xxxxxxxxxx> wrote:
>
> For the purposes of this discussion again, let's say "fixing nice"
> means say "group schedule each nice level above 0". There are
> obviously many possibilities here, but let's consider this one
> precisely.
THAT IS NOT HOW 'nice' WORKS!
For chissake, how hard is it to understand?
The semantics of "nice" are not - and have never been - to put things
into process scheduling groups of their own.
When somebody says "nice xyzzy", they are explicitly stating that
"xyzzy" isn't as important as other processes. It's done for stuff
that you don't care about, and more specifically, for stuff that you
really don't want to impact anything else. So if there are other
things to be run, 'nice' means that those should get more CPU time.
(Obviously, negative nice levels work the other way around).
This is very much documented. People rely on it. Look at the man-page.
It talks about "most favorable" vs "least favorable" scheduling.
> Two people logged in would get their "make" jobs group scheduled
> together. What is the problem?
The problem is that you don't know what the hell you are talking about.
Different nice levels shouldn't get group scheduled together - they
should be scheduled *less*. And it's not about "make", since nobody
really ever uses nice on make anyway, it's about things like
pulseaudio (that wants higher priorities) and random background
filesystem indexers etc (that want lower priorities).
Nice levels are _not_ about group scheduling. They're about
priorities. And since the cgroup code doesn't even support priority
levels for the groups, it's a really *horrible* match.
And the thing is, the nice semantics are traditional. They are also
*horrible*, but that doesn't allow you to change their semantics.
People rely on those crazy traditional and mostly useless semantics.
Not very much (because they are mostly useless), but there really are
people who use it.
And they use it knowing that positive nice levels means that something
is less important.
In contrast, giving processes a scheduling group doesn't imply "less
important". Not AT ALL. It doesn't really mean "more important"
either, it just means "somewhat insulated from other groups".
So let's say that you have a filesystem indexer, and you nice it up to
make sure that it doesn't steal CPU bandwidth from your "real work".
Now, let's say that you start a "make -16" to build something
important.
Do you *really* think that the person who niced the filesystem indexer
down wants the indexer to get 50% of the CPU, just because it's
scheduled separately from the parallel make?
HELL NO!
So stop this idiocy. "nice" has absolutely nothing to do with group
scheduling. It cannot. It must not. It's a legacy interface, and it
has real semantics.
> Since Linus appears to be more interested in talking about nipples
> than explaining exactly what it would break, but you appear to agree
> with him, hopefully you'll be able to explain...
The reason I was talking about make nipples should be clear by now.
Think "legacy interface". Think "don't mess with it, because people
are used to it".
They may be useless, but dammit, they do what they do.
Don't try to turn male nipples into something they aren't. And don't
try to turn 'nice' into something it isn't.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/