Re: AIM7 40% regression with 2.6.26-rc1

From: Linus Torvalds
Date: Wed May 07 2008 - 23:30:15 EST




On Thu, 8 May 2008, Zhang, Yanmin wrote:
>
> Congratulations! The patch really fixes the regression completely!
> vmstat showed cpu idle is 0%, just like 2.6.25's.

Well, that shows that it was the BKL.

That said, "idle 0%" is easy when you spin. Do you also have actual
performance numbers? I'd hope that not only do we use full CPU time, it's
also at least as fast as the old semaphores were?

While I've been dissing sleeping locks (because their overhead is so
high), at least in _theory_ they can get better behavior when not
spinning. Now, that's not going to happen with the BKL, I'm 99.99% sure,
but I'd still like to hear actual performance numbers too, just to be
sure.

Anyway, at least the "was it the BKL or some other semaphore user"
question is entirely off the table.

So we need to

- fix the BKL. My patch may be a good starting point, but there are
alternatives:

(a) reinstate the old BKL code entirely

Quite frankly, I'd prefer not to. Not only did it have three
totally different cases, some of them were apparently broken (ie
BKL+regular preempt didn't cond_resched() right), and I just don't
think it's worth maintaining three different versions, when
distro's are going to pick one anyway. *We* should pick one, and
maintain it.

(b) screw the special BKL preemption - it's a spinlock, we don't
preempt other spinlocks, but at least fix BKL+preempt+cond_resched
thing.

This would be "my patch + fixes" where at least one of the fixes
is the known (apparently old) cond_preempt() bug.

(c) Try to keep the 2.6.25 code as closely as possible, but just
switch over to mutexes instead.

I dunno. I was never all that enamoured with the BKL as a sleeping
lock, so I'm biased against this one, but hey, it's just a
personal bias.

- get rid of the BKL anyway, at least in anything that is noticeable.

Matthew's patch to file locking is probably worth doing as-is,
simply because I haven't heard any better solutions. The BKL
certainly can't be it, and whatever comes out of the NFSD
discussion will almost certainly involve just making sure that
those leases just use the new fs/locks.c lock.

This is also why I'd actually prefer the simplest possible
(non-preempting) spinlock BKL. Because it means that we can get
rid of all that "saved_lock_depth" crud (like my patch already
did). We shouldn't aim for a clever BKL, we should aim for a BKL
that nobody uses.

I'm certainly open to anything. Regardless, we should decide fairly soon,
so that we have the choice made before -rc2 is out, and not drag this out,
since regardless of the choice it needs to be tested and people comfy with
it for the 2.6.26 release.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/