Re: [announce] "kill the Big Kernel Lock (BKL)" tree

From: Andi Kleen
Date: Wed May 14 2008 - 14:32:39 EST


Ingo Molnar <mingo@xxxxxxx> writes:

> As some of the latency junkies on lkml already know it, commit 8e3e076
> ("BKL: revert back to the old spinlock implementation") in v2.6.26-rc2
> removed the preemptible BKL feature and made the Big Kernel Lock a
> spinlock and thus turned it into non-preemptible code again. This commit
> returned the BKL code to the 2.6.7 state of affairs in essence.

It's a reasonable start, but have you considered doing this work
in tree instead? As in just add all the warnings, but don't actually
change the semantics yet. I suspect you would get far more users
this way and the work would go faster.

It would be reasonable to enable this in -mm if it the warnings are
not too intrusive (self disable itself etc.)

Also for fixing the ioctls I'm not sure that dynamic instrumentation
will really work because it would be tough to execute them all.

I suspect some variant of static code analysis would make sense
for the ioctls.

I used to do some auditing with cflow. That won't
catch indirect function calls unfortunately, but if there's
some way to find those and bail out one could do an automated
tool that flags all the ioctls that don't sleep for example
(don't have any sleeping functions in the call chain -- this
might need some manual annotation, but hopefully not much)

Then it would be possible to safely switch those over to a blocking
mutex variant of BKL.

Now there could be some more automated analysis here: for example the
main other user of BKL is character open. I suspect to really
make progress here you would also need a open_unlocked() and
do the same for all the open functions etc.

> According to my quick & dirty git-log analysis, at the current pace of
> BKL removal we'd have to wait more than 10 years to remove most BKL
> critical sections from the kernel and to get acceptable latencies again.

Hmm, is BKL really that common still that it's a latency problem?
The few VFS cases like locks can be fixed without extreme measures.

Most of the legacy users are unlikely to be latency problems,
simply because only very few people (or nobody) still has that hardware
and the code will never run.

Also I wouldn't lose sleep over e.g. let ISDN continue using BKL forever.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/