Re: INFO: rcu detected stall in ext4_write_checks

From: Theodore Ts'o
Date: Sun Jul 14 2019 - 15:05:59 EST


On Sun, Jul 14, 2019 at 05:48:00PM +0300, Dmitry Vyukov wrote:
> But short term I don't see any other solution than stop testing
> sched_setattr because it does not check arguments enough to prevent
> system misbehavior. Which is a pity because syzkaller has found some
> bad misconfigurations that were oversight on checking side.
> Any other suggestions?

Or maybe syzkaller can put its own limitations on what parameters are
sent to sched_setattr? In practice, there are any number of ways a
root user can shoot themselves in the foot when using sched_setattr or
sched_setaffinity, for that matter. I imagine there must be some such
constraints already --- or else syzkaller might have set a kernel
thread to run with priority SCHED_BATCH, with similar catastrophic
effects --- or do similar configurations to make system threads
completely unschedulable.

Real time administrators who know what they are doing --- and who know
that their real-time threads are well behaved --- will always want to
be able to do things that will be catastrophic if the real-time thread
is *not* well behaved. I don't it is possible to add safety checks
which would allow the kernel to automatically detect and reject unsafe
configurations.

An apt analogy might be civilian versus military aircraft. Most
airplanes are designed to be "inherently stable"; that way, modulo
buggy/insane control systems like on the 737 Max, the airplane will
automatically return to straight and level flight. On the other hand,
some military planes (for example, the F-16, F-22, F-36, the
Eurofighter, etc.) are sometimes designed to be unstable, since that
way they can be more maneuverable.

There are use cases for real-time Linux where this flexibility/power
vs. stability tradeoff is going to argue for giving root the
flexibility to crash the system. Some of these systems might
literally involve using real-time Linux in military applications,
something for which Paul and I have had some experience. :-)

Speaking of sched_setaffinity, one thing which we can do is have
syzkaller move all of the system threads to they run on the "system
CPU's", and then move the syzkaller processes which are testing the
kernel to be on the "system under test CPU's". Then regardless of
what priority the syzkaller test programs try to run themselves at,
they can't crash the system.

Some real-time systems do actually run this way, and it's a
recommended configuration which is much safer than letting the
real-time threads take over the whole system:

http://linuxrealtime.org/index.php/Improving_the_Real-Time_Properties#Isolating_the_Application

- Ted