Re: INFO: rcu detected stall in ext4_write_checks

From: Paul E. McKenney
Date: Mon Jul 15 2019 - 10:03:40 EST


On Mon, Jul 15, 2019 at 03:39:38PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2019 at 06:01:01AM -0700, Paul E. McKenney wrote:
> > Title: Making SCHED_DEADLINE safe for kernel kthreads
> >
> > Abstract:
> >
> > Dmitry Vyukov's testing work identified some (ab)uses of sched_setattr()
> > that can result in SCHED_DEADLINE tasks starving RCU's kthreads for
> > extended time periods, not millisecond, not seconds, not minutes, not even
> > hours, but days. Given that RCU CPU stall warnings are issued whenever
> > an RCU grace period fails to complete within a few tens of seconds,
> > the system did not suffer silently. Although one could argue that people
> > should avoid abusing sched_setattr(), people are human and humans make
> > mistakes. Responding to simple mistakes with RCU CPU stall warnings is
> > all well and good, but a more severe case could OOM the system, which
> > is a particularly unhelpful error message.
> >
> > It would be better if the system were capable of operating reasonably
> > despite such abuse. Several approaches have been suggested.
> >
> > First, sched_setattr() could recognize parameter settings that put
> > kthreads at risk and refuse to honor those settings. This approach
> > of course requires that we identify precisely what combinations of
> > sched_setattr() parameters settings are risky, especially given that there
> > are likely to be parameter settings that are both risky and highly useful.
>
> So we (the people poking at the DEADLINE code) are all aware of this,
> and on the TODO list for making DEADLINE available for !priv users is
> the item:
>
> - put limits on deadline/period
>
> And note that that is both an upper and lower limit. The upper limit
> you've just found why we need it, the lower limit is required because
> you can DoS the hardware by causing deadlines/periods that are equal (or
> shorter) than the time it takes to program the hardware.
>
> There might have even been some patches that do some of this, but I've
> held off because we have bigger problems and they would've established
> an ABI while it wasn't clear it was sufficient or the right form.

So I should withdraw the proposal?

Thanx, Paul