Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush
From: Sergey Senozhatsky
Date: Wed Nov 08 2017 - 00:29:46 EST
Hello Tejun,
On (11/07/17 05:23), Tejun Heo wrote:
> Hello, Sergey.
>
> On Tue, Nov 07, 2017 at 11:04:34AM +0900, Sergey Senozhatsky wrote:
> > just to make sure. there is a typo in Steven's patch:
> >
> > while (!READ_ONCE(console_waiter))
> >
> > should be
> >
> > while (READ_ONCE(console_waiter))
> >
> > is this the "tweaking" you are talking about?
>
> Oh, I was talking about tweaking the repro, but I'm not sure the above
> would change anything. The problem that the repro demonstrates is a
> message deluge involving an non-sleepable flusher + local irq (or
> other atomic contexts) message producer.
>
> In the above case, none of the involved contexts can serve as the
> flusher for a long time without messing up the system. If you wanna
> allow printks to be async without falling into these lockups, you
> gotta introduce an independent safe context to flush from.
we are in agreement.
I Cc-ed you to another thread, let's merge discussions.
> > > > there are some concerns, like a huge number of printk-s happening while
> > > > console_sem is locked. e.g. console_lock()/console_unlock() on one of the
> > > > CPUs, or console_lock(); printk(); ... printk(); console_unlock();
> > >
> > > Unless we make all messages fully synchronous, I don't think there's a
> > > good solution for that and I don't think we wanna make everything
> > > fully synchronous.
> >
> > this is where it becomes complicated. offloading logic is not binary,
> > unfortunately. we normally want to offload; but not always. things
> > like sysrq or late PM warnings, or kexec, etc. want to stay fully sync,
> > regardless the consequences. some of sysrq prints out even do
> > touch_nmi_watchdog() and touch_all_softlockup_watchdogs(). current
> > printk-kthread patch set tries to consider those cases and to avoid
> > any offloading.
>
> Yeah, sure, selectively opting out of asynchronous operation is a
> different (solvable) issue. Also, just to be clear, the proposed
> patch doesn't make any of these worse in any meaningful way - e.g. we
> could end up trapping a nice 20 task pinned to an overloaded CPU in
> the flusher role.
>
> The following is a completely untested patch to show how we can put
> the console in full sync mode, just the general idea. I'm a bit
> skeptical we really wanna do this given that we already (with or
> without the patch) stay sync for most of these events due to the way
> we go async, but, yeah, if we wanna do that, we can do that.
we've been going in a slightly different direction in printk-kthread.
we keep printk sync by default [as opposed to previous "immediately
offload" approach]. people asked for it, some people demanded it. we
offload to printk-kthread only when we detect that this particular
task on this particular CPU has been doing printing (without rescheduling)
for 1/2 of watchdog threshold value. IOW, if we see that we are heading
towards the lockup limit then we offload. otherwise - we let it loop in
console_unlock().
-ss