Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

From: Steven Rostedt
Date: Tue Jan 23 2018 - 10:41:32 EST


On Wed, 24 Jan 2018 00:21:30 +0900
Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx> wrote:

> On (01/23/18 09:56), Steven Rostedt wrote:
> [..]
> > > Why do we even use irq_work for printk_safe?
> >
> > Why not?
> >
> > Really, I think you are trying to solve a symptom and not the problem.
> > If we are having issues with irq_work, we are going to have issues with
> > a work queue. It's just spreading out the problem instead of fixing it.
>
> I don't want to have heuristics in print_safe, I don't want to have a magic
> number controlled by a user-space visible knob, I don't want to have the
> first 3 lines of a lockdep splat.

We can have more. But if printk is causing printks, that's a major bug.
And work queues are not going to fix it, it will just spread out the
pain. Have it be 100 printks, it needs to be fixed if it is happening.
And having all printks just generate more printks is not helpful. Even
if we slow them down. They will still never end.

A printk causing a printk is a special case, and we need to just show
enough to let the user know that its happening, and why printks are
being throttled. Yes, we may lose data, but if every printk that goes
out causes another printk, then there's going to be so much noise that
we wont know what other things went wrong. Honestly, if someone showed
me a report where the logs were filled with printks that caused
printks, I'd stop right there and tell them that needs to be fixed
before we do anything else. And if that recursion is happening because
of another problem, I don't want to see the recursion printks. I want
to see the printks that show what is causing the recursions.



> The problem is - we flush printk_safe too soon and printing CPU ends up
> in a lockup - it log_store()-s new messages while it's printing the pending

No, the problem is that printks are causing more printks. Yes that will
make flushing them soon more likely to lock up the system. But that's
not the problem. The problem is printks causing printks.

> ones. It's fine to do so when CPU is in preemptible context. Really, we
> should not care in printk_safe as long as we don't lockup the kernel. The
> misbehaving console must be fixed. If CPU is not in preemptible context then
> we do lockup the kernel. Because we flush printk_safe regardless of the
> current CPU context. If we will flush printk_safe via WQ then we automatically

And if we can throttle recursive printks, then we should be able to
stop that from happening.

> add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we
> will not lockup it up." thing. Yes, we fill up the logbuf with probably needed
> and appreciated or unneeded messages. But we should not care in printk_safe.
> We don't lockup the kernel... And the misbehaving console must be fixed.

I agree.

>
> I disagree with "If we are having issues with irq_work, we are going to have
> issues with a work queue". There is a tremendous difference between irq_work
> on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU
> context, the other one does.

But switching to work queue does not address the underlining problem
that printks are causing more printks.

-- Steve