Re: [PATCH v10 1/2] printk: Make printk() completely async

From: Petr Mladek
Date: Tue Aug 16 2016 - 05:04:40 EST


On Mon 2016-08-15 17:26:50, Vladislav Levenetz wrote:
> On 08/12/2016 12:44 PM, Petr Mladek wrote:
> >But I was curious if we could hit a printk from the wake_up_process().
> >The change above causes using the fair scheduler and there is
> >the following call chain [*]
> >
> > vprintk_emit()
> > -> wake_up_process()
> > -> try_to_wake_up()
> > -> ttwu_queue()
> > -> ttwu_do_activate()
> > -> ttwu_activate()
> > -> activate_task()
> > -> enqueue_task()
> > -> enqueue_task_fair() via p->sched_class->enqueue_task
> > -> cfs_rq_of()
> > -> task_of()
> > -> WARN_ON_ONCE(!entity_is_task(se))
> >
> >We should never trigger this because printk_kthread is a task.
> >But what if the date gets inconsistent?
> >
> >Then there is the following chain:
> >
> > vprintk_emit()
> > -> wake_up_process()
> > -> try_to_wake_up()
> > -> ttwu_queue()
> > -> ttwu_do_activate()
> > -> ttwu_activate()
> > -> activate_task()
> > -> enqueue_task()
> > -> enqueue_task_fair() via p->sched_class->enqueue_task
> > ->hrtick_update()
> > -> hrtick_start_fair()
> > -> WARN_ON(task_rq(p) != rq)
> >
> >This looks like another paranoid consistency check that might be
> >triggered when the scheduler gets messed.
> >
> >I see few possible solutions:
> >
> >1. Replace the WARN_ONs by printk_deferred().
> >
> > This is the usual solution but it would make debugging less convenient.
> >
> >
> >2. Force synchronous printk inside WARN()/BUG() macros.
> >
> > This would make sense even from other reasons. These are printed
> > when the system is in a strange state. There is no guarantee that
> > the printk_kthread will get scheduled.
> >
> >
> >3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU
> > printk_func.
> >
> > It might be elegant. But we do not want this outside the scheduler
> > code. Therefore we would need special variants of WARN_*_SCHED()
> > BUG_*_SCHED() macros.
> >
> >
> >I personally prefer the 2nd solution. What do you think about it,
> >please?
> >
> >
> >Best Regards,
> >Petr
>
> Hi Petr,
>
> Sorry with for the late reply.

No problem.

> Hitting a WARN()/BUG() from wake_up calls will lead to a deadlock if
> only a single CPU is running.

I think that the deadlock might happen also with more CPUs if
the async_printk() is enabled. I mean:

printk_emit()
wake_up_process()
try_to_wake_up()
raw_spin_lock_irqsave(&p->pi_lock, flags) !!!!
ttwu_queue()
ttwu_do_activate()
ttwu_activate()
activate_task()
enqueue_task()
enqueue_task_fair() via p->sched_class->enqueue_task
hrtick_update()
hrtick_start_fair()
WARN_ON(task_rq(p) != rq)
printk()
vprintk_emit()
wake_up_process()
try_to_wake_up()
raw_spin_lock_irqsave(&p->pi_lock,
flags)

There is a deadlock because p->pi_lock is already taken by
the first try_to_wake_up().

By other words, I think that the single running CPU was only
a symptom but it was not the root cause of the deadlock.

> We already had such a situation with system suspend. During a
> specific test on our device sometimes we hit a WARN from the time
> keeping core. (Cannot recall which one exactly. Viresh have it) from
> a printk wake_up path during system suspend and with already only
> one CPU running.
> So we were forced to make printing synchronous in the suspend path
> prior disabling all non-boot cpu's.
>
> Your solution number 2) sounds reasonable to me.

Good.

> I'm wondering if we could hit a WARN()/BUG() somewhere from the fair
> scheduler like the example you made for the RT sched?

Unfortunately, it looks like. The example above actually is from
the fair scheduler.

Best Regards,
Petr