Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

From: Steven Rostedt
Date: Thu Jan 11 2018 - 21:55:57 EST


On Thu, 11 Jan 2018 20:30:57 -0500
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> I have to say that your analysis here really does point out the benefit
> of my patch.
>
> Today, printk() can print for a time of A * B, where, as you state
> above:
>
> A is the amount of data to print in the worst case
> B the time call_console_drivers() needs to print a single
> char to all registered and enabled consoles
>
> In the worse case, the current approach is A is infinite. That is,
> printk() never stops, as long as there is a printk happening on another
> CPU before B can finish. A will keep growing. The call to printk() will
> never return. The more CPUs you have, the more likely this will occur.
> All it takes is a few CPUs doing periodic printks. If there is a slow
> console, where the periodic printk on other CPUs occur quicker than the
> first can finish, the first one will be stuck forever. Doesn't take
> much to have this happen.
>
> With my patch, A is fixed to the size of the buffer. A single printk()
> can never print more than that. If another CPU comes in and does a
> printk, then it will take over the task of printing, and release the
> first printk.

In fact, below is a module I made (starting with Tejun's crazy stress
test, then removing all the craziness). This simple module locks up the
system without my patch. After applying my patch, the system runs fine.

All I did was start off a work queue on each CPU, and each CPU does one
printk() followed by a millisecond sleep. No 10,000 printks, nothing
in an interrupt handler. Preemption is disabled while the printk
happens, but that's normal.

This is much closer to an OOM happening all over the system, where OOMs
stack dumps are occurring on different CPUS.

I ran this on a box with 4 CPUs and a serial console (so it has a slow
console). Again, all I have is each CPU doing exactly ONE printk()!
then sleeping for a full millisecond! It will cause a lot of output,
and perhaps slow the system down. But it should not lock up the system.
But without my patch, it does!

Try it!

Test it on a box, and it will lock up. Then add my patch and see what
the results are. I think this speaks very loudly in favor of applying
my patch.

Again, the below module locks up my system immediately without my
patch. With my patch, no problem. In fact, it's still running, while I
wrote this email, and it hardly shows a slow down in the system.


-- Steve

#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>

static bool stop_testing;

static void preempt_printk_workfn(struct work_struct *work)
{
while (!READ_ONCE(stop_testing)) {
preempt_disable();
printk("%5d%-75s\n", smp_processor_id(), " XXX PREEMPT");
preempt_enable();
msleep(1);
}
}

static struct work_struct __percpu *works;

static void finish(void)
{
int cpu;

WRITE_ONCE(stop_testing, true);
for_each_online_cpu(cpu)
flush_work(per_cpu_ptr(works, cpu));
free_percpu(works);
}

static int __init test_init(void)
{
int cpu;

works = alloc_percpu(struct work_struct);
if (!works)
return -ENOMEM;

/*
* This is just a test module. This will break if you
* do any CPU hot plugging between loading and
* unloading the module.
*/

for_each_online_cpu(cpu) {
struct work_struct *work = per_cpu_ptr(works, cpu);

INIT_WORK(work, &preempt_printk_workfn);
schedule_work_on(cpu, work);
}

return 0;
}

static void __exit test_exit(void)
{
finish();
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");