Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async

From: Tetsuo Handa
Date: Sun Mar 06 2016 - 06:06:21 EST


Sergey Senozhatsky wrote:
> On (03/06/16 16:18), Tetsuo Handa wrote:
> > Moreover, I don't like use of a workqueue even if printk_wq was allocated
> > with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper,
> > the OOM reaper chose a dedicated kernel thread rather than a workqueue
> > ( http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mhocko@xxxxxxxxxx ).
> >
> > Blocking actual printing until ongoing workqueue item calls schedule_timeout_*()
> > is not nice (see commit 373ccbe59270 and 564e81a57f97).
>
> do you mean a new worker allocation delay and a MAYDAY timer delay?
>

I don't know what MAYDAY is. I'm talking about a situation where printing_work
work item is not processed (i.e. printing_work_func() is not called) until
current work item calls schedule_timeout_*().

We had a problem that since vmstat_work work item was using system_wq,
vmstat_work work item was not processed (i.e. vmstat_update() was not called) if
kworker was looping inside memory allocator without calling schedule_timeout_*()
due to disk_events_workfn() doing GFP_NOIO allocation).

----------
[ 271.579276] MemAlloc: kworker/0:56(7399) gfp=0x2400000 order=0 delay=129294
[ 271.581237] ffff88007c78fa08 ffff8800778f8c80 ffff88007c790000 ffff8800778f8c80
[ 271.583329] 0000000002400000 0000000000000000 ffff8800778f8c80 ffff88007c78fa20
[ 271.585391] ffffffff8162aa9d 0000000000000001 ffff88007c78fa30 ffffffff8162aac7
[ 271.587463] Call Trace:
[ 271.588512] [<ffffffff8162aa9d>] preempt_schedule_common+0x18/0x2b
[ 271.590243] [<ffffffff8162aac7>] _cond_resched+0x17/0x20
[ 271.591830] [<ffffffff8111fafe>] __alloc_pages_nodemask+0x64e/0xcc0
[ 271.593561] [<ffffffff8116a3b2>] ? __kmalloc+0x22/0x190
[ 271.595119] [<ffffffff81160ce7>] alloc_pages_current+0x87/0x110
[ 271.596778] [<ffffffff812e95f4>] bio_copy_kern+0xc4/0x180
[ 271.598342] [<ffffffff810a6a00>] ? wait_woken+0x80/0x80
[ 271.599878] [<ffffffff812f25f0>] blk_rq_map_kern+0x70/0x130
[ 271.601481] [<ffffffff812ece35>] ? blk_get_request+0x75/0xe0
[ 271.603100] [<ffffffff814433fd>] scsi_execute+0x12d/0x160
[ 271.604657] [<ffffffff81443524>] scsi_execute_req_flags+0x84/0xf0
[ 271.606339] [<ffffffffa01db742>] sr_check_events+0xb2/0x2a0 [sr_mod]
[ 271.608141] [<ffffffff8109cbfc>] ? set_next_entity+0x6c/0x6a0
[ 271.609830] [<ffffffffa01cf163>] cdrom_check_events+0x13/0x30 [cdrom]
[ 271.611610] [<ffffffffa01dbb85>] sr_block_check_events+0x25/0x30 [sr_mod]
[ 271.613429] [<ffffffff812fc7eb>] disk_check_events+0x5b/0x150
[ 271.615065] [<ffffffff812fc8f1>] disk_events_workfn+0x11/0x20
[ 271.616699] [<ffffffff810827c5>] process_one_work+0x135/0x310
[ 271.618321] [<ffffffff81082abb>] worker_thread+0x11b/0x4a0
[ 271.620018] [<ffffffff810829a0>] ? process_one_work+0x310/0x310
[ 271.622022] [<ffffffff81087e53>] kthread+0xd3/0xf0
[ 271.623533] [<ffffffff81087d80>] ? kthread_create_on_node+0x1a0/0x1a0
[ 271.625487] [<ffffffff8162f09f>] ret_from_fork+0x3f/0x70
[ 271.627175] [<ffffffff81087d80>] ? kthread_create_on_node+0x1a0/0x1a0
----------

Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item,
printing_work_func() will not be called until current work item calls
schedule_timeout_*(). That will be an undesirable random delay. If you use
a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM,
we can avoid this random delay.

> > > +static void printing_work_func(struct work_struct *work)
> > > +{
> > > + console_lock();
> > > + console_unlock();
> > > +}
> >
> > Is this safe? If somebody invokes the OOM killer between console_lock()
> > and console_unlock(), won't this cause OOM killer messages not printed?
>
> if you mean something like
>
> console_lock();
> for (...) {
> do_foo() {
> ...
> pr_err(" ... foo message ...\n");
> ...
> }
> }
> console_unlock();
>
> then yes, nothing will be printed until that process executes console_unlock(),
> because it's console_unlock() that pushes the messages to console drivers.

Yes, I meant a sequence like

console_lock();
ptr = kmalloc(GFP_KERNEL);
kfree(ptr);
console_unlock();

and kmalloc() prints OOM killer messages rather than failing that allocation.
Are we sure that there is no such usage?