Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

From: Tetsuo Handa
Date: Wed Dec 20 2017 - 07:06:57 EST


Sergey Senozhatsky wrote:
> Steven said that this scenario is possible, but is not of any particular
> interest, because printk from IRQ or from any other atomic context is a
> bad thing, which should happen only when something wrong is going on in
> the system. but we are in OOM or has just returned from the OOM. which _is_
> "something bad going on", isn't it? can we instead say - OOM makes that
> printk from atomic context more likely? if it does happen, will there be
> non-atomic printk-s to take over printing from atomic CPUz? we can't tell.
> I don't know much about Tetsuo's test, but I assume that his VM does not
> have any networking activities during the test. I probably wouldn't be so
> surprised to see a bunch of printk-s from atomic contexts under OOM.

I'm using VMware Workstation Player, and my VM does not have any network
activity other than ssh login session. Fortunately, VMware's serial console
(written to host's file) is reliable enough to allow console=ttyS0,115200n8
configuration. But there is a virtualization software where serial console is
so weak that I have to choose netconsole instead. Also, there are enterprise
servers where very slow configuration (e.g. 1200 or 9600) has to be used for
serial console because serial device is emulated using system management
interrupts instead of using real hardware. Therefore, while it is true that
any approach would survive my environment, it is dangerous to assume that any
approach is safe for my customer's enterprise servers.

Thanks for summarizing the pointers. The safest way for not overflowing
printk() will be to use mutex_lock(&oom_lock) at __alloc_pagesmay_oom() (and
yield the CPU resource to the thread flushing the logbuf), but so far we
have not came to agreement. Fortunately, since warn_alloc() for reporting
allocation stall was killed in 4.15-rc1, the risk of overflowing printk()
under OOM was reduced a lot. But yes, since my VM has little network
activity, printk() flooding due to allocation failure might happen in
different VMs.

Anyway, the rule that "do not try to printk() faster than the kernel can
write to consoles" will remain no matter how printk() changes. I think that
any printk() users has to be careful not to waste CPU resource. MM's direct
reclaim + back off combination is a user who really love to waste CPU resource
while someone is printk()ing.