Re: [Query] Preemption (hogging) of the work handler

From: Viresh Kumar
Date: Mon Jul 11 2016 - 15:03:31 EST


Hi Jan,

On 11-07-16, 12:26, Jan Kara wrote:
> Yes. We have similar problems as you observe on machines when they do a lot
> of printing (usually due to device discovery or similar reasons). The
> problem is not fully solved even upstream as Andrew is reluctant to merge
> the patches. Sergey (added to CC) has the latest version of the series [1].

Yeah, I saw these patches on last Thursday. I backported all printk patches from
3.10 to mainline to my 3.10 branch and applied your patches on the top.

It did work for my case (thanks) and I wanted to give a Tested-by on the thread
[1], but by that time it was late Friday for me :)

Though I saw a issue with that.

[ 12.874909] sched: RT throttling activated for rt_rq ffffffc0ac13fcd0 (cpu 0)
[ 12.874909] potential CPU hogs:
[ 12.874909] printk (292)

On my system, the excessive printing happens during suspend/resume and this
happened after all the non-boot CPUs were offlined. So, only CPU 0 was left and
that was doing printing for a long time and so these errors :)

It resulted in missing some print messages eventually as the scheduler probably
didn't schedule this thread for sometime after that.

Will it be fine to get the priority of this kthread to a somewhat lower value,
etc ?

> If you are interested, I can send you the patches for 3.12 kernel which we
> carry in SLES kernels and which fixes the issue for us. It is significanly
> different from current upstream version but it works good enough for us.

Thanks, that will be a good thing to have. I am currently backport 100+ patches
from 3.10 to mainline for printk :)

Please send them to me (please make sure that you send all the patches touching
drivers/printk/ after 3.12, so that I am not left solving merge conflicts for
ever :).

--
viresh