Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
From: Petr Mladek
Date: Fri Dec 15 2017 - 03:32:24 EST
On Fri 2017-12-15 14:06:07, Sergey Senozhatsky wrote:
> Hello,
>
> On (12/14/17 22:18), Steven Rostedt wrote:
> > > Steven, your approach works ONLY when we have the following preconditions:
> > >
> > > a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > > etc) context
> > >
> > > what does guarantee that? what happens if there is NO non-atomic
> > > CPU or that non-atomic simplky missses the console_owner != false
> > > point? we are going to conclude
> > >
> > > "if printk() doesn't work for you, it's because you are holding it wrong"?
> > >
> > >
> > > what if that non-atomic CPU does not call printk(), but instead
> > > it does console_lock()/console_unlock()? why there is no handoff?
> > >
> > > CPU0 CPU1 ~ CPU10
> > > in atomic contexts [!]. ping-ponging console_sem
> > > ownership to each other. while what they really
> > > need to do is to simply up() and let CPU0 to
> > > handle it.
> > > printk
> > > console_lock()
> > > schedule()
> > > ...
> > > printk
> > > printk
> > > ...
> > > printk
> > > printk
> > >
> > > up()
> > >
> > > // woken up
> > > console_unlock()
> > >
> > > why do we make an emphasis on fixing vprintk_printk()?
Is the above scenario really dangerous? console_lock() owner is
able to sleep. Therefore there is no risk of a softlockup.
Sure, many messages will get stacked in the meantime and the console
owner my get then passed to another owner in atomic context. But
do you really see this in the real life?
> > Where do we do the above? And has this been proven to be an issue?
>
> um... hundreds of cases.
>
> I work for a company that has several thousand engineers spread
> across the globe. and people do use printk(), and issues do happen.
Do people have issues with the current upstream printk() or
still even with Steven's patch?
My current view is that Steven's patch could not make things
worse. I was afraid of possible deadlock but it seems that I was
wrong. Other than that the patch should make things just better
because it allows to pass the work from time to time a safe way.
Of course, there is a chance that it will pass the work from
a safe context to atomic one. But there was the same chance that
the work already started in the atomic context. Therefore statistically
this should not make things worse.
This is why I suggest to start with Steven's solution. If people
would still see problems in the real life then we could think
about how to fix it. It is quite likely that we would need to add
offloading to the kthreads in the end but there is a chance...
In each case, I think that is better to split in into
two or even more steps than introducing one mega-complex
change. And given the many-years resistance against offloading
I tend to start with Steven's approach.
Does this make some sense, please?
Best Regards,
Petr