Re: [RFC] kdb: Switch kdb_printf to use safer console poll APIs
From: Sumit Garg
Date: Tue May 26 2020 - 03:46:32 EST
On Fri, 22 May 2020 at 21:33, Daniel Thompson
<daniel.thompson@xxxxxxxxxx> wrote:
>
> On Fri, May 22, 2020 at 08:04:31PM +0530, Sumit Garg wrote:
> > In kgdb NMI context, polling driver APIs are more safer to use instead
> > of console APIs since the polling drivers know they will execute from
> > all sorts of crazy places. And for the most common use cases this would
> > also result in no console handler ever being called. So switch to use
> > polling driver APIs in case a particular console supports polling mode.
>
> This comment seems rather half hearted, not least because it doesn't
> explain what the current problem is nor why using the polling API is
> safer.
>
TBH, some sentences in the above comment were borrowed from your
suggestion here [1]. But I agree that it doesn't portray the complete
picture. So how about:
====
In kgdb NMI context, calling console handlers isn't safe due to locks
used in those handlers which could lead to a deadlock. Although, using
oops_in_progress increases the chance to bypass locks in most console
handlers but it might not be sufficient enough in case a console uses
more locks (VT/TTY is good example).
So instead switch to use lockless polling driver APIs in case a
particular console supports polling mode which is common for most kdb
use-cases and would result in no console handler ever being called.
====
[1] https://lkml.org/lkml/2020/5/20/356
> Compare the above against the advice in
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
> and I think it comes up short. Perhaps also consider Ingo Molnar's much
> more concise suggestion on describing changes:
>
> : Please use the customary changelog style we use in the kernel:
> : " Current code does (A), this has a problem when (B).
> : We can improve this doing (C), because (D)."
> -- http://lkml.iu.edu/hypermail//linux/kernel/1311.1/01157.html
Thanks for the pointers.
>
>
> > Suggested-by: Daniel Thompson <daniel.thompson@xxxxxxxxxx>
> > Signed-off-by: Sumit Garg <sumit.garg@xxxxxxxxxx>
> > ---
> > kernel/debug/kdb/kdb_io.c | 39 +++++++++++++++++++++++++++++++++------
> > 1 file changed, 33 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
> > index 3a5a068..8e0d581 100644
> > --- a/kernel/debug/kdb/kdb_io.c
> > +++ b/kernel/debug/kdb/kdb_io.c
> > @@ -24,6 +24,7 @@
> > #include <linux/kgdb.h>
> > #include <linux/kdb.h>
> > #include <linux/kallsyms.h>
> > +#include <linux/tty_driver.h>
> > #include "kdb_private.h"
> >
> > #define CMD_BUFLEN 256
> > @@ -699,11 +700,24 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
> > }
> > }
> > for_each_console(c) {
> > + int line;
> > + struct tty_driver *p;
> > +
> > if (!(c->flags & CON_ENABLED))
> > continue;
> > - ++oops_in_progress;
> > - c->write(c, cp, retlen - (cp - kdb_buffer));
> > - --oops_in_progress;
> > + p = c->device ? c->device(c, &line) : NULL;
> > + if (p && p->ops && p->ops->poll_put_char) {
>
> What prevents this logic from matching an active console that hasn't
> been selected as the polling driver?
Yes you are correct and it could lead to invoking poll_put_char()
without poll_init(). And we couldn't invoke poll_init() here as that
still comes with locks and could sleep. So one way to overcome this
would be to pass selected polling driver via dbg_io_ops and use
polling APIs only if the underlying console driver matches that
polling driver.
>
>
> > + len = retlen - (cp - kdb_buffer);
> > + cp2 = cp;
> > + while (len--) {
> > + p->ops->poll_put_char(p, line, *cp2);
> > + cp2++;
> > + }
>
> Assuming it is possible to identify the console that matches the
> currently selected polling driver can't we just drop the
> is_console test and get rid of this branch entirely.
Have a look at my suggested approach above.
>
> The only reason for the is_console test is to avoid issuing messages
> twice so if we are able to suppress the c->write() for the same UART
> then is_console check becomes pointless and can go.
I did consider removing is_console check but it looks like it's not
only limited to polling drivers but also used at other places (see [1]
[2]) as well.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/early/ehci-dbgp.c#n1061
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/tty/serial/kgdb_nmi.c#n48
>
>
> > + } else {
> > + ++oops_in_progress;
> > + c->write(c, cp, retlen - (cp - kdb_buffer));
> > + --oops_in_progress;
> > + }
> > touch_nmi_watchdog();
> > }
> > }
> > @@ -765,11 +779,24 @@ int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
> > }
> > }
> > for_each_console(c) {
> > + int line;
> > + struct tty_driver *p;
> > +
> > if (!(c->flags & CON_ENABLED))
> > continue;
> > - ++oops_in_progress;
> > - c->write(c, moreprompt, strlen(moreprompt));
> > - --oops_in_progress;
> > + p = c->device ? c->device(c, &line) : NULL;
> > + if (p && p->ops && p->ops->poll_put_char) {
> > + len = strlen(moreprompt);
> > + cp = moreprompt;
> > + while (len--) {
> > + p->ops->poll_put_char(p, line, *cp);
> > + cp++;
> > + }
> > + } else {
> > + ++oops_in_progress;
> > + c->write(c, moreprompt, strlen(moreprompt));
> > + --oops_in_progress;
> > + }
>
> Maybe also consider pulling the string emit to a separate function.
>
Okay.
-Sumit
>
> Daniel.