Re: [PATCH 3/7] kgdb: Add request_nmi() to the io ops table for kgdboc
From: Sumit Garg
Date: Tue Jun 30 2020 - 02:09:46 EST
On Mon, 29 Jun 2020 at 17:15, Daniel Thompson
<daniel.thompson@xxxxxxxxxx> wrote:
>
> On Fri, Jun 26, 2020 at 12:44:15PM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Jun 23, 2020 at 3:59 AM Daniel Thompson
> > <daniel.thompson@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Jun 23, 2020 at 02:07:47PM +0530, Sumit Garg wrote:
> > > > On Mon, 22 Jun 2020 at 21:33, Daniel Thompson
> > > > <daniel.thompson@xxxxxxxxxx> wrote:
> > > > > > + irq_set_status_flags(irq, IRQ_NOAUTOEN);
> > > > > > + res = request_nmi(irq, fn, IRQF_PERCPU, "kgdboc", dev_id);
> > > > >
> > > > > Why do we need IRQF_PERCPU here. A UART interrupt is not normally
> > > > > per-cpu?
> > > > >
> > > >
> > > > Have a look at this comment [1] and corresponding check in
> > > > request_nmi(). So essentially yes UART interrupt is not normally
> > > > per-cpu but in order to make it an NMI, we need to request it in
> > > > per-cpu mode.
> > > >
> > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq/manage.c#n2112
> > >
> > > Thanks! This is clear.
> > >
> > > > > > + if (res) {
> > > > > > + res = request_irq(irq, fn, IRQF_SHARED, "kgdboc", dev_id);
> > > > >
> > > > > IRQF_SHARED?
> > > > >
> > > > > Currrently there is nothing that prevents concurrent activation of
> > > > > ttyNMI0 and the underlying serial driver. Using IRQF_SHARED means it
> > > > > becomes possible for both drivers to try to service the same interrupt.
> > > > > That risks some rather "interesting" problems.
> > > > >
> > > >
> > > > Could you elaborate more on "interesting" problems?
> > >
> > > Er... one of the serial drivers we have allowed the userspace to open
> > > will, at best, be stone dead and not passing any characters.
> > >
> > >
> > > > BTW, I noticed one more problem with this patch that is IRQF_SHARED
> > > > doesn't go well with IRQ_NOAUTOEN status flag. Earlier I tested it
> > > > with auto enable set.
> > > >
> > > > But if we agree that both shouldn't be active at the same time due to
> > > > some real problems(?) then I can rid of IRQF_SHARED as well. Also, I
> > > > think we should unregister underlying tty driver (eg. /dev/ttyAMA0) as
> > > > well as otherwise it would provide a broken interface to user-space.
> > >
> > > I don't have a particular strong opinion on whether IRQF_SHARED is
> > > correct or not correct since I think that misses the point.
> > >
> > > Firstly, using IRQF_SHARED shows us that there is no interlocking
> > > between kgdb_nmi and the underlying serial driver. That probably tells
> > > us about the importance of the interlock than about IRQF_SHARED.
> > >
> > > To some extent I'm also unsure that kgdb_nmi could ever actually know
> > > the correct flags to use in all cases (that was another reason for the
> > > TODO comment about poll_get_irq() being a bogus API).
> >
> > I do wonder a little bit if the architecture of the "kgdb_nmi_console"
> > should change. I remember looking at it in the past and thinking it a
> > little weird that if I wanted to get it to work I'd need to change my
> > "console=" command line to go through this new driver and (I guess)
> > change the agetty I have running on my serial port to point to
> > ttyNMI0. Is that how it's supposed to work? Then if I want to do a
> > build without kgdb then I need to go in and change my agetty to point
> > back at my normal serial port?
> >
> > It kinda feels like a better way to much of what the driver does would be to:
> >
> > 1. Allow kgdb to sniff incoming serial bytes on a port and look for
> > its characters. We already have this feature in the kernel to a small
> > extent for sniffing a break / sysrq character.
> >
> > 2. If userspace doesn't happen to have the serial port open then
> > ideally we could open the port (using all the standard APIs that
> > already exist) from in the kernel and just throw away all the bytes
> > (since we already sniffed them). As soon as userspace tried to open
> > the port when it would get ownership and if userspace ever closed the
> > port then we'd start reading / throwing away bytes again.
> >
> > If we had a solution like that:
> >
> > a) No serial drivers would need to change.
> >
> > b) No kernel command line parameters would need to change.
> >
> > Obviously that solution wouldn't magically get you an NMI, though.
> > For that I'd presume the right answer would be to add a parameter for
> > each serial driver that can support it to run its rx interrupt in NMI
> > mode.
>
Thanks Doug for the suggestions.
> ... or allow modal changes to the uart driver when kgdboc comes up?
>
> We already allow UART drivers to de-optimize themselves and use
> different code paths when polling is enabled so its not totally crazy
> ;-).
>
>
> > Of course, perhaps I'm just confused and crazy and the above is a
> > really bad idea.
>
> Thanks for bringing this up.
>
> Sumit and I were chatting last week and our discussion went in a similar
> direction (I think not exactly the same which is why it is good to
> see your thoughts too).
>
> Personally I think it comes down to how intrusive adding NMI support is
> to serial drivers. kgdb_nmi is rather hacky and feels a bit odd to
> enable. It is clearly intended to avoid almost all changes to the UART
> driver. On our side we have been wondering whether the serial core can
> add helpers to make it easy for a serial driver to implement an simple,
> safe but not optimal NMI implementation. Making it easy to have
> safety-first might make NMI more palatable.
>
I am currently working on a PoC in this direction and hopeful to come
up with least intrusive NMI support to serial drivers.
>
> > Speaking of confused: is there actually any way to use the existing
> > kgdb NMI driver (CONFIG_SERIAL_KGDB_NMI) in mainline without out of
> > tree patches? When I looked before I assumed it was just me that was
> > outta luck because I didn't have NMI at the time, but I just did some
> > grepping and I can't find anyplace in mainline where
> > "arch_kgdb_ops.enable_nmi" would not be NULL. Did I miss it, or do we
> > need out-of-tree patches to enable this?
>
> Out-of-tree...
Yeah and this patch-set derived from Daniel's work was one of them.
>
> If, after looking at other approaches, we do all agree to nuke kgdb_nmi
> then there shouldn't be much impediment (nor that many tears).
>
Makes sense.
-Sumit
>
> Daniel.