Re: [PATCH v2 2/7] serial: qcom-geni: fix shutdown race
From: Johan Hovold
Date: Fri Oct 11 2024 - 02:52:13 EST
On Thu, Oct 10, 2024 at 03:30:05PM -0700, Doug Anderson wrote:
> On Wed, Oct 9, 2024 at 7:10 AM Johan Hovold <johan@xxxxxxxxxx> wrote:
> > On Thu, Oct 03, 2024 at 11:30:08AM -0700, Doug Anderson wrote:
> > > Hmmm, when I look at that commit it makes me think that the problem
> > > that commit e83766334f96 ("tty: serial: qcom_geni_serial: No need to
> > > stop tx/rx on UART shutdown") was fixing was re-introduced by commit
> > > d8aca2f96813 ("tty: serial: qcom-geni-serial: stop operations in
> > > progress at shutdown"). ...and indeed, it was. :(
> > >
> > > I can't interact with kgdb if I do this:
> > >
> > > 1. ssh over to DUT
> > > 2. Kill the console process (on ChromeOS stop console-ttyMSM0)
> > > 3. Drop in the debugger (echo g > /proc/sysrq-trigger)
> >
> > Yeah, don't do that then. ;)
>
> The problem is, I don't always have a choice. As talked about in the
> message of commit e83766334f96 ("tty: serial: qcom_geni_serial: No
> need to stop tx/rx on UART shutdown"), the above steps attempt to
> simulate what happened organically: a crash in late shutdown. During
> shutdown the agetty has been killed by the init system and I don't
> have a choice about it. If I get a kernel crash then (which isn't
> uncommon since shutdown code tends to trigger seldom-used code paths)
> then I can't debug it. :(
Ok, thanks for clarifying.
> > Not sure how your "console process" works, but this should only happen
> > if you do not enable the serial console (console=ttyMSM0) and then try
> > to use a polled console (as enabling the console will prevent port
> > shutdown from being called).
>
> That simply doesn't seem to be the case for me. The port shutdown
> seems to be called. To confirm, I put a printout at the start of
> qcom_geni_serial_shutdown(). I see in my /proc/cmdline:
>
> console=ttyMSM0,115200n8
>
> ...and I indeed verify that I see console messages on my UART. I then run:
>
> stop console-ttyMSM0
>
> ...and I see on the UART:
>
> [ 92.916964] DOUG: qcom_geni_serial_shutdown
> [ 92.922703] init: console-ttyMSM0 main process (611) killed by TERM signal
>
> Console messages keep coming out the UART even though the agetty isn't
> there.
And this is with a Chromium kernel, not mainline?
If you take a look at tty_port_shutdown() there's a hack in there for
consoles that was added back in 2010 and that prevents shutdown() from
called for console ports.
Put perhaps you manage to hit shutdown() via some other path. Serial
core is not yet using tty_port_hangup() so a hangup might trigger
that...
Could you check that with a dump_stack()?
> Now I (via ssh) drop into the debugger:
>
> echo g > /proc/sysrq-trigger
>
> I see the "kgdb" prompt but I can't interact with it because
> qcom_geni_serial_shutdown() stopped RX.
How about simply amending poll_get_char() so that it enables the
receiver if it's not already enabled?
Johan