Re: [PATCH v2 2/7] serial: qcom-geni: fix shutdown race

From: Doug Anderson
Date: Thu Oct 10 2024 - 18:30:44 EST


Hi,

On Wed, Oct 9, 2024 at 7:10 AM Johan Hovold <johan@xxxxxxxxxx> wrote:
>
> On Thu, Oct 03, 2024 at 11:30:08AM -0700, Doug Anderson wrote:
> > On Tue, Oct 1, 2024 at 5:51 AM Johan Hovold <johan+linaro@xxxxxxxxxx> wrote:
> > >
> > > A commit adding back the stopping of tx on port shutdown failed to add
> > > back the locking which had also been removed by commit e83766334f96
> > > ("tty: serial: qcom_geni_serial: No need to stop tx/rx on UART
> > > shutdown").
> >
> > Hmmm, when I look at that commit it makes me think that the problem
> > that commit e83766334f96 ("tty: serial: qcom_geni_serial: No need to
> > stop tx/rx on UART shutdown") was fixing was re-introduced by commit
> > d8aca2f96813 ("tty: serial: qcom-geni-serial: stop operations in
> > progress at shutdown"). ...and indeed, it was. :(
> >
> > I can't interact with kgdb if I do this:
> >
> > 1. ssh over to DUT
> > 2. Kill the console process (on ChromeOS stop console-ttyMSM0)
> > 3. Drop in the debugger (echo g > /proc/sysrq-trigger)
>
> Yeah, don't do that then. ;)

The problem is, I don't always have a choice. As talked about in the
message of commit e83766334f96 ("tty: serial: qcom_geni_serial: No
need to stop tx/rx on UART shutdown"), the above steps attempt to
simulate what happened organically: a crash in late shutdown. During
shutdown the agetty has been killed by the init system and I don't
have a choice about it. If I get a kernel crash then (which isn't
uncommon since shutdown code tends to trigger seldom-used code paths)
then I can't debug it. :(

We need to fix this.


> Not sure how your "console process" works, but this should only happen
> if you do not enable the serial console (console=ttyMSM0) and then try
> to use a polled console (as enabling the console will prevent port
> shutdown from being called).

That simply doesn't seem to be the case for me. The port shutdown
seems to be called. To confirm, I put a printout at the start of
qcom_geni_serial_shutdown(). I see in my /proc/cmdline:

console=ttyMSM0,115200n8

...and I indeed verify that I see console messages on my UART. I then run:

stop console-ttyMSM0

...and I see on the UART:

[ 92.916964] DOUG: qcom_geni_serial_shutdown
[ 92.922703] init: console-ttyMSM0 main process (611) killed by TERM signal

Console messages keep coming out the UART even though the agetty isn't
there. Now I (via ssh) drop into the debugger:

echo g > /proc/sysrq-trigger

I see the "kgdb" prompt but I can't interact with it because
qcom_geni_serial_shutdown() stopped RX.


-Doug