Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

From: Sibi Sankar
Date: Thu Jun 04 2020 - 14:17:40 EST


On 2020-06-03 15:37, govinds@xxxxxxxxxxxxxx wrote:
Hi Mani,

On 2020-06-03 05:57, Manivannan Sadhasivam wrote:
On Tue, Jun 02, 2020 at 01:04:26PM -0700, Brian Norris wrote:
On Tue, Jun 2, 2020 at 12:40 PM John Stultz <john.stultz@xxxxxxxxxx> wrote:
> On Tue, Jun 2, 2020 at 12:16 PM Brian Norris <briannorris@xxxxxxxxxxxx> wrote:
> > On Mon, Jun 1, 2020 at 10:25 PM John Stultz <john.stultz@xxxxxxxxxx> wrote:
> > >
> > > Ever since 5.7-rc1, if we call
> > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> > > reboot, resulting in the device getting stuck in the usb crash
> > > debug mode and not coming back up wihthout a hard power off.
> > >
> > > This hack avoids the issue by returning early in
> > > ath10k_qmi_event_server_exit().
> > >
> > > A better solution is very much desired!
> >
> > Any chance you can bisect what caused this? There are a lot of
> > non-ath10k pieces involved in this stuff.
>
> Amit had spent some work on chasing it down to the in kernel qrtr-ns
> work, and reported it here:
> https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
>
> But that discussion seemingly stalled out, so I came up with this hack
> to workaround it for us.

If I'm reading it right, then that means we should revert this stuff
from v5.7-rc1:

0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace

At least, until people can resolve the tail end of that thread. New
features (ath11k, etc.) are not a reason to break existing features
(ath10k/wcn3990).

I don't agree with this. If you read through the replies to the bug report,
it is clear that NS migration uncovered a corner case or even a bug. So we
should try to fix that indeed.

Govind: Did you get chance to work on fixing this issue?


I have done basic testing by moving msa map/unmap from qmi service
callbacks to init/de-init path.
I will send patch for review.
Reason for del_server needs to investigated from rproc side.

Govind,
On receiving SIGTERM, rmtfs would try
to perform a graceful shutdown of the
modem, that should be the source of
the del_server.


Thanks,
Mani


Brian

Thanks,
Govind

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.