Re: [PATCH 1/2] mwifiex: Use non-posted PCI register writes

From: Jonas Dreßler
Date: Thu Sep 30 2021 - 12:22:53 EST


On 9/30/21 6:19 PM, Pali Rohár wrote:
On Thursday 30 September 2021 18:14:04 Jonas Dreßler wrote:
On 9/30/21 5:42 PM, Pali Rohár wrote:
On Thursday 30 September 2021 17:38:43 Jonas Dreßler wrote:
On 9/23/21 10:22 PM, Pali Rohár wrote:
On Thursday 23 September 2021 22:41:30 Andy Shevchenko wrote:
On Thu, Sep 23, 2021 at 6:28 PM Jonas Dreßler <verdre@xxxxxxx> wrote:
On 9/22/21 2:50 PM, Jonas Dreßler wrote:

...

- Just calling mwifiex_write_reg() once and then blocking until the card
wakes up using my delay-loop doesn't fix the issue, it's actually
writing multiple times that fixes the issue

These observations sound a lot like writes (and even reads) are actually
being dropped, don't they?

It sounds like you're writing into a not ready (fully powered on) device.

This reminds me a discussion with Bjorn about CRS response returned
after firmware crash / reset when device is not ready yet:
https://lore.kernel.org/linux-pci/20210922164803.GA203171@bhelgaas/

Could not be this similar issue? You could check it via reading
PCI_VENDOR_ID register from config space. And if it is not valid value
then card is not really ready yet.

To check this, try to put a busy loop for reading and check the value
till it gets 0.

Something like

unsigned int count = 1000;

do {
if (mwifiex_read_reg(...) == 0)
break;
} while (--count);


--
With Best Regards,
Andy Shevchenko

I've tried both reading PCI_VENDOR_ID and the firmware status using a busy
loop now, but sadly none of them worked. It looks like the card always
replies with the correct values even though it sometimes won't wake up after
that.

I do have one new observation though, although I've no clue what could be
happening here: When reading PCI_VENDOR_ID 1000 times to wakeup we can
"predict" the wakeup failure because exactly one (usually around the 20th)
of those 1000 reads will fail.

What does "fail" means here?

ioread32() returns all ones, that's interpreted as failure by
mwifiex_read_reg().

Ok. And can you check if PCI Bridge above this card has enabled CRSSVE
bit (CRSVisible in RootCtl+RootCap in lspci output)? To determinate if
Bridge could convert CRS response to all-ones as failed transaction.


Seems like that bit is disabled:
> RootCap: CRSVisible-
> RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-



Maybe the firmware actually tries to wake up,
encounters an error somewhere in its wakeup routines and then goes down a
special failure code path. That code path keeps the cards CPU so busy that
at some point a PCI_VENDOR_ID request times out?

Or well, maybe the card actually wakes up fine, but we don't receive the
interrupt on our end, so many possibilities...