Re: QCA6174 pcie wifi: Add pci quirks

From: Ingmar Klein
Date: Sat Jun 05 2021 - 10:46:45 EST


Hi Pali and Bjorn,

finally found the time to test.
Pali's v3 patch seems to work like a charm for my card with "0x003e" id
as well.
Just finished compiling a pve-kernel v5.11.21 with Pali's patch,
slightly adjusted for my test card and the Ubuntu kernel source (no
functional differences, just minor adjustments to make it fit the
Proxmox pve-kernel).

System works just fine, in contrast to without patch. Of course, no long
term tests, yet. However, it is looking really good.
Thanks guys!

Best regards,
Ingmar


Am 28.05.2021 um 20:47 schrieb Ingmar Klein:
Hi Pali,
sorry for not checking that detail!
Of course no problem that you couldn't test that ID. Will be glad to
do so.

I'll let you know how this turns out.

Best regards,
Ingmar


Am 28.05.2021 um 20:21 schrieb Pali Rohár:
Hello Ingmar!

Now I see that in your patch you have Atheros card with id 0x003e:
https://lore.kernel.org/linux-pci/08982e05-b6e8-5a8d-24ab-da1488ee50a8@xxxxxx/


With my patch I have tested 5 different Atheros cards but none has id
0x003e:
https://lore.kernel.org/linux-pci/20210505163357.16012-1-pali@xxxxxxxxxx/


So my patch does not fix that issue for your 0x003e card. I just do not
have such card for testing.

Could you try to apply my patch and then add your id 0x003e into quirk
list if it helps?

On Friday 28 May 2021 20:08:52 Ingmar Klein wrote:
Thanks to both of you, Bjorn and Pali!
I had hoped that Pali would come with an appropriate fix. Good to know,
that this is taken care of.

Will test ASAP, but I am confident, that it will work anyway.
Should it unexpectedly not fix my issues, I'll let you know.
Have a nice weekend!
Best regards,
Ingmar


Am 26.05.2021 um 00:12 schrieb Bjorn Helgaas:
On Thu, Apr 15, 2021 at 09:53:38PM +0200, Pali Rohár wrote:
Hello!

On Thursday 15 April 2021 13:01:19 Alex Williamson wrote:
[cc +Pali]

On Thu, 15 Apr 2021 20:02:23 +0200
Ingmar Klein <ingmar_klein@xxxxxx> wrote:

First thanks to you both, Alex and Bjorn!
I am in no way an expert on this topic, so I have to fully rely
on your
feedback, concerning this issue.

If you should have any other solution approach, in form of
patch-set, I
would be glad to test it out. Just let me know, what you think
might
make sense.
I will wait for your further feedback on the issue. In the
meantime I
have my current workaround via quirk entry.

By the way, my layman's question:
Do you think, that the following topic might also apply for the
QCA6174?
https://www.spinics.net/lists/linux-pci/msg106395.html
I have been testing more ath cards and I'm going to send a new
version
of this patch with including more PCI ids.
Dropping this patch in favor of Pali's new version.

Or in other words, should a similar approach be tried for the
QCA6174
and if yes, would it bring any benefit at all?
I hope you can excuse me, in case the questions should not make
too much
sense.
If you run lspci -vvv on your device, what do LnkCap and LnkSta
report
under the express capability?  I wonder if your device even supports
Gen1 speeds, mine does not.
I would not expect that patch to be relevant to you based on your
report.  I understand it to resolve an issue during link
retraining to a
higher speed on boot, not during a bus reset.  Pali can correct
if I'm
wrong.  Thanks,
These two issues are are related. Both operations (PCIe Hot Reset and
PCIe Link Retraining) cause reset of ath chips. Seems that they cause
double reset. After reset these chips reads configuration from
internal
EEPROM/OTP and if another reset is triggered prior chip finishes
internal configuration read then it stops working. My testing showed
that ath10k chips completely disappear from the PCIe bus, some ath9k
chips works fine but starts reporting incorrect PCI ID (0xABCD)
and some
other ath9k chips reports correct PCI ID but does not work. I had
discussion with Adrian Chadd who knows probably everything about
ath9k
and confirmed me that this issue is there with ath9k and ath10k
chips.

He wrote me that workaround to turn card back from this "broken"
state
is to do PCIe Cold Reset of the card, which means turning power
supply
off for particular PCIe slot. Such thing is not supported on many
low-end boards, so workaround cannot be applied.

I was able to recover my testing cards from this "broken" state by
PCIe
Warm Reset (= reset via PERST# pin).

I have tried many other reset methods (PCIe PM reset, Link Down, PCIe
Hot Reset with bigger internal, ...) but nothing worked. So seems
that
the only workaround is to do PCIe Cold Reset or PCIe Warm Reset.

I will send V2 of my patch with details and explanation.

As kernel does not have API for doing PCIe Warm Reset, I think is
another argument why kernel really needs it.

I do not have any QCA6174 card for testing, but based on the fact I
reproduced this issue with more ath9k and ath10 cards and Adrian
confirmed that above reset issue is there, I think that it affects
all
AR9xxx and QCAxxxx cards handled by ath9k and ath10 drivers.

I was told that AMI BIOS was patching their BIOSes found in
notebooks to
avoid triggering this issue on notebooks ath9k cards.

Alex

Am 15.04.2021 um 04:36 schrieb Alex Williamson:
On Wed, 14 Apr 2021 16:03:50 -0500
Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

[+cc Alex]

On Fri, Apr 09, 2021 at 11:26:33AM +0200, Ingmar Klein wrote:
Edit: Retry, as I did not consider, that my mail-client would
make this
party html.

Dear maintainers,
I recently encountered an issue on my Proxmox server system,
that
includes a Qualcomm QCA6174 m.2 PCIe wifi module.
https://deviwiki.com/wiki/AIRETOS_AFX-QCA6174-NX

On system boot and subsequent virtual machine start (with
passed-through
QCA6174), the VM would just freeze/hang, at the point where
the ath10k
driver loads.
Quick search in the proxmox related topics, brought me to the
following
discussion, which suggested a PCI quirk entry for the QCA6174
in the kernel:
https://forum.proxmox.com/threads/pcie-passthrough-freezes-proxmox.27513/


I then went ahead, got the Proxmox kernel source (v5.4.106)
and applied
the attached patch.
Effect was as hoped, that the VM hangs are now gone. System
boots and
runs as intended.

Judging by the existing quirk entries for Atheros, I would
think, that
my proposed "fix" could be included in the vanilla kernel.
As far as I saw, there is no entry yet, even in the latest
kernel sources.
This would need a signed-off-by; see
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.11#n361


This is an old issue, and likely we'll end up just applying
this as
yet another quirk.  But looking at c3e59ee4e766 ("PCI: Mark
Atheros
AR93xx to avoid bus reset"), where it started, it seems to be
connected to 425c1b223dac ("PCI: Add Virtual Channel to
save/restore
support").

I'd like to dig into that a bit more to see if there are any
clues.
AFAIK Linux itself still doesn't use VC at all, and
425c1b223dac added
a fair bit of code.  I wonder if we're restoring something out of
order or making some simple mistake in the way to restore VC
config.
I don't really have any faith in that bisect report in commit
c3e59ee4e766.  To double check I dug out the card from that
commit,
installed an old Fedora release so I could build kernel v3.13,
pre-dating 425c1b223dac and tested triggering a bus reset both via
setpci and by masking PM reset so that sysfs can trigger the
bus reset
path with the kernel save/restore code.  Both result in the system
hanging when the device is accessed either restoring from the
kernel
bus reset or reading from the device after the setpci reset. 
Thanks,

Alex