Re: Regression: QCA6390 fails with "mm/page_alloc: place pages to tail in __free_pages_core()"
From: Kalle Valo
Date: Wed Nov 11 2020 - 14:24:21 EST
David Hildenbrand <david@xxxxxxxxxx> writes:
>> Am 05.11.2020 um 11:42 schrieb Vlastimil Babka <vbabka@xxxxxxx>:
>>
>> On 11/5/20 10:04 AM, Kalle Valo wrote:
>>> (changing the subject, adding more lists and people)
>>> Pavel Procopiuc <pavel.procopiuc@xxxxxxxxx> writes:
>>>> Op 04.11.2020 om 10:12 schreef Kalle Valo:
>>>>> Yeah, it is unfortunately time consuming but it is the best way to get
>>>>> bottom of this.
>>>>
>>>> I have found the commit that breaks things for me, it's
>>>> 7fef431be9c9ac255838a9578331567b9dba4477 mm/page_alloc: place pages to
>>>> tail in __free_pages_core()
>>>>
>>>> I've reverted it on top of the 5.10-rc2 and ath11k driver loads fine
>>>> and I have wifi working.
>>> Oh, very interesting. Thanks a lot for the bisection, otherwise we would
>>> have never found out whats causing this.
>>> David & mm folks: Pavel noticed that his QCA6390 Wi-Fi 6 device (driver
>>> ath11k) failed on v5.10-rc1. After bisecting he found that the commit
>>> below causes the regression. I have not been able to reproduce this and
>>> for me QCA6390 works fine. I don't know if this needs a specific kernel
>>> configuration or what's the difference between our setups.
>>> Any ideas what might cause this and how to fix it?
>>> Full discussion:
>>> http://lists.infradead.org/pipermail/ath11k/2020-November/000501.html
>>> commit 7fef431be9c9ac255838a9578331567b9dba4477
>>> Author: David Hildenbrand <david@xxxxxxxxxx>
>>> AuthorDate: Thu Oct 15 20:09:35 2020 -0700
>>> Commit: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>> CommitDate: Fri Oct 16 11:11:18 2020 -0700
>>> mm/page_alloc: place pages to tail in __free_pages_core()
>>
>> Let me paste from the ath11k discussion:
>>
>>> * Relevant errors from the log:
>>> # journalctl -b | grep -iP '05:00|ath11k'
>>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 0x028000
>>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: reg 0x10: [mem
>>> 0xd2100000-0xd21fffff 64bit]
>>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot D3cold
>>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: 4.000 Gb/s
>>> available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at
>>> 0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
>>> Nov 02 10:41:26 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
>>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: WARNING:
>>> ath11k PCI support is experimental!
>>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: BAR 0:
>>> assigned [mem 0xd2100000-0xd21fffff 64bit]
>>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: enabling
>>> device (0000 -> 0002)
>>> Nov 02 10:41:27 razor kernel: mhi 0000:05:00.0: Requested to power ON
>>> Nov 02 10:41:27 razor kernel: mhi 0000:05:00.0: Power on setup success
>>> Nov 02 10:41:27 razor kernel: ath11k_pci 0000:05:00.0: Respond mem
>>> req failed, result: 1, err: 0
>>
>> This seems to be ath11k_qmi_respond_fw_mem_request(). Why is it
>> failure with error 0? No idea.
>>
>> What would happen if all the GFP_KERNEL in the file were changed to GFP_DMA32?
>>
>> I'm thinking the hardware perhaps doesn't like too high physical
>> addresses or something. But if I think correctly, freeing to tail
>> should actually move them towards head. So it's weird.
>
> It depends in which order memory is exposed to MM, which might depend
> on other factors in some configurations.
>
> This smells like it exposes an existing bug. Can you reproduce also
> with zone shuffling enabled?
I think I can reproduce this bug now on my NUC box by disabling vt-d in
the BIOS, but I'm not sure yet if it really is the same problem or not.
I included some debug messages to this ath11k patch:
https://patchwork.kernel.org/project/linux-wireless/patch/1605121102-14352-1-git-send-email-kvalo@xxxxxxxxxxxxxx/
Pavel, can you test with that patch on v5.10-rc2 and provide the ath11k
log messages? Preferably both before and after reverting commit
7fef431be9c9. Do note that I'm not expecting the debug patch to fix
anything, in your case it's just for providing more debug info.
With vt-d disabled on v5.10-rc2 before the revert I see:
ath11k_pci 0000:06:00.0: WARNING: ath11k PCI support is experimental!
ath11k_pci 0000:06:00.0: BAR 0: assigned [mem 0xdb000000-0xdbffffff 64bit]
ath11k_pci 0000:06:00.0: enabling device (0000 -> 0002)
ath11k_pci 0000:06:00.0: MSI vectors: 1
NET: Registered protocol family 42
mhi 0000:06:00.0: Requested to power ON
mhi 0000:06:00.0: Power on setup success
ath11k_pci 0000:06:00.0: Respond mem req failed, result: 1, err: 0
ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-22
ath11k_pci 0000:06:00.0: req mem_seg[0] 0x1580000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[1] 0x1600000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[2] 0x1680000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[3] 0x1700000 294912 1
ath11k_pci 0000:06:00.0: req mem_seg[4] 0x1780000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[5] 0x1800000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[6] 0x1880000 458752 1
ath11k_pci 0000:06:00.0: req mem_seg[7] 0x1520000 131072 1
ath11k_pci 0000:06:00.0: req mem_seg[8] 0x1900000 524288 4
ath11k_pci 0000:06:00.0: req mem_seg[9] 0x1980000 360448 4
ath11k_pci 0000:06:00.0: req mem_seg[10] 0x1540000 16384 1
ath11k_pci 0000:06:00.0: qmi failed memory request, err = -110
ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-110
With vt-d disabled on v5.10-rc2 and reverting commit 7fef431be9c9 I see:
ath11k_pci 0000:06:00.0: WARNING: ath11k PCI support is experimental!
ath11k_pci 0000:06:00.0: BAR 0: assigned [mem 0xdb000000-0xdbffffff 64bit]
ath11k_pci 0000:06:00.0: MSI vectors: 1
mhi 0000:06:00.0: Requested to power ON
mhi 0000:06:00.0: Power on setup success
ath11k_pci 0000:06:00.0: Respond mem req failed, result: 1, err: 0
ath11k_pci 0000:06:00.0: qmi failed to respond fw mem req:-22
ath11k_pci 0000:06:00.0: req mem_seg[0] 0x76300000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[1] 0x76380000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[2] 0x76a00000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[3] 0x76a80000 294912 1
ath11k_pci 0000:06:00.0: req mem_seg[4] 0x76b00000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[5] 0x76b80000 524288 1
ath11k_pci 0000:06:00.0: req mem_seg[6] 0x76400000 458752 1
ath11k_pci 0000:06:00.0: req mem_seg[7] 0x761a0000 131072 1
ath11k_pci 0000:06:00.0: req mem_seg[8] 0x76480000 524288 4
ath11k_pci 0000:06:00.0: req mem_seg[9] 0x76500000 360448 4
ath11k_pci 0000:06:00.0: req mem_seg[10] 0x76580000 16384 1
ath11k_pci 0000:06:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0xffffffff
ath11k_pci 0000:06:00.0: fw_version 0x101c06cc fw_build_timestamp 2020-06-24 19:50 fw_build_id
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches