Re: [PATCH v2 00/13] Qcom: LLCC/EDAC: Fix base address used for LLCC banks

From: Krzysztof Kozlowski
Date: Tue Dec 13 2022 - 13:47:57 EST


On 13/12/2022 18:57, Manivannan Sadhasivam wrote:
> On Tue, Dec 13, 2022 at 05:54:56PM +0100, Krzysztof Kozlowski wrote:
>> On 13/12/2022 06:28, Manivannan Sadhasivam wrote:
>>> On Mon, Dec 12, 2022 at 01:23:40PM -0600, Andrew Halaney wrote:
>>>> On Mon, Dec 12, 2022 at 06:02:58PM +0530, Manivannan Sadhasivam wrote:
>>>>> The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
>>>>> accessing the (Control and Status Regsiters) CSRs of each LLCC bank.
>>>>> This offset only works for some SoCs like SDM845 for which driver support
>>>>> was initially added.
>>>>>
>>>>> But the later SoCs use different register stride that vary between the
>>>>> banks with holes in-between. So it is not possible to use a single register
>>>>> stride for accessing the CSRs of each bank. By doing so could result in a
>>>>> crash with the current drivers. So far this crash is not reported since
>>>>> EDAC_QCOM driver is not enabled in ARM64 defconfig and no one tested the
>>>>> driver extensively by triggering the EDAC IRQ (that's where each bank
>>>>> CSRs are accessed).
>>>>>
>>>>> For fixing this issue, let's obtain the base address of each LLCC bank from
>>>>> devicetree and get rid of the fixed stride.
>>>>>
>>>>> This series affects multiple platforms but I have only tested this on
>>>>> SM8250 and SM8450. Testing on other platforms is welcomed.
>>>>>
>>>>
>>>> Tested-by: Andrew Halaney <ahalaney@xxxxxxxxxx> # sa8540p-ride
>>>>
>>>
>>> Thanks!
>>>
>>>> I took this for a quick spin on the qdrive3 I've got access to without
>>>> any issue:
>>>>
>>>> [root@localhost ~]# modprobe qcom_edac
>>>> [root@localhost ~]# dmesg | grep -i edac
>>>> [ 0.620723] EDAC MC: Ver: 3.0.0
>>>> [ 1.165417] ghes_edac: GHES probing device list is empty
>>>> [ 594.688103] EDAC DEVICE0: Giving out device to module qcom_llcc_edac controller llcc: DEV qcom_llcc_edac (INTERRUPT)
>>>> [root@localhost ~]# cat /proc/interrupts | grep ecc
>>>> 174: 0 0 0 0 0 0 0 0 GICv3 614 Level llcc_ecc
>>>> [root@localhost ~]#
>>>>
>>>> Potentially stupid question, but are users expected to manually load the
>>>> driver as I did? I don't see how it would be loaded automatically in the
>>>> current state, but thought it was funny that I needed to modprobe
>>>> myself.
>>>>
>>>> Please let me know if you want me to do any more further testing!
>>>>
>>>
>>> Well, I always ended up using the driver as a built-in. I do make it module for
>>> build test but never really used it as a module, so didn't catch this issue.
>>>
>>> This is due to the module alias not exported by the qcom_edac driver. Below
>>> diff allows kernel to autoload it:
>>>
>>> diff --git a/drivers/edac/qcom_edac.c b/drivers/edac/qcom_edac.c
>>> index f7afb5375293..13919d01c22d 100644
>>> --- a/drivers/edac/qcom_edac.c
>>> +++ b/drivers/edac/qcom_edac.c
>>> @@ -419,3 +419,4 @@ module_platform_driver(qcom_llcc_edac_driver);
>>>
>>> MODULE_DESCRIPTION("QCOM EDAC driver");
>>> MODULE_LICENSE("GPL v2");
>>> +MODULE_ALIAS("platform:qcom_llcc_edac");
>>
>> While this is a way to fix it, but instead of creating aliases for wrong
>> names, either a correct name should be used or driver should receive ID
>> table.
>>
>
> I'm not sure how you'd fix it with a _correct_ name here.

Hm, I assumed that it would be enough if driver name would match device
name. Currently these two are not in sync. Maybe it's not enough when
built as module?

> Also, the id table is
> an overkill since there is only one driver that is making use of it. And
> moreover, there is no definite ID to use.

Every driver with a single device support has usually ID table and it's
not a problem...

Best regards,
Krzysztof