Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

From: Leizhen (ThunderTown)
Date: Sat Jan 30 2021 - 04:51:37 EST




On 2021/1/29 21:54, Leizhen (ThunderTown) wrote:
>
>
> On 2021/1/29 18:26, Arnd Bergmann wrote:
>> On Fri, Jan 29, 2021 at 9:16 AM Arnd Bergmann <arnd@xxxxxxxxxx> wrote:
>>> On Fri, Jan 29, 2021 at 8:23 AM Leizhen (ThunderTown)
>>> <thunder.leizhen@xxxxxxxxxx> wrote:
>>>> On 2021/1/28 22:24, Arnd Bergmann wrote:
>>>>> On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei <thunder.leizhen@xxxxxxxxxx> wrote:
>>>>>> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
>>>>>> +
>>>>>> +static void l3cache_maint_common(u32 range, u32 op_type)
>>>>>> +{
>>>>>> + u32 reg;
>>>>>> +
>>>>>> + reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
>>>>>> + reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
>>>>>> + reg |= range | op_type;
>>>>>> + reg |= L3_MAINT_STATUS_START;
>>>>>> + writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
>>>>>
>>>>> Are there contents of L3_MAINT_CTRL that need to be preserved
>>>>> across calls and can not be inferred? A 'readl()' is often expensive,
>>>>> so it might be more efficient if you can avoid that.
>>>>
>>>> Right, this readl() can be replaced with readl_relaxed(). Thanks.
>>>>
>>>> I'll check and correct the readl() and writel() in other places.
>>>
>>> What I meant is that if you want to replace them, you should provide
>>> performance numbers that show how much difference this makes
>>> and add comments in the source code explaining how you proved that
>>> the _relaxed() version is actually correct.
>>
>> Another clarification, as there are actually two independent
>> points here:
>>
>> * if you can completely remove the readl() above and just write a
>> hardcoded value into the register, or perhaps read the original
>> value once at boot time, that is probably a win because it
>> avoids one of the barriers in the beginning. The datasheet should
>> tell you if there are any bits in the register that have to be
>> preserved
>
> Code coupling will become very strong.
>
>>
>> * Regarding the _relaxed() accessors, it's a lot harder to know
>> whether that is safe, as you first have to show, in particular in case
>> any of the accesses stop being guarded by the spinlock in that
>> case, and whether there may be a case where you have to
>> serialize the memory access against accesses that are still in the
>> store queue or prefetched.
>>
>> Whether this matters at all depends mostly on the type of devices
>> you are driving on your SoC. If you have any high-speed network
>> interfaces that are unable to do cache coherent DMA, any extra
>> instruction here may impact the number of packets you can transfer,
>> but if all your high-speed devices are connected to a coherent
>> interconnect, I would just go with the obvious approach and use
>> the safe MMIO accessors everywhere.
>
> In fact, this driver has been running on an earlier version for several years
> and has not received any feedback about the performance issue. So I didn't
> try to optimize it when I first sent these patches. I had to reconsider it
> until you noticed it.
>
> How about keeping it unchanged for the moment? It'll take a lot of time and
> energy to retest.

In the spirit of code excellence, it's still necessary to optimize it.
Yesterday, my family urged me to go back, I wrote it in a hurry.

>
>>
>> Arnd
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> .
>