Re: [PATCH] arm64: dts: renesas: r8a7796: Add CMT device nodes

From: Marc Zyngier
Date: Thu Nov 22 2018 - 10:56:01 EST


On 22/11/2018 15:30, Daniel Lezcano wrote:
>
> Added Marc Zyngier in Cc.

Thanks for looping me in.

>
> On 22/11/2018 16:16, Biju Das wrote:
>> Hello Daniel,
>>
>> Thanks for the feedback.
>>
>>>>> Subject: Re: [PATCH] arm64: dts: renesas: r8a7796: Add CMT device
>>>>> nodes
>>>>>
>>>>> On 19/11/2018 16:50, Biju Das wrote:
>>>>>> Hi Daniel,
>>>>>>
>>>>>> Thanks for the feedback.
>>>>>>
>>>>>>>>> Subject: Re: [PATCH] arm64: dts: renesas: r8a7796: Add CMT device
>>>>>>>>> nodes
>>>>>>>>>
>>>>>>>>> On 26/10/2018 10:25, Biju Das wrote:
>>>>>>>>>> This patch adds CMT{0|1|2|3} device nodes for r8a7796 SoC.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Biju Das <biju.das@xxxxxxxxxxxxxx>
>>>>>>>>>> ---
>>>>>>>>>> This patch is tested against renesas-dev
>>>>>>>>>>
>>>>>>>>>> I have executed on inconsistency-check, nanosleep and
>>>>>>>>>> clocksource_switch selftests on this arm64 SoC. The
>>>>>>>>>> inconsistency-check and nanosleep tests are working fine.The
>>>>>>>>>> clocksource_switch asynchronous test is failing due to
>>>>>>>>>> inconsistency-check
>>>>>>>>> failure on "arch_sys_counter".
>>>>>>>>>>
>>>>>>>>>> But if i skip the clocksource_switching of "arch_sys_counter",
>>>>>>>>>> the asynchronous test is passing for CMT0/1/2/3 timer.
>>>>>>>>>>
>>>>>>>>>> Has any one noticed this issue?
>>>>>>>>>
>>>>>>>>> So now that you mention that, I've been through the
>>>>>>>>> clocksource_switch on another ARM64 platform (hikey960) and
>>>>>>>>> disabled the
>>>>>>>>> ARM64_ERRATUM_858921 config option. I can see the same issue.
>>>>>>>>>
>>>>>>>>> Is this option set on your config ?
>>>>>>>>
>>>>>>>> No. As per " config ARM64_ERRATUM_858921", it is "Workaround for
>>>>>>> Cortex-A73 erratum 858921"
>>>>>>>>
>>>>>>>> Our SoC is 2xCA-57 + 4 x CA-53. Does it impact CA-57 + CA_53?

Erratum 858921 is strictly an A73 thing, and neither A57 nor A53 suffer
from this issue. If you're seeing something that looks similar, that's
probably because the integration of the global counter is less than
perfect (insert massive understatement here), similar to the way it is
broken with FSL's A008585 erratum.

>>>>>>>
>>>>>>> Dunno :/
>>>>>>>
>>>>>>>> Any way I will enable this config option and will provide you the
>>> results.
>>>>>>>
>>>>>>> Ok, thanks!
>>>>>>
>>>>>> The following config is enabled by default on upstream
>>>>>> kernel(4.20-rc3) CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y
>>>>>> CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND=y
>>>>>> CONFIG_FSL_ERRATUM_A008585=y
>>>>>> CONFIG_HISILICON_ERRATUM_161010101=y
>>>>>> CONFIG_ARM64_ERRATUM_858921=y
>>>>>>
>>>>>> For a quick testing, I have activated the erratum using the
>>>>>> property
>>>>> "fsl,erratum-a008585" on device tree.
>>>>>> With this I confirm the issue is fixed.
>>>>>>
>>>>>> I have some questions on this.
>>>>>> 1) Based on the test result ,do you think renesas soc also
>>>>>> impacted by the
>>>>> ARM64_ERRATUM_858921?
>>>>>> 2) Is there any way to find, is this Erratum actually causing the
>>>>> asynchronous test to fail?
>>>>>
>>>>> I guess, you can hack the __fsl_a008585_read_reg macro and check if
>>>>> the invalid condition is reached.
>>>>>
>>>>> This thread https://lkml.org/lkml/2018/5/10/773 will give you all the
>>>>> answers you are looking for (well very likely).
>>>>>
>>>>> Let me know if it helped.
>>>>
>>>> In our case , Delta: 174760 ns
>>>>
>>>> 1530553351:205762284
>>>> 1530553351:205762404
>>>> --------------------
>>>> 1530553351:205951226
>>>> 1530553351:205776466
>>>> --------------------
>>>>
>>>> I have tried the workaround for ARM64_ERRATUM_858921, that also fixes
>>> the issue.
>>>>
>>>> But all the workaround disables ARM64 VDSO. How do we conclude that is
>>> it VDSO issue or ARM64_ERRATUM issue?

The VDSO is just a userspace proxy for the counter, and nothing else. If
the counter is broken, we have no choice but to disable the VDSO and get
everyone on the slow path.

>>>
>>> May be disable all errata and set vdso_default to false?
>>>
>>> [ ... ]
>>>
>>> -static bool vdso_default = true;
>>> +static bool vdso_default = false;
>>>
>>> [ ... ]
>>
>> I have disabled the activation of errata from device tree and set vdso_default=false.
>> With this also it works fine. So looks like arm64 vdso is the issue in our case.

That doesn't make much sense, as all it means is that you take an extra
trap to access the exact same data.

>>
>> Do you agree with the conclusion?
>
> I agree we have an element to investigate the issue. That is very
> specific to this timer and a good knowledge of the internals is required.
>
> Certainly Marc Zyngier or Mark Rutland can help here.
>
>> [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=6
>> [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
>> [ 0.000000] arch_timer: cp15 timer(s) running at 8.32MHz (virt).
>> [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1eb398c07, max_idle_ns: 440795202503 ns
>> [ 0.000004] sched_clock: 56 bits at 8MHz, resolution 120ns, wraps every 2199023255503ns

Now that is a bit more interesting: How is this kernel booted? The fact
that is uses the virtual timer and not the physical one makes me think
that it is booted at EL1 instead of EL2. The Â100 question is whether
CNTVOFF_EL2 is consistently set to the same value across CPUs.

Thanks,

M.
--
Jazz is not dead. It just smells funny...