Re: Banana Pi-R1 stabil

From: Gerhard Wiesinger
Date: Tue Mar 05 2019 - 14:21:33 EST


On 05.03.2019 10:28, Maxime Ripard wrote:
On Sat, Mar 02, 2019 at 09:42:08AM +0100, Gerhard Wiesinger wrote:
On 01.03.2019 10:30, Maxime Ripard wrote:
On Thu, Feb 28, 2019 at 08:41:53PM +0100, Gerhard Wiesinger wrote:
On 28.02.2019 10:35, Maxime Ripard wrote:
On Wed, Feb 27, 2019 at 07:58:14PM +0100, Gerhard Wiesinger wrote:
On 27.02.2019 10:20, Maxime Ripard wrote:
On Sun, Feb 24, 2019 at 09:04:57AM +0100, Gerhard Wiesinger wrote:
Hello,

I've 3 Banana Pi R1, one running with self compiled kernel
4.7.4-200.BPiR1.fc24.armv7hl and old Fedora 25 which is VERY STABLE, the 2
others are running with Fedora 29 latest, kernel 4.20.10-200.fc29.armv7hl. I
tried a lot of kernels between of around 4.11
(kernel-4.11.10-200.fc25.armv7hl) until 4.20.10 but all had crashes without
any output on the serial console or kernel panics after a short time of
period (minutes, hours, max. days)

Latest known working and stable self compiled kernel: kernel
4.7.4-200.BPiR1.fc24.armv7hl:

https://www.wiesinger.com/opensource/fedora/kernel/BananaPi-R1/

With 4.8.x the DSA b53 switch infrastructure has been introduced which
didn't work (until ca8931948344c485569b04821d1f6bcebccd376b and kernel
4.18.x):

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/dsa/b53?h=v4.20.12

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/drivers/net/dsa/b53?h=v4.20.12

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/dsa/b53?h=v4.20.12&id=ca8931948344c485569b04821d1f6bcebccd376b

I has been fixed with kernel 4.18.x:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/drivers/net/dsa/b53?h=linux-4.18.y


So current status is, that kernel crashes regularly, see some samples below.
It is typically a "Unable to handle kernel paging request at virtual addres"

Another interesting thing: A Banana Pro works well (which has also an
Allwinner A20 in the same revision) running same Fedora 29 and latest
kernels (e.g. kernel 4.20.10-200.fc29.armv7hl.).

Since it happens on 2 different devices and with different power supplies
(all with enough power) and also the same type which works well on the
working old kernel) a hardware issue is very unlikely.

I guess it has something to do with virtual memory.

Any ideas?
[47322.960193] Unable to handle kernel paging request at virtual addres 5675d0
That line is a bit suspicious

Anyway, cpufreq is known to cause those kind of errors when the
voltage / frequency association is not correct.

Given the stack trace and that the BananaPro doesn't have cpufreq
enabled, my first guess would be that it's what's happening. Could you
try using the performance governor and see if it's more stable?

If it is, then using this:
https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test

will help you find the offending voltage-frequency couple.
For me it looks like they have all the same config regarding cpu governor
(Banana Pro, old kernel stable one, new kernel unstable ones)
The Banana Pro doesn't have a regulator set up, so it will only change
the frequency, not the voltage.

They all have the ondemand governor set:

I set on the 2 unstable "new kernel Banana Pi R1":

# Set to max performance
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
What are the results?
Stable since more than around 1,5 days. Normally they have been crashed for
such a long uptime. So it looks that the performance governor fixes it.

I guess crashes occour because of changing CPU voltage and clock changes and
invalid data (e.g. also invalid RAM contents might be read, register
problems, etc).

Any ideas how to fix it for ondemand mode, too?
Run https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test

But it doesn't explaing that it works with kernel 4.7.4 without any
problems.
My best guess would be that cpufreq wasn't enabled at that time, or
without voltage scaling.

Where can I see the voltage scaling parameters?

on DTS I don't see any difference between kernel 4.7.4 and 4.20.10 regarding
voltage:

dtc -I dtb -O dts -o
/boot/dtb-4.20.10-200.fc29.armv7hl/sun7i-a20-lamobo-r1.dts
/boot/dtb-4.20.10-200.fc29.armv7hl/sun7i-a20-lamobo-r1.dtb
This can be also due to configuration being changed, driver support, etc.

Where will the voltages for scaling then be set in detail (drivers, etc.)?



There is another strange thing (tested with
kernel-5.0.0-0.rc8.git1.1.fc31.armv7hl, kernel-4.19.8-300.fc29.armv7hl,
kernel-4.20.13-200.fc29.armv7hl, kernel-4.20.10-200.fc29.armv7hl):

There is ALWAYS high CPU of around 10% in kworker:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
18722 root      20   0       0      0      0 I   9.5   0.0 0:47.52
[kworker/1:3-events_freezable_power_]

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
  776 root      20   0       0      0      0 I   8.6   0.0 0:02.77
[kworker/0:4-events]
The first one looks like it's part of the workqueue code.


Any guessed reason for that?



Therefore CPU doesn't switch to low frequencies (see below).
You said previously that those crashes were happening when the board
was changing frequency, so I'm confused?


For the ondemand setting: due to the high load of kworker, the frequency is not changing often to lower values (but does some time and crashes also regularly)

For the performance setting: frequency is fixed (to maximum in the current configuration) and is stable



Any ideas?
Run the cpustress program I told you to use already twice.

Had no time to try it yet. Will do. See also my comment below regarding idle CPU and high CPU.



BTW: Still stable at aboout 2,5days on both devices. So solution IS the
performance governor.
No, the performance governor prevents any change in frequency. My
guess is that a lower frequency operating point is not working and is
crashing the CPU.


Yes, there might at least 2 scenarios:

1.) Frequency switching itself is the problem

2.) lower frequency/voltage operating points are not stable.

For both scenarios: it might be possible that the crash happens on idle CPU, high CPU load or just randomly. Therefore just "waiting" might be better than 100% CPU utilization.But will test also 100% CPU.

Therefore it would be good to see where the voltages for different frequencies for the SoC are defined (to compare).


I'm currently testing 2 different settings on the 2 new Banana Pi R1 with newest kernel (see below), so 2 static frequencies:

# Set to specific frequency 144000 (currently testing on Banana Pi R1 #1)

# Set to specific frequency 312000 (currently testing on Banana Pi R1 #2)

If that's fine I'll test also further frequencies (with different loads).

Thnx.

Ciao,

Gerhard


# Set to max performance (stable)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "144000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "144000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to ondemand (not stable)
echo "ondemand" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "ondemand" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "144000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "144000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 144000 (currently testing on Banana Pi R1 #1)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "144000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "144000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "144000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "144000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 312000 (currently testing on Banana Pi R1 #2)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "312000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "312000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "312000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "312000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 528000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "528000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "528000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "528000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "528000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 720000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "720000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "720000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "720000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "720000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 864000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "864000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "864000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "864000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "864000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 912000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "912000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "912000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "912000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "912000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 960000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "960000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "960000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq