Re: Orange PI 5 MAX: very unstable using kernel 6.19.0 and 6.18.10, 6.18.9 perfectly stable
From: Qu Wenruo
Date: Thu Feb 12 2026 - 17:48:34 EST
在 2026/2/13 08:53, David Arendt 写道:
On 2/12/26 10:05 PM, Qu Wenruo wrote:
在 2026/2/13 06:41, David Arendt 写道:
Hello,
I am using a Kubernetes Cluster with 3 Orange PI5 MAX nodes. The data is stored using a btrfs filesystem as backend. If using kernel 6.19.0 or kernel 6.18.10 I have experienced many crashes during high IO load on all 3 nodes. Reverting back to 6.18.9 solves the problems completely. Unfortunately the crashes are spontaneous reboots without leaving a trace in any logfile, so I have no stacktrace of them. After the crashes I have sometimes incorrect btrfs csums for a file but these may also be a result of a partial write due to the crash. On one node I had a btrfs error logged without crashing, but I am not sure if this is the root cause or a result of a prior crash. A scrub after reboot returned no error with 6.19.0.
The offending tree dump items are:
Feb 10 13:31:07 opi02 kernel: item 92 key (13218356101120
Feb 10 13:31:07 opi02 kernel: item 93 key (13216208642048
Feb 10 13:31:07 opi02 kernel: item 94 key (13218356162560
Obviously item 93 is smaller than all its previous and next item keys.
hex(13218356101120) = 0xc05a36b8000
hex(13216208642048) = 0xc05236be000
hex(13218356162560) = 0xc05a36c7000
It looks like something fliped, "0xc05a3" -> "0xc0523"
0xa -> 0x2 is exactly one bit flipped.
So either the memory hardware has something wrong and resulting a sticking bit (always 0), or there is something inside the kernel touching memory it shouldn't.
And this exactly matches the symptom, changing random bit of your kernel, crash always expected.
Can you run a memtest to make sure it is not hardware problems first?
Hello,
I don't know of anything like memtest86 for the arm64 platform for testing the whole memory, so I used the user space memtester to check the 14G of unused ram on all 3 machines while using kernel 6.18.10.
Here is the result of the first iteration (same on every machine):
memtester version 4.7.1 (64-bit)
Copyright (C) 2001-2024 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 14000MB (14680064000 bytes)
got 14000MB (14680064000 bytes), trying mlock ...locked.
Loop 1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
I don't think it is hardware a failure as it is happening on 3 different machines. Crashes occur somewhere between 30 minutes and 12 hours on all 3 machines that have been running without a single crash for more than a year now with older kernel versions including 4 days with 6.18.9 and all version from 6.18.0 to 6.18.9, so it seems to be caused by something that has changed between 6.18.9 and 6.18.10.
Then I'm afraid you have to try bisecting.
On the other hand, I also have a arm64 board (Orion O6) as a VM host.
The testing arm64 VM is running a kernel very close to v6.19.0, but never hit such a crash/corruption.
So I'm wondering it may be some driver, specific to RK3588, that is corrupting memory randomly that caused the problem.
In the past (several years ago), we had amd sfh driver causing random corruptions in x86_64, and led to the exactly same problem (random crash, btrfs corruption detected etc).
So I guess it can be the same situation.
Thanks,
Qu
Thanks,
David Arendt
Thanks,
Qu
Unfortunately I don't have more information at the moment.
Thanks in advance,
David Arendt