Re: Regression: memory corruption on Atmel SAMA5D31
From: Peter Rosin
Date: Thu Mar 03 2022 - 04:17:45 EST
On 2022-03-03 04:02, Saravana Kannan wrote:
> On Wed, Mar 2, 2022 at 4:29 PM Peter Rosin <peda@xxxxxxxxxx> wrote:
>>
>> Hi!
>>
>> I'm seeing a weird problem, and I'd like some help with further
>> things to try in order to track down what's going on. I have
>> bisected the issue to
>>
>> f9aa460672c9 ("driver core: Refactor fw_devlink feature")
>
> I skimmed through your email and I'll read it more closely tomorrow,
> but it wasn't clear if you see this on Linus's tip of the tree too.
> Asking because of:
> https://lore.kernel.org/lkml/20210930085714.2057460-1-yangyingliang@xxxxxxxxxx/
>
> Also, a couple of other data points that _might_ help. Try kernel
> command line option fw_devlink=permissive vs fw_devlink=on (I forget
> if this was the default by 5.10) vs fw_devlink=off.
>
> I'm expecting "off" to fix the issue for you. But if permissive vs on
> shows a difference driver issues would start becoming a real
> possibility.
>
> -Saravana
Thanks for the quick reply! I don't think I tested the very tip of
Linus tree before, only latest rc or something like that, but now I
have. I.e.
5859a2b19911 ("Merge branch 'ucount-rlimit-fixes-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
It would have been typical if an issue that existed for a couple of
years had been fixed the last few weeks, but alas, no.
On that kernel, and with whatever the default fw_devlink value is, the
issue is there. It's a bit hard to tell if the incident probability
is the same when trying fw_devlink arguments, but roughly so, and I
do not have to wait for long to get a bad hash with the first
reproducer
while :; do cat testfile | sha256sum; done
The output is typical:
78464c59faa203413aceb5f75de85bbf4cde64f21b2d0449a2d72cd2aadac2a3 -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
e03c5524ac6d16622b6c43f917aae730bc0793643f461253c4646b860c1a7215 -
1b8db6218f481cb8e4316c26118918359e764cc2c29393fd9ef4f2730274bb00 -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
7d60bf848911d3b919d26941be33c928c666e9e5666f392d905af2d62d400570 -
212e1fe02c24134857ffb098f1834a2d87c655e0e5b9e08d4929f49a070be97c -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
7e33e751eb99a0f63b4f7d64b0a24f3306ffaf7c4bc4b27b82e5886c8ea31bc3 -
d7a1f08aa9d0374d46d828fc3582f5927e076ff229b38c28089007cd0599c645 -
4fc963b7c7b14df9d669500f7c062bf378ff2751f705bb91eecd20d2f896f6fe -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
9360d886046c12d983b8bc73dd22302c57b0aafe58215700604fa977b4715fbe -
4f9173f63cb2e13d1470e59e1b5c657f3b0f4f2e9a55ab6facffbb03f34ce04d -
Setting fw_devlink=off makes no difference, AFAICT.
So, just to double-check I went back to 5.11.22 with the two
mentioned patches reverted [1], plus an added backport of
c73960bb0a43 ("gpiolib: allow line names from device props to override driver names")
in order to make userspace behave as similarly as possible.
I left that running for an hour or so with 350-ish hashes
calculated correctly. Which is no proof that there is no latent
issue of course, but at the very least a great deal more stable
than later kernels.
Cheers,
Peter
[1]
f9aa460672c9 ("driver core: Refactor fw_devlink feature")
2d09e6eb4a6f ("driver core: Delete pointless parameter in fwnode_operations.add_links")