Re: [Qestion] UCE on pud-sized hugepage lead to kernel panic on lts5.10

From: HORIGUCHI NAOYA(堀口 直也)
Date: Wed Dec 14 2022 - 20:01:21 EST


On Wed, Dec 14, 2022 at 09:33:10AM +0800, mawupeng wrote:
> On current arm64 stable 5.10(v5.10.158). If a UCE happnes pud-sized
> hugepage, kernel will panic since current memory failure can not handle
> this kind of memory failure since commit 31286a8484a8 ("mm: hwpoison:
> disable memory error handling on 1GB hugepage")
>
> The latest kernel(v6.0) can handle this UCE since commit 6f4614886baa ("mm,
> hwpoison: enable memory error handling on 1GB hugepage"). We are trying to
> backport this patchset to stable 5.10, however too many other patches
> should be backport since there are huge difference between 5.10 and 6.0.
> The full patch list will be shown at the end of this mail.
>
> We do not think backport all of these patches is doable for stable 5.10. Is
> there any better way to fix this problem.

Sorry, I have no idea about this. I think that backporting to stable kernel
is done only for small bug fixes, which is not the case for enablement of
handling uncorrected error on 1GB hugepages. So as Greg commented, using
latest (stable) kernel seems to me the second best.

Thanks,
Naoya Horiguchi