Re: [PATCH v18 04/19] EDAC: Add memory repair control feature
From: Mauro Carvalho Chehab
Date: Wed Jan 15 2025 - 06:35:55 EST
Em Tue, 14 Jan 2025 11:35:21 -0800
Dan Williams <dan.j.williams@xxxxxxxxx> escreveu:
> > There is no way to tell that the topology hasn't changed.
> > For the reasons above, I don't think we care. Instead of trying to stop
> > userspace reparing the wrong memory, make sure it is safe for it to do that.
> > (The kernel is rarely in the business of preventing the slightly stupid)
>
> If the policy is "error records with SPA from the current boot can be
> trusted" and "userspace requests outside of current boot error records
> must only be submitted to known offline" then I think we are aligned.
Surely userspace cannot infere if past errors on SPA are for the same DPA
block, but it may still decide between soft/hard PPR based on different
criteria adopted by the machine admins - or use instead memory sparing.
So, yeah sanity checks at Kernel level to identify "trust" level based
on having DPA data or not makes sense, but the final decision about
the action should be taken on userspace.
Thanks,
Mauro