Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

From: Michal Hocko
Date: Wed Dec 06 2023 - 08:50:00 EST


On Wed 06-12-23 12:08:05, Philipp Rudo wrote:
> On Fri, 1 Dec 2023 17:59:02 +0100
> Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> > On Fri 01-12-23 16:51:13, Philipp Rudo wrote:
> > > On Fri, 1 Dec 2023 12:55:52 +0100
> > > Michal Hocko <mhocko@xxxxxxxx> wrote:
> > >
> > > > On Fri 01-12-23 12:33:53, Philipp Rudo wrote:
> > > > [...]
> > > > > And yes, those are all what-if concerns but unfortunately that is all
> > > > > we have right now.
> > > >
> > > > Should theoretical concerns without an actual evidence (e.g. multiple
> > > > drivers known to be broken) become a roadblock for this otherwise useful
> > > > feature?
> > >
> > > Those concerns aren't just theoretical. They are experiences we have
> > > from a related feature that suffers exactly the same problem regularly
> > > which wouldn't exist if everybody would simply work "properly".
> >
> > What is the related feature?
>
> kexec

OK, but that is a completely different thing, no? crashkernel parameter
doesn't affect kexec. Or what is the actual relation?

> > > And yes, even purely theoretical concerns can become a roadblock for a
> > > feature when the cost of those theoretical concerns exceed the benefit
> > > of the feature. The thing is that bugs will be reported against kexec.
> > > So _we_ need to figure out which of the shitty drivers caused the
> > > problem. That puts additional burden on _us_. What we are trying to
> > > evaluate at the moment is if the benefit outweighs the extra burden
> > > with the information we have at the moment.
> >
> > I do understand your concerns! But I am pretty sure you do realize that
> > it is really hard to argue theoreticals. Let me restate what I consider
> > facts. Hopefully we can agree on these points
> > - the CMA region can be used by user space memory which is a
> > great advantage because the memory is not wasted and our
> > experience has shown that users do care about this a lot. We
> > _know_ that pressure on making those reservations smaller
> > results in a less reliable crashdump and more resources spent
> > on tuning and testing (especially after major upgrades). A
> > larger reservation which is not completely wasted for the
> > normal runtime is addressing that concern.
> > - There is no other known mechanism to achieve the reusability
> > of the crash kernel memory to stop the wastage without much
> > more intrusive code/api impact (e.g. a separate zone or
> > dedicated interface to prevent any hazardous usage like RDMA).
> > - implementation wise the patch has a very small footprint. It
> > is using an existing infrastructure (CMA) and it adds a
> > minimal hooking into crashkernel configuration.
> > - The only identified risk so far is RDMA acting on this memory
> > without using proper pinning interface. If it helps to have a
> > statement from RDMA maintainers/developers then we can pull
> > them in for a further discussion of course.
> > - The feature requires an explicit opt-in so this doesn't bring
> > any new risk to existing crash kernel users until they decide
> > to use it. AFAIU there is no way to tell that the crash kernel
> > memory used to be CMA based in the primary kernel. If you
> > believe that having that information available for
> > debugability would help then I believe this shouldn't be hard
> > to add. I think it would even make sense to mark this feature
> > experimental to make it clear to users that this needs some
> > time before it can be marked production ready.
> >
> > I hope I haven't really missed anything important. The final
>
> If I understand Documentation/core-api/pin_user_pages.rst correctly you
> missed case 1 Direct IO. In that case "short term" DMA is allowed for
> pages without FOLL_LONGTERM. Meaning that there is a way you can
> corrupt the CMA and with that the crash kernel after the production
> kernel has panicked.

Could you expand on this? How exactly direct IO request survives across
into the kdump kernel? I do understand the RMDA case because the IO is
async and out of control of the receiving end.

Also if direct IO is a problem how come this is not a problem for kexec
in general. The new kernel usually shares all the memory with the 1st
kernel.

/me confused.
--
Michal Hocko
SUSE Labs