Re: [PATCH 0/4] kdump: crashkernel reservation from CMA
From: Baoquan He
Date: Mon Nov 27 2023 - 21:11:42 EST
On 11/28/23 at 09:12am, Tao Liu wrote:
> Hi Jiri,
>
> On Sun, Nov 26, 2023 at 5:22 AM Jiri Bohac <jbohac@xxxxxxx> wrote:
> >
> > Hi Tao,
> >
> > On Sat, Nov 25, 2023 at 09:51:54AM +0800, Tao Liu wrote:
> > > Thanks for the idea of using CMA as part of memory for the 2nd kernel.
> > > However I have a question:
> > >
> > > What if there is on-going DMA/RDMA access on the CMA range when 1st
> > > kernel crash? There might be data corruption when 2nd kernel and
> > > DMA/RDMA write to the same place, how to address such an issue?
> >
> > The crash kernel CMA area(s) registered via
> > cma_declare_contiguous() are distinct from the
> > dma_contiguous_default_area or device-specific CMA areas that
> > dma_alloc_contiguous() would use to reserve memory for DMA.
> >
> > Kernel pages will not be allocated from the crash kernel CMA
> > area(s), because they are not GFP_MOVABLE. The CMA area will only
> > be used for user pages.
> >
> > User pages for RDMA, should be pinned with FOLL_LONGTERM and that
> > would migrate them away from the CMA area.
> >
> > But you're right that DMA to user pages pinned without
> > FOLL_LONGTERM would still be possible. Would this be a problem in
> > practice? Do you see any way around it?
Thanks for the effort to bring this up, Jiri.
I am wondering how you will use this crashkernel=,cma parameter. I mean
the scenario of crashkernel=,cma. Asking this because I don't know how
SUSE deploy kdump in SUSE distros. In SUSE distros, kdump kernel's
initramfs is the same as the 1st kernel, or only contain those needed
kernel modules for needed devices. E.g if we dump to local disk, NIC
driver will be filter out? If latter case, It's possibly having the
on-flight DMA issue, e.g NIC has DMA buffer in the CMA area, but not
reset during kdump bootup because the NIC driver is not loaded in to
initialize. Not sure if this is 100%, possible in theory?
Recently we are seeing an issue that on a HPE system, PCI error messages
are always seen in kdump kernel, while it's a local dump, NIC device is
not needed and the igb driver is not loaded in. Then adding igb driver
into kdump initramfs can work around it. It's similar with above
on-flight DMA.
The crashkernel=,cma requires no userspace data dumping, from our
support engineers' feedback, customer never express they don't need to
dump user space data. Assume a server with huge databse deployed, and
the database often collapsed recently and database provider claimed that
it's not database's fault, OS need prove their innocence. What will you
do?
So this looks like a nice to have to me. At least in fedora/rhel's
usage, we may only back port this patch, and add one sentence in our
user guide saying "there's a crashkernel=,cma added, can be used with
crashkernel= to save memory. Please feel free to try if you like".
Unless SUSE or other distros decides to use it as default config or
something like that. Please correct me if I missed anything or took
anything wrong.
Thanks
Baoquan