Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

From: David Hildenbrand
Date: Wed Dec 06 2023 - 06:23:35 EST


On 06.12.23 12:08, Philipp Rudo wrote:
On Fri, 1 Dec 2023 17:59:02 +0100
Michal Hocko <mhocko@xxxxxxxx> wrote:

On Fri 01-12-23 16:51:13, Philipp Rudo wrote:
On Fri, 1 Dec 2023 12:55:52 +0100
Michal Hocko <mhocko@xxxxxxxx> wrote:
On Fri 01-12-23 12:33:53, Philipp Rudo wrote:
[...]
And yes, those are all what-if concerns but unfortunately that is all
we have right now.

Should theoretical concerns without an actual evidence (e.g. multiple
drivers known to be broken) become a roadblock for this otherwise useful
feature?

Those concerns aren't just theoretical. They are experiences we have
from a related feature that suffers exactly the same problem regularly
which wouldn't exist if everybody would simply work "properly".

What is the related feature?

kexec

And yes, even purely theoretical concerns can become a roadblock for a
feature when the cost of those theoretical concerns exceed the benefit
of the feature. The thing is that bugs will be reported against kexec.
So _we_ need to figure out which of the shitty drivers caused the
problem. That puts additional burden on _us_. What we are trying to
evaluate at the moment is if the benefit outweighs the extra burden
with the information we have at the moment.

I do understand your concerns! But I am pretty sure you do realize that
it is really hard to argue theoreticals. Let me restate what I consider
facts. Hopefully we can agree on these points
- the CMA region can be used by user space memory which is a
great advantage because the memory is not wasted and our
experience has shown that users do care about this a lot. We
_know_ that pressure on making those reservations smaller
results in a less reliable crashdump and more resources spent
on tuning and testing (especially after major upgrades). A
larger reservation which is not completely wasted for the
normal runtime is addressing that concern.
- There is no other known mechanism to achieve the reusability
of the crash kernel memory to stop the wastage without much
more intrusive code/api impact (e.g. a separate zone or
dedicated interface to prevent any hazardous usage like RDMA).
- implementation wise the patch has a very small footprint. It
is using an existing infrastructure (CMA) and it adds a
minimal hooking into crashkernel configuration.
- The only identified risk so far is RDMA acting on this memory
without using proper pinning interface. If it helps to have a
statement from RDMA maintainers/developers then we can pull
them in for a further discussion of course.
- The feature requires an explicit opt-in so this doesn't bring
any new risk to existing crash kernel users until they decide
to use it. AFAIU there is no way to tell that the crash kernel
memory used to be CMA based in the primary kernel. If you
believe that having that information available for
debugability would help then I believe this shouldn't be hard
to add. I think it would even make sense to mark this feature
experimental to make it clear to users that this needs some
time before it can be marked production ready.

I hope I haven't really missed anything important. The final

If I understand Documentation/core-api/pin_user_pages.rst correctly you
missed case 1 Direct IO. In that case "short term" DMA is allowed for
pages without FOLL_LONGTERM. Meaning that there is a way you can
corrupt the CMA and with that the crash kernel after the production
kernel has panicked.

With that I don't see a chance this series can be included unless
someone can explain me that that the documentation is wrong or I
understood it wrong.

I think you are right. We'd have to disallow any FOLL_PIN on these CMA pages, or find other ways of handling that (detect that there are no short-term pins any).

But, I'm also wondering how MMU-notifier-based approaches might interfere, where CMA pages might be transparently mapped into secondary MMUs, possibly having DMA going on.

Are we sure that all these secondary MMUs are inactive as soon as we kexec?

--
Cheers,

David / dhildenb