Re: [PATCH v2] x86/kexec: Exclude GART aperture from vmcore

From: Baoquan He
Date: Wed Dec 27 2017 - 02:49:38 EST


On 12/19/17 at 06:58pm, Jiri Bohac wrote:

Sorry for late response. Please see the inline comments.

>
> On Tue, Dec 19, 2017 at 09:58:04AM +0800, Baoquan He wrote:
> > Hmm, as I have said in the first replying mail, the v2 will introduce
> > issues:
> >
> > 1) If 'iommu=off' is specified in 1st kernel but not in kdump kernel, it
> > will ignore the ram we need dump.
>
> yes, instead of crashing the machine (because GART may be initialized in the
> 2nd kernel, overlapping the 1st kernel memory, which the 2nd kernel with its
> fake e820 map sees as unused).
>
> I'd say this is an improvement.

I don't get what you said. If 'iommu=off' only specified in 1st kernel,
kdump kernel will think the memory which GART bar pointed as a hole.
This is incorrect. I don't see the improvement.

>
> > 2) If 'iommu=off' is specified in kdump kernel, but not in 1st kernel,
> > it won't get the GART region, this patch does't work.
>
> No. It will work:
>
> First kernel initializes the GART (either in a hole properly provided by the
> BIOS or overlapping e820 RAM).
>
> Second kernel will start with the GART initialized. In gart_iommu_hole_init()
> the setting is read from the northbridge registers and verified as valid. It
> does not overlap e820 memory, because the second kernel has a fake e820 map
> only spanning the crashkernel= reserved range. "fix" is never set to 1, so it
> will exclude GART from vmcore in this path:
>
> out:
> if (!fix && !fallback_aper_force) {
> if (last_aper_base) {
> exclude_from_vmcore(last_aper_base, last_aper_order);
> return 1;
>
> (fix is never set to 1)
> no_iommu is only checked after that.

Seems yes. Well, the interesting thing is 'iommu=off' doesn't even work,
right? Well, I don't know why the GART hardware/firmware/implementation
is so ..., well, freaky. Even though 'iommu=off' is specified
explicitly, it will initialize anyway.
>
>
> > 3) If people enable GART in bios, there's a ram memory hole for GART.
> > Nothing need to do while kdump kernel doesn't know GART is enabled or
> > not in bios, will try to avoid it anyway. It won't hurt anythig though,
> > in logic it's not suggested since confusion will be brought in.
>
> I don't have easy access to the HP machines. I have a machine right here in our
> lab that has this issue. It has no "enable GART" setting in BIOS. It has a
> "enable IOMMU" setting. The bug stays there regardless of the setting.
> It's old. Noone will fix the firmware. The patch fixes it.

OK, then we need fix it. In fact, in my personal opinion, if there's
a chance, we should avoid to fix it, because
..GART is too old, and systems with GART rarely are seen currently;
..The code is too freaky, no clear code comment. As you can see, we
usually clean up codes around too when we fix a found issue. While
there's no way to begin to do clean up for GART, and it's not worth
doing that.

I understand you could get a bug report from other people, and have to
fix it as an assignee. And this fix is located in aperture_64.c only,
I am fine it's done like this. Maybe you can try the way I suggested
that only removing the region from io resource, but not touching anything
else, if you have interest.

So if have to, could you add some code comments around your fix to notice
people why these code are introduced? Commit log can help to understand
added code, while sometime file moving may make this checking very hard.

Thanks
Baoquan