Re: [PATCH] x86/kexec: Exclude GART aperture from vmcore
From: Jiri Bohac
Date: Wed Nov 29 2017 - 07:27:42 EST
On Wed, Nov 29, 2017 at 10:43:07AM +0800, Baoquan He wrote:
> On 11/28/17 at 10:58pm, Jiri Bohac wrote:
> > On Sun, Nov 12, 2017 at 04:04:26PM +0800, Baoquan He wrote:
> > > Solution:
> > > 1) Remove the code which support GART IOMMU when it's not enabled in
> > > BIOS. This has been done in the new generation of hardware IOMMU like
> > > intel vt-d IOMMU and amd-Vi IOMMU. We should not make GART IOMMU be
> > > exceptional.
> >
> > Wouldn't this break old machines with actual AGP and
> > misconfigured/bugg BIOSes? Wasn't that the reason why we have the
> > workaround of mapping the hole over real memory?
>
> Hmm, a quick question, does it work when GART support is enabled in
> bios? In intel vt-d and amd-vi iommu, if user doesn't enable it in bios,
> the functionality will be disabled in kernel, why would we not do that
> for GART IOMMU? and why is GART so special?
My feeling is that there is no technical reason (perhaps there
were more broken BIOSes in the AGP days?). The main reason is
that the kernel has included the workaround since ever and
removing it now will break these machines. Even if most of them
would be fixed by properly configuring the BIOS, it will still be
regardes as a regression by the user, which I don't think is
acceptable.
Currently everything works with the workaround, except kdump.
That's why I came up with a fix for [half of] kdump.
> > > 2) Remove those pages from mm subsystem since they are not seen any more
> > > though they have been added into mm subsystem, because CPU can't see
> > > them.
Could you explain what exactly you mean and how it would fix the
vmcore issue?
> > > 3) Remove the apreture region from /proc/iomem so that pages in that
> > > region can't be seen by kdump kernel. This is easier, but just a work
> > > around.
> >
> > I like this idea, but won't this cause pci_claim_resource() fail
> > after the call to pci_find_parent_resource() ? See my previous
> > mail.
>
> Not very sure, now have not time to investigate why it cause failure.
>
> I tried to find a system with GART in our lab, but failed. Those
> machines are too old
The vmcore problem is reproducible on modern AMD systems. This
has originally been reported by a customer running an "HP ProLiant
BL465c Gen8". I later reproduced it on a "ProLiant DL385p Gen8",
as well as many older systems.
So I believe kdump/vmcore needs fixing - it is broken on machines
sold today.
What became a bit "old" is the reason why we don't include the
"GART" region in the iomem resources - fix for a resource
conflict resulting in malfuncioning AGP hardware as reported in
bko#72201; see my original mail for the full info. But I don't
think we should break this old hardware again just to make the
kdump fix more beautiful.
> If have a easy fix, worth to have a try.
The fix for kexec_file_load-based kdump is easy and tested. I
don't insist on fixing the old kexec, because we don't use it on
x86_64. But I'm willing to help if you think it needs to be fixed
as well.
Regards,
--
Jiri Bohac <jbohac@xxxxxxx>
SUSE Labs, Prague, Czechia