Re: [PATCH 1/3] iommu/vt-d: skip RMRR entries that fail the sanity check
From: Barret Rhoden
Date: Mon Dec 23 2019 - 15:28:02 EST
On 12/17/19 2:19 PM, Chen, Yian wrote:
Regardless, I have two other patches in this series that could resolve
the problem for me and probably other people. I'd just like at least
one of the three patches to get merged so that my machine boots when
the original commit f036c7fa0ab6 ("iommu/vt-d: Check VT-d RMRR region
in BIOS is reported as reserved") gets released.
when a firmware bug appears, the potential problem may beyond the scope
of its visible impacts so that introducing a workaround in official
implementation should be considered very carefully.
Agreed. I think that in the RMRR case, it wouldn't surprise me if these
problems are already occurring, and we just didn't know about it, so I'd
like to think about sane workarounds. I only noticed it on a kexec.
Not sure how many people with similarly-broken firmware are kexecing
kernels on linus/master kernels yet.
Specifically, my firmware reports an RMRR with start == 0 and end == 0
(end should be page-aligned-minus-one). The only reason commit
f036c7fa0ab6 didn't catch it on a full reboot is that trim_bios_range()
reserved the first page, assuming that the BIOS meant to reserve it but
just didn't tell us in the e820 map. My firmware didn't mark that first
page E820_RESERVED. On a kexec, the range that got trimmed was
0x100-0xfff instead of 0x000-0xfff. In both cases, the kernel won't use
the region the broken RMRR points to, but in the kexec case, it wasn't
E820_RESERVED, so the new commit aborted the DMAR setup.
If the workaround is really needed at this point, I would recommend
adding a WARN_TAINT with TAINT_FIRMWARE_WORKAROUND, to tell the
workaround is in the place.
Sounds good. I can rework the patchset so that whenever I skip an RMRR
entry or whatnot, I'll put in a WARN_TAINT. I see a few other examples
in dmar.c to work from.
If any of the three changes are too aggressive, I'm OK with you all
taking just one of them. I'd like to be able to kexec with the new
kernel. I'm likely not the only one with bad firmware, and any bug that
only shows up on a kexec often a pain to detect.
Thanks,
Barret