Re: [PATCH 1/3] iommu/vt-d: skip RMRR entries that fail the sanity check

From: Barret Rhoden
Date: Mon Dec 23 2019 - 15:28:02 EST


On 12/17/19 2:19 PM, Chen, Yian wrote:
Regardless, I have two other patches in this series that could resolve the problem for me and probably other people. I'd just like at least one of the three patches to get merged so that my machine boots when the original commit f036c7fa0ab6 ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved") gets released.

when a firmware bug appears, the potential problem may beyond the scope of its visible impacts so that introducing a workaround in official implementation should be considered very carefully.

Agreed. I think that in the RMRR case, it wouldn't surprise me if these problems are already occurring, and we just didn't know about it, so I'd like to think about sane workarounds. I only noticed it on a kexec. Not sure how many people with similarly-broken firmware are kexecing kernels on linus/master kernels yet.

Specifically, my firmware reports an RMRR with start == 0 and end == 0 (end should be page-aligned-minus-one). The only reason commit f036c7fa0ab6 didn't catch it on a full reboot is that trim_bios_range() reserved the first page, assuming that the BIOS meant to reserve it but just didn't tell us in the e820 map. My firmware didn't mark that first page E820_RESERVED. On a kexec, the range that got trimmed was 0x100-0xfff instead of 0x000-0xfff. In both cases, the kernel won't use the region the broken RMRR points to, but in the kexec case, it wasn't E820_RESERVED, so the new commit aborted the DMAR setup.

If the workaround is really needed at this point, I would recommend adding a WARN_TAINT with TAINT_FIRMWARE_WORKAROUND, to tell the workaround is in the place.

Sounds good. I can rework the patchset so that whenever I skip an RMRR entry or whatnot, I'll put in a WARN_TAINT. I see a few other examples in dmar.c to work from.

If any of the three changes are too aggressive, I'm OK with you all taking just one of them. I'd like to be able to kexec with the new kernel. I'm likely not the only one with bad firmware, and any bug that only shows up on a kexec often a pain to detect.

Thanks,

Barret