Re: [PATCH 1/3] iommu/vt-d: skip RMRR entries that fail the sanity check

From: Chen, Yian
Date: Tue Dec 17 2019 - 14:19:32 EST




On 12/16/2019 11:35 AM, Barret Rhoden wrote:
On 12/16/19 2:07 PM, Chen, Yian wrote:


On 12/11/2019 11:46 AM, Barret Rhoden wrote:
RMRR entries describe memory regions that are DMA targets for devices
outside the kernel's control.

RMRR entries that fail the sanity check are pointing to regions of
memory that the firmware did not tell the kernel are reserved or
otherwise should not be used.

Instead of aborting DMAR processing, this commit skips these RMRR
entries. They will not be mapped into the IOMMU, but the IOMMU can
still be utilized. If anything, when the IOMMU is on, those devices
will not be able to clobber RAM that the kernel has allocated from those
regions.

Signed-off-by: Barret Rhoden <brho@xxxxxxxxxx>
---
 drivers/iommu/intel-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index f168cd8ee570..f7e09244c9e4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4316,7 +4316,7 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)
ÂÂÂÂÂ rmrr = (struct acpi_dmar_reserved_memory *)header;
ÂÂÂÂÂ ret = arch_rmrr_sanity_check(rmrr);
ÂÂÂÂÂ if (ret)
-ÂÂÂÂÂÂÂ return ret;
+ÂÂÂÂÂÂÂ return 0;
ÂÂÂÂÂ rmrru = kzalloc(sizeof(*rmrru), GFP_KERNEL);
ÂÂÂÂÂ if (!rmrru)
Parsing rmrr function should report the error to caller. The behavior to response the error can be
chose by the caller in the calling stack, for example, dmar_walk_remapping_entries().
A concern is that ignoring a detected firmware bug might have a potential side impact though
it seemed safe for your case.

That's a little difficult given the current code. Once we are in
dmar_walk_remapping_entries(), the specific function (parse_one_rmrr) is called via callback:

ÂÂÂÂret = cb->cb[iter->type](iter, cb->arg[iter->type]);
ÂÂÂÂif (ret)
ÂÂÂÂÂÂÂ return ret;

If there's an error of any sort, it aborts the walk. Handling the specific errors here is difficult, since we don't know what the errors mean to the specific callback. Is there some errno we can use that means "there was a problem, but it's not so bad that you have to abort, but I figured you ought to know"? Not that I think that's a good idea.

The knowledge of whether or not a specific error is worth aborting all DMAR functionality is best known inside the specific callback. The only handling to do is print a warning and either skip it or abort.

I think skipping the entry for a bad RMRR is better than aborting completely, though I understand if people don't like that. It's debatable. By aborting, we lose the ability to use the IOMMU at all, but we are still in a situation where the devices using the RMRR regions might be clobbering kernel memory, right? Using the IOMMU (with no mappings for the bad RMRRs) would stop those devices from clobbering memory.

Regardless, I have two other patches in this series that could resolve the problem for me and probably other people. I'd just like at least one of the three patches to get merged so that my machine boots when the original commit f036c7fa0ab6 ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved") gets released.

when a firmware bug appears, the potential problem may beyond the scope of its visible impacts so that introducing a workaround in official implementation should be considered very carefully.

If the workaround is really needed at this point, I would recommend adding a WARN_TAINT with TAINT_FIRMWARE_WORKAROUND, to tell the workaround is in the place.

Thanks
Yian

Thanks,

Barret