Re: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge systems

From: Christian KÃnig
Date: Fri Dec 06 2019 - 03:57:41 EST


Am 04.12.19 um 17:08 schrieb Deucher, Alexander:
-----Original Message-----
From: Deucher, Alexander
Sent: Monday, December 2, 2019 11:37 AM
To: Lucas Stach <dev@xxxxxxxxxx>; Kai-Heng Feng
<kai.heng.feng@xxxxxxxxxxxxx>; joro@xxxxxxxxxx; Koenig, Christian
(Christian.Koenig@xxxxxxx) <Christian.Koenig@xxxxxxx>
Cc: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subject: RE: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge
systems

-----Original Message-----
From: Lucas Stach <dev@xxxxxxxxxx>
Sent: Sunday, December 1, 2019 7:43 AM
To: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>; joro@xxxxxxxxxx
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>;
iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: [PATCH v2] iommu/amd: Disable IOMMU on Stoney Ridge
systems

Am Freitag, den 29.11.2019, 22:21 +0800 schrieb Kai-Heng Feng:
Serious screen flickering when Stoney Ridge outputs to a 4K monitor.

According to Alex Deucher, IOMMU isn't enabled on Windows, so let's
do the same here to avoid screen flickering on 4K monitor.
This doesn't seem like a good solution, especially if there isn't a
method for the user to opt-out. Some users might prefer having the
IOMMU support to 4K display output.

But before using the big hammer of disabling or breaking one of those
features, we should take a look at what's the issue here. Screen
flickering caused by the IOMMU being active hints to the IOMMU not
being able to sustain the translation bandwidth required by the high-
bandwidth isochronous transfers caused by 4K scanout, most likely due
to insufficient TLB space.

As far as I know the framebuffer memory for the display buffers is
located in stolen RAM, and thus contigous in memory. I don't know the
details of the GPU integration on those APUs, but maybe there even is
a way to bypass the IOMMU for the stolen VRAM regions?

If there isn't and all GPU traffic passes through the IOMMU when
active, we should check if the stolen RAM is mapped with hugepages on
the IOMMU side. All the stolen RAM can most likely be mapped with a
few hugepage mappings, which should reduce IOMMU TLB demand by a
large margin.

The is no issue when we scan out of the carve out region. The issue occurs
when we scan out of regular system memory (scatter/gather). Many newer
laptops have very small carve out regions (e.g., 32 MB), so we have to use
regular system pages to support multiple high resolution displays. The
problem is, the latency gets too high at some point when the IOMMU is
involved. Huge pages would probably help in this case, but I'm not sure if
there is any way to guarantee that we get huge pages for system memory. I
guess we could use CMA or something like that.
Thomas recently sent out a patch set to add huge page support to ttm:
https://patchwork.freedesktop.org/series/70090/
We'd still need a way to guarantee huge pages for the display buffer.

That unfortunately won't help in this case since the TTM work Thomas is doing only affects the CPU page tables.

Additional to that we already allocate huge pages for the display buffer in a best effort manner and it doesn't seem to help.

If I understood the hardware guys correctly even transparent mode adds to much latency so that the display block might run into an underflow.

The only solution documented to work is to either disabling the IOMMU or not using scan-out from system memory.

Alex, we should probably kick of another internal discussion with the hardware guys about that.

Christian.


Alex

Alex

Regards,
Lucas

Cc: Alex Deucher <alexander.deucher@xxxxxxx>
Bug:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
tl

ab.freedesktop.org%2Fdrm%2Famd%2Fissues%2F961&amp;data=02%7C01%
7Calexa
nder.deucher%40amd.com%7C30540b2bf2be417c4d9508d7765bf07f%7C3dd
8961fe4
884e608e11a82d994e183d%7C0%7C0%7C637108010075463266&amp;sdata=1
ZIZUWos
cPiB4auOY10jlGzoFeWszYMDBQG0CtrrOO8%3D&amp;reserved=0
Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
---
v2:
- Find Stoney graphics instead of host bridge.

drivers/iommu/amd_iommu_init.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu_init.c
b/drivers/iommu/amd_iommu_init.c index 568c52317757..139aa6fdadda
100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2516,6 +2516,7 @@ static int __init early_amd_iommu_init(void)
struct acpi_table_header *ivrs_base;
acpi_status status;
int i, remap_cache_sz, ret = 0;
+ u32 pci_id;

if (!amd_iommu_detected)
return -ENODEV;
@@ -2603,6 +2604,16 @@ static int __init early_amd_iommu_init(void)
if (ret)
goto out;

+ /* Disable IOMMU if there's Stoney Ridge graphics */
+ for (i = 0; i < 32; i++) {
+ pci_id = read_pci_config(0, i, 0, 0);
+ if ((pci_id & 0xffff) == 0x1002 && (pci_id >> 16) == 0x98e4) {
+ pr_info("Disable IOMMU on Stoney Ridge\n");
+ amd_iommu_disabled = true;
+ break;
+ }
+ }
+
/* Disable any previously enabled IOMMUs */
if (!is_kdump_kernel() || amd_iommu_disabled)
disable_iommus();
@@ -2711,7 +2722,7 @@ static int __init state_next(void)
ret = early_amd_iommu_init();
init_state = ret ? IOMMU_INIT_ERROR :
IOMMU_ACPI_FINISHED;
if (init_state == IOMMU_ACPI_FINISHED &&
amd_iommu_disabled) {
- pr_info("AMD IOMMU disabled on kernel command-
line\n");
+ pr_info("AMD IOMMU disabled\n");
init_state = IOMMU_CMDLINE_DISABLED;
ret = -EINVAL;
}