Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()

From: Qian Cai
Date: Tue Apr 07 2020 - 11:36:11 EST




> On Apr 6, 2020, at 10:12 PM, Qian Cai <cai@xxxxxx> wrote:
>
> fetch_pte() could race with increase_address_space() because it held no
> lock from iommu_unmap_page(). On the CPU that runs fetch_pte() it could
> see a stale domain->pt_root and a new increased domain->mode from
> increase_address_space(). As the result, it could trigger invalid
> accesses later on. Fix it by using a pair of smp_[w|r]mb in those
> places.

After further testing, the change along is insufficient. What I am chasing right
now is the swap device will go offline after heavy memory pressure below. The
symptom is similar to what we have in the commit,

754265bcab78 (âiommu/amd: Fix race in increase_address_space()â)

Apparently, it is no possible to take the domain->lock in fetch_pte() because it
could sleep.

Thoughts?

[ 7292.727377][ T814] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xffffffffffd80000 flags=0x0010]
[ 7292.740571][ T814] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xffffffffffd81000 flags=0x0010]
[ 7292.753268][ T814] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xffffffffffd81900 flags=0x0010]
[ 7292.766078][ T814] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xffffffffffd81d00 flags=0x0010]
[ 7292.778447][ T814] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xffffffffffd82000 flags=0x0010]
[ 7292.790724][ T814] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xffffffffffd82400 flags=0x0010]
[ 7292.803195][ T814] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xffffffffffd82800 flags=0x0010]
[ 7292.815664][ T814] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xffffffffffd82c00 flags=0x0010]
[ 7292.828089][ T814] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xffffffffffd83000 flags=0x0010]
[ 7292.840468][ T814] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xffffffffffd83400 flags=0x0010]
[ 7292.852795][ T814] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x0027 address=0xffffffffffd83800 flags=0x0010]
[ 7292.864566][ T814] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x0027 address=0xffffffffffd83c00 flags=0x0010]
[ 7326.583908][ C8] smartpqi 0000:23:00.0: controller is offline: status code 0x14803
[ 7326.592386][ C8] smartpqi 0000:23:00.0: controller offline
[ 7326.663728][ C66] sd 0:1:0:0: [sda] tag#467 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=6s
[ 7326.664321][ T2738] smartpqi 0000:23:00.0: resetting scsi 0:1:0:0
[ 7326.673849][ C66] sd 0:1:0:0: [sda] tag#467 CDB: opcode=0x28 28 00 02 ee 2e 60 00 00 08 00
[ 7326.680118][ T2738] smartpqi 0000:23:00.0: reset of scsi 0:1:0:0: FAILED
[ 7326.688612][ C66] blk_update_request: I/O error, dev sda, sector 49163872 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 7326.695456][ T2738] sd 0:1:0:0: Device offlined - not ready after error recovery
[ 7326.706632][ C66] Read-error on swap-device (254:1:45833824)
[ 7326.714030][ T2738] sd 0:1:0:0: Device offlined - not ready after error recovery
[ 7326.723382][T45523] sd 0:1:0:0: rejecting I/O to offline device
[ 7326.727352][ T2738] sd 0:1:0:0: Device offlined - not ready after error recovery
[ 7326.727379][ T2738] sd 0:1:0:0: Device offlined - not ready after error recovery
[ 7326.727442][ T2738] sd 0:1:0:0: Device offlined - not ready after error recovery

>
> kernel BUG at drivers/iommu/amd_iommu.c:1704!
> BUG_ON(unmapped && !is_power_of_2(unmapped));
> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> RIP: 0010:amd_iommu_unmap+0x1b2/0x1d0
> Call Trace:
> <IRQ>
> __iommu_unmap+0x106/0x320
> iommu_unmap_fast+0xe/0x10
> __iommu_dma_unmap+0xdc/0x1a0
> iommu_dma_unmap_sg+0xae/0xd0
> scsi_dma_unmap+0xe7/0x150
> pqi_raid_io_complete+0x37/0x60 [smartpqi]
> pqi_irq_handler+0x1fc/0x13f0 [smartpqi]
> __handle_irq_event_percpu+0x78/0x4f0
> handle_irq_event_percpu+0x70/0x100
> handle_irq_event+0x5a/0x8b
> handle_edge_irq+0x10c/0x370
> do_IRQ+0x9e/0x1e0
> common_interrupt+0xf/0xf
> </IRQ>
>
> Signed-off-by: Qian Cai <cai@xxxxxx>
> ---
> drivers/iommu/amd_iommu.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 20cce366e951..22328a23335f 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -1434,6 +1434,11 @@ static bool increase_address_space(struct protection_domain *domain,
> *pte = PM_LEVEL_PDE(domain->mode,
> iommu_virt_to_phys(domain->pt_root));
> domain->pt_root = pte;
> + /*
> + * Make sure fetch_pte() will see the new domain->pt_root before it
> + * snapshots domain->mode.
> + */
> + smp_wmb();
> domain->mode += 1;
>
> ret = true;
> @@ -1460,6 +1465,8 @@ static u64 *alloc_pte(struct protection_domain *domain,
> *updated = increase_address_space(domain, address, gfp) || *updated;
>
> level = domain->mode - 1;
> + /* To pair with smp_wmb() in increase_address_space(). */
> + smp_rmb();
> pte = &domain->pt_root[PM_LEVEL_INDEX(level, address)];
> address = PAGE_SIZE_ALIGN(address, page_size);
> end_lvl = PAGE_SIZE_LEVEL(page_size);
> @@ -1545,6 +1552,8 @@ static u64 *fetch_pte(struct protection_domain *domain,
> return NULL;
>
> level = domain->mode - 1;
> + /* To pair with smp_wmb() in increase_address_space(). */
> + smp_rmb();
> pte = &domain->pt_root[PM_LEVEL_INDEX(level, address)];
> *page_size = PTE_LEVEL_PAGE_SIZE(level);
>
> --
> 2.21.0 (Apple Git-122.2)
>