Re: [RFC PATCH] iommu/amd: fix a race in fetch_pte()
From: Qian Cai
Date: Mon Apr 20 2020 - 09:26:17 EST
> On Apr 18, 2020, at 2:34 PM, Joerg Roedel <joro@xxxxxxxxxx> wrote:
>
> On Sat, Apr 18, 2020 at 09:01:35AM -0400, Qian Cai wrote:
>> Hard to tell without testing further. Iâll leave that optimization in
>> the future, and focus on fixing those races first.
>
> Yeah right, we should fix the existing races first before introducing
> new ones ;)
>
> Btw, THANKS A LOT for tracking down all these race condition bugs, I am
> not even remotely able to trigger them with the hardware I have around.
>
> I did some hacking and the attached diff shows how I think this race
> condition needs to be fixed. I boot-tested this fix on-top of v5.7-rc1,
> but did no further testing. Can you test it please?
No dice. There could be some other races. For example,
> @@ -1536,16 +1571,19 @@ static u64 *fetch_pte(struct protection_domain *domain,
> unsigned long address,
> unsigned long *page_size)
...
> amd_iommu_domain_get_pgtable(domain, &pgtable);
>
> if (address > PM_LEVEL_SIZE(pgtable.mode))
> return NULL;
>
> level = pgtable.mode - 1;
> pte = &pgtable.root[PM_LEVEL_INDEX(level, address)];
<â increase_address_space()
> *page_size = PTE_LEVEL_PAGE_SIZE(level);
>
while (level > 0) {
*page_size = PTE_LEVEL_PAGE_SIZE(level);
Then in iommu_unmap_page(),
while (unmapped < page_size) {
pte = fetch_pte(dom, bus_addr, &unmap_size);
â
bus_addr = (bus_addr & ~(unmap_size - 1)) + unmap_size;
bus_addr would be unsync with dom->mode when it enter fetch_pte() again.
Could that be a problem?
[ 5159.274829][ T4057] LTP: starting oom02
[ 5160.382787][ C52] perf: interrupt took too long (7443 > 6208), lowering kernel.perf_event_max_sample_rate to 26800
[ 5167.951785][ T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc0000 flags=0x0010]
[ 5167.964540][ T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc1000 flags=0x0010]
[ 5167.977442][ T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc1900 flags=0x0010]
[ 5167.989901][ T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc1d00 flags=0x0010]
[ 5168.002291][ T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc2000 flags=0x0010]
[ 5168.014665][ T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc2400 flags=0x0010]
[ 5168.027132][ T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc2800 flags=0x0010]
[ 5168.039566][ T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc2c00 flags=0x0010]
[ 5168.051956][ T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc3000 flags=0x0010]
[ 5168.064310][ T812] smartpqi 0000:23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0027 address=0xfffffffffffc3400 flags=0x0010]
[ 5168.076652][ T812] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x0027 address=0xfffffffffffc3800 flags=0x0010]
[ 5168.088290][ T812] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x0027 address=0xfffffffffffc3c00 flags=0x0010]
[ 5183.695829][ C8] smartpqi 0000:23:00.0: controller is offline: status code 0x14803
[ 5183.704390][ C8] smartpqi 0000:23:00.0: controller offline
[ 5183.756594][ C101] blk_update_request: I/O error, dev sda, sector 22306304 op 0x1:(WRITE) flags 0x8000000 phys_seg 4 prio class 0
[ 5183.756628][ C34] sd 0:1:0:0: [sda] tag#655 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.756759][ C56] blk_update_request: I/O error, dev sda, sector 58480128 op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.756810][ C79] sd 0:1:0:0: [sda] tag#234 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.756816][ C121] sd 0:1:0:0: [sda] tag#104 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.756837][ T53] sd 0:1:0:0: [sda] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.756882][ C121] sd 0:1:0:0: [sda] tag#104 CDB: opcode=0x2a 2a 00 00 4d d4 00 00 02 00 00
[ 5183.756892][ C79] sd 0:1:0:0: [sda] tag#234 CDB: opcode=0x2a 2a 00 02 03 e4 00 00 02 00 00
[ 5183.756909][ C121] blk_update_request: I/O error, dev sda, sector 5100544 op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.756920][ C79] blk_update_request: I/O error, dev sda, sector 33809408 op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.756939][ T53] sd 0:1:0:0: [sda] tag#4 CDB: opcode=0x2a 2a 00 02 4b f8 00 00 02 00 00
[ 5183.756967][ T53] blk_update_request: I/O error, dev sda, sector 38533120 op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.756989][ C49] blk_update_request: I/O error, dev sda, sector 30181376 op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.757045][ C51] sd 0:1:0:0: [sda] tag#452 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.757107][ C51] sd 0:1:0:0: [sda] tag#452 CDB: opcode=0x2a 2a 00 02 95 06 00 00 02 00 00
[ 5183.757125][ C51] blk_update_request: I/O error, dev sda, sector 43320832 op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.757199][ C82] blk_update_request: I/O error, dev sda, sector 10187776 op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.757209][ C109] blk_update_request: I/O error, dev sda, sector 29812736 op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.757215][ C77] sd 0:1:0:0: [sda] tag#849 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.757222][ C110] sd 0:1:0:0: [sda] tag#558 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.757237][ C92] blk_update_request: I/O error, dev sda, sector 6410240 op 0x1:(WRITE) flags 0x8004000 phys_seg 4 prio class 0
[ 5183.757244][ C91] sd 0:1:0:0: [sda] tag#73 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.757251][ C68] sd 0:1:0:0: [sda] tag#416 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.757458][ C77] sd 0:1:0:0: [sda] tag#849 CDB: opcode=0x2a 2a 00 02 78 a4 00 00 02 00 00
[ 5183.757467][ C110] sd 0:1:0:0: [sda] tag#558 CDB: opcode=0x2a 2a 00 03 58 94 00 00 02 00 00
[ 5183.757515][ C122] sd 0:1:0:0: [sda] tag#747 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 cmd_age=15s
[ 5183.757525][ C68] sd 0:1:0:0: [sda] tag#416 CDB: opcode=0x2a 2a 00 01 0e 32 00 00 02 00 00
[ 5183.757536][ C91] sd 0:1:0:0: [sda] tag#73 CDB: opcode=0x2a 2a 00 01 a2 86 00 00 02 00 00
[ 5183.757727][ C122] sd 0:1:0:0: [sda] tag#747 CDB: opcode=0x2a 2a 00 02 a7 24 00 00 02 00 00
[ 5183.758530][ T53] Write-error on swap-device (254:1:64823296)
[ 5183.758758][ T53] Write-error on swap-device (254:1:35201024)
[ 5183.758811][ C105] Write-error on swap-device (254:1:52690944)
[ 5183.758959][ C82] Write-error on swap-device (254:1:6856704)