[+cc Christian, Xinhui, amd-gfx]
On Fri, Jan 06, 2023 at 01:48:11PM +0800, Baolu Lu wrote:
On 1/5/23 11:27 PM, Felix Kuehling wrote:Is this stalled? We explored the idea of changing the PCI core so
Am 2023-01-05 um 09:46 schrieb Deucher, Alexander:Thanks for the explanation.
Agreed. This applies to GPU computing on some older AMD APUs that-----Original Message-----The GPU driver uses the pasid for shared virtual memory between
From: Hegde, Vasant <Vasant.Hegde@xxxxxxx>
On 1/5/2023 4:07 PM, Baolu Lu wrote:
On 2023/1/5 18:27, Vasant Hegde wrote:
On 1/5/2023 6:39 AM, Matt Fagnani wrote:So do you mind telling why does the PASID need to be enabled
I built 6.2-rc2 with the patch applied. The same blackLooking into lspci output, it doesn't list ACS feature
screen problem happened with 6.2-rc2 with the patch. I
tried to use early kdump with 6.2-rc2 with the patch
twice by panicking the kernel with sysrq+alt+c after the
black screen happened. The system rebooted after about
10-20 seconds both times, but no kdump and dmesg files
were saved in /var/crash. I'm attaching the lspci -vvv
output as requested. ...
for Graphics card. So with your fix it didn't enable PASID
and hence it failed to boot. ...
for the graphic device? Or in another word, what does the
graphic driver use the PASID for? ...
the CPU and GPU. I.e., so that the user apps can use the same
virtual address space on the GPU and the CPU. It also uses
pasid to take advantage of recoverable device page faults using
PRS. ...
take advantage of memory coherence and IOMMUv2 address translation
to create a shared virtual address space between the CPU and GPU.
In this case it seems to be a Carrizo APU. It is also true for
Raven APUs. ...
This is actually the problem that commit 201007ef707a was trying to
fix. The PCIe fabric routes Memory Requests based on the TLP
address, ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with
PASID that should go upstream to the IOMMU may instead be routed as
a P2P Request if its address falls in a bridge window.
In SVA case, the IOMMU shares the address space of a user
application. The user application side has no knowledge about the
PCI bridge window. It is entirely possible that the device is
programed with a P2P address and results in a disaster.
that for devices that use ATS/PRI, we could enable PASID without
checking for ACS [1], but IIUC we ultimately concluded that it was
based on a misunderstanding of how ATS Translation Requests are routed
and that an AMD driver change would be required [2].
So it seems like we still have this regression, and we're running out
of time before v6.2.
[1] https://lore.kernel.org/all/20230114073420.759989-1-baolu.lu@xxxxxxxxxxxxxxx/
[2] https://lore.kernel.org/all/Y91X9MeCOsa67CC6@xxxxxxxxxx/