Re: [PATCH] drm/amdkfd: dqm fence memory corruption

From: Qu Huang
Date: Fri Mar 26 2021 - 06:10:21 EST


On 2021/1/28 5:50, Felix Kuehling wrote:
Am 2021-01-27 um 7:33 a.m. schrieb Qu Huang:
Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.

Thank you for pointing out that discrepancy. That's a good catch!

I'd prefer to fix this properly by making dqm->fence_addr a u64 pointer.
We should probably also fix up the query_status and
amdkfd_fence_wait_timeout function interfaces to use a 64 bit fence
values everywhere to be consistent.

Regards,
  Felix
Hi Felix, Thanks for your advice, please check v2 at https://lore.kernel.org/patchwork/patch/1372584/
Thanks,
Qu.



Signed-off-by: Qu Huang <jinsdb@xxxxxxx>
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e686ce2..8b38d0c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1161,7 +1161,7 @@ static int start_cpsch(struct device_queue_manager *dqm)
pr_debug("Allocating fence memory\n");
/* allocate fence memory on the gart */
- retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(*dqm->fence_addr),
+ retval = kfd_gtt_sa_allocate(dqm->dev, sizeof(uint64_t),
&dqm->fence_mem);
if (retval)