[PATCH AUTOSEL 5.14 008/252] drm/amdgpu: Fix koops when accessing RAS EEPROM

From: Sasha Levin
Date: Thu Sep 09 2021 - 07:41:53 EST


From: Luben Tuikov <luben.tuikov@xxxxxxx>

[ Upstream commit 1d9d2ca85b32605ac9c74c8fa42d0c1cfbe019d4 ]

Debugfs RAS EEPROM files are available when
the ASIC supports RAS, and when the debugfs is
enabled, an also when "ras_enable" module
parameter is set to 0. However in this case,
we get a kernel oops when accessing some of
the "ras_..." controls in debugfs. The reason
for this is that struct amdgpu_ras::adev is
unset. This commit sets it, thus enabling access
to those facilities. Note that this facilitates
EEPROM access and not necessarily RAS features or
functionality.

Cc: Alexander Deucher <Alexander.Deucher@xxxxxxx>
Cc: John Clements <john.clements@xxxxxxx>
Cc: Hawking Zhang <Hawking.Zhang@xxxxxxx>
Signed-off-by: Luben Tuikov <luben.tuikov@xxxxxxx>
Acked-by: Alexander Deucher <Alexander.Deucher@xxxxxxx>
Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index fc66aca28594..95d5842385b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1966,11 +1966,20 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
bool exc_err_limit = false;
int ret;

- if (adev->ras_enabled && con)
- data = &con->eh_data;
- else
+ if (!con)
+ return 0;
+
+ /* Allow access to RAS EEPROM via debugfs, when the ASIC
+ * supports RAS and debugfs is enabled, but when
+ * adev->ras_enabled is unset, i.e. when "ras_enable"
+ * module parameter is set to 0.
+ */
+ con->adev = adev;
+
+ if (!adev->ras_enabled)
return 0;

+ data = &con->eh_data;
*data = kmalloc(sizeof(**data), GFP_KERNEL | __GFP_ZERO);
if (!*data) {
ret = -ENOMEM;
@@ -1980,7 +1989,6 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
mutex_init(&con->recovery_lock);
INIT_WORK(&con->recovery_work, amdgpu_ras_do_recovery);
atomic_set(&con->in_recovery, 0);
- con->adev = adev;

max_eeprom_records_len = amdgpu_ras_eeprom_get_record_max_length();
amdgpu_ras_validate_threshold(adev, max_eeprom_records_len);
--
2.30.2