Re: [PATCH] drm/msm/adreno: De-spaghettify the use of memory barriers

From: Konrad Dybcio
Date: Tue Jun 04 2024 - 13:36:00 EST




On 5/14/24 20:38, Akhil P Oommen wrote:
On Wed, May 08, 2024 at 07:46:31PM +0200, Konrad Dybcio wrote:
Memory barriers help ensure instruction ordering, NOT time and order
of actual write arrival at other observers (e.g. memory-mapped IP).
On architectures employing weak memory ordering, the latter can be a
giant pain point, and it has been as part of this driver.

Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
readl/writel, which include r/w (respectively) barriers.

Replace the barriers with a readback that ensures the previous writes
have exited the write buffer (as the CPU must flush the write to the
register it's trying to read back) and subsequently remove the hack
introduced in commit b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt
status in hw_init").

Fixes: b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt status in hw_init")
Signed-off-by: Konrad Dybcio <konrad.dybcio@xxxxxxxxxx>
---
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 5 ++---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 ++++----------
2 files changed, 6 insertions(+), 13 deletions(-)

I prefer this version compared to the v2. A helper routine is
unnecessary here because:
1. there are very few scenarios where we have to read back the same
register.
2. we may accidently readback a write only register.

Which would still trigger an address dependency on the CPU, no?



diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 0e3dfd4c2bc8..4135a53b55a7 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -466,9 +466,8 @@ static int a6xx_rpmh_start(struct a6xx_gmu *gmu)
int ret;
u32 val;
- gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1);
- /* Wait for the register to finish posting */
- wmb();
+ gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1));
+ gmu_read(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ);

This is unnecessary because we are polling on a register on the same port below. But I think we
can replace "wmb()" above with "mb()" to avoid reordering between read
and write IO instructions.

Ok on the dropping readback part

+ AFAIU from Will's response, we can drop the barrier as well


ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_RSCC_CONTROL_ACK, val,
val & (1 << 1), 100, 10000);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 973872ad0474..0acbc38b8e70 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1713,22 +1713,16 @@ static int hw_init(struct msm_gpu *gpu)
}
/* Clear GBIF halt in case GX domain was not collapsed */
+ gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);

We need a full barrier here to avoid reordering. Also, lets add a
comment about why we are doing this odd looking sequence.

+ gpu_read(gpu, REG_A6XX_GBIF_HALT);
if (adreno_is_a619_holi(adreno_gpu)) {
- gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0);
- /* Let's make extra sure that the GPU can access the memory.. */
- mb();

We need a full barrier here.

+ gpu_read(gpu, REG_A6XX_RBBM_GPR0_CNTL);
} else if (a6xx_has_gbif(adreno_gpu)) {
- gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
- /* Let's make extra sure that the GPU can access the memory.. */
- mb();

We need a full barrier here.

Not sure we do between REG_A6XX_GBIF_HALT & REG_A6XX_RBBM_(GBIF_HALT/GPR0_CNTL),
but I suppose keeping the one after REG_A6XX_RBBM_(GBIF_HALT/GPR0_CNTL) makes
sense to avoid the possibility of configuring the GPU before it can access DRAM..


+ gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT);
}
- /* Some GPUs are stubborn and take their sweet time to unhalt GBIF! */
- if (adreno_is_a7xx(adreno_gpu) && a6xx_has_gbif(adreno_gpu))
- spin_until(!gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK));
-

Why is this removed?

Because it was a hack in the first place and the enforcement of GBIF
unhalt requests coming through before proceeding further removes the
necessity to check this (unless there's some hw-mandated delay we should
keep in mind, but kgsl doesn't have that and there doesn't seem to be
any from testing on 8[456]50).

Konrad