Re: [PATCH v4 00/14] GMU-less A6xx support (A610, A619_holi)
From: Konrad Dybcio
Date: Tue Mar 28 2023 - 21:12:52 EST
On 14.03.2023 16:28, Konrad Dybcio wrote:
> v3 -> v4:
> - Drop the mistakengly-included and wrong A3xx-A5xx bindings changes
> - Improve bindings commit messages to better explain what GMU Wrapper is
> - Drop the A680 highest bank bit value adjustment patch
> - Sort UBWC config variables in a reverse-Christmass-tree fashion [4/14]
> - Don't alter any UBWC config values in [4/14]
> - Do so for a619_holi in [8/14]
> - Rebase on next-20230314 (shouldn't matter at all)
After Johan's recent runtime PM fix, this kinda broke..
When entering the error-fail-retry path (e.g. when not embedding
the firmware in initrd, then starting a DE and letting the kernel
get the fw from the root partition), the GPU does not wake up fully:
[ 24.744344] msm_dpu 5e01000.display-controller: [drm:adreno_wait_ring] *ERROR* timeout waiting for space in ringbuffer 0
[ 25.744343] [drm:a6xx_idle] *ERROR* A619: a6xx_hw_init: timeout waiting for GPU to idle: status 00800005 irq 00800000 rptr/wptr 12/12
[ 25.744401] msm_dpu 5e01000.display-controller: [drm:adreno_load_gpu] *ERROR* gpu hw init failed: -22
[ 25.744494] adreno 5900000.gpu: [drm:a6xx_irq] *ERROR* gpu fault ring 0 fence ffffff00 status 00800005 rb 000c/000c ib1 0000000000000000/0000 ib2 0000000000000000/0000
[ 25.744544] msm_dpu 5e01000.display-controller: [drm:recover_worker] *ERROR* A619: hangcheck recover!
Adding a random 1s sleep in hw_init() fixes it. Because of course it does.
Investigating that, merging this will be suboptimal until then..
Konrad
>
> v3: https://lore.kernel.org/r/20230223-topic-gmuwrapper-v3-0-5be55a336819@xxxxxxxxxx
>
> v2 -> v3:
> New dependencies:
> - https://lore.kernel.org/linux-arm-msm/20230223-topic-opp-v3-0-5f22163cd1df@xxxxxxxxxx/T/#t
> - https://lore.kernel.org/linux-arm-msm/20230120172233.1905761-1-konrad.dybcio@xxxxxxxxxx/
>
> Sidenote: A speedbin rework is in progress, the of_machine_is_compatible
> calls in A619_holi are ugly (but well, necessary..) but they'll be
> replaced with socid matching in this or the next kernel cycle.
>
> Due to the new way of identifying GMU wrapper GPUs, configuring 6350
> to use wrapper would cause the wrong fuse values to be checked, but that
> will be solved by the conversion + the ultimate goal is to use the GMU
> whenever possible with the wrapper left for GMU-less Adrenos and early
> bringup debugging of GMU-equipped ones.
>
> - Ship dt-bindings in this series as we're referencing the compatible now
>
> - "De-staticize" -> "remove static keyword" [3/15]
>
> - Track down all the values in [4/15]
>
> - Add many comments and explanations in [4/15]
>
> - Fix possible return-before-mutex-unlock [5/15]
>
> - Explain the GMU wrapper a bit more in the commit msg [5/15]
>
> - Separate out pm_resume/suspend for GMU-wrapper GPUs to make things
> cleaner [5/15]
>
> - Don't check if `info` exists, it has to at this point [5/15]
>
> - Assign gpu->info early and clean up following if statements in
> a6xx_gpu_init [5/15]
>
> - Determine whether we use GMU wrapper based on the GMU compatible
> instead of a quirk [5/15]
>
> - Use a struct field to annotate whether we're using gmu wrapper so
> that it can be assigned at runtime (turns out a619 holi-ness cannot
> be determined by patchid + that will make it easier to test out GMU
> GPUs without actually turning on the GMU if anybody wants to do so)
> [5/15]
>
> - Unconditionally hook up gx to the gmu wrapper (otherwise our gpu
> will not get power) [5/15]
>
> - Don't check for gx domain presence in gmu_wrapper paths, it's
> guaranteed [5/15]
>
> - Use opp set rate in the gmuwrapper suspend path [5/15]
>
> - Call opp functions on the GPU device and not on the DRM device of
> mdp4/5/DPU1 half the time (WHOOOOPS!) [5/15]
>
> - Disable the memory clock in a6xx_pm_suspend instead of enabling it
> (moderate oops) [5/15]
>
> - Call the forgotten clk_bulk_disable_unprepare in a6xx_pm_suspend [5/15]
>
> - Set rate to FMIN (a6xx really doesn't like rate=0 + that's what
> msm-5.x does anyway) before disabling core clock [5/15]
>
> - pm_runtime_get_sync -> pm_runtime_resume_and_get [5/15]
>
> - Don't annotate no cached BO support with a quirk, as A619_holi is
> merged into the A619 entry in the big const struct - this means
> that all GPUs operating in gmu wrapper configuration will be
> implicitly treated as if they didn't have this feature [7/15]
>
> - Drop OPP rate & icc related patches, they're a part of a separate
> series now; rebase on it
>
> - Clean up extra parentheses [8/15]
>
> - Identify A619_holi by checking the compatible of its GMU instead
> of patchlevel [8/15]
>
> - Drop "Fix up A6XX protected registers" - unnecessary, Rob will add
> a comment explaining why
>
> - Fix existing UBWC values for A680, new patch [10/15]
>
> - Use adreno_is_aXYZ macros in speedbin matching [13/15] - new patch
>
> v2: https://lore.kernel.org/linux-arm-msm/20230214173145.2482651-1-konrad.dybcio@xxxxxxxxxx/
>
> v1 -> v2:
> - Fix A630 values in [2/14]
> - Fix [6/14] for GMU-equipped GPUs
>
> Link to v1: https://lore.kernel.org/linux-arm-msm/20230126151618.225127-1-konrad.dybcio@xxxxxxxxxx/
>
> This series concludes my couple-weeks-long suffering of figuring out
> the ins and outs of the "non-standard" A6xx GPUs which feature no GMU.
>
> The GMU functionality is essentially emulated by parting out a
> "GMU wrapper" region, which is essentially just a register space
> within the GPU. It's modeled to be as similar to the actual GMU
> as possible while staying as unnecessary as we can make it - there's
> no IRQs, communicating with a microcontroller, no RPMh communication
> etc. etc. I tried to reuse as much code as possible without making
> a mess where every even line is used for GMU and every odd line is
> used for GMU wrapper..
>
> This series contains:
> - plumbing for non-GMU operation, if-ing out GMU calls based on
> GMU presence
> - GMU wrapper support
> - A610 support (w/ speedbin)
> - A619 support (w/ speedbin)
> - couple of minor fixes and improvements
> - VDDCX/VDDGX scaling fix for non-GMU GPUs (concerns more than just
> A6xx)
> - Enablement of opp interconnect properties
>
> A619_holi works perfectly fine using the already-present A619 support
> in mesa. A610 needs more work on that front, but can already replay
> command traces captures on downstream.
>
> NOTE: the "drm/msm/a6xx: Add support for A619_holi" patch contains
> two occurences of 0x18 used in place of a register #define, as it's
> supposed to be RBBM_GPR0_CNTL, but that will only be present after
> mesa-side changes are merged and headers are synced from there.
>
> Speedbin patches depend on:
> https://lore.kernel.org/linux-arm-msm/20230120172233.1905761-1-konrad.dybcio@xxxxxxxxxx/
>
> Signed-off-by: Konrad Dybcio <konrad.dybcio@xxxxxxxxxx>
> ---
> Konrad Dybcio (14):
> dt-bindings: display/msm: gpu: Document GMU wrapper-equipped A6xx
> dt-bindings: display/msm/gmu: Add GMU wrapper
> drm/msm/a6xx: Remove static keyword from sptprac en/disable functions
> drm/msm/a6xx: Extend and explain UBWC config
> drm/msm/a6xx: Introduce GMU wrapper support
> drm/msm/a6xx: Remove both GBIF and RBBM GBIF halt on hw init
> drm/msm/adreno: Disable has_cached_coherent in GMU wrapper configurations
> drm/msm/a6xx: Add support for A619_holi
> drm/msm/a6xx: Add A610 support
> drm/msm/a6xx: Fix some A619 tunables
> drm/msm/a6xx: Use "else if" in GPU speedbin rev matching
> drm/msm/a6xx: Use adreno_is_aXYZ macros in speedbin matching
> drm/msm/a6xx: Add A619_holi speedbin support
> drm/msm/a6xx: Add A610 speedbin support
>
> .../devicetree/bindings/display/msm/gmu.yaml | 49 +-
> .../devicetree/bindings/display/msm/gpu.yaml | 57 ++-
> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 57 ++-
> drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 2 +
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 494 ++++++++++++++++++---
> drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 +
> drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 14 +-
> drivers/gpu/drm/msm/adreno/adreno_device.c | 17 +-
> drivers/gpu/drm/msm/adreno/adreno_gpu.h | 33 +-
> 9 files changed, 632 insertions(+), 92 deletions(-)
> ---
> base-commit: 647ef0d33d52a103b50469d7109b63d453686c11
> change-id: 20230223-topic-gmuwrapper-b4fff5fd7789
>
> Best regards,