Re: [RFC] drm/msm: Add UABI to request perfcntr usage
From: Akhil P Oommen
Date: Mon Dec 16 2024 - 12:03:14 EST
On 12/13/2024 10:40 PM, Antonino Maniscalco wrote:
> On 12/13/24 5:50 PM, Akhil P Oommen wrote:
>> On 12/12/2024 9:44 PM, Antonino Maniscalco wrote:
>>> On 12/12/24 4:58 PM, Akhil P Oommen wrote:
>>>> On 12/5/2024 10:24 PM, Rob Clark wrote:
>>>>> From: Rob Clark <robdclark@xxxxxxxxxxxx>
>>>>>
>>>>> Performance counter usage falls into two categories:
>>>>>
>>>>> 1. Local usage, where the counter configuration, start, and end read
>>>>> happen within (locally to) a single SUBMIT. In this case,
>>>>> there is
>>>>> no dependency on counter configuration or values between submits,
>>>>> and
>>>>> in fact counters are normally cleared on context switches,
>>>>> making it
>>>>> impossible to rely on cross-submit state.
>>>>>
>>>>> 2. Global usage, where a single privilaged daemon/process is sampling
>>>>> counter values across all processes for profiling.
>>>>>
>>>>> The two categories are mutually exclusive. While you can have many
>>>>> processes making local counter usage, you cannot combine global and
>>>>> local usage without the two stepping on each others feet (by changing
>>>>> counter configuration).
>>>>>
>>>>> For global counter usage, there is already a SYSPROF param (since
>>>>> global
>>>>> counter usage requires disabling counter clearing on context switch).
>>>>> This patch adds a REQ_CNTRS param to request local counter usage. If
>>>>> one or more processes has requested counter usage, then a SYSPROF
>>>>> request will fail with -EBUSY. And if SYSPROF is active, then
>>>>> REQ_CNTRS
>>>>> will fail with -EBUSY, maintaining the mutual exclusivity.
>>>>>
>>>>> This is purely an advisory interface to help coordinate userspace.
>>>>> There is no real means of enforcement, but the worst that can
>>>>> happen if
>>>>> userspace ignores a REQ_CNTRS failure is that you'll get nonsense
>>>>> profiling data.
>>>>>
>>>>> Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx>
>>>>> ---
>>>>> kgsl takes a different approach, which involves a lot more UABI for
>>>>> assigning counters to different processes. But I think by taking
>>>>> advantage of the fact that mesa (freedreno+turnip) reconfigure the
>>>>> counters they need in each SUBMIT, for their respective gl/vk perf-
>>>>> counter extensions, we can take this simpler approach.
>>>>
>>>> KGSL's approach is preemption and ifpc safe (also whatever HW changes
>>>> that will come up in future generations). How will we ensure that here?
>>>>
>>>> I have plans to bring up IFPC support in near future. Also, I
>>>> brought up
>>>> this point during preemption series. But from the responses, I felt
>>>> that
>>>> profiling was not considered a serious usecase. Still I wonder how the
>>>> perfcounter extensions work accurately with preemption.
>>>
>>> So back then I implemented the postamble IB to clear perf counters and
>>> that gets disabled when sysprof (so global usage) is happening. The
>>> kernel is oblivious to "Local isage" of profiling but in that case
>>> really what we want to do is disable preemption which in my
>>> understanding can be done from userspace with a PKT. In my understanding
>>> this had us covered for all usecases.
>>
>> I think this wasn't mentioned at that time. Which UMD PKT in a6x+ did
>> you mean?
>
> Ah, I thought it wasmentioned, sorry.
> The packet I was referring to is:
> <doc> Make next dword 1 to disable preemption, 0 to re-enable it. </
> doc>
> <value name="CP_PREEMPT_DISABLE" value="0x6c" variants="A6XX"/>
Ah! Okay. I think this packet is not used by the downstream blob. IMO,
disabling preemption is still a suboptimal solution.
>
> BTW you mentioned wanting to look into IFPC. Since I too wanted to look
> into implementing it wonder if you could let me know when you planned on
> working on it.
I have few patches in progress. Nothing final yet and need verification
on the hw side. Also, I need to do some housekeeping here to debug gmu
issues since IFPC increases the probability of those a lot.
I will try to send out the patches very soon.
-Akhil.
>
>>
>> -Akhil.
>>
>>>
>>> So what would you expect instead we should do kernel side to make
>>> profiling preemption safe?
>>>
>>>>
>>>> -Akhil
>>>>
>>>>>
>>>>> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +
>>>>> drivers/gpu/drm/msm/msm_drv.c | 5 ++-
>>>>> drivers/gpu/drm/msm/msm_gpu.c | 1 +
>>>>> drivers/gpu/drm/msm/msm_gpu.h | 29 +++++++++++++-
>>>>> drivers/gpu/drm/msm/msm_submitqueue.c | 52 +++++++++++++++++++
>>>>> +++++-
>>>>> include/uapi/drm/msm_drm.h | 1 +
>>>>> 6 files changed, 85 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/
>>>>> drm/msm/adreno/adreno_gpu.c
>>>>> index 31bbf2c83de4..f688e37059b8 100644
>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
>>>>> @@ -441,6 +441,8 @@ int adreno_set_param(struct msm_gpu *gpu, struct
>>>>> msm_file_private *ctx,
>>>>> if (!capable(CAP_SYS_ADMIN))
>>>>> return UERR(EPERM, drm, "invalid permissions");
>>>>> return msm_file_private_set_sysprof(ctx, gpu, value);
>>>>> + case MSM_PARAM_REQ_CNTRS:
>>>>> + return msm_file_private_request_counters(ctx, gpu, value);
>>>>> default:
>>>>> return UERR(EINVAL, drm, "%s: invalid param: %u", gpu-
>>>>>> name, param);
>>>>> }
>>>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/
>>>>> msm_drv.c
>>>>> index 6416d2cb4efc..bf8314ff4a25 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_drv.c
>>>>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>>>>> @@ -377,9 +377,12 @@ static void msm_postclose(struct drm_device
>>>>> *dev, struct drm_file *file)
>>>>> * It is not possible to set sysprof param to non-zero if gpu
>>>>> * is not initialized:
>>>>> */
>>>>> - if (priv->gpu)
>>>>> + if (ctx->sysprof)
>>>>> msm_file_private_set_sysprof(ctx, priv->gpu, 0);
>>>>> + if (ctx->counters_requested)
>>>>> + msm_file_private_request_counters(ctx, priv->gpu, 0);
>>>>> +
>>>>> context_close(ctx);
>>>>> }
>>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/
>>>>> msm_gpu.c
>>>>> index 82f204f3bb8f..013b59ca3bb1 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_gpu.c
>>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.c
>>>>> @@ -991,6 +991,7 @@ int msm_gpu_init(struct drm_device *drm, struct
>>>>> platform_device *pdev,
>>>>> gpu->nr_rings = nr_rings;
>>>>> refcount_set(&gpu->sysprof_active, 1);
>>>>> + refcount_set(&gpu->local_counters_active, 1);
>>>>> return 0;
>>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/
>>>>> msm_gpu.h
>>>>> index e25009150579..83c61e523b1b 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_gpu.h
>>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.h
>>>>> @@ -195,12 +195,28 @@ struct msm_gpu {
>>>>> int nr_rings;
>>>>> /**
>>>>> - * sysprof_active:
>>>>> + * @sysprof_active:
>>>>> *
>>>>> - * The count of contexts that have enabled system profiling.
>>>>> + * The count of contexts that have enabled system profiling plus
>>>>> one.
>>>>> + *
>>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to
>>>>> keep
>>>>> + * the under/overflow checks that refcount_t provides, but allow
>>>>> + * multiple on/off transitions so we track the logical value
>>>>> plus one.)
>>>>> */
>>>>> refcount_t sysprof_active;
>>>>> + /**
>>>>> + * @local_counters_active:
>>>>> + *
>>>>> + * The count of contexts that have requested local (intra-submit)
>>>>> + * performance counter usage plus one.
>>>>> + *
>>>>> + * Note: refcount_t does not like 0->1 transitions.. we want to
>>>>> keep
>>>>> + * the under/overflow checks that refcount_t provides, but allow
>>>>> + * multiple on/off transitions so we track the logical value
>>>>> plus one.)
>>>>> + */
>>>>> + refcount_t local_counters_active;
>>>>> +
>>>>> /**
>>>>> * lock:
>>>>> *
>>>>> @@ -383,6 +399,13 @@ struct msm_file_private {
>>>>> */
>>>>> int sysprof;
>>>>> + /**
>>>>> + * @counters_requested:
>>>>> + *
>>>>> + * Has the context requested local perfcntr usage.
>>>>> + */
>>>>> + bool counters_requested;
>>>>> +
>>>>> /**
>>>>> * comm: Overridden task comm, see MSM_PARAM_COMM
>>>>> *
>>>>> @@ -626,6 +649,8 @@ void msm_submitqueue_destroy(struct kref *kref);
>>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx,
>>>>> struct msm_gpu *gpu, int sysprof);
>>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx,
>>>>> + struct msm_gpu *gpu, int reqcntrs);
>>>>> void __msm_file_private_destroy(struct kref *kref);
>>>>> static inline void msm_file_private_put(struct msm_file_private
>>>>> *ctx)
>>>>> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/
>>>>> msm/msm_submitqueue.c
>>>>> index 7fed1de63b5d..1e1e21e6f7ae 100644
>>>>> --- a/drivers/gpu/drm/msm/msm_submitqueue.c
>>>>> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
>>>>> @@ -10,6 +10,15 @@
>>>>> int msm_file_private_set_sysprof(struct msm_file_private *ctx,
>>>>> struct msm_gpu *gpu, int sysprof)
>>>>> {
>>>>> + int ret = 0;
>>>>> +
>>>>> + mutex_lock(&gpu->lock);
>>>>> +
>>>>> + if (sysprof && (refcount_read(&gpu->local_counters_active) >
>>>>> 1)) {
>>>>> + ret = UERR(EBUSY, gpu->dev, "Local counter usage active");
>>>>> + goto out_unlock;
>>>>> + }
>>>>> +
>>>>> /*
>>>>> * Since pm_runtime and sysprof_active are both refcounts, we
>>>>> * call apply the new value first, and then unwind the previous
>>>>> @@ -18,7 +27,8 @@ int msm_file_private_set_sysprof(struct
>>>>> msm_file_private *ctx,
>>>>> switch (sysprof) {
>>>>> default:
>>>>> - return UERR(EINVAL, gpu->dev, "Invalid sysprof: %d",
>>>>> sysprof);
>>>>> + ret = UERR(EINVAL, gpu->dev, "Invalid sysprof: %d", sysprof);
>>>>> + goto out_unlock;
>>>>> case 2:
>>>>> pm_runtime_get_sync(&gpu->pdev->dev);
>>>>> fallthrough;
>>>>> @@ -43,7 +53,45 @@ int msm_file_private_set_sysprof(struct
>>>>> msm_file_private *ctx,
>>>>> ctx->sysprof = sysprof;
>>>>> - return 0;
>>>>> +out_unlock:
>>>>> + mutex_unlock(&gpu->lock);
>>>>> +
>>>>> + return ret;
>>>>> +}
>>>>> +
>>>>> +int msm_file_private_request_counters(struct msm_file_private *ctx,
>>>>> + struct msm_gpu *gpu, int reqctrs)
>>>>> +{
>>>>> + int ret = 0;
>>>>> +
>>>>> + mutex_lock(&gpu->lock);
>>>>> +
>>>>> + if (reqctrs && (refcount_read(&gpu->sysprof_active) > 1)) {
>>>>> + ret = UERR(EBUSY, gpu->dev, "System profiling active");
>>>>> + goto out_unlock;
>>>>> + }
>>>>> +
>>>>> + if (reqctrs) {
>>>>> + if (ctx->counters_requested) {
>>>>> + ret = UERR(EINVAL, gpu->dev, "Already requested");
>>>>> + goto out_unlock;
>>>>> + }
>>>>> +
>>>>> + ctx->counters_requested = true;
>>>>> + refcount_inc(&gpu->local_counters_active);
>>>>> + } else {
>>>>> + if (!ctx->counters_requested) {
>>>>> + ret = UERR(EINVAL, gpu->dev, "Not requested");
>>>>> + goto out_unlock;
>>>>> + }
>>>>> + refcount_dec(&gpu->local_counters_active);
>>>>> + ctx->counters_requested = false;
>>>>> + }
>>>>> +
>>>>> +out_unlock:
>>>>> + mutex_unlock(&gpu->lock);
>>>>> +
>>>>> + return ret;
>>>>> }
>>>>> void __msm_file_private_destroy(struct kref *kref)
>>>>> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
>>>>> index 2342cb90857e..ae7fb355e4a1 100644
>>>>> --- a/include/uapi/drm/msm_drm.h
>>>>> +++ b/include/uapi/drm/msm_drm.h
>>>>> @@ -91,6 +91,7 @@ struct drm_msm_timespec {
>>>>> #define MSM_PARAM_UBWC_SWIZZLE 0x12 /* RO */
>>>>> #define MSM_PARAM_MACROTILE_MODE 0x13 /* RO */
>>>>> #define MSM_PARAM_UCHE_TRAP_BASE 0x14 /* RO */
>>>>> +#define MSM_PARAM_REQ_CNTRS 0x15 /* WO: request "local" (intra-
>>>>> submit) perfcntr usage */
>>>>> /* For backwards compat. The original support for preemption was
>>>>> based on
>>>>> * a single ring per priority level so # of priority levels equals
>>>>> the #
>>>>
>>>
>>>
>>> Best regards,
>>
>
>
> Best regards,