Re: [PATCH 0/2] drm/amd: fix VRR race condition during IRQ handling

From: Harry Wentland
Date: Mon Sep 09 2024 - 13:18:53 EST




On 2024-09-09 13:11, Alex Deucher wrote:
> On Sun, Sep 8, 2024 at 7:23 AM Tobias Jakobi
> <tjakobi@xxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> On 9/8/24 09:35, Christopher Snowhill wrote:
>>
>>> On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote:
>>>> From: Tobias Jakobi <tjakobi@xxxxxxxxxxxxxxxxxxxxx>
>>>>
>>>> Hello,
>>>>
>>>> this fixes a nasty race condition in the set_drr() callbacks for DCN10
>>>> and DCN35 that has existed now since quite some time, see this GitLab
>>>> issue for reference.
>>>>
>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3142
>>>>
>>>> The report just focuses von DCN10, but the same problem also exists in
>>>> the DCN35 code.
>>> Does the problem not exist in the following references to funcs->set_drr?
>>>
>>> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
>>> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
>>> drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx[i]->stream_res.tg->funcs->set_drr(
>>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
>>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
>>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
>>> drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
>>> drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
>>> drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
>>> drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
>>> drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
>>
>> Maybe. But the big difference I see here, is that in this code there
>> isn't even any kind of NULL check applied to tg. Or most of the members
>> of *pipe_ctx. If there really is the same kind of problem here, then one
>> would need to rewrite a bit more code to fix stuff.
>>
>> E.g. in the case of dcn31_hwseq.c, the questionable code is in
>> dcn31_reset_back_end_for_pipe(), which is static and only called from
>> dcn31_reset_hw_ctx_wrap(). Which is assigned to the .reset_hw_ctx_wrap
>> callback. And this specific callback, from what I can see, is only
>> called from dce110_reset_hw_ctx_wrap(). Which is then assigned to the
>> .apply_ctx_to_hw callback. The callback is only called from
>> dc_commit_state_no_check(). That one is static again, and called from
>> dc_commit_streams().
>>
>> I could trace this even further. My point is: I don't think this is
>> called from any IRQ handler code. And given the depth and complexity of
>> the callgraph, I have to admit, that, at least at this point, this is a
>> bit over my head.
>>
>> Sure, I could now sprinkle a bunch of x != NULL in the code, but that
>> would be merely voodoo. And I usually try to have a theoretical basis
>> when I apply changes to code.
>>
>> Maybe if someone from the AMD display team could give some insight if
>> there still is potentially vulnerable code in some of the instances that
>> Christopher has posted, then I would gladly take a look.
>
> @Wentland, Harry can you confirm this?
>

As Tobias said, without extensive analysis and trace of the code in all
possible use-case it's hard to say there's no possible way the other
set_drr calls could potentially have a similar issue.

I think Tobias' analysis is sound and this fixes a number of issues, hence
my RB.

Harry

> Alex
>
>>
>> With best wishes,
>> Tobias
>>
>>>
>>>> With best wishes,
>>>> Tobias
>>>>
>>>> Tobias Jakobi (2):
>>>> drm/amd/display: Avoid race between dcn10_set_drr() and
>>>> dc_state_destruct()
>>>> drm/amd/display: Avoid race between dcn35_set_drr() and
>>>> dc_state_destruct()
>>>>
>>>> .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++--------
>>>> .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 20 +++++++++++--------
>>>> 2 files changed, 24 insertions(+), 16 deletions(-)