Re: [PATCH 0/2] drm/amd: fix VRR race condition during IRQ handling
From: Christopher Snowhill
Date: Mon Sep 09 2024 - 04:29:04 EST
On Sun Sep 8, 2024 at 4:23 AM PDT, Tobias Jakobi wrote:
> On 9/8/24 09:35, Christopher Snowhill wrote:
>
> > On Mon Sep 2, 2024 at 2:40 AM PDT, tjakobi wrote:
> >> From: Tobias Jakobi <tjakobi@xxxxxxxxxxxxxxxxxxxxx>
> >>
> >> Hello,
> >>
> >> this fixes a nasty race condition in the set_drr() callbacks for DCN10
> >> and DCN35 that has existed now since quite some time, see this GitLab
> >> issue for reference.
> >>
> >> https://gitlab.freedesktop.org/drm/amd/-/issues/3142
> >>
> >> The report just focuses von DCN10, but the same problem also exists in
> >> the DCN35 code.
> > Does the problem not exist in the following references to funcs->set_drr?
> >
> > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
> > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
> > drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c: pipe_ctx[i]->stream_res.tg->funcs->set_drr(
> > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
> > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
> > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
> > drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
> > drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
> > drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
> > drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: if (pipe_ctx->stream_res.tg->funcs->set_drr)
> > drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c: pipe_ctx->stream_res.tg->funcs->set_drr(
>
> Maybe. But the big difference I see here, is that in this code there
> isn't even any kind of NULL check applied to tg. Or most of the members
> of *pipe_ctx. If there really is the same kind of problem here, then one
> would need to rewrite a bit more code to fix stuff.
>
> E.g. in the case of dcn31_hwseq.c, the questionable code is in
> dcn31_reset_back_end_for_pipe(), which is static and only called from
> dcn31_reset_hw_ctx_wrap(). Which is assigned to the .reset_hw_ctx_wrap
> callback. And this specific callback, from what I can see, is only
> called from dce110_reset_hw_ctx_wrap(). Which is then assigned to the
> .apply_ctx_to_hw callback. The callback is only called from
> dc_commit_state_no_check(). That one is static again, and called from
> dc_commit_streams().
>
> I could trace this even further. My point is: I don't think this is
> called from any IRQ handler code. And given the depth and complexity of
> the callgraph, I have to admit, that, at least at this point, this is a
> bit over my head.
>
> Sure, I could now sprinkle a bunch of x != NULL in the code, but that
> would be merely voodoo. And I usually try to have a theoretical basis
> when I apply changes to code.
>
> Maybe if someone from the AMD display team could give some insight if
> there still is potentially vulnerable code in some of the instances that
> Christopher has posted, then I would gladly take a look.
Sorry, I was taking a note from someone else who mentioned set_drr functions, and wasn't aware that none of the other implementations happen to be called from IRQ handlers. Thanks for looking into this.
-Christopher
> With best wishes,
> Tobias
>
> >
> >> With best wishes,
> >> Tobias
> >>
> >> Tobias Jakobi (2):
> >> drm/amd/display: Avoid race between dcn10_set_drr() and
> >> dc_state_destruct()
> >> drm/amd/display: Avoid race between dcn35_set_drr() and
> >> dc_state_destruct()
> >>
> >> .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 20 +++++++++++--------
> >> .../amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 20 +++++++++++--------
> >> 2 files changed, 24 insertions(+), 16 deletions(-)