RE: [PATCH v2] drm/dp: Fix aux->transfer NULL pointer dereference on drm_dp_dpcd_access
From: Yuan, Perry
Date: Thu Nov 11 2021 - 21:17:15 EST
[AMD Official Use Only]
Hi Harry.
> -----Original Message-----
> From: Wentland, Harry <Harry.Wentland@xxxxxxx>
> Sent: Wednesday, November 10, 2021 11:32 PM
> To: Yuan, Perry <Perry.Yuan@xxxxxxx>; Jani Nikula
> <jani.nikula@xxxxxxxxxxxxxxx>; Maarten Lankhorst
> <maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard <mripard@xxxxxxxxxx>;
> Thomas Zimmermann <tzimmermann@xxxxxxx>; David Airlie <airlied@xxxxxxxx>;
> Daniel Vetter <daniel@xxxxxxxx>
> Cc: Huang, Shimmer <Xinmei.Huang@xxxxxxx>; Huang, Ray
> <Ray.Huang@xxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; dri-
> devel@xxxxxxxxxxxxxxxxxxxxx; Limonciello, Mario <Mario.Limonciello@xxxxxxx>
> Subject: Re: [PATCH v2] drm/dp: Fix aux->transfer NULL pointer dereference on
> drm_dp_dpcd_access
>
> On 2021-11-05 03:35, Yuan, Perry wrote:
> > [AMD Official Use Only]
> >
> > Hi Jani:
> >
> >
> >> -----Original Message-----
> >> From: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx>
> >> Sent: Wednesday, November 3, 2021 7:31 PM
> >> To: Yuan, Perry <Perry.Yuan@xxxxxxx>; Maarten Lankhorst
> >> <maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard
> >> <mripard@xxxxxxxxxx>; Thomas Zimmermann <tzimmermann@xxxxxxx>;
> David
> >> Airlie <airlied@xxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx>
> >> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> >> Huang, Shimmer <Xinmei.Huang@xxxxxxx>; Huang, Ray
> <Ray.Huang@xxxxxxx>
> >> Subject: RE: [PATCH v2] drm/dp: Fix aux->transfer NULL pointer
> >> dereference on drm_dp_dpcd_access
> >>
> >> [CAUTION: External Email]
> >>
> >> On Wed, 03 Nov 2021, "Yuan, Perry" <Perry.Yuan@xxxxxxx> wrote:
> >>> [AMD Official Use Only]
> >>>
> >>> Hi Jani:
> >>>
> >>>> -----Original Message-----
> >>>> From: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx>
> >>>> Sent: Tuesday, November 2, 2021 4:40 PM
> >>>> To: Yuan, Perry <Perry.Yuan@xxxxxxx>; Maarten Lankhorst
> >>>> <maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard
> >>>> <mripard@xxxxxxxxxx>; Thomas Zimmermann <tzimmermann@xxxxxxx>;
> >> David
> >>>> Airlie <airlied@xxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx>
> >>>> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> >>>> Huang, Shimmer <Xinmei.Huang@xxxxxxx>; Huang, Ray
> >> <Ray.Huang@xxxxxxx>
> >>>> Subject: RE: [PATCH v2] drm/dp: Fix aux->transfer NULL pointer
> >>>> dereference on drm_dp_dpcd_access
> >>>>
> >>>> [CAUTION: External Email]
> >>>>
> >>>> On Tue, 02 Nov 2021, "Yuan, Perry" <Perry.Yuan@xxxxxxx> wrote:
> >>>>> [AMD Official Use Only]
> >>>>>
> >>>>> Hi Jani:
> >>>>> Thanks for your comments.
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx>
> >>>>>> Sent: Monday, November 1, 2021 9:07 PM
> >>>>>> To: Yuan, Perry <Perry.Yuan@xxxxxxx>; Maarten Lankhorst
> >>>>>> <maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard
> >>>>>> <mripard@xxxxxxxxxx>; Thomas Zimmermann
> >> <tzimmermann@xxxxxxx>;
> >>>> David
> >>>>>> Airlie <airlied@xxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx>
> >>>>>> Cc: Yuan, Perry <Perry.Yuan@xxxxxxx>;
> >>>>>> dri-devel@xxxxxxxxxxxxxxxxxxxxx; linux- kernel@xxxxxxxxxxxxxxx;
> >>>>>> Huang, Shimmer <Xinmei.Huang@xxxxxxx>; Huang, Ray
> >>>> <Ray.Huang@xxxxxxx>
> >>>>>> Subject: Re: [PATCH v2] drm/dp: Fix aux->transfer NULL pointer
> >>>>>> dereference on drm_dp_dpcd_access
> >>>>>>
> >>>>>> [CAUTION: External Email]
> >>>>>>
> >>>>>> On Mon, 01 Nov 2021, Perry Yuan <Perry.Yuan@xxxxxxx> wrote:
> >>>>>>> Fix below crash by adding a check in the drm_dp_dpcd_access
> >>>>>>> which ensures that aux->transfer was actually initialized earlier.
> >>>>>>
> >>>>>> Gut feeling says this is papering over a real usage issue
> >>>>>> somewhere else. Why is the aux being used for transfers before
> >>>>>> ->transfer has been set? Why should the dp helper be defensive
> >>>>>> against all kinds of
> >>>> misprogramming?
> >>>>>>
> >>>>>>
> >>>>>> BR,
> >>>>>> Jani.
> >>>>>>
> >>>>>
> >>>>> The issue was found by Intel IGT test suite, graphic by pass test case.
> >>>>>
> >> https://g
> itl
> >>>>> ab.freedesktop.org%2Fdrm%2Figt-gpu-
> >>>> tools&data=04%7C01%7CPerry.Yuan
> >>>>> %40amd.com%7C83d011acfe65437c0fa808d99ddc65b0%7C3dd8961fe4
> >> 884e6
> >>>> 08e11a8
> >>>>>
> >>>>
> >> 2d994e183d%7C0%7C0%7C637714392203200313%7CUnknown%7CTWFpbG
> >> Zsb
> >>>> 3d8eyJWIj
> >>>>>
> >>>>
> >> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
> >> 00
> >>>> 0&am
> >>>>>
> >>>>
> >> p;sdata=snPpRYLGeJtTpNGle1YHZAvevcABbgLkgOsffiNzQPw%3D&reser
> >> ved
> >>>> =0
> >>>>> normally use case will not see the issue.
> >>>>> To avoid this issue happy again when we run the test case , it
> >>>>> will be nice to
> >>>> add a check before the transfer is called.
> >>>>> And we can see that it really needs to have a check here to make
> >>>>> ITG &kernel
> >>>> happy.
> >>>>
> >>>> You're missing my point. What is the root cause? Why do you have
> >>>> the aux device or connector registered before ->transfer function
> >>>> is initialized. I don't think you should do that.
> >>>>
> >>>> BR,
> >>>> Jani.
> >>>>
> >>>
> >>> One potential IGT fix patch to resolve the test case failure is:
> >>>
> >>> tests/amdgpu/amd_bypass.c
> >>> data->pipe_crc = igt_pipe_crc_new(data->drm_fd, data->pipe_id,
> >>> - AMDGPU_PIPE_CRC_SOURCE_DPRX);
> >>> +
> >>> INTEL_PIPE_CRC_SOURCE_AUTO); The kernel panic error gone after change
> "dprx" to "auto" in the IGT test.
> >>>
> >>> In my view ,the IGT amdgpu bypass test will do some common setup
> >>> work
> >> including crc piple, source.
> >>> When the IGT sets up a new CRC pipe capture source for amdgpu bypass
> >> test, the SOURCE was set as "dprx" instead of "auto"
> >>> It makes "amdgpu_dm_crtc_set_crc_source()" failed to set correct
> >>> AUX
> >> and it's transfer function invalid .
> >>> The system I tested use HDMI port connected to monitor .
> >>>
> >>> amdgpu_dm_crtc_set_crc_source-> (aux = (aconn->port) ? &aconn-
> >>> port->aux : &aconn->dm_dp_aux.aux;)
> >>> drm_dp_start_crc ->
> >>> drm_dp_dpcd_readb-> aux->transfer is NULL, issue here.
> >>> The fix will use the "auto" keyword, which will let the driver
> >>> select a
> >> default source of frame CRCs for this CRTC.
> >>>
> >>> Correct me if have some wrong points.
> >>
> >> Apparently I'm completely failing to communicate my POV to you.
> >>
> >> If you have a kernel oops, the fix needs to be in the kernel, not IGT.
> >>
> >> The question is, why is it possible for IGT (or any userspace) to
> >> trigger AUX communication when the ->transfer function is not set? In
> >> my opinion, the kernel driver should not have exposed the interface
> >> at all if the ->transfer function is not set. The interface is useless without the -
> >transfer function.
> >> IMO, that's the bug.
> >>
> >
> > Yes , you are correct , the transfer shouldn't be called before it is ready !
> >
> > Let me explain more details in my view .
> > Maybe the root cause is not why the aux->transfer is not called before it is
> registered in this case.
> > I suppose the issue was triggered by wrong CRC pipe source .
> >
> > Actually the aux->transfer has been registered when amdgpu DM registered at
> kernel boot.
> > IGT test was run when system login to Gnome desktop.
> >
> > amdgpu_dm_initialize_dp_connector->
> > aconnector->dm_dp_aux.aux.transfer = dm_dp_aux_transfer;
> >
> > The test case failed when the IGT set an "DPRX" CRC pipe source while the
> HDMI connected to monitor only.
> > At this time, the aux->transfer is NULL, and dp helper did not check the
> transfer pointer NULL or not.
> > It calls the transfers to DPCD read, then you see the kernel panic log.
> >
> > amdgpu_dm_crtc_funcs-> amdgpu_dm_crtc_set_crc_source->
> > drm_dp_start_crc
> >
> > * And if the DP cable connected only, the issue will not happen. Test will pass.
> > * If I change the CRC source to "auto", kernel will not see the panic as well.
> > Maybe the failed test case need to run on the DP instead of HDMI, I am not
> sure at now.
> >
>
> Two things need to happen:
> 1) IGT should skip tests requiring DPRX CRC source if not on a
> DP connector.
> 2) Driver should return EINVAL (or another appropriate error) if
> DPRX CRC source is requested when the CRTC is not connected to
> a DP display. Alternatively we could make sure that DPRX is
> not advertised as a CRC source in this case but I'm not sure
> how difficult that would be.
>
> Like Jani said, I don't think the current patch is the correct one as it doesn't get
> to the root cause. The root cause fix should be in the CRC debugfs handling code.
>
> Harry
Got your point.
I will make another two patches as you suggested.
Thanks for your feedback.
Perry.
>
> >
> > Hopping this info can help.
> >
> > Perry.
> >
> >
> >>
> >> BR,
> >> Jani.
> >>
> >>>
> >>> Thank you!
> >>> Perry.
> >>>
> >>>>
> >>>>>
> >>>>> Perry.
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000000
> >>>>>>> PGD
> >>>>>>> 0 P4D 0
> >>>>>>> Oops: 0010 [#1] SMP NOPTI
> >>>>>>> RIP: 0010:0x0
> >>>>>>> Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
> >>>>>>> RSP: 0018:ffffa8d64225bab8 EFLAGS: 00010246
> >>>>>>> RAX: 0000000000000000 RBX: 0000000000000020 RCX:
> >>>>>>> ffffa8d64225bb5e
> >>>>>>> RDX: ffff93151d921880 RSI: ffffa8d64225bac8 RDI:
> >>>>>>> ffff931511a1a9d8
> >>>>>>> RBP: ffffa8d64225bb10 R08: 0000000000000001 R09:
> >>>>>>> ffffa8d64225ba60
> >>>>>>> R10: 0000000000000002 R11: 000000000000000d R12:
> >>>>>>> 0000000000000001
> >>>>>>> R13: 0000000000000000 R14: ffffa8d64225bb5e R15:
> >>>>>>> ffff931511a1a9d8
> >>>>>>> FS: 00007ff8ea7fa9c0(0000) GS:ffff9317fe6c0000(0000)
> >>>>>>> knlGS:0000000000000000
> >>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>> CR2: ffffffffffffffd6 CR3: 000000010d5a4000 CR4:
> >>>>>>> 0000000000750ee0
> >>>>>>> PKRU: 55555554
> >>>>>>> Call Trace:
> >>>>>>> drm_dp_dpcd_access+0x72/0x110 [drm_kms_helper]
> >>>>>>> drm_dp_dpcd_read+0xb7/0xf0 [drm_kms_helper]
> >>>>>>> drm_dp_start_crc+0x38/0xb0 [drm_kms_helper]
> >>>>>>> amdgpu_dm_crtc_set_crc_source+0x1ae/0x3e0 [amdgpu]
> >>>>>>> crtc_crc_open+0x174/0x220 [drm]
> >>>>>>> full_proxy_open+0x168/0x1f0
> >>>>>>> ? open_proxy_open+0x100/0x100
> >>>>>>> do_dentry_open+0x156/0x370
> >>>>>>> vfs_open+0x2d/0x30
> >>>>>>>
> >>>>>>> v2: fix some typo
> >>>>>>>
> >>>>>>> Signed-off-by: Perry Yuan <Perry.Yuan@xxxxxxx>
> >>>>>>> ---
> >>>>>>> drivers/gpu/drm/drm_dp_helper.c | 4 ++++
> >>>>>>> 1 file changed, 4 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/drm_dp_helper.c
> >>>>>>> b/drivers/gpu/drm/drm_dp_helper.c index
> >>>>>>> 6d0f2c447f3b..76b28396001a
> >>>>>>> 100644
> >>>>>>> --- a/drivers/gpu/drm/drm_dp_helper.c
> >>>>>>> +++ b/drivers/gpu/drm/drm_dp_helper.c
> >>>>>>> @@ -260,6 +260,10 @@ static int drm_dp_dpcd_access(struct
> >>>>>>> drm_dp_aux
> >>>>>> *aux, u8 request,
> >>>>>>> msg.buffer = buffer;
> >>>>>>> msg.size = size;
> >>>>>>>
> >>>>>>> + /* No transfer function is set, so not an available DP connector */
> >>>>>>> + if (!aux->transfer)
> >>>>>>> + return -EINVAL;
> >>>>>>> +
> >>>>>>> mutex_lock(&aux->hw_mutex);
> >>>>>>>
> >>>>>>> /*
> >>>>>>
> >>>>>> --
> >>>>>> Jani Nikula, Intel Open Source Graphics Center
> >>>>
> >>>> --
> >>>> Jani Nikula, Intel Open Source Graphics Center
> >>
> >> --
> >> Jani Nikula, Intel Open Source Graphics Center