Re: [PATCH] phy: tegra: xusb: Fix dangling pointer on probe failure
From: Thierry Reding
Date: Wed Oct 28 2020 - 18:26:43 EST
On Tue, Oct 13, 2020 at 10:58:20AM +0100, Marc Zyngier wrote:
> If, for some reason, the xusb PHY fails to probe, it leaves
> a dangling pointer attached to the platform device structure.
>
> This would normally be harmless, but the Tegra XHCI driver then
> goes and extract that pointer from the PHY device. Things go
> downhill from there:
>
> 8.752082] [004d554e5145533c] address between user and kernel address ranges
> [ 8.752085] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [ 8.752088] Modules linked in: max77620_regulator(E+) xhci_tegra(E+) sdhci_tegra(E+) xhci_hcd(E) sdhci_pltfm(E) cqhci(E) fixed(E) usbcore(E) scsi_mod(E) sdhci(E) host1x(E+)
> [ 8.752103] CPU: 4 PID: 158 Comm: systemd-udevd Tainted: G S W E 5.9.0-rc7-00298-gf6337624c4fe #1980
> [ 8.752105] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT)
> [ 8.752108] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--)
> [ 8.752115] pc : kobject_put+0x1c/0x21c
> [ 8.752120] lr : put_device+0x20/0x30
> [ 8.752121] sp : ffffffc012eb3840
> [ 8.752122] x29: ffffffc012eb3840 x28: ffffffc010e82638
> [ 8.752125] x27: ffffffc008d56440 x26: 0000000000000000
> [ 8.752128] x25: ffffff81eb508200 x24: 0000000000000000
> [ 8.752130] x23: ffffff81eb538800 x22: 0000000000000000
> [ 8.752132] x21: 00000000fffffdfb x20: ffffff81eb538810
> [ 8.752134] x19: 3d4d554e51455300 x18: 0000000000000020
> [ 8.752136] x17: ffffffc008d00270 x16: ffffffc008d00c94
> [ 8.752138] x15: 0000000000000004 x14: ffffff81ebd4ae90
> [ 8.752140] x13: 0000000000000000 x12: ffffff81eb86a4e8
> [ 8.752142] x11: ffffff81eb86a480 x10: ffffff81eb862fea
> [ 8.752144] x9 : ffffffc01055fb28 x8 : ffffff81eb86a4a8
> [ 8.752146] x7 : 0000000000000001 x6 : 0000000000000001
> [ 8.752148] x5 : ffffff81dff8bc38 x4 : 0000000000000000
> [ 8.752150] x3 : 0000000000000001 x2 : 0000000000000001
> [ 8.752152] x1 : 0000000000000002 x0 : 3d4d554e51455300
> [ 8.752155] Call trace:
> [ 8.752157] kobject_put+0x1c/0x21c
> [ 8.752160] put_device+0x20/0x30
> [ 8.752164] tegra_xusb_padctl_put+0x24/0x3c
> [ 8.752170] tegra_xusb_probe+0x8b0/0xd10 [xhci_tegra]
> [ 8.752174] platform_drv_probe+0x60/0xb4
> [ 8.752176] really_probe+0xf0/0x504
> [ 8.752179] driver_probe_device+0x100/0x170
> [ 8.752181] device_driver_attach+0xcc/0xd4
> [ 8.752183] __driver_attach+0xb0/0x17c
> [ 8.752185] bus_for_each_dev+0x7c/0xd4
> [ 8.752187] driver_attach+0x30/0x3c
> [ 8.752189] bus_add_driver+0x154/0x250
> [ 8.752191] driver_register+0x84/0x140
> [ 8.752193] __platform_driver_register+0x54/0x60
> [ 8.752197] tegra_xusb_init+0x40/0x1000 [xhci_tegra]
> [ 8.752201] do_one_initcall+0x54/0x2d0
> [ 8.752205] do_init_module+0x68/0x29c
> [ 8.752207] load_module+0x2178/0x26c0
> [ 8.752209] __do_sys_finit_module+0xb0/0x120
> [ 8.752211] __arm64_sys_finit_module+0x2c/0x40
> [ 8.752215] el0_svc_common.constprop.0+0x80/0x240
> [ 8.752218] do_el0_svc+0x30/0xa0
> [ 8.752220] el0_svc+0x18/0x50
> [ 8.752223] el0_sync_handler+0x90/0x318
> [ 8.752225] el0_sync+0x158/0x180
> [ 8.752230] Code: a9bd7bfd 910003fd a90153f3 aa0003f3 (3940f000)
> [ 8.752232] ---[ end trace 90f6c89d62d85ff5 ]---
>
> Reset the pointer on probe failure fixes the issue.
>
> Fixes: 53d2a715c2403 ("phy: Add Tegra XUSB pad controller support")
> Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx>
> ---
> drivers/phy/tegra/xusb.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/phy/tegra/xusb.c b/drivers/phy/tegra/xusb.c
> index de4a46fe1763..ad88d74c1884 100644
> --- a/drivers/phy/tegra/xusb.c
> +++ b/drivers/phy/tegra/xusb.c
> @@ -1242,6 +1242,7 @@ static int tegra_xusb_padctl_probe(struct platform_device *pdev)
> reset:
> reset_control_assert(padctl->rst);
> remove:
> + platform_set_drvdata(pdev, NULL);
> soc->ops->remove(padctl);
> return err;
> }
Sorry, I had missed this before. Why is this necessary? The driver core
already does dev_set_drvdata(dev, NULL) on failure, which is the same as
your platform_set_drvdata() here.
I suppose one possible explanation would be if for some reason we end up
here in the error cleanup path but with err == 0.
Do you have more information on when this happens so that I can repro
and investigate? Alternatively, if you've still got this set up, can you
do a quick test to see if "err" is indeed a negative error code when we
get here?
Thierry
Attachment:
signature.asc
Description: PGP signature