Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
From: mika.westerberg@xxxxxxxxxxxxxxx
Date: Tue Nov 13 2018 - 10:07:56 EST
On Tue, Nov 13, 2018 at 01:28:04PM +0000, Shameerali Kolothum Thodi wrote:
>
> > -----Original Message-----
> > From: mika.westerberg@xxxxxxxxxxxxxxx
> > [mailto:mika.westerberg@xxxxxxxxxxxxxxx]
> > Sent: 13 November 2018 12:59
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>
> > Cc: linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Wangzhou (B)
> > <wangzhou1@xxxxxxxxxxxxx>; Linuxarm <linuxarm@xxxxxxxxxx>; Lukas
> > Wunner <lukas@xxxxxxxxx>
> > Subject: Re: Qemu Guest kernel 4.20-rc1 PCIe hotplug issue
> >
> > On Tue, Nov 13, 2018 at 12:36:20PM +0000, Shameerali Kolothum Thodi wrote:
> > > > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller
> > > > *ctrl,
> > > > u16 cmd,
> > > > slot_ctrl |= (cmd & mask);
> > > > ctrl->cmd_busy = 1;
> > > > smp_mb();
> > > > + ctrl->slot_ctrl = slot_ctrl;
> > >
> > > Actually I tried this one, but it doesn't help in this case as the
> > > initial
> > > pcie_capability_read_word() returns the slot_ctrl without
> > > PCI_EXP_SLTCTL_HPIE bit set. It looks to me
> > > pcie_enable_notification() function enables this,
> > >
> > > if (!pciehp_poll_mode)
> > > cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;
> > >
> > > I don't know this is as per the spec or not as the initial cap read
> > > doesn't seems to have the PCI_EXP_SLTCTL_HPIE bit set.
> >
> > If I read the code right cmd value should end up in ctrl->slot_ctrl properly from
> > pcie_enable_notification().
>
> Right. As I mentioned in my previous mail, I missed the fact that you are updating
> the ctrl->slot_ctrl with cmd value while in my test I did my update with the value
> returned by pcie_capability_read_word().
OK, I see.
> > However, I think we are missing check for PCI_EXP_SLTCTL_CCIE in
> > pciehp_isr().
>
> Ok.
>
> > Here's an updated patch, can you try and see if it makes any difference?
>
> I just tried this and it works. Thanks.
Can you still check that the previous one (without _CCIE check) works?
> See few comments below.
>
> > diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> > b/drivers/pci/hotplug/pciehp_hpc.c
> > index 7dd443aea5a5..da2cbe892444 100644
> > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > @@ -156,9 +156,9 @@ static void pcie_do_write_cmd(struct controller *ctrl,
> > u16 cmd,
> > slot_ctrl |= (cmd & mask);
> > ctrl->cmd_busy = 1;
> > smp_mb();
> > + ctrl->slot_ctrl = slot_ctrl;
>
> Does it make more sense if we can move this before smp_mb()?. Also I am not
> sure updating the ctrl->slot_ctrl before actually the hardware is programmed
> with that value will result in any other race conditions? TBH, I am not that familiar
> with this code and I leave that to you :)
Both are good questions :)
For the moving ctrl->slot_ctrl before pcie_capability_write_word(), I
think we should be fine and this is actually more correct because if we
are unmasking interrupts they may trigger immediately making
pciehp_isr() find wrong values in ctrl->slot_ctrl (as can be seen in the
issue you reported).
The smb_mb() thing is not that clear (at least to me) because it is used
in two places in the driver and both seem to be making write to
ctrl->cmd_busy visible to other CPUs but I don't see where we deal with
the read part.
I may be missing something, though.