Re: Kirkwood PCI Express and bridges

From: Chris Packham
Date: Mon Jun 24 2019 - 22:05:14 EST


On 24/06/19 4:08 PM, Chris Packham wrote:
> Hi Thomas,
>
> On 21/06/19 6:17 PM, Thomas Petazzoni wrote:
>> Hello Chris,
>>
>> On Fri, 21 Jun 2019 04:03:27 +0000
>> Chris Packham <Chris.Packham@xxxxxxxxxxxxxxxxxxx> wrote:
>>
>>> I'm in the process of updating the kernel version used on our products
>>> from 4.4 -> 5.1.
>>>
>>> We have one product that uses a Kirkwood CPU, IDT PCI bridge and Marvell
>>> Switch ASIC. The Switch ASIC presents as multiple PCI devices.
>>>
>>> The hardware setup looks like this
>>> __________
>>> [ Kirkwood ] --- [ IDT 5T5 ] ---+--- | |
>>> +--- | Switch |
>>> +--- | |
>>> +--- |__________|
>>>
>>> On the 4.4 based kernel things are fine
>>>
>>> [root@awplus flash]# lspci -t
>>> -[0000:00]---01.0-[01-06]----00.0-[02-06]--+-02.0-[03]----00.0
>>> +-03.0-[04]----00.0
>>> +-04.0-[05]----00.0
>>> \-05.0-[06]----00.0
>>>
>>> But on the 5.1 based kernel things get a little weird
>>>
>>> [root@awplus flash]# lspci -t
>>> -[0000:00]---01.0-[01-06]--+-00.0-[02-06]--
>>> +-01.0
>>> +-02.0-[02-06]--
>>> +-03.0-[02-06]--
>>> +-04.0-[02-06]--
>>> +-05.0-[02-06]--
>>> +-06.0-[02-06]--
>>> +-07.0-[02-06]--
>>> +-08.0-[02-06]--
>>> +-09.0-[02-06]--
>>> +-0a.0-[02-06]--
>>> +-0b.0-[02-06]--
>>> +-0c.0-[02-06]--
>>> +-0d.0-[02-06]--
>>> +-0e.0-[02-06]--
>>> +-0f.0-[02-06]--
>>> +-10.0-[02-06]--
>>> +-11.0-[02-06]--
>>> +-12.0-[02-06]--
>>> +-13.0-[02-06]--
>>> +-14.0-[02-06]--
>>> +-15.0-[02-06]--
>>> +-16.0-[02-06]--
>>> +-17.0-[02-06]--
>>> +-18.0-[02-06]--
>>> +-19.0-[02-06]--
>>> +-1a.0-[02-06]--
>>> +-1b.0-[02-06]--
>>> +-1c.0-[02-06]--
>>> +-1d.0-[02-06]--
>>> +-1e.0-[02-06]--
>>> \-1f.0-[02-06]--+-02.0-[03]----00.0
>>> +-03.0-[04]----00.0
>>> +-04.0-[05]----00.0
>>> \-05.0-[06]----00.0
>>>
>>>
>>> I'll start bisecting to see where things started going wrong. I just
>>> wondered if this rings any bells for anyone.
>>
>> I am almost sure that the culprit is
>> 1f08673eef1236f7d02d93fcf596bb8531ef0d12 ("PCI: mvebu: Convert to PCI
>> emulated bridge config space").
>
> The problem seems to pre-date this commit. I've gone back as far as 4.18
> and the problem still exists (in fact there are more duplicate devices).
> I'll keep going back (unfortunately due to out platform being out of
> tree it's not a simple bisect).
>
>> I still think it makes sense to share the bridge emulation code between
>> the mvebu and aardvark drivers, but this sharing has required making
>> the code very different, with lots of subtle differences in behavior in
>> how registers are emulated.
>
> Agreed. Bugs love to hide in duplicated code.
>
> I will admit to being ignorant about the need for an emulated bridge. I
> know it has something to do with the type of transaction used for the
> downstream devices. I also know that these systems won't work without an
> emulated bridge.
>
>> Unfortunately, I don't have access to one of these complicated PCI
>> setup with a HW switch on the way, so I couldn't test this kind of
>> setups.
>>
>> Do you mind helping with figuring out what the issues are ? That would
>> be really nice.
>
> No problem. As I said I'll keep going to find a point where behaviour
> turns bad for me. I suspect we might find other problems along the way.
>

Some progress. Our defconfig had CONFIG_CMDLINE="pci=pcie_scan_all" in
it. This dated back to before we were using a devicetree with our
kirkwood platforms. At some point this started having an effect on the
emulated bridge.