Re: [RFC 2/4] PCI: generic: Add support for ARM64 and MSI(x)

From: Lorenzo Pieralisi
Date: Tue Oct 07 2014 - 10:48:01 EST


On Tue, Oct 07, 2014 at 02:52:27PM +0100, Arnd Bergmann wrote:
> On Tuesday 07 October 2014 13:06:59 Lorenzo Pieralisi wrote:
> > On Wed, Oct 01, 2014 at 10:38:45AM +0100, Arnd Bergmann wrote:
> >
> > [...]
> >
> > > pci_mmap_page_range could either get generalized some more in an attempt
> > > to have a __weak default implementation that works on ARM, or it could
> > > be changed to lose the dependency on pci_sys_data instead. In either
> > > case, the change would involve using the generic pci_host_bridge_window
> > > list.
> >
> > On ARM pci_mmap_page_range requires pci_sys_data to retrieve its
> > mem_offset parameter. I had a look, and I do not understand *why*
> > it is required in that function, so I am asking. That function
> > is basically used to map PCI resources to userspace, IIUC, through
> > /proc or /sysfs file mappings. As far as I understand those mappings
> > expect VMA pgoff to be the CPU address when files representing resources
> > are mmapped from /proc and 0 when mmapped from /sys (I mean from
> > userspace, then VMA pgoff should be updated by the kernel to map the
> > resource).
>
> Applying the mem_offset is certainly the more intuitive way, since
> that lets you read the PCI BAR values from a device and access the
> device with the appropriate offsets.

Ok, but I am referring to this snippet (drivers/pci/pci-sysfs.c):

/* pci_mmap_page_range() expects the same kind of entry as coming
* from /proc/bus/pci/ which is a "user visible" value. If this is
* different from the resource itself, arch will do necessary fixup.
*/
pci_resource_to_user(pdev, i, res, &start, &end);

--> Here start represents a CPU physical address, if pci_resource_to_user()
does not fix it up, correct ?

vma->vm_pgoff += start >> PAGE_SHIFT;

[...]

return pci_mmap_page_range(...);

pci_mmap_page_range() applies (mem_offset >> PAGE_SHIFT) to pgoff in the
ARM implemention.

Is not there a mismatch here on platforms where mem_offset != 0 ?

> > Question is: why pci_mmap_page_range() should apply an additional
> > shift to the VMA pgoff based on pci_sys_data.mem_offset, which represents
> > the offset from cpu->bus offset. I do not understand that. PowerPC
> > does not seem to apply that fix-up (in PowerPC __pci_mmap_make_offset there
> > is commented out code which prevents the pci_mem_offset shift to be
> > applied). I think it all boils down to what the userspace interface is
> > expecting when the memory areas are mmapped, if anyone has comments on
> > this that is appreciated.
>
> The important part is certainly that whatever transformation is done
> by pci_resource_to_user() gets undone by __pci_mmap_make_offset().

Exactly, it does not seem to be the case above, that's why I asked.

> In case of PowerPC and Microblaze, the mem_offset handling is commented
> out in both, to work around X11 trying to use the same values on
> /dev/mem. However, they do have the respective fixup for io_offset.
>
> sparc applies the offset in both places for both io_offset and mem_offset.
> xtensa applies only io_offset in __pci_mmap_make_offset but neither
> in pci_resource_to_user. This probably works because the mem_offset is
> always zero there.
> mips applies a different fixup (for 36-bit addressing), but not the
> mem_offset.
>
> Every other architecture applies no offset here, neither in __pci_mmap_make_offset/pci_mmap_page_range nor in pci_resource_to_user
>
> The only hint I could find for how the ARM version came to be is
> from the historic kernel tree git log for linux-2.5.42, which added
> the current code as
>
> 2002/10/13 11:05:47+01:00 rmk
> [ARM] Update pcibios_enable_device, supply pci_mmap_page_range()
> Update pcibios_enable_device to only enable requested resources,
> mainly for IDE. Supply a pci_mmap_page_range() function to allow
> user space to mmap PCI regions.
>
> At that point, only two platforms had a nonzero mem_offset:
> footbridge/dc21285 and integrator/pci_v3. Both were using VGA,
> and presumably used this to make X work. (rmk might remember
> details).

I think that, as I mentioned, it boils down to what the userspace
interface (proc/sys and they seem to differ) is supposed to be passed
from userspace processes upon mmap.

> The code at the time matched what powerpc and sparc did, but then
> both implemented pci_resource_to_user() in order for libpciaccess
> to work correctly (bcea1db16b for sparc, 463ce0e103f for powerpc),
> and later powerpc changed it again to not apply the offset in
> pci_resource_to_user or pci_mmap_page_range in 396a1a5832ae.

I will keep investigating, thank you for your help, any further comments
appreciated.

Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/