Re: [PATCH 2/2] PCI: Revoke mappings like devmem

From: Daniel Vetter
Date: Sat Mar 13 2021 - 17:37:43 EST


On Sat, Mar 13, 2021 at 10:57 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> [+cc Krzysztof, Pali, Oliver]
>
> On Thu, Feb 04, 2021 at 05:58:31PM +0100, Daniel Vetter wrote:
> > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> > the region") /dev/kmem zaps ptes when the kernel requests exclusive
> > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> > the default for all driver uses.
> >
> > Except there's two more ways to access PCI BARs: sysfs and proc mmap
> > support. Let's plug that hole.
>
> IIUC, the idea is that if a driver calls request_mem_region() on a PCI
> BAR, we prevent access to the BAR via sysfs. I guess I'm OK with that
> if it's a real security improvement or something.

Yup.

> But the downside of this implementation is that it depends on
> iomem_get_mapping(), which doesn't work until after fs_initcalls,
> which means the sysfs files cannot be static attributes of devices
> added before that. PCI devices are typically enumerated in
> subsys_initcall.
>
> Krzysztof is converting PCI sysfs files (config, rom, reset, vpd, etc)
> to static attributes. This is a major improvement that could get rid
> of pci_create_sysfs_dev_files(), the late_initcall pci_sysfs_init(),
> and the "sysfs_initialized" hack. This would fix a race reported by
> Pali [1] (thanks to Oliver for the idea [2]).
>
> EXCEPT that this revoke change means the "resource%d", "legacy_io",
> and "legacy_mem" files cannot be static attributes because of
> iomem_get_mapping().
>
> Any ideas on how to deal with this? Having to keep the
> pci_sysfs_init() initcall just for these few files seems like the tail
> wagging the dog.

It's a bit "pick your ugly". Either we have the late init call (not
pretty), or the sysfs side needs a callback to fish out the
address_space for the mmap at open() time, which didn't stir up much
enthusiams with Greg because we need a new callback just for these
mmio files. Either approach works.
-Daniel

> [1] https://lore.kernel.org/r/20200716110423.xtfyb3n6tn5ixedh@pali
> [2] https://lore.kernel.org/r/CAOSf1CHss03DBSDO4PmTtMp0tCEu5kScn704ZEwLKGXQzBfqaA@xxxxxxxxxxxxxx
>
> > For revoke_devmem() to work we need to link our vma into the same
> > address_space, with consistent vma->vm_pgoff. ->pgoff is already
> > adjusted, because that's how (io_)remap_pfn_range works, but for the
> > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is
> > to adjust this at at ->open time:
> >
> > - for sysfs this is easy, now that binary attributes support this. We
> > just set bin_attr->mapping when mmap is supported
> > - for procfs it's a bit more tricky, since procfs pci access has only
> > one file per device, and access to a specific resources first needs
> > to be set up with some ioctl calls. But mmap is only supported for
> > the same resources as sysfs exposes with mmap support, and otherwise
> > rejected, so we can set the mapping unconditionally at open time
> > without harm.
> >
> > A special consideration is for arch_can_pci_mmap_io() - we need to
> > make sure that the ->f_mapping doesn't alias between ioport and iomem
> > space. There's only 2 ways in-tree to support mmap of ioports: generic
> > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single
> > architecture hand-rolling. Both approach support ioport mmap through a
> > special pfn range and not through magic pte attributes. Aliasing is
> > therefore not a problem.
> >
> > The only difference in access checks left is that sysfs PCI mmap does
> > not check for CAP_RAWIO. I'm not really sure whether that should be
> > added or not.
> >
> > Acked-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx>
> > Cc: Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx>
> > Cc: Jason Gunthorpe <jgg@xxxxxxxx>
> > Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > Cc: John Hubbard <jhubbard@xxxxxxxxxx>
> > Cc: Jérôme Glisse <jglisse@xxxxxxxxxx>
> > Cc: Jan Kara <jack@xxxxxxx>
> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> > Cc: linux-mm@xxxxxxxxx
> > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> > Cc: linux-samsung-soc@xxxxxxxxxxxxxxx
> > Cc: linux-media@xxxxxxxxxxxxxxx
> > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > Cc: linux-pci@xxxxxxxxxxxxxxx
> > ---
> > drivers/pci/pci-sysfs.c | 4 ++++
> > drivers/pci/proc.c | 1 +
> > 2 files changed, 5 insertions(+)
> >
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index 0c45b4f7b214..f8afd54ca3e1 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -942,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b)
> > b->legacy_io->read = pci_read_legacy_io;
> > b->legacy_io->write = pci_write_legacy_io;
> > b->legacy_io->mmap = pci_mmap_legacy_io;
> > + b->legacy_io->mapping = iomem_get_mapping();
> > pci_adjust_legacy_attr(b, pci_mmap_io);
> > error = device_create_bin_file(&b->dev, b->legacy_io);
> > if (error)
> > @@ -954,6 +955,7 @@ void pci_create_legacy_files(struct pci_bus *b)
> > b->legacy_mem->size = 1024*1024;
> > b->legacy_mem->attr.mode = 0600;
> > b->legacy_mem->mmap = pci_mmap_legacy_mem;
> > + b->legacy_io->mapping = iomem_get_mapping();
> > pci_adjust_legacy_attr(b, pci_mmap_mem);
> > error = device_create_bin_file(&b->dev, b->legacy_mem);
> > if (error)
> > @@ -1169,6 +1171,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
> > res_attr->mmap = pci_mmap_resource_uc;
> > }
> > }
> > + if (res_attr->mmap)
> > + res_attr->mapping = iomem_get_mapping();
> > res_attr->attr.name = res_attr_name;
> > res_attr->attr.mode = 0600;
> > res_attr->size = pci_resource_len(pdev, num);
> > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> > index 3a2f90beb4cb..9bab07302bbf 100644
> > --- a/drivers/pci/proc.c
> > +++ b/drivers/pci/proc.c
> > @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file)
> > fpriv->write_combine = 0;
> >
> > file->private_data = fpriv;
> > + file->f_mapping = iomem_get_mapping();
> >
> > return 0;
> > }
> > --
> > 2.30.0
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch