Re: [PATCH 15/25] x86, pkeys: check VMAs and PTEs for protection keys

From: Jerome Glisse
Date: Thu Oct 22 2015 - 18:26:06 EST


On Thu, Oct 22, 2015 at 02:23:08PM -0700, Dave Hansen wrote:
> On 10/22/2015 01:57 PM, Jerome Glisse wrote:
> > I have not read all the patches, but here i assume that for GUP you do
> > not first call arch_vma_access_permitted(). So issue i see is that GUP
> > for a process might happen inside another process and that process might
> > have different pkru protection keys, effectively randomly allowing or
> > forbidding a device driver to perform a GUP from say some workqueue that
> > just happen to be schedule against a different processor/thread than the
> > one against which it is doing the GUP for.
>
> There are some places where there is no real context from which we can
> determine access rights. ptrace is a good example. We don't enforce
> PKEYs when walking _another_ process's page tables.
>
> Can you give an example of where a process might be doing a gup and it
> is completely separate from the CPU context that it's being executed under?

In drivers/iommu/amd_iommu_v2.c thought this is on AMD platform. I also
believe that in infiniband one can have GUP call from workqueue that can
run at any time. In GPU driver we also use GUP thought at this point we
do not allow another process from accessing a buffer that is populated
by GUP from another process.

I am also here mainly talking about what future GPU will do where you will
have the CPU service page fault from GPU inside a workqueue that can run
at any point in time.

>
> > Second and more fundamental thing i have issue with is that this whole
> > pkru keys are centric to CPU POV ie this is a CPU feature. So i do not
> > believe that device driver should be forbidden to do GUP base on pkru
> > keys.
>
> I don't think of it as something necessarily central to the CPU, but
> something central to things that walk page tables. We mark page tables
> with PKEYs and things that walk them will have certain rights.

My point is that we are seing devices that want to walk the page table and
they do it from a work queue inside the kernel which can run against another
process than the one they are doing the walk from.

I am sure there is already upstream device driver that does so, i have not
check all of them to confirm thought.


> > Tying this to the pkru reg value of whatever processor happens to be
> > running some device driver kernel function that try to do a GUP seems
> > broken to me.
>
> That's one way to look at it. Another way is that PKRU is specifying
> some real _intent_ about whether we want access to be allowed to some
> memory.

I think i misexpress myself here, yes PKRU is about specifying intent but
specifying it for CPU thread not for device thread. GPU for instance have
threads that run on behalf of a given process and i would rather see some
kind of coherent way to specify that for each devices like you allow it
to specify it on per CPU thread basis.


> > So as first i would just allow GUP to always work and then come up with
> > syscall to allow to set pkey on device file. This obviously is a lot more
> > work as you need to go over all device driver using GUP.
>
> I wouldn't be opposed to adding some context to the thread (like
> pagefault_disable()) that indicates whether we should enforce protection
> keys. If we are in some asynchronous context, disassociated from the
> running CPU's protection keys, we could set a flag.

I was simply thinking of having a global set of pkeys against the process
mm struct which would be the default global setting for all device GUP
access. This global set could be override by userspace on a per device
basis allowing some device to have more access than others.


> I'd really appreciate if you could point to some concrete examples here
> which could actually cause a problem, like workqueues doing gups.

Well i could grep for all current user of GUP, but i can tell you that this
is gonna be the model for GPU thread ie a kernel workqueue gonna handle
page fault on behalf of GPU and will perform equivalent of GUP. Also apply
for infiniband ODP thing which is upstream.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/