Re: New subsystem for acceleration devices

From: Jason Gunthorpe
Date: Tue Aug 09 2022 - 10:22:32 EST


On Tue, Aug 09, 2022 at 02:46:36PM +0200, Arnd Bergmann wrote:
> On Tue, Aug 9, 2022 at 2:18 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> > On Tue, Aug 09, 2022 at 10:32:27AM +0200, Arnd Bergmann wrote:
> > > On Tue, Aug 9, 2022 at 10:04 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> > >
> > > I think for devices with hardware MMU contexts you actually want to
> > > bind the context to a 'mm_struct', and then ensure that the context
> > > is only ever used from a process that shares this mm_struct,
> > > regardless of who else has access to the same file or inode.
> >
> > I can't think of a security justification for this.
> >
> > If process A stuffs part of its address space into the device and
> > passes the FD to process B which can then access that addresss space,
> > how is it any different from process A making a tmpfs, mmaping it, and
> > passing it to process B which can then access that address space?
> >
> > IMHO the 'struct file' is the security domain and a process must be
> > careful to only allow FDs to be created that meet its security needs.
> >
> > The kernel should not be involved in security here any further than
> > using the standard FD security mechanisms.
> >
> > Who is to say there isn't a meaningful dual process use case for the
> > accelerator? We see dual-process designs regularly in networking
> > accelerators.
>
> I think there is a lot of value in keeping things simple here, to
> make sure users and developers don't make a mess of it.

I don't think the kernel should enforce policy on userspace. As long
as the kernel side is simple and reasonable then it should let
userspace make whatever mess it likes.

We have a lot of experiance here now, and userspaces do take advantage
of this flexability in more well established accelerator subsystems,
like DRM and RDMA.

> If the accelerator has access to the memory of one process but I run
> it from another process, I lose the benefits of the shared page
> tables,

There are several accelerator "ASID" models I've seen - devices can
have one or many ASID's and the ASID's can be map/unmap style or forced
1:1 with a mm_struct (usually called SVA/SVM).

Userspace is responsible to figure out how to use this stuff. With
map/unmap there should be no kernel restriction on what mappings can
be created, but often sane userspaces will probably want to stick to
1:1 map/unmap with a single process.

> E.g. attaching a debugger to single-step through the accelerator code
> would then involve at least four address spaces:
>
> - the addresses of the debugger
> - the addresses local to the accelerator
> - addresses in the shared address space of the process that owns
> the memory
> - addresses in the process that owns the accelerator context
>
> which is at least one more than I'd like to deal with.

It is a FD. There is no "owns the accelerator context" - that concept
is an abuse of the FD model, IMHO.

If you are debugging you have the mmu_struct of each of the threads
you are debugging and each of the ASID's loaded in the accelerator to
deal with - it is inherent in the hardware design.

> This is somewhat different for accelerators that have coherent
> access to a process address space and only access explicitly
> shared buffers. On these you could more easily allow sharing the
> file descriptor between any number of processes.

That is just a multi-ASID accelerator and userspace has linked a
unique SVA ASID to each unique process using the FD.

The thing to understand is that the FD represents the security
context, so it is very resonable on a multi-ASID device I could share
the same security context with two co-operating processes, create two
ASID's and do accelerator operations that work jointly across both
memory spaces. For instance I might consume a source from process B,
process it and deliver the result into process A where process A will
then send it over the network or something. We have these kinds of use
models already.

Jason