Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

From: Jason Gunthorpe
Date: Thu Mar 25 2021 - 13:17:42 EST


On Thu, Mar 25, 2021 at 10:02:36AM -0700, Jacob Pan wrote:
> Hi Jean-Philippe,
>
> On Thu, 25 Mar 2021 11:21:40 +0100, Jean-Philippe Brucker
> <jean-philippe@xxxxxxxxxx> wrote:
>
> > On Wed, Mar 24, 2021 at 03:12:30PM -0700, Jacob Pan wrote:
> > > Hi Jason,
> > >
> > > On Wed, 24 Mar 2021 14:03:38 -0300, Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > wrote:
> > > > On Wed, Mar 24, 2021 at 10:02:46AM -0700, Jacob Pan wrote:
> > > > > > Also wondering about device driver allocating auxiliary domains
> > > > > > for their private use, to do iommu_map/unmap on private PASIDs (a
> > > > > > clean replacement to super SVA, for example). Would that go
> > > > > > through the same path as /dev/ioasid and use the cgroup of
> > > > > > current task?
> > > > >
> > > > > For the in-kernel private use, I don't think we should restrict
> > > > > based on cgroup, since there is no affinity to user processes. I
> > > > > also think the PASID allocation should just use kernel API instead
> > > > > of /dev/ioasid. Why would user space need to know the actual PASID
> > > > > # for device private domains? Maybe I missed your idea?
> > > >
> > > > There is not much in the kernel that isn't triggered by a process, I
> > > > would be careful about the idea that there is a class of users that
> > > > can consume a cgroup controlled resource without being inside the
> > > > cgroup.
> > > >
> > > > We've got into trouble before overlooking this and with something
> > > > greenfield like PASID it would be best built in to the API to prevent
> > > > a mistake. eg accepting a cgroup or process input to the allocator.
> > > >
> > > Make sense. But I think we only allow charging the current cgroup, how
> > > about I add the following to ioasid_alloc():
> > >
> > > misc_cg = get_current_misc_cg();
> > > ret = misc_cg_try_charge(MISC_CG_RES_IOASID, misc_cg, 1);
> > > if (ret) {
> > > put_misc_cg(misc_cg);
> > > return ret;
> > > }
> >
> > Does that allow PASID allocation during driver probe, in kernel_init or
> > modprobe context?
> >
> Good point. Yes, you can get cgroup subsystem state in kernel_init for
> charging/uncharging. I would think module_init should work also since it is
> after kernel_init. I have tried the following:
> static int __ref kernel_init(void *unused)
> {
> int ret;
> + struct cgroup_subsys_state *css;
> + css = task_get_css(current, pids_cgrp_id);
>
> But that would imply:
> 1. IOASID has to be built-in, not as module
> 2. IOASIDs charged on PID1/init would not subject to cgroup limit since it
> will be in the root cgroup and we don't support migration nor will migrate.
>
> Then it comes back to the question of why do we try to limit in-kernel
> users per cgroup if we can't enforce these cases.

Are these real use cases? Why would a driver binding to a device
create a single kernel pasid at bind time? Why wouldn't it use
untagged DMA?

When someone needs it they can rework it and explain why they are
doing something sane.

Jason