Re: [RFC v3 5/8] dmabuf: Add gpu cgroup charge transfer function

From: T.J. Mercier
Date: Wed Mar 23 2022 - 19:38:08 EST


On Tue, Mar 22, 2022 at 9:47 AM T.J. Mercier <tjmercier@xxxxxxxxxx> wrote:
>
> On Tue, Mar 22, 2022 at 2:52 AM Michal Koutný <mkoutny@xxxxxxxx> wrote:
> >
> > On Mon, Mar 21, 2022 at 04:54:26PM -0700, "T.J. Mercier"
> > <tjmercier@xxxxxxxxxx> wrote:
> > > Since the charge is duplicated in two cgroups for a short period
> > > before it is uncharged from the source cgroup I guess the situation
> > > you're thinking about is a global (or common ancestor) limit?
> >
> > The common ancestor was on my mind (after the self-shortcut).
> >
> > > I can see how that would be a problem for transfers done this way and
> > > an alternative would be to swap the order of the charge operations:
> > > first uncharge, then try_charge. To be certain the uncharge is
> > > reversible if the try_charge fails, I think I'd need either a mutex
> > > used at all gpucg_*charge call sites or access to the gpucg_mutex,
> >
> > Yes, that'd provide safe conditions for such operations, although I'm
> > not sure these special types of memory can afford global lock on their
> > fast paths.
>
> I have a benchmark I think is suitable, so let me try this change to
> the transfer implementation and see how it compares.

I added a mutex to struct gpucg which is locked when charging the
cgroup initially during allocation, and also only for the source
cgroup during dma_buf_charge_transfer. Then I used a multithreaded
benchmark where each thread allocates 4, 8, 16, or 32 DMA buffers and
then sends them through Binder to another process with charge transfer
enabled. This was intended to generate contention for the mutex in
dma_buf_charge_transfer. The results of this benchmark show that the
difference between a mutex protected charge transfer and an
unprotected charge transfer is within measurement noise. The worst
data point shows about 3% overheard for the mutex.

So I'll prep this change for the next revision. Thanks for pointing it out.
>
> >
> > > which implies adding transfer support to gpu.c as part of the gpucg_*
> > > API itself and calling it here. Am I following correctly here?
> >
> > My idea was to provide a special API (apart from
> > gpucp_{try_charge,uncharge}) to facilitate transfers...
> >
> > > This series doesn't actually add limit support just accounting, but
> > > I'd like to get it right here.
> >
> > ...which could be implemented (or changed) depending on how the charging
> > is realized internally.
> >
> >
> > Michal