Re: regression 4.4: deadlock in with cgroup percpu_rwsem
From: Christoph Hellwig
Date: Tue Jan 26 2016 - 09:52:08 EST
On Mon, Jan 25, 2016 at 02:38:36PM -0500, Tejun Heo wrote:
> On Mon, Jan 25, 2016 at 09:49:42AM +0100, Christoph Hellwig wrote:
> > FYI, my use case was also related to percpu-ref. The percpu ref API
> > is unfortunately really hard to use and will almost always involve
> > a work queue due to the complex interaction between percpu_ref_kill
> > and percpu_ref_exit. One thing that would help a lot of callers would
>
> That's interesting. Can you please elaborate on how kill and exit
> interact to make things complex?
That we need to first call kill to tear down the reference, then we get
a release callback which is in the calling context of the last
percpu_ref_put, but will need to call percpu_ref_exit from process context
again. This means if any percpu_ref_put is from non-process context
we will always need a work_struct or similar to schedule the final
percpu_ref_exit. Except when..
> > be a percpu_ref_exit_sync that kills the ref and waits for all references
> > to go away synchronously.
>
> That shouldn't be difficult to implement. One minor concern is that
> it's almost guaranteed that there will be cases where the
> synchronicity is exposed to userland. Anyways, can you please
> describe the use case?
We use this completion scheme where the percpu_ref_exit is done from
the same context as the percpu_ref_kill which previously waits for
the last reference drop. But for these cases exposing the synchronicity
to the caller (including userland) actually is intentional.
My use case is a new storage target, broadly similar to the SCSI target,
which happens to exhibit the same behavior. In that case we only want
to return from the teardown function when all I/O on a 'queue' of sorts
has finished, for example during module removal.