Re: [PATCH v2] locking/percpu-rwsem: Optimize readers and reduce global impact

From: John Stultz
Date: Wed Aug 24 2016 - 17:17:14 EST

On Fri, Aug 12, 2016 at 6:44 PM, Om Dhyade <odhyade@xxxxxxxxxxxxxx> wrote:
> Update from my tests:
> Use-case: Android application launches.
> I tested the patches on android N build, i see max latency ~7ms.
> In my tests, the wait is due to: copy_process(fork.c) blocks all threads in
> __cgroup_procs_write including threads which are not part of the forking
> process's thread-group.
> Dimtry had provided a hack patch which reverts to per-process rw-sem which
> had max latency of ~2ms.
> android user-space binder library does 2 cgroup write operations per
> transaction, apart from the copy_process(fork.c) wait, i see pre-emption in
> _cgroup_procs_write causing waits.

Hey Peter, Tejun, Oleg,
So while you're tweaks for the percpu-rwsem have greatly helped the
regression folks were seeing (many thanks, by the way), as noted
above, the performance regression with the global lock compared to
earlier kernels is still ~3x slower (though again, much better then
the 80x slower that was seen earlier).

So I was wondering if patches to go back to the per signal_struct
locking would still be considered? Or is the global lock approach the
only way forward?

At a higher level, I'm worried that Android's use of cgroups as a
priority enforcement mechanism is at odds with developers focusing on
it as a container enforcement mechanism, as in the latter its not
common for tasks to change between cgroups, but with the former
priority adjustments are quite common.