Re: [PATCH RFC 01/10] rcu: Make CONFIG_SRCU unconditionally enabled

From: Kirill Tkhai
Date: Thu Aug 09 2018 - 06:22:46 EST


On 09.08.2018 00:31, Dave Chinner wrote:
> On Wed, Aug 08, 2018 at 12:27:34PM +0200, Michal Hocko wrote:
>> [CC Josh - the whole series is
>> http://lkml.kernel.org/r/153365347929.19074.12509495712735843805.stgit@xxxxxxxxxxxxxxxxxxxxx]
>>
>> On Wed 08-08-18 13:17:44, Kirill Tkhai wrote:
>>> On 08.08.2018 10:20, Michal Hocko wrote:
>>>> On Tue 07-08-18 18:37:36, Kirill Tkhai wrote:
>>>>> This patch kills all CONFIG_SRCU defines and
>>>>> the code under !CONFIG_SRCU.
>>>>
>>>> The last time somebody tried to do this there was a pushback due to
>>>> kernel tinyfication. So this should really give some numbers about the
>>>> code size increase. Also why can't we make this depend on MMU. Is
>>>> anybody else than the reclaim asking for unconditional SRCU usage?
>>>
>>> I don't know one. The size numbers (sparc64) are:
>>>
>>> $ size image.srcu.disabled
>>> text data bss dec hex filename
>>> 5117546 8030506 1968104 15116156 e6a77c image.srcu.disabled
>>> $ size image.srcu.enabled
>>> text data bss dec hex filename
>>> 5126175 8064346 1968104 15158625 e74d61 image.srcu.enabled
>>> The difference is: 15158625-15116156 = 42469 ~41Kb
>>>
>>> Please, see the measurement details to my answer to Stephen.
>>>
>>>> Btw. I totaly agree with Steven. This is a very poor changelog. It is
>>>> trivial to see what the patch does but it is far from clear why it is
>>>> doing that and why we cannot go other ways.
>>> We possibly can go another way, and there is comment to [2/10] about this.
>>> Percpu rwsem may be used instead, the only thing, it is worse, is it will
>>> make shrink_slab() wait unregistering shrinkers, while srcu-based
>>> implementation does not require this.
>>
>> Well, if unregisterring doesn't do anything subtle - e.g. an allocation
>> or take locks which depend on allocation - and we can guarantee that
>> then blocking shrink_slab shouldn't be a big deal.
>
> unregister_shrinker() already blocks shrink_slab - taking a rwsem in
> write mode blocks all readers - so using a per-cpu rwsem doesn't
> introduce anything new or unexpected. I'd like to see numbers of the
> different methods before anything else.

The difference is percpu_rw_semaphore makes readers to wait till RCU
grace period is finished. Sometimes this takes unpredictable time on
big machines with many CPUs, which is not good.

> IMO, the big deal is that the split unregister mechanism seems to
> imply superblock shrinkers can be called during sb teardown or
> /after/ the filesystem has been torn down in memory (i.e. after
> ->put_super() is called). That's a change of behaviour, but it's
> left to the filesystem to detect and handle that condition. That's
> exceedingly subtle and looks like a recipe for disaster to me. I
> note that XFS hasn't been updated to detect and avoid this landmine.
>
> And, FWIW, filesystems with multiple shrinkers (e.g. XFS as 3 per
> mount) will take the SCRU penalty multiple times during unmount, and
> potentially be exposed to multiple different "use during/after
> teardown" race conditions.
>
>> It is subtle though.
>> Maybe subtle enough to make unconditional SRCU worth it. This all should
>> be in the changelog though.
>
> IMO, we've had enough recent bugs to deal with from shrinkers being
> called before the filesystem is set up and from trying to handle
> allocation errors during setup. Do we really want to make shrinker
> shutdown just as prone to mismanagement and subtle, hard to hit
> bugs? I don't think we do - unmount is simply not a critical
> performance path.

There are possible different situations, people use linux like they want.
Imagine, you want to reboot NFS server, but you want to enter clients
and umount them over ssh, and the time is critical. Something like this.
I believe there are many examples, people need this.