Re: [PATCH v3] drivers/base/memory.c: cache blocks in radix tree to accelerate lookup

From: David Hildenbrand
Date: Fri Dec 20 2019 - 05:51:06 EST


On 19.12.19 18:33, Scott Cheloha wrote:
> On Wed, Dec 18, 2019 at 10:00:49AM +0100, David Hildenbrand wrote:
>> On 17.12.19 20:32, Scott Cheloha wrote:
>>> Searching for a particular memory block by id is slow because each block
>>> device is kept in an unsorted linked list on the subsystem bus.
>>>
>>> Lookup is much faster if we cache the blocks in a radix tree. Memory
>>> subsystem initialization and hotplug/hotunplug is at least a little faster
>>> for any machine with more than ~100 blocks, and the speedup grows with
>>> the block count.
>>>
>>> Signed-off-by: Scott Cheloha <cheloha@xxxxxxxxxxxxxxxxxx>
>>> Acked-by: David Hildenbrand <david@xxxxxxxxxx>
>>> ---
>>> v2 incorporates suggestions from David Hildenbrand.
>>>
>>> v3 changes:
>>> - Rebase atop "drivers/base/memory.c: drop the mem_sysfs_mutex"
>>>
>>> - Be conservative: don't use radix_tree_for_each_slot() in
>>> walk_memory_blocks() yet. It introduces RCU which could
>>> change behavior. Walking the tree "by hand" with
>>> find_memory_block_by_id() is slower but keeps the patch
>>> simple.
>>
>> Fine with me (splitting it out, e.g., into an addon patch), however, as
>> readers/writers run mutually exclusive, there is nothing to worry about
>> here. RCU will not make a difference.
>
> Oh. In that case, can you make heads or tails of this CI failure
> email I got about the v2 patch? I've inlined it at the end of this
> mail. "Suspicious RCU usage". Unclear if it's a false positive. My
> thinking was that I'd hold off on using radix_tree_for_each_slot() and
> introducing a possible regression until I understood what was
> triggering the robot.

Ah, did not see that message. See below.

>
> Also, is there anyone else I should shop this patch to? I copied the
> maintainers reported by scripts/get_maintainer.pl but are there others
> who might be interested?

On memory hotplug (and related) things, it's usually a good idea to CC
Andrew (who picks up basically all MM stuff), Michal Hocko, and Oscar
Salvador. (cc-ing them)

>
> --
>
> Here's that CI mail. I've stripped the attached config because it's
> huge.
>
> Date: Sun, 1 Dec 2019 23:05:23 +0800
> From: kernel test robot <lkp@xxxxxxxxx>
> To: Scott Cheloha <cheloha@xxxxxxxxxxxxxxxxxx>
> Cc: 0day robot <lkp@xxxxxxxxx>, LKP <lkp@xxxxxxxxxxxx>
> Subject: 86dc301f7b ("drivers/base/memory.c: cache blocks in radix tree .."):
> [ 1.341517] WARNING: suspicious RCU usage
> Message-ID: <20191201150523.GE18573@shao2-debian>
>
> Greetings,
>
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
> https://github.com/0day-ci/linux/commits/Scott-Cheloha/drivers-base-memory-c-cache-blocks-in-radix-tree-to-accelerate-lookup/20191124-104557
>
> commit 86dc301f7b4815d90e3a7843ffed655d64efe445
> Author: Scott Cheloha <cheloha@xxxxxxxxxxxxxxxxxx>
> AuthorDate: Thu Nov 21 13:59:52 2019 -0600
> Commit: 0day robot <lkp@xxxxxxxxx>
> CommitDate: Sun Nov 24 10:45:59 2019 +0800
>
> drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
>
> Searching for a particular memory block by id is slow because each block
> device is kept in an unsorted linked list on the subsystem bus.
>
> Lookup is much faster if we cache the blocks in a radix tree. Memory
> subsystem initialization and hotplug/hotunplug is at least a little faster
> for any machine with more than ~100 blocks, and the speedup grows with
> the block count.
>
> Signed-off-by: Scott Cheloha <cheloha@xxxxxxxxxxxxxxxxxx>
> Acked-by: David Hildenbrand <david@xxxxxxxxxx>
>
> 0e4a459f56 tracing: Remove unnecessary DEBUG_FS dependency
> 86dc301f7b drivers/base/memory.c: cache blocks in radix tree to accelerate lookup
> +---------------------------------------------------------------------+------------+------------+
> | | 0e4a459f56 | 86dc301f7b |
> +---------------------------------------------------------------------+------------+------------+
> | boot_successes | 8 | 0 |
> | boot_failures | 25 | 11 |
> | WARNING:possible_circular_locking_dependency_detected | 25 | 8 |
> | WARNING:suspicious_RCU_usage | 0 | 11 |
> | include/linux/radix-tree.h:#suspicious_rcu_dereference_check()usage | 0 | 11 |
> +---------------------------------------------------------------------+------------+------------+
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <lkp@xxxxxxxxx>
>
> [ 1.335279] random: get_random_bytes called from kcmp_cookies_init+0x29/0x4c with crng_init=0
> [ 1.336883] ACPI: bus type PCI registered
> [ 1.338295] PCI: Using configuration type 1 for base access
> [ 1.340735]
> [ 1.341049] =============================
> [ 1.341517] WARNING: suspicious RCU usage
> [ 1.342266] 5.4.0-rc5-00070-g86dc301f7b481 #1 Tainted: G T
> [ 1.343494] -----------------------------
> [ 1.344226] include/linux/radix-tree.h:167 suspicious rcu_dereference_check() usage!
> [ 1.345516]
> [ 1.345516] other info that might help us debug this:
> [ 1.345516]
> [ 1.346962]
> [ 1.346962] rcu_scheduler_active = 2, debug_locks = 1
> [ 1.348134] no locks held by swapper/0/1.
> [ 1.348866]
> [ 1.348866] stack backtrace:
> [ 1.349525] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G T 5.4.0-rc5-00070-g86dc301f7b481 #1
> [ 1.351230] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [ 1.352720] Call Trace:
> [ 1.353187] ? dump_stack+0x9a/0xde
> [ 1.353507] ? node_access_release+0x19/0x19
> [ 1.353507] ? walk_memory_blocks+0xe6/0x184
> [ 1.353507] ? set_debug_rodata+0x20/0x20
> [ 1.353507] ? link_mem_sections+0x39/0x3d
> [ 1.353507] ? topology_init+0x74/0xc8
> [ 1.353507] ? enable_cpu0_hotplug+0x15/0x15
> [ 1.353507] ? do_one_initcall+0x13d/0x30a
> [ 1.353507] ? kernel_init_freeable+0x18e/0x23b
> [ 1.353507] ? rest_init+0x173/0x173
> [ 1.353507] ? kernel_init+0x10/0x151
> [ 1.353507] ? rest_init+0x173/0x173
> [ 1.353507] ? ret_from_fork+0x3a/0x50
> [ 1.410829] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
> [ 1.427389] cryptd: max_cpu_qlen set to 1000
> [ 1.457792] ACPI: Added _OSI(Module Device)
> [ 1.458615] ACPI: Added _OSI(Processor Device)
> [ 1.459428] ACPI: Added _OSI(3.0 _SCP Extensions)
>


We get a complaint that we do a rcu_dereference() without proper protection.

We come via

rest_init()->kernel_init()->kernel_init_freeable()->do_basic_setup()->
do_initcalls()->do_initcall_level()->do_one_initcall()->
topology_init()->register_one_node()->link_mem_sections()->
walk_memory_blocks()

E.g., we add ACPI memory similarly via
...do_initcalls()...acpi_init()->acpi_scan_init()
also without holding the device hotplug lock. AFAIK, we can't get
races/concurrent hot(un)plug activity here (we even documented that
for ACPI). And if we would, the code would already be wrong ;)

I assume the simplest and cheapest way to suppress the warning would be
to add rcu_read_lock()/rcu_read_unlock() around the
radix_tree_for_each_slot().

Another way to suppress the warning would be to take the device hotplug
lock before calling register_one_node() in all arch specific code - but
we had a similar discussion due to the ACPI code back then and decided
to not take the device hotplug lock if not really necessary.

... but after all we would want a radix_tree_for_each_slot() variant
that does not perform such checks here. We don't hold any locks and
don't need any locks.

--
Thanks,

David / dhildenb