Re: [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations

From: Noam Stolero
Date: Thu Dec 05 2019 - 11:16:33 EST


On 12/3/2019 6:08 PM, Catalin Marinas wrote:
> On Tue, Dec 03, 2019 at 03:51:50PM +0000, Noam Stolero wrote:
>> On 8/12/2019 7:06 PM, Catalin Marinas wrote:
>>> Following the discussions on v2 of this patch(set) [1], this series
>>> takes slightly different approach:
>>>
>>> - it implements its own simple memory pool that does not rely on the
>>> slab allocator
>>>
>>> - drops the early log buffer logic entirely since it can now allocate
>>> metadata from the memory pool directly before kmemleak is fully
>>> initialised
>>>
>>> - CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option is renamed to
>>> CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE
>>>
>>> - moves the kmemleak_init() call earlier (mm_init())
>>>
>>> - to avoid a separate memory pool for struct scan_area, it makes the
>>> tool robust when such allocations fail as scan areas are rather an
>>> optimisation
>>>
>>> [1] http://lkml.kernel.org/r/20190727132334.9184-1-catalin.marinas@xxxxxxx
>>>
>>> Catalin Marinas (3):
>>> mm: kmemleak: Make the tool tolerant to struct scan_area allocation
>>> failures
>>> mm: kmemleak: Simple memory allocation pool for kmemleak objects
>>> mm: kmemleak: Use the memory pool for early allocations
>>>
>>> init/main.c | 2 +-
>>> lib/Kconfig.debug | 11 +-
>>> mm/kmemleak.c | 325 ++++++++++++----------------------------------
>>> 3 files changed, 91 insertions(+), 247 deletions(-)
>> We observe severe degradation in our network performance affecting all
>> of our NICs. The degradation is directly linked to this patch.
>>
>> What we run:
>> Simple Iperf TCP loopback with 8 streams on ConnectX5-100GbE.
>> Since it's a loopback test, traffic goes from the socket through the IP
>> stack and back to the socket, without going through the NIC driver.
> Something similar was reported before. Can you try commit 2abd839aa7e6
> ("kmemleak: Do not corrupt the object_list during clean-up") and see if
> it fixes the problem for you? It was merged in 5.4-rc4.

I've tested this commit, as well as 5.4.0-rc6 and 5.4.0 GA versions. We
don't see the issue I've mentioned.

Thank you for the quick response and the assistance.