Re: [PATCH -next] mm/kmemleak: annotate a data race in checksum

From: Qian Cai
Date: Tue Mar 17 2020 - 09:42:22 EST




> On Mar 17, 2020, at 9:31 AM, Marco Elver <elver@xxxxxxxxxx> wrote:
>
> On Tue, 17 Mar 2020 at 14:22, Qian Cai <cai@xxxxxx> wrote:
>>
>> Even if KCSAN is disabled for kmemleak, update_checksum() could still
>> call crc32() (which is outside of kmemleak.c) to dereference
>> object->pointer. Thus, the value of object->pointer could be accessed
>> concurrently as noticed by KCSAN,
>>
>> BUG: KCSAN: data-race in crc32_le_base / do_raw_spin_lock
>>
>> write to 0xffffb0ea683a7d50 of 4 bytes by task 23575 on cpu 12:
>> do_raw_spin_lock+0x114/0x200
>> debug_spin_lock_after at kernel/locking/spinlock_debug.c:91
>> (inlined by) do_raw_spin_lock at kernel/locking/spinlock_debug.c:115
>> _raw_spin_lock+0x40/0x50
>> __handle_mm_fault+0xa9e/0xd00
>> handle_mm_fault+0xfc/0x2f0
>> do_page_fault+0x263/0x6f9
>> page_fault+0x34/0x40
>>
>> read to 0xffffb0ea683a7d50 of 4 bytes by task 839 on cpu 60:
>> crc32_le_base+0x67/0x350
>> crc32_le_base+0x67/0x350:
>> crc32_body at lib/crc32.c:106
>> (inlined by) crc32_le_generic at lib/crc32.c:179
>> (inlined by) crc32_le at lib/crc32.c:197
>> kmemleak_scan+0x528/0xd90
>> update_checksum at mm/kmemleak.c:1172
>> (inlined by) kmemleak_scan at mm/kmemleak.c:1497
>> kmemleak_scan_thread+0xcc/0xfa
>> kthread+0x1e0/0x200
>> ret_from_fork+0x27/0x50
>>
>> If a shattered value was returned due to a data race, it will be
>> corrected in the next scan. Thus, annotate it as an intentional data
>> race using the data_race() macro.
>>
>> Signed-off-by: Qian Cai <cai@xxxxxx>
>> ---
>> mm/kmemleak.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
>> index e362dc3d2028..d3327756c3a4 100644
>> --- a/mm/kmemleak.c
>> +++ b/mm/kmemleak.c
>> @@ -1169,7 +1169,12 @@ static bool update_checksum(struct kmemleak_object *object)
>> u32 old_csum = object->checksum;
>>
>> kasan_disable_current();
>
> Suggested:
> + kcsan_disable_current();
>
>> - object->checksum = crc32(0, (void *)object->pointer, object->size);
>> + /*
>> + * crc32() will dereference object->pointer. If an unstable value was
>> + * returned due to a data race, it will be corrected in the next scan.
>> + */
>> + object->checksum = data_race(crc32(0, (void *)object->pointer,
>> + object->size));
>
> This will work with the default config, because for word-sized-aligned
> writes no marking is enforced. But this will still cause a data race
> if the write is e.g. due to a memcpy.

I saw this spla atmt but just decided to reuse an old one to save some time.

Looks like that "head->func = func;â not aligned.

[77392.095571][ T839] BUG: KCSAN: data-race in call_rcu / crc32_le_base
[77392.102066][ T839]
[77392.104297][ T839] write to 0xffff898ea73a8748 of 8 bytes by task 114682 on cpu 79:
[77392.112111][ T839] call_rcu+0xe8/0x4b0
__call_rcu at kernel/rcu/tree.c:2701
(inlined by) call_rcu at kernel/rcu/tree.c:2777
[77392.116084][ T839] __fput+0x23a/0x3d0
[77392.119970][ T839] ____fput+0x1e/0x30
[77392.123852][ T839] task_work_run+0xba/0x120
[77392.128257][ T839] do_syscall_64+0x7d7/0xb05
[77392.132753][ T839] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[77392.138544][ T839]
[77392.140760][ T839] INFO: lockdep is turned off.
[77392.145478][ T839] irq event stamp: 0
[77392.149270][ T839] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[77392.156307][ T839] hardirqs last disabled at (0): [<ffffffffb0ab4d42>] copy_process+0x1122/0x3240
[77392.165348][ T839] softirqs last enabled at (0): [<ffffffffb0ab4d42>] copy_process+0x1122/0x3240
[77392.174384][ T839] softirqs last disabled at (0): [<0000000000000000>] 0x0
[77392.181405][ T839]
[77392.183625][ T839] read to 0xffff898ea73a8748 of 4 bytes by task 839 on cpu 46:
[77392.191088][ T839] crc32_le_base+0x67/0x350
[77392.195498][ T839] kmemleak_scan+0x3ee/0x9f0
[77392.199992][ T839] kmemleak_scan_thread+0x9f/0xc4
[77392.204921][ T839] kthread+0x1cd/0x1f0
[77392.208894][ T839] ret_from_fork+0x27/0x50

>
> There are already markers for KASAN around, so the most reliable thing
> is to just disable KCSAN in this region.

OK, Iâll test that a bit first.

>
>> kasan_enable_current();
>
> Suggested:
> + kcsan_enable_current();
>
> Thanks,
> -- Marco
>
>> return object->checksum != old_csum;
>> --
>> 2.21.0 (Apple Git-122.2)