Re: [External] Re: [PATCH] mm/hugetlb: Fix a race between hugetlb sysctl handlers
From: Mike Kravetz
Date: Tue Aug 25 2020 - 20:03:38 EST
On 8/24/20 8:01 PM, Muchun Song wrote:
> On Tue, Aug 25, 2020 at 5:21 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>>
>> I too am looking at this now and do not completely understand the race.
>> It could be that:
>>
>> hugetlb_sysctl_handler_common
>> ...
>> table->data = &tmp;
>>
>> and, do_proc_doulongvec_minmax()
>> ...
>> return __do_proc_doulongvec_minmax(table->data, table, write, ...
>> with __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, ...
>> ...
>> i = (unsigned long *) data;
>> ...
>> *i = val;
>>
>> So, __do_proc_doulongvec_minmax can be dereferencing and writing to the pointer
>> in one thread when hugetlb_sysctl_handler_common is setting it in another?
>
> Yes, you are right.
>
>>
>> Another confusing part of the message is the stack trace which includes
>> ...
>> ? set_max_huge_pages+0x3da/0x4f0
>> ? alloc_pool_huge_page+0x150/0x150
>>
>> which are 'downstream' from these routines. I don't understand why these
>> are in the trace.
>
> I am also confused. But this issue can be reproduced easily by letting more
> than one thread write to `/proc/sys/vm/nr_hugepages`. With this patch applied,
> the issue can not be reproduced and disappears.
There certainly is an issue here as one thread can modify data in another.
However, I am having a hard time seeing what causes the 'kernel NULL pointer
dereference'.
I tried to reproduce the issue myself but was unsuccessful. I have 16 threads
writing to /proc/sys/vm/nr_hugepages in an infinite loop. After several hours
running, I did not hit the issue. Just curious, what architecture is the
system? any special config or compiler options?
If you can easily reproduce, can you post the detailed oops message?
The 'NULL pointer' seems strange because after the first assignment to
table->data the value should never be NULL. Certainly it can be modified
by another thread, but I can not see how it can be NULL. At the beginning
of __do_proc_doulongvec_minmax, there is a check for NULL pointer with:
if (!data || !table->maxlen || !*lenp || (*ppos && !write)) {
*lenp = 0;
return 0;
}
I looked at the code my compiler produced for __do_proc_doulongvec_minmax.
It appears to use the same value/register for the pointer throughout the
routine. IOW, I do not see how the pointer can be NULL for the assignment
when the routine does:
*i = val;
Again, your analysis/patch points out a real issue. I just want to get
a better understanding to make sure there is not another issue causing
the NULL pointer dereference.
--
Mike Kravetz