Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()

From: Mike Kravetz
Date: Mon Feb 25 2019 - 11:49:36 EST


On 2/24/19 7:17 PM, David Rientjes wrote:
> On Sun, 24 Feb 2019, Mike Kravetz wrote:
>
>>> User can change a node specific hugetlb count. i.e.
>>> /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
>>> the calculated value of count is a total number of huge pages. It could
>>> be overflow when a user entering a crazy high value. If so, the total
>>> number of huge pages could be a small value which is not user expect.
>>> We can simply fix it by setting count to ULONG_MAX, then it goes on. This
>>> may be more in line with user's intention of allocating as many huge pages
>>> as possible.
>>>
>>> Signed-off-by: Jing Xiangfeng <jingxiangfeng@xxxxxxxxxx>
>>
>> Thank you.
>>
>> Acked-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>>
>>> ---
>>> mm/hugetlb.c | 7 +++++++
>>> 1 file changed, 7 insertions(+)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index afef616..6688894 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -2423,7 +2423,14 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy,
>>> * per node hstate attribute: adjust count to global,
>>> * but restrict alloc/free to the specified node.
>>> */
>>> + unsigned long old_count = count;
>>> count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
>>> + /*
>>> + * If user specified count causes overflow, set to
>>> + * largest possible value.
>>> + */
>>> + if (count < old_count)
>>> + count = ULONG_MAX;
>>> init_nodemask_of_node(nodes_allowed, nid);
>>> } else
>>> nodes_allowed = &node_states[N_MEMORY];
>>>
>
> Looks like this fixes the overflow issue, but isn't there already a
> possible underflow since we don't hold hugetlb_lock? Even if
> count == 0, what prevents h->nr_huge_pages_node[nid] being greater than
> h->nr_huge_pages here? I think the per hstate values need to be read with
> READ_ONCE() and stored on the stack to do any sane bounds checking.

Yes, without holding the lock there is the potential for issues. Looking
back to when the node specific code was added there is a comment about
"re-use/share as much of the existing global hstate attribute initialization
and handling". I suspect that is the reason for these calculations outside
the lock.

As you mention above, nr_huge_pages_node[nid] could be greater than
nr_huge_pages. This is true even if we do READ_ONCE(). So, the code would
need to test for this condition and 'fix up' values or re-read. It is just
racy without holding the lock.

If that is too ugly, then we could just add code for the node specific
adjustments. set_max_huge_pages() is only called from here. It would be
pretty easy to modify set_max_huge_pages() to take the node specific value
and do calculations/adjustments under the lock.
--
Mike Kravetz