Re: [PATCH] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common()
From: Mike Kravetz
Date: Tue Feb 19 2019 - 18:45:41 EST
On 2/18/19 1:27 AM, Michal Hocko wrote:
> On Sat 16-02-19 21:31:12, Jingxiangfeng wrote:
>> From: Jing Xiangfeng <jingxiangfeng@xxxxxxxxxx>
>>
>> We can use the following command to dynamically allocate huge pages:
>> echo NR_HUGEPAGES > /proc/sys/vm/nr_hugepages
>> The count in __nr_hugepages_store_common() is parsed from /proc/sys/vm/nr_hugepages,
>> The maximum number of count is ULONG_MAX,
>> the operation 'count += h->nr_huge_pages - h->nr_huge_pages_node[nid]' overflow and count will be a wrong number.
>
> Could you be more specific of what is the runtime effect on the
> overflow? I haven't checked closer but I would assume that we will
> simply shrink the pool size because count will become a small number.
>
Well, the first thing to note is that this code only applies to case where
someone is changing a node specific hugetlb count. i.e.
/sys/devices/system/node/node1/hugepages/hugepages-2048kB
In this case, the calculated value of count is a maximum or minimum total
number of huge pages. However, only the number of huge pages on the specified
node is adjusted to try and achieve this maximum or minimum.
So, in the case of overflow the number of huge pages on the specified node
could be reduced. I say 'could' because it really is dependent on previous
values. In some situations the node specific value will be increased.
Minor point is that the description in the commit message does not match
the code changed.
> Is there any reason to report an error in that case? We do not report
> errors when we cannot allocate the requested number of huge pages so why
> is this case any different?
Another issue to consider is that h->nr_huge_pages is an unsigned long,
and h->nr_huge_pages_node[] is an unsigned int. The sysfs store routines
treat them both as unsigned longs. Ideally, the store routines should
distinguish between the two.
In reality, an actual overflow is unlikely. If my math is correct (not
likely) it would take something like 8 Petabytes to overflow the node specific
counts.
In the case of a user entering a crazy high value and causing an overflow,
an error return might not be out of line. Another option would be to simply
set count to ULONG_MAX if we detect overflow (or UINT_MAX if we are paranoid)
and continue on. This may be more in line with user's intention of allocating
as many huge pages as possible.
Thoughts?
--
Mike Kravetz