Re: [PATCH v2 4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

From: Baoquan He
Date: Fri Jul 24 2020 - 11:00:14 EST

On 07/23/20 at 11:21am, Mike Kravetz wrote:
> On 7/23/20 2:11 AM, Baoquan He wrote:
> > On 07/23/20 at 11:46am, Anshuman Khandual wrote:
> >>
> >>
> >> On 07/23/2020 08:52 AM, Baoquan He wrote:
> >>> A customer complained that no message is logged wh en the number of
> >>> persistent huge pages is not changed to the exact value written to
> >>> the sysfs or proc nr_hugepages file.
> >>>
> >>> In the current code, a best effort is made to satisfy requests made
> >>> via the nr_hugepages file. However, requests may be only partially
> >>> satisfied.
> >>>
> >>> Log a message if the code was unsuccessful in fully satisfying a
> >>> request. This includes both increasing and decreasing the number
> >>> of persistent huge pages.
> >>
> >> But is kernel expected to warn for all such situations where the user
> >> requested resources could not be allocated completely ? Otherwise, it
> >> does not make sense to add an warning for just one such situation.
> >
> > It's not for just one such situation, we have already had one to warn
> > out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().
> Those are a little different in that they are warnings based on kernel
> command line parameters.
> > As Mike said, in one time of persistent huge page number setting,
> > comparing the old value with the new vlaue is good enough for customer
> > to get the information. However, if customer want to detect and analyze
> > previous setting failure, logging message will be helpful. So I think
> > logging the failure or partial success makes sense.
> I can understand the argument against adding a new warning for this.
> You could even argue that this condition has existed since the time
> hugetlb was added to the kernel which was long ago. And, nobody has
> complained enough to add a warning. I have even heard of a sysadmin
> practice of asking for a ridiculously large amount of hugetlb pages
> just so that the kernel will allocate as many as possible. They do
> not 'expect' to get the ridiculous amount they asked for. In such
> cases, this will be a new warning in their log.
> As mentioned in a previous e-mail, when one sets nr_hugepages by writing
> to the sysfs or proc file, one needs to read the file to determine if the
> number of requested pages were actually allocated. Anyone who does not
> do this is just asking for trouble. Yet, I imagine that it may happen.
> To be honest, I do not see this log message as something that would be
> helpful to end users. Rather, I could see this as being useful to support
> people. Support always asks for system logs and this could point out a
> possible issue with hugetlb usage.
> I do not feel strongly one way or another about adding the warning. Since
> it is fairly trivial and could help diagnose issues I am in favor of adding
> it. If people feel strongly that it should not be added, I am open to
> those arguments.

Seems it's all done, and very fair. I appreciate your understanding on
this issue. Will see if any strong concern is raised on the log adding.