Re: fd3e45436660 ("ACPI / NUMA: ia64: Parse all entries of SRAT memory affinity table")

From: Michal Hocko
Date: Thu May 10 2018 - 04:22:40 EST


On Thu 10-05-18 13:36:11, Ganapatrao Kulkarni wrote:
> On Thu, May 10, 2018 at 1:00 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > On Thu 10-05-18 08:27:35, Ganapatrao Kulkarni wrote:
> >> On Wed, May 9, 2018 at 6:26 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >> > On Wed 09-05-18 18:07:16, Ganapatrao Kulkarni wrote:
> >> >> Hi Michal
> >> >>
> >> >>
> >> >> On Wed, May 9, 2018 at 5:54 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >> >> > On Wed 11-04-18 12:48:32, Michal Hocko wrote:
> >> >> >> Hi,
> >> >> >> my attention was brought to the %subj commit and either I am missing
> >> >> >> something or the patch is quite dubious. What is it actually trying to
> >> >> >> fix? If a BIOS/FW provides more memblocks than the limit then we would
> >> >> >> get misleading numa topology (numactl -H output) but is the situation
> >> >> >> much better with it applied? Numa init code will refuse to init more
> >> >> >> memblocks than the limit and falls back to dummy_numa_init (AFAICS)
> >> >> >> which will break the topology again and numactl -H will have a
> >> >> >> misleading output anyway.
> >> >>
> >> >> IIRC, the MEMBLOCK beyond max limit getting dropped from visible
> >> >> memory(partial drop from a node).
> >> >> this patch removed any upper limit on memblocks and allowed to parse
> >> >> all entries of SRAT.
> >> >
> >> > Yeah I've understood that much. My question is, however, why do we care
> >> > about parsing the NUMA topology when we fallback into a single NUMA node
> >> > anyway? Or do I misunderstand the code? I do not have any platform with
> >> > that many memblocks.
> >>
> >> IMHO, this fix is very much logical by removing the SRAT parsing restriction.
> >> below is the crash log which made us to debug and eventually fix with
> >> this patch.
> >
> > Ohh, I am not saying that the current code handles too many memblocks
> > correctly. I just think that your fix is not correct or incomplete at
> > least. Assuming that my understanding is correct which you haven't
> > disputed yet. So can we focus on the proper solution now? Do we actually
> > need the memblock restrictions? We do not need those for reserved
> > memblocks so I do not see any real reason to simply remove the
> > restriction altogether. Have you explored that option?
>
> my logic was simple, when i added this patch, when the cap on max
> memblocks is arch specific, why to restrict SRAT parsing which is not
> arch specific.

Because there are other parts which are still restricted and that we
want to have all parts in sync.

> other way around argument is, why the restriction added in the first
> place itself!

Exactly. So have you explored that path?
--
Michal Hocko
SUSE Labs