Re: NUMA x86: add constraints check for nid parameters

From: Petr Holasek
Date: Fri Dec 02 2011 - 05:55:49 EST


On Thu, 01 Dec 2011, Andrew Morton wrote:
>
> On Fri, 2 Dec 2011 00:14:42 +0100
> Petr Holasek <pholasek@xxxxxxxxxx> wrote:
>
> > On Thu, 01 Dec 2011, Andrew Morton wrote:
> >
> > > On Thu, 1 Dec 2011 12:45:07 +0100
> > > Petr Holasek <pholasek@xxxxxxxxxx> wrote:
> > >
> > > > This patch adds constraints checks into __node_distance() and
> > > > numa_set_distance() functions. If from or to parameters are
> > > > lower than zero, it results into oops now.
> > >
> > > Passing negative numbers into __node_distance() sounds like a bug in
> > > the caller, and this patch will remove our means of detecting that bug.
> >
> > That's true, but upper boundary is checked now, so why not to check lower?
>
> Because it adds more code to the kernel and can hide bugs?
>
> > Seems inconsistent to me - from this point of view even don't check anything
> > would be better for detecting bug in the caller.
> >
> > >
> > > Perhaps we need to be told more about this patch. Is the bug
> > > user-triggerable? If so, how? How was this fault triggered?
> > > Etcetera.
> > >
> >
> > AFAIK, neither __node_distance() nor numa_set_distance() aren't in any
> > path from user-space inputs. Their paramaters are based on ACPI tables
> > provided by HW vendors.
>
> That didn't answer my questions. Have you observed any problems in
> this code? If so, please fully describe them. Or was it purely from
> code inspection?

Code inspection only, nothing from real life.

>
> If what we're doing here is to be defensive against buggy BIOS tables
> (a good idea) then we should validate the BIOS table values as close as
> possible to the point where they were read frmo the BIOS. And we should
> (probably) emit a warning if a bad table entry is detected, rather than
> silently fixing it up.

numa_set_distance() does exactly what you described above, only emits a
warning. I agree with your objections with __node_distance() checks, it
really can hide bugs in caller. So silent fix-up is the main problem and
we shouldn't check anything so the caller will be advised when using
wrong nid by oops with a benefit of less code for us. Do I understand your
opinion on this type of code?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/