Re: [PATCH 0 of 8] x86: use PTE_MASK consistently

From: Linus Torvalds
Date: Tue May 20 2008 - 14:36:29 EST




On Tue, 20 May 2008, Jeremy Fitzhardinge wrote:

> Hugh Dickins wrote:
> > I'll leave it to you and Linus whether your way of defining PTE_MASK is
> > satisfactory as is, or needs to be improved to his way. I've not tried
> > his suggestion of doing the _PAGE_BIT definitions: certainly it's
> > seemed odd to me that they were defined with L, but I've little
> > appetite to mess around with them now myself.
>
> Yes, well, that was me too, with the intention of making ~_PAGE_FOO generate
> an appropriately sized mask. I guess it would be better to use pteval_t these
> days.

The thing is, making _PAGE_FOO be a signed long in no way guarantees any
appropriately sized masks.

Why? Because 'long' mixes with 'unsigned long' and in the C type system,
unsigned is stronger than signed.

As a result, if you depend on sign-extension of things like a binary 'not'
operation, you're always going to have cases where it simply doesn't work,
just because you happen to mix it up with other things that may have type
'unsigned long' (or, on 32-bit, 'unsigned int' which has the same size:
the 'unsigned' is so strong that it will convert to 'long' because it has
the same size.

Try this one in a 32-bit environment:

/*
* 'A' is the low bits mask - and explicitly signed so
* that you can do '~A' to get the high bits of a PTE
*/
#define A 0xfffl

/*
* Some random value that just happens to be unsigned for other
* reasons - perhaps it's a sizeof expression or whatever.
*/
#define B 5U

int main(int argc, char **argv)
{
unsigned long long pte;

pte = B | ~A;
printf("%llx\n", pte);

pte = ~A;
pte |= B;
printf("%llx\n", pte);

pte = B;
pte |= ~A;
printf("%llx\n", pte);
}

and see what it generates. It does the EXACT SAME thing in all three
cases: the expression is "B | ~A" in various guises, but they are not the
same values because the order of the type expansion matters. Note how in
one case the sign bits of the binary not did *not* percolate up to the
upper 32 bits.

So making something signed and just assuming that it means that when it is
expanded to a bigger type the bits will always be set is simply not true.
It's *often* true, but it's untrue in many trivial cases.

If you want the upper bits to behave reliably in a bigger type, then you
have to actually make the values *be* of that type.

Or do people really want to live with a kernel where

pte = B | ~A;

is different from

pte = B;
pte |= ~A;

and making the "obvious" simplification will just generate a totaly
different result?

I don't think we want to continue to rely on that kind of subtle C type
tricks, especially since we know they haven't worked (ie we had this
*exact* issue with mixing in PAGE_MASK that happened to be unsigned, and
changed all the arithmetic for the signed longs that depended on sign
extensions to bigger types).

Obviously, we could just choose to make *everything* be of a signed type,
but we know that doesn't work very well, and it interacts badly with
things like right-shifting (which we do quite a lot on the things
involved: we extract fields like the PFN with a right shift and a mask).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/