Re: Unsigned widening casts of binary "not" operations..

From: Linus Torvalds
Date: Mon Apr 22 2013 - 20:23:13 EST


[ Ugh, resending because I had mistakenly set the html bit and all the
mailing lists just refused the original... ]

So I was playing with sparse again, because the MIPS people had hit a
bug with the fact that they had made PAGE_MASK an *unsigned* type, and
then doing the ~ (binary not) operation on it does the wrong thing
when you operate on bigger types, like hardware 36-bit physical
addresses.

See commit 3b5e50edaf50 (and commit c17a6554 that it reverts).

So the issue is that let's say that you have a constant (or variable,
for that matter) that is of type "unsigned int", and then you use that
to mask a variable of a larger size (say it's an "unsigned long" on a
64-bit arch, or it's a phys_addr_t on a 32-bit arch with PAE). So the
code looks basically like a variation of something like this:

u64 value = ...;

value &= ~bitmask;

What happens?

What happens is that the "~bitmask" is done in the *narrower* type,
and then - because the narrower type is unsigned - the cast to the
wider type is done as an *unsigned* cast, so what you *think* happens
is that it clears the bits that are set in "bitmask", but what
*actually* happens is that yes, you clear the bits that are set in
:bitmask", but you *also* clear the upper bits of value.

Now, why am I posting about this MIPS-specific small detail on to x86 people?

Because I hacked up a sparse patch that looks for the pattern of
unsigned casts of a (bit) negation to a wider type, and there's a fair
number of them.

Now, it may well be that my sparse hack (and it really is a hack) is a
broken piece of crap, but from a quick look the warnings it gives look
reasonable.

And there's quite a lot of them. Even in my (fairly small) config I
use on my desktop. And the first warnings I see are in x86 code:

arch/x86/kernel/traps.c:405:16: warning: implicit unsigned
widening cast of a '~' expression
arch/x86/kernel/cpu/perf_event_p4.c:912:15: warning: implicit
unsigned widening cast of a '~' expression

that particular one is something that has an *explicit* cast to "u64",
and then it does binary 'and' operations with these things that are of
a 32-bit unsigned type with a binary not in front of them. IOW, the
types in that expression are *very* confused.

Here's a ext4 code snippet that looks like an actual bug (but seems to
only hit read-ahead):

ext4_fsblk_t b, block;

b = block & ~(EXT4_SB(sb)->s_inode_readahead_blks-1);

where "b" actually ends up having the upper bits cleared, because the
s_inode_readahead_blks thing is an unsigned int, so you're masking off
not just the low bits, but the high bits too. Ted? Of course, it's
just read-ahead, so it probably doesn't matter, but.

We have a number of generic code examples where this kind of thing
seems harmless:

*vm_flags &= ~VM_MERGEABLE;

(we only have flags in the low 32 bits, and the only reason we get
warnings for VM_MERGEABLE is because 0x80000000 is an implicitly
unsigned constant, while 0x40000000 is *not*), or

kernel/trace/trace.c:2910:32: warning: implicit unsigned widening
cast of a '~' expression

where "trace_flags" is "unsigned long" and "mask" is "unsigned int",
and the expression

trace_flags &= ~mask;

actually clears the upper 32 bits too, but presumably they are always
clear anyway so we don't *really care. But it's an example of code
that is potentially very subtly dangerous.

Now, a lot of the other warnings I get seem to be more benign - the
networking code results in a lot of these because of code like this:

#define NLMSG_ALIGNTO 4U
#define NLMSG_ALIGN(len) ( ((len)+NLMSG_ALIGNTO-1) & ~(NLMSG_ALIGNTO-1) )

which technically has the same problem if "len" is 64-bit (which
sizeof() is on x86-64), but we don't really *care* because it turns
out that the sizeof values will always have the high bits clear
*anyway*. So I'm not sure the hacky sparse warning is useful, because
my code isn't smart enough to figure out when this kind of widening
cast is a problem, and when it isn't.

That said, I'm cc'ing David and netdev too, just in case. There is
likely some *reason* why it uses an "unsigned int" for a constant that
is then commonly expanded with a binary "not" and the upper bits end
up being surprising. So this thing doesn't look like a bug, but it
does cause these warnings:

net/netlink/af_netlink.c:1889:38: warning: implicit unsigned
widening cast of a '~' expression

and maybe the networking people care about this and maybe they don't.

(It turns out that any use of "UINT_MAX" for a "long" value also
results in this warning, because we define it as "~0ul", so there are
other cases of spurious things where we *intentionally* drop the high
bits).

ANYWAY.

Only a few of the warnings looked like they might be bugs, but I'm
attaching my sparse patch here in case somebody wants to play with the
hacky thing..

Linus

Attachment: sparse.diff
Description: Binary data