Re: [patch] x86, mm: pass in 'total' to__copy_from_user_*nocache()

From: Linus Torvalds
Date: Sat Feb 28 2009 - 12:43:07 EST




On Sat, 28 Feb 2009, Arjan van de Ven wrote:
>
> it invalidates all caches in the hierarchy

Yeah, now that I look at the intel pdf's, I see that.

> afaik this is what Intel cpus do; but I also thought this behavior was
> quite architectural as well...

Ok, I really think we should definitely not use non-temporal stores for
anything smaller than one full page in that case. In fact, I wonder if
even any of the old streaming benchmarks are even true. I thought it would
still stay in the L3, but yes, it literally seems to make the access
totally noncached and WC.

That's almost unacceptable in the long run. With a 8MB L3 cache - and a
compile sequence, do we really want to go out to memory to write the .S
file, and then have the assembler go out to memory to read it back? For a
compile, that _probably_ is all fine (the compiler in particular will have
enough data structures around that it's not going to fit in the cache
anyway), but I'm seeing leaner compilers and other cases where forcing
things out all the way on the bus is simply the wrong thing.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/