Re: CPA patchset

From: Ingo Molnar
Date: Fri Jan 11 2008 - 02:20:34 EST



* Andi Kleen <ak@xxxxxxx> wrote:

> > > > but that's not too smart: why dont they use WB plus cflush
> > > > instead?
> > >
> > > Because they need to access it WC for performance.
> >
> > I think you have it fundamentally backwards: the best for
> > performance is WB + cflush. What would WC offer for performance that
> > cflush cannot do?
>
> Cached requires the cache line to be read first before you can write
> it.

nonsense, and you should know it. It is perfectly possible to construct
fully written cachelines, without reading the cacheline first. MOVDQ is
SSE1 so on basically in every CPU today - and it is 16 byte aligned and
can generate full cacheline writes, _without_ filling in the cacheline
first. Bulk ops (string ops, etc.) will do full cacheline writes too,
without filling in the cacheline. Especially with high performance 3D
ops we do _NOT_ need any funky reads from anywhere because 3D software
can stream a lot of writes out: we construct a full frame or a portion
of a frame, or upload vertices or shader scripts, textures, etc.

( also, _even_ when there is a cache fill pending on for a partially
written cacheline, that might go on in parallel and it is not
necessarily holding up the CPU unless it has an actual data dependency
on that. )

but that's totally besides the point anyway. WC or WB accesses, if a 3D
app or a driver does high-freq change_page_attr() calls, it will _lose_
the performance game:

> > also, it's irrelevant to change_page_attr() call frequency. Just map
> > in everything from the card and use it. In graphics, if you remap
> > anything on the fly and it's not a slowpath you've lost the
> > performance game even before you began it.
>
> The typical case would be lots of user space DRI clients supplying
> their own buffers on the fly. There's not really a fixed pool in this
> case, but it all varies dynamically. In some scenarios that could
> happen quite often.

in what scenarios? Please give me in-tree examples of such high-freq
change_page_attr() cases, where the driver authors would like to call it
with high frequency but are unable to do it and see performance problems
due to the WBINVD.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/