Re: CPA patchset

From: Andi Kleen
Date: Fri Jan 11 2008 - 06:26:22 EST


> It is perfectly possible to construct
> fully written cachelines, without reading the cacheline first. MOVDQ is

If you write a aligned full 64 (or 128) byte area and even then you can
have occassional reads which can be either painfully slow or even incorrect.

> but that's totally besides the point anyway. WC or WB accesses, if a 3D
> app or a driver does high-freq change_page_attr() calls, it will _lose_
> the performance game:

Yes, high frequency as in doing it in fast paths is not a good idea, but
reasonably low frequency (as in acceptable process exit latencies for
example) are something to aim for. Right now with WBINVD and other problems
it is too slow.

> > > in everything from the card and use it. In graphics, if you remap
> > > anything on the fly and it's not a slowpath you've lost the
> > > performance game even before you began it.
> >
> > The typical case would be lots of user space DRI clients supplying
> > their own buffers on the fly. There's not really a fixed pool in this
> > case, but it all varies dynamically. In some scenarios that could
> > happen quite often.
>
> in what scenarios? Please give me in-tree examples of such high-freq
> change_page_attr() cases, where the driver authors would like to call it
> with high frequency but are unable to do it and see performance problems
> due to the WBINVD.

Some workloads do regular mapping into the GART aperture, but it is
not too critical yet.

But it is not too widely used because it is too slow; but i've got
requests from various parties over the years for more efficient c_p_a().
It's a chicken'n'egg problem -- you're asking for users but the users
don't use it yet because it's too slow.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/