Re: Interacting with coherent memory on external devices

From: Jerome Glisse
Date: Fri Apr 24 2015 - 12:01:03 EST


On Fri, Apr 24, 2015 at 09:30:40AM -0500, Christoph Lameter wrote:
> On Thu, 23 Apr 2015, Paul E. McKenney wrote:
>
> > If by "entire industry" you mean everyone who might want to use hardware
> > acceleration, for example, including mechanical computer-aided design,
> > I am skeptical.
>
> The industry designs GPUs with super fast special ram and accellerators
> with special ram designed to do fast searches and you think you can demand page
> that stuff in from the main processor?
>

Why do you think AMD and NVidia are adding page fault support to their GPU
in the first place ? They are not doing this on a whim, they have carefully
thought about that.

Are you saying you know better than the 2 biggest GPU designer on the planet ?
And who do you think is pushing for such thing in the kernel ? Do you think
we are working on this on a whim ? Because we woke up one day and thought that
it would be cool and that it should be done this way ?


Yes if all your GPU do is pagefault it will be disastrous, but is this the
usual thing we see on CPU ? No ! Are people complaining about the numerous
page fault that happens over a day ? No, the vast majority of user are
completely oblivious to page fault. This is how it works on CPU and yes this
can work for GPU too. What happens on CPU ? Well CPU can switch to work on
a different thread or a different application altogether. The same thing will
happen on the GPU. If you have enough jobs, your GPU will be busy and you
will never worry about page fault because overall your GPU will deliver the
same kind of throughput as if there was no pagefault. It can very well be
buried into the overall noise if the ratio of available runnable thread
versus page faulting thread is high enough. Which is most of the time the
case for the CPU, why would the same assumption not work on the GPU ?

Note that i am not dismissing low latency folks, i know they exist, i know
they hate page fault and in no way what we propose will make it worse for
them. They will be able to keep the same kind of control they cherish but
this does not mean you should go on a holy crusade to pretend that other
people workload does not exist. They do exist. Page fault is not evil and
it has prove usefull to the whole computer industry for CPU.


To be sure you are not misinterpretting what we propose, in no way we say
we gonna migrate thing on page fault for everyone. We are saying first
the device driver decide where thing need to be (system memory or local
memory) device driver can get hint/request from userspace for this (as they
do today). So no change whatsoever here, people that hand tune things will
keep being able to do so.

Now we want to add the case where device driver do not get any kind of
directive or hint from userspace. So what autonuma is, simply collect
informations from the GPU on what is access often and then migrate this
transparently (yes this can happen without interruption to GPU). So you
are migrating from a memory that has 16GB/s or 32GB/s bandwidth to the
device memory that have 500GB/s.

This is a valid usecase, they are many people outthere that do not want
to learn about hand tuning there application for the GPU but they could
nonetheless benefit from it.

Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/