RE: [PATCH 00/83] AMD HSA kernel driver

From: Bridgman, John
Date: Tue Jul 15 2014 - 13:07:14 EST

>-----Original Message-----
>From: Dave Airlie [mailto:airlied@xxxxxxxxx]
>Sent: Tuesday, July 15, 2014 12:35 AM
>To: Christian KÃnig
>Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>kernel@xxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; Deucher,
>Alexander; akpm@xxxxxxxxxxxxxxxxxxxx
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>On 14 July 2014 18:37, Christian KÃnig <deathsimple@xxxxxxxxxxx> wrote:
>>> I vote for HSA module that expose ioctl and is an intermediary with
>>> the kernel driver that handle the hardware. This gives a single point
>>> for HSA hardware and yes this enforce things for any hardware
>>> I am more than happy to tell them that this is it and nothing else if
>>> they want to get upstream.
>> I think we should still discuss this single point of entry a bit more.
>> Just to make it clear the plan is to expose all physical HSA capable
>> devices through a single /dev/hsa device node to userspace.
>This is why we don't design kernel interfaces in secret foundations, and
>expect anyone to like them.

Understood and agree. In this case though this isn't a cross-vendor interface designed by a secret committee, it's supposed to be more of an inoffensive little single-vendor interface designed *for* a secret committee. I'm hoping that's better ;)

>So before we go any further, how is this stuff planned to work for multiple

Three classes of "multiple" :

1. Single CPU with IOMMUv2 and multiple GPUs:

- all devices accessible via /dev/kfd
- topology information identifies CPU + GPUs, each has "node ID" at top of userspace API, "global ID" at user/kernel interface
(don't think we've implemented CPU part yet though)
- userspace builds snapshot from sysfs info & exposes to HSAIL runtime, which in turn exposes the "standard" API
- kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast for APU, relatively less so for dGPU over PCIE)
- to-be-added memory operations allow allocation & residency control (within existing gfx driver limits) of buffers in VRAM & carved-out system RAM
- queue operations specify a node ID to userspace library, which translates to "global ID" before calling kfd

2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or more GPUs:

- topology information exposes CPUs & GPUs, along with affinity info showing what is connected to what
- everything else works as in (1) above

3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or more GPUs

- no attempt to cover this with HSA topology, each CPU and associated GPUs is accessed independently via separate /dev/kfd instances

>Do we have a userspace to exercise this interface so we can see how such a
>thing would look?

Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for final approval this week

There's a separate test harness to exercise the userspace lib calls, haven't started IP review or sanitizing for that but legal stuff is done

N‹§²æ¸›yú²X¬¶ÇvØ–)Þ{.nlj·¥Š{±‘êX§¶›¡Ü}©ž²ÆzÚj:+v‰¨¾«‘êZ+€Êzf£¢·hšˆ§~†­†Ûÿû®w¥¢¸?™¨è&¢)ßf”ùy§m…á«a¶Úÿ 0¶ìå