Re: [RFC 0/6] drm/fences: add in-fences to DRM

From: Daniel Stone
Date: Fri Mar 25 2016 - 08:10:56 EST


Hi all,

On 25 March 2016 at 11:58, Rob Clark <robdclark@xxxxxxxxx> wrote:
> On Thu, Mar 24, 2016 at 7:49 PM, Inki Dae <inki.dae@xxxxxxxxxxx> wrote:
>> It's definitely different case. This tries to add new user-space interfaces to expose fences to user-space. At least, implicit interfaces are embedded into drivers.
>> So I'd like to give you a question. Why exposing fences to user-space is required? To provide easy-to-debug solution to rendering pipleline? To provide merge fence feature?
>
> Well, implicit sync and explicit sync are two different cases.
> Implicit sync ofc remains the default, but userspace could opt-in to
> explicit sync instead. For example, on the gpu side of things,
> depending on flags userspace passes in to the submit ioctl we would
> either attach the fence to all the written buffers (implicit) or
> return it as a fence fd to userspace (explicit), which userspace could
> then pass in to atomic ioctl to synchronize pageflip.
>
> And visa-versa, we can pass the pageflip (atomic) completion fence
> back in to gpu so it doesn't start rendering the next frame until the
> buffer is off screen.
>
> fwiw, currently android is the first user of explicit sync (although I
> expect wayland/weston to follow suit).

Second, really. Vulkan avoids implicit sync entirely, and exposes
fence-like primitives throughout its whole API. These include being
able to pass prerequisite fences for display (what Gustavo is adding
here: something to block on before display), and also when the user
acquires a buffer as a render target, it is given another prerequisite
fence (the other side of what Gustavo is suggesting, i.e. the fence
triggers when the buffer is no longer displayed and becomes available
for rendering).

In order to implement this correctly, and avoid performance bubbles,
we need a primitive like this exposed through the KMS API, from both
sides. This is especially important when you take the case of
userspace suballocation, where userspace allocates larger blocks and
divides the allocation internally for different uses. Implicit sync
does not work at all for that case.

As stated before, there are other benefits, including much better
traceability. I would expect Wayland/Weston to also start pushing
support for this API relatively soon.

> A couple linaro folks have
> android running with an upstream kernel + mesa + atomic/kms hwc on a
> couple devices (nexus7 and db410c with freedreno, and qemu with
> virgl). But there are some limitations due to missing the
> EGL_ANDROID_native_fence_sync extension in mesa. I plan to implement
> that, but I ofc need the fence fd stuff in order to do so ;-)

Yes, having that would be a godsend for a lot of people.

>> And if we need really to expose fences to user-space and there is really real user, then we have already good candidates, DMA-BUF-IOCTL-SYNC or maybe fcntl system call because we share already DMA buffer between CPU <-> DMA and DMA <-> DMA using DMABUF.
>> For DMA-BUF-IOCTL-SYNC, I think you remember that was what I tried long time ago because you was there. Several years ago, I tried to couple exposing the fences to user-space with cache operation although at that time, I really misleaded the fence machnism. My trying was also for the potential users.
>
> Note that this is not (just) about sw sync, but also sync between
> multiple hw devices.

Sync isn't quite good enough, because it's a mandatory blocking point
for userspace. We want to push the explicit fences further down the
line, so userspace can parallelise its work.

Even if none of the above requirements held true, I don't think being
able to support Android is a bad thing. It's completely right to be
worried about pushing in Android work and APIs for the sake of it -
hence why we didn't take ADF! - but in this case it's definitely a
good thing. This is also the model that ChromeOS is moving towards, so
it becomes more important from that point of view as well.

Cheers,
Daniel