Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

From: Christian König
Date: Mon Aug 15 2022 - 06:51:16 EST


Am 15.08.22 um 12:47 schrieb Dmitry Osipenko:
On 8/15/22 13:18, Dmitry Osipenko wrote:
On 8/15/22 13:14, Christian König wrote:
Am 15.08.22 um 12:11 schrieb Christian König:
Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
On 8/15/22 13:05, Christian König wrote:
Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
Higher order pages allocated using alloc_pages() aren't refcounted and
they
need to be refcounted, otherwise it's impossible to map them by
KVM. This
patch sets the refcount of the tail pages and fixes the KVM memory
mapping
faults.

Without this change guest virgl driver can't map host buffers into
guest
and can't provide OpenGL 4.5 profile support to the guest. The host
mappings are also needed for enabling the Venus driver using host GPU
drivers that are utilizing TTM.

Based on a patch proposed by Trigger Huang.
Well I can't count how often I have repeated this: This is an
absolutely
clear NAK!

TTM pages are not reference counted in the first place and because of
this giving them to virgl is illegal.
A? The first page is refcounted when allocated, the tail pages are not.
No they aren't. The first page is just by coincident initialized with
a refcount of 1. This refcount is completely ignored and not used at all.

Incrementing the reference count and by this mapping the page into
some other address space is illegal and corrupts the internal state
tracking of TTM.
See this comment in the source code as well:

        /* Don't set the __GFP_COMP flag for higher order allocations.
         * Mapping pages directly into an userspace process and calling
         * put_page() on a TTM allocated page is illegal.
         */

I have absolutely no idea how somebody had the idea he could do this.
I saw this comment, but it doesn't make sense because it doesn't explain
why it's illegal. Hence it looks like a bogus comment since the
refcouting certainly works, at least to a some degree because I haven't
noticed any problems in practice, maybe by luck :)

I'll try to dig out the older discussions, thank you for the quick reply!
Are you sure it was really discussed in public previously? All I can
find is yours two answers to a similar patches where you're saying that
this it's a wrong solution without in-depth explanation and further
discussions.

Yeah, that's my problem as well I can't find that of hand.

But yes it certainly was discussed in public.


Maybe it was discussed privately? In this case I will be happy to get
more info from you about the root of the problem so I could start to
look at how to fix it properly. It's not apparent where the problem is
to a TTM newbie like me.


Well this is completely unfixable. See the whole purpose of TTM is to allow tracing where what is mapped of a buffer object.

If you circumvent that and increase the page reference yourself than that whole functionality can't work correctly any more.

Regards,
Christian.