Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE

From: Ilia Mirkin
Date: Sat Apr 28 2018 - 12:30:54 EST


On Fri, Apr 27, 2018 at 9:08 AM, Michel DÃnzer <michel@xxxxxxxxxxx> wrote:
> From: Michel DÃnzer <michel.daenzer@xxxxxxx>
>
> Previously, TTM would always (with CONFIG_TRANSPARENT_HUGEPAGE enabled)
> try to allocate huge pages. However, not all drivers can take advantage
> of huge pages, but they would incur the overhead for allocating and
> freeing them anyway.
>
> Now, drivers which can take advantage of huge pages need to set the new
> flag TTM_PAGE_FLAG_TRANSHUGE to get them. Drivers not setting this flag
> no longer incur any overhead for allocating or freeing huge pages.
>
> v2:
> * Also guard swapping of consecutive pages in ttm_get_pages
> * Reword commit log, hopefully clearer now
>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Michel DÃnzer <michel.daenzer@xxxxxxx>

Both I and lots of other people, based on reports, are still seeing
plenty of issues with this as late as 4.16.4. Admittedly I'm on
nouveau, but others have reported issues with radeon/amdgpu as well.
It's been going on since the feature was merged in v4.15, with what
seems like little investigation from the authors introducing the
feature.

We now have *two* broken releases, v4.15 and v4.16 (anything that
spews error messages and stack traces ad-infinitum in dmesg is, by
definition, broken). You're putting this behind a flag now (finally),
but should it be enabled anywhere? Why is it being flipped on for
amdgpu by default, despite the still-existing problems?

Reverting this feature without just resetting back to the code in
v4.14 is painful, but why make Joe User suffer by enabling it while
you're still working out the kinks?

-ilia