Re: [PATCH] drm/mgag200: Flush the cache to improve latency

From: Daniel Vetter
Date: Tue Feb 06 2024 - 08:30:49 EST


On Mon, Feb 05, 2024 at 05:02:58PM +1000, Dave Airlie wrote:
> On Mon, 6 Nov 2023 at 20:47, Jocelyn Falempe <jfalempe@xxxxxxxxxx> wrote:
> >
> > On 23/10/2023 10:30, Jocelyn Falempe wrote:
> > > On 20/10/2023 14:06, Thomas Zimmermann wrote:
> > >> (cc'ing lkml for feedback)
> > >>
> > >> Hi Jocelyn
> > >>
> > >> Am 19.10.23 um 15:55 schrieb Jocelyn Falempe:
> > >>> We found a regression in v5.10 on real-time server, using the
> > >>> rt-kernel and the mgag200 driver. It's some really specialized
> > >>> workload, with <10us latency expectation on isolated core.
> > >>> After the v5.10, the real time tasks missed their <10us latency
> > >>> when something prints on the screen (fbcon or printk)
> > >>
> > >> I'd like to hear the opinion of the RT-devs on this patch. Because
> > >> AFAIK we never did such a workaround in other drivers. And AFAIK
> > >> printk is a PITA anyway.
> > >
> > > Most other drivers uses DMA, which means this workaround can't apply to
> > > them.
> > >
> > >>
> > >> IMHO if that RT system cannot handle differences in framebuffer
> > >> caching, it's under-powered. It's just a matter of time until
> > >> something else changes and the problem returns. And (honest question)
> > >> as it's an x86-64, how do they handle System Management Mode?
> > >
> > > I think it's not a big news, that the Matrox G200 from 1999 is
> > > under-powered.
> > > I was also a bit surprised that flushing the cache would have such
> > > effect on latency. The tests we are doing can run 24h with the
> > > workaround, without any interrupt taking more than 10us. Without the
> > > workaround, every ~30s the interrupt failed its 10us target.
> > >
> > >>
> > >>>
> > >>> The regression has been bisected to 2 commits:
> > >>> 0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages")
> > >>> 4862ffaec523 ("drm/mgag200: Move vmap out of commit tail")
> > >>>
> > >>> The first one changed the system memory framebuffer from Write-Combine
> > >>> to the default caching.
> > >>> Before the second commit, the mgag200 driver used to unmap the
> > >>> framebuffer after each frame, which implicitly does a cache flush.
> > >>> Both regressions are fixed by the following patch, which forces a
> > >>> cache flush after each frame, reverting to almost v5.9 behavior.
> > >>
> > >> With that second commit, we essentially never unmap an active
> > >> framebuffer console. But with commit
> > >>
> > >> 359c6649cd9a ("drm/gem: Implement shadow-plane {begin, end}_fb_access
> > >> with vmap")
> > >>
> > >> we now again unmap the console framebuffer after the pageflip happened.
> > >>
> > >> So how does the latest kernel behave wrt to the problem?
> > >
> > > The regression was found when upgrading the server from v5.4 to v5.14,
> > > so we didn't test with later kernels.
> > > We will test with v6.3 (which should have 359c6649cd9a ) and see what it
> > > gives.
> >
> > I don't have a clear explanation, but testing with v6.3, and forcing the
> > Write Combine, doesn't fix the latency issue. So forcing the cache flush
> > is still needed.
> >
> > Also, on some systems, they use "isolated cpu" to handle RT task, but
> > with a standard kernel (so without the CONFIG_PREEMPT_RT).
> > So I'm wondering if we can use a kernel module parameter for this,
> > so that users that wants to achieve low latency, can opt-in ?
> >
> > something like mgag200.force_cache_flush=1 or mgag200.low_latency=1 ?
>
> I think we should either add a config option or command line parameter here.

Yeah I think the situation is cursed enough that we need that, and module
option for mgag200 sounds like the most reasonable way.

> I'd don't think adding nopat to the kernel command line is a good
> suggestion in the long run, servers often have other cards plugged
> into them like nvidia gpus or rdma etc, you don't want to cripple them
> because you want reduced latency on the crappy on-board.
>
> I'd rather we put the default back to what it used to be, which was
> flush the cache though, I'm not sure why we have any objection to
> doing that, it used to work, it was clearly fine in operation, why
> undo it?

Uh that default is a few patches back and I think it's not great if we
change that, since it means all drivers using shadow buffers for fdbev
will again suffer from rendering fbcon into a wc instead of wb buffer.

But Jocelyn has meanwhile dug out more info in another reply, I'll reply
there.

Cheers, Sima
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch