Re: [PATCH] mm: clear 1G pages with streaming stores on x86

From: Arvind Sankar
Date: Wed Mar 11 2020 - 14:32:48 EST


On Wed, Mar 11, 2020 at 11:16:07AM +0300, Kirill A. Shutemov wrote:
> On Tue, Mar 10, 2020 at 11:35:54PM -0400, Arvind Sankar wrote:
> >
> > The rationale for MOVNTI instruction is supposed to be that it avoids
> > cache pollution. Aside from the bench that shows MOVNTI to be faster for
> > the move itself, shouldn't it have an additional benefit in not trashing
> > the CPU caches?
> >
> > As string instructions improve, why wouldn't the same improvements be
> > applied to MOVNTI?
>
> String instructions inherently more flexible. Implementation can choose
> caching strategy depending on the operation size (cx) and other factors.
> Like if operation is large enough and cache is full of dirty cache lines
> that expensive to free up, it can choose to bypass cache. MOVNTI is more
> strict on semantics and more opaque to CPU.

But with today's processors, wouldn't writing 1G via the string
operations empty out almost the whole cache? Or are there already
optimizations to prevent one thread from hogging the L3?

If we do want to just use the string operations, it seems like the
clear_page routines should just call memset instead of duplicating it.

>
> And more importantly string instructions, unlike MOVNTI, is something that
> generated often by compiler and used in standard libraries a lot. It is
> and will be focus of optimization of CPU architects.
>
> --
> Kirill A. Shutemov