Re: [PATCH] RFC: clear 1G pages with streaming stores on x86
From: Cannon Matthews
Date: Tue Jul 24 2018 - 22:50:41 EST
On Tue, Jul 24, 2018 at 1:53 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, 24 Jul 2018 13:46:39 -0700 Cannon Matthews <cannonmatthews@xxxxxxxxxx> wrote:
>
> > Reimplement clear_gigantic_page() to clear gigabytes pages using the
> > non-temporal streaming store instructions that bypass the cache
> > (movnti), since an entire 1GiB region will not fit in the cache anyway.
> >
> > ...
> >
> > Tested:
> > Time to `mlock()` a 512GiB region on broadwell CPU
> > AVG time (s) % imp. ms/page
> > clear_page_erms 133.584 - 261
> > clear_page_nt 34.154 74.43% 67
>
> A gigantic improvement!
>
> > --- a/arch/x86/include/asm/page_64.h
> > +++ b/arch/x86/include/asm/page_64.h
> > @@ -56,6 +56,9 @@ static inline void clear_page(void *page)
> >
> > void copy_page(void *to, void *from);
> >
> > +#define __HAVE_ARCH_CLEAR_GIGANTIC_PAGE
> > +void __clear_page_nt(void *page, u64 page_size);
>
> Nit: the modern way is
>
> #ifndef __clear_page_nt
> void __clear_page_nt(void *page, u64 page_size);
> #define __clear_page_nt __clear_page_nt
> #endif
>
> Not sure why, really. I guess it avoids adding two symbols and
> having to remember and maintain the relationship between them.
>
That makes sense, changed to this style. Thanks.
> > --- /dev/null
> > +++ b/arch/x86/lib/clear_gigantic_page.c
> > @@ -0,0 +1,30 @@
> > +#include <asm/page.h>
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/mm.h>
> > +#include <linux/sched.h>
> > +
> > +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS)
> > +#define PAGES_BETWEEN_RESCHED 64
> > +void clear_gigantic_page(struct page *page,
> > + unsigned long addr,
> > + unsigned int pages_per_huge_page)
> > +{
> > + int i;
> > + void *dest = page_to_virt(page);
> > + int resched_count = 0;
> > +
> > + BUG_ON(pages_per_huge_page % PAGES_BETWEEN_RESCHED != 0);
> > + BUG_ON(!dest);
> > +
> > + might_sleep();
>
> cond_resched() already does might_sleep() - it doesn't seem needed here.
ïAh gotcha, removed it. The original implementation called both, which
does seem redundant.
>
> > + for (i = 0; i < pages_per_huge_page; i += PAGES_BETWEEN_RESCHED) {
> > + __clear_page_nt(dest + (i * PAGE_SIZE),
> > + PAGES_BETWEEN_RESCHED * PAGE_SIZE);
> > + resched_count += cond_resched();
> > + }
> > + /* __clear_page_nt requrires and `sfence` barrier. */
> > + wmb();
> > + pr_debug("clear_gigantic_page: rescheduled %d times\n", resched_count);
> > +}
> > +#endif
>