Re: [PATCH] numa: fix /proc/<pid>/numa_maps on s390
From: Andrew Morton
Date: Thu Jan 21 2016 - 14:19:27 EST
On Thu, 21 Jan 2016 15:57:32 +0100 Gerald Schaefer <gerald.schaefer@xxxxxxxxxx> wrote:
> > --- a/fs/proc/task_mmu.c~numa-fix-proc-pid-numa_maps-on-s390-fix
> > +++ a/fs/proc/task_mmu.c
> > @@ -1523,6 +1523,7 @@ static int gather_pte_stats(pmd_t *pmd,
> > pte_t *pte;
> >
> > ptl = pmd_trans_huge_lock(pmd, vma);
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > if (ptl) {
> > pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
> > struct page *page;
> > @@ -1534,6 +1535,7 @@ static int gather_pte_stats(pmd_t *pmd,
> > spin_unlock(ptl);
> > return 0;
> > }
> > +#endif
> >
> > if (pmd_trans_unstable(pmd))
> > return 0;
>
> Hi Andrew,
>
> Unfortunately this seems to be a lot more complicated than we thought.
> huge_ptep_get() is only defined when CONFIG_HUGETLB_PAGE=y, independent
> from CONFIG_TRANSPARENT_HUGEPAGE. This is because asm/hugetlb.h is only
> included from linux/hugetlb.h when CONFIG_HUGETLB_PAGE=y.
>
> So this fix won't fix the build error when CONFIG_HUGETLBFS=n and
> CONFIG_TRANSPARENT_HUGEPAGE=y.
>
> Since the THP code did not repeat the flaws of the hugtelbfs code, i.e.
> it is actually working on PMD entries and not PTE entries, there was
> no need for huge_ptep_get() for THP so far.
>
> Now it seems that the THP code in gather_pte_stats() is an exception to
> this, as it is not working on a PMD like the rest of the THP code, but
> also on a fake "PTE" like the hugetlbfs code.
>
> I guess this needs more thinking, two options are crossing my mind:
> - Fix the THP code in gather_pte_stats() to properly use a PMD instead of
> PTE. This would probably require something like a "_pmd" version of
> "can_gather_numa_stats()" and a pmd_dirty() check for the
> gather_stats() parameter.
> - Make huge_ptep_get() also available for CONFIG_HUGETLBFS=n, perhaps
> by introducing something like HAVE_ARCH_HUGE_PTEP_GET and implementing
> the default NOP version in linux/hugetlbfs.h instead of the individual
> asm/hugetlbfs.h files for all archs.
>
> The first option seems more correct, but it might entail other problems.
> The second option would also introduce new problems on s390, where the
> implementation of huge_ptep_get() in arch/s390/mm/hugetlbpage.c is currently
> only built with CONFIG_HUGETLBFS=y, but I guess we could handle that.
>
> Any thoughts / more ideas?
>
The first option does of course sound better.
But you need numa_maps fixed, presumably in 4.5 and possibly backported
into -stable? (The changelog doesn't describe the end-user-visible
effects of the bug. Naughty changelog!)
So is there some minimal thing we can do for now to get things working
properly and fix it for-real in 4.6?