Re: [RFC][PATCH 0/6] more detailed per-process transparenthugepage statistics

From: Andrea Arcangeli
Date: Tue Feb 01 2011 - 19:08:12 EST


On Tue, Feb 01, 2011 at 12:56:41PM -0800, Dave Hansen wrote:
> On Tue, 2011-02-01 at 21:39 +0100, Andrea Arcangeli wrote:
> > So now the speedup
> > from hugepages needs to also offset the cost of the more frequent
> > split/collapse events that didn't happen before.
>
> My concern here is the downward slope. I interpret that as saying that
> we'll eventually have _zero_ THPs. Plus, the benefits are decreasing
> constantly, even though the scanning overhead is fixed (or increasing
> even).

It doesn't seem a downward slope, it seems to level out at 900000
pages. As shown by the other chart a faster khugepaged scan rate would
make it level out at an higher percentage of memory being huge with an
increased cost in khugepaged but it may very well payoff for the final
performance (especially on plenty-core).

> I guess we could also try and figure out whether the khugepaged CPU
> overhead really comes from the scanning or the collapsing operations
> themselves. Should be as easy as some oprofiling.

Actually I already know, the scanning is super fast. So it's no real
big deal to increase the scanning. It's big deal only if there are
plenty more of collapse/split. Compared to the KSM scan, the
khugepaged scan costs nothing.

> If it really is the scanning, I bet we could be a lot more efficient
> with khugepaged as well. In the case of KVM guests, we're going to have
> awfully fixed virtual addresses and processes where collapsing can take
> place.
>
> It might make sense to just have split_huge_page() stick the vaddr and
> the mm in a queue. khugepaged could scan those addresses first instead
> of just going after the system as a whole.

That would apply to KSM and swapping only though, not to all
split_huge_page. It may not be bad idea. But the scanning really is
fast. So it may not be necessary. Clearly the more memory you have,
the faster the scanning has to be to obtain the same percentage of
memory in hugepages in presence of KSM.

Also note: did you tune the ksmd scanning values? Or you only run echo
1 >run? Clearly if you increased the ksmd scanning values decreasing
the scan_sleep_millisecs or increased the pages_scanned, you've to
increase the khugepaged scanning values too accordingly. Not saying
the current default is ok for such an huge system that you're
using. But I doubt the ksm default is ok either for such an huge
system. So if you go tweak ksmd at 100% cpu load (which will also
cause more false sharing as the interval between the cksum comparsion
before adding to unstable tree decreases significantly) and khugepaged
doesn't collapse the false-sharing regions, it's normal. (in that case
either slowdown ksm or speedup khugepaged would help, slowing down ksm
may actually lead to better performance and not much less memory used)

In fact the "keep track" of split_huge_page location for khugepaged,
may actually hide issues in KSM if there's a piece of memory flipped
twice fast but that stays at the same value all the time, then the
cksum heuristic that decides if the page is constant enough to be
added to the unstable tree, may get false positives from the cksum. If
we notice in KSM the page cows fast after after sharing it for a
couple of times we should stop merging it. It'd be better to improve
KSM intelligence to avoid false sharing in that case. Now I don't have
enough data (I don't even know what runs in guest) clearly to tell if
this could ever be because of 1) undetectable false sharing from KSM
through the cksum, 2) a too fast ksm scan invalidating the ckshm, 3)
or genuine khugepaged scan too slow not keeping up with KSM optimal
changes (which would be perfectly normal if ksmd scan rate has been
increased a lot but khugepaged wasn't accordingly).

In short I don't issues at the moment with this workload if increasing
khugepaged (or slowing down ksm) optimizes it.

> For cases where the page got split, but wasn't modified, should we have
> a non-copying, non-allocating fastpath to re-merge it?

Even if it's modified it's ok, as long as the pages are still
physically contiguous collapse can happen in place. I never dreamed to
attempt it only because of the increased complexity,
__split_huge_page_refcount is complex enough so because I could avoid
converting from regular page to hugepage in place I happily avoided it
;). That BUG_ON(page_mapcount != mapcount) I can have nightmares at
night with it, so I was pleased not to have more of those. Anyway I
think it's not important optimization we should spend more energy in
making sure split_huge_page is never called for nothing. However in
the mid term I'm not against it, it may always happen sometime that
the hugepage is splitted by memory pressure but then the subpages
aren't swapped out.

There's also one other thing to optimize that I think has more
priority (it's not going to benefit KVM though) that is to collapse
readonly shared anon pages, which currently it can't do.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/