On Wed, 29 May 2019, Yang Shi wrote:
The approach we went with were to track the actual counts of compoundRight, we've also encountered this. I talked to Kirill about it a week orYes, I do agree. I tried other approaches too, it sounds making deferred split
so ago where the suggestion was to split all compound pages on the
deferred split queues under the presence of even memory pressure.
That breaks cgroup isolation and perhaps unfairly penalizes workloads that
are running attached to other memcg hierarchies that are not under
pressure because their compound pages are now split as a side effect.
There is a benefit to keeping these compound pages around while not under
memory pressure if all pages are subsequently mapped again.
queue per memcg is the optimal one.
pages on the deferred split queue for each pgdat for each memcg and then
invoke the shrinker for memcg reclaim and iterate those not charged to the
hierarchy under reclaim. That's suboptimal and was a stop gap measure
under time pressure: it's refreshing to see the optimal method being
pursued, thanks!
Right, and we have also seen this for users of MADV_FREE that have both anI'm curious if your internal applications team is also asking forNo, but this reminds me. The THPs on deferred split queue should be accounted
statistics on how much memory can be freed if the deferred split queues
can be shrunk? We have applications that monitor their own memory usage
into available memory too.
increased rss and memcg usage that don't realize that the memory is freed
under pressure. I'm thinking that we need some kind of MemAvailable for
memcg hierarchies to be the authoritative source of what can be reclaimed
under pressure.
Exactly the same in my case :) We were likely looking at the exact samethrough memcg stats or usage and proactively try to reduce that usage whenI don't think they have such monitor. I saw rss_huge is abormal in memcg stat
it is growing too large. The deferred split queues have significantly
increased both memcg usage and rss when they've upgraded kernels.
How are your applications monitoring how much memory from deferred split
queues can be freed on memory pressure? Any thoughts on providing it as a
memcg stat?
even after the application is killed by oom, so I realized the deferred split
queue may play a role here.
issue at the same time.
The memcg stat doesn't have counters for available memory as global vmstat. ItHave you considered following how NR_ANON_MAPPED is tracked for each pgdat
may be better to have such statistics, or extending reclaimable "slab" to
shrinkable/reclaimable "memory".
and using that as an indicator of when the modify a memcg stat to track
the amount of memory on a compound page? I think this would be necessary
for userspace to know what their true memory usage is.